String Manipulation in C
Recall that there are two types of strings in the C programming language. String variables can be modified, but string literals cannot. This article will discuss how to safely manipulate string variables. <!--more-->
String Basics
String variables are an array of characters in 'C'. Strings are always null terminated, so the array needs space for the string and the null character (\0
).
/* There are two ways to declare a string variable */
char mystr[4] = "bye"; // '\0' is appended automatically to the end
char mystr2[4] = {'b', 'y', 'e', '\0'};
We can change characters in a string by modifying the array elements:
mystr[0] = 'e'; // mystr is now "eye" instead of "bye"
Identifying a Common Mistake
Suppose we wanted to copy the contents of a string. Here is what beginners often write:
char mystr[6] = "hello";
char newstr[6];
newstr = mystr; // THIS IS INCORRECT.
Notice that newstr
is an array name. It is illegal to use an array name on the left side of the assignment operator (=
). The only exception to this rule is when we initialize a string at the same time as we declare it. So char mystr[6] = "hello"
is acceptable, but newstr = mystr
is not.
Luckily, there is a string function in 'C' that can help us to copy strings.
Introducing the String Library
The string library has functions that can help us manipulate strings. You'll need to add #include <string.h>
at the beginning of your program to access these functions.
Let's continue to solve the problem of copying the contents of a string. There are two functions that may be able to help us: strcpy
and strncpy
. We will look at strcpy
first.
#include <string.h> // for strcpy()
#include <stdio.h> // for printf()
int main(){
char mystr[6] = "hello";
char newstr[6];
/* copy the contents of 'mystr' into 'newstr',
including the null char */
strcpy(newstr, mystr);
printf("Now 'newstr' contains %s \n", newstr);
}
On my machine, the result is:
Now 'newstr' contains hello
What would happen if I made the following change to my program:
#include <string.h>
#include <stdio.h>
int main(){
char mystr[6] = "hello";
char newstr[3]; // CHANGE: newstr only has space for 3 chars
strcpy(newstr, mystr); // THIS IS UNSAFE!
printf("Now 'newstr' contains %s \n", newstr);
}
We have just done something no programmer should ever do. We copied six characters ("hello"
plus '\0'
) into an array that only had space for three characters. This exceeds array bounds and causes a major vulnerability called buffer overflow.
But when I run this program on my machine, I get the same output as before:
Now 'newstr' contains hello
So what happened? We might think that our program is okay because it runs as expected. This is a terrible assumption because our program has a big flaw and 'C' does not warn us.
Safe Functions in the String Library
Let's consider strncpy
: a "safe" alternative to strcpy
. This is because strncpy
does not exceed array bounds when used properly. Let's edit the example from before to use the n
family of string functions:
#include <string.h>
int main(){
char mystr[6] = "hello";
char newstr[3];
// Only copy the first two letters of 'mystr'
strncpy(newstr, mystr, 2);
newstr[2] = '\0';
/* strncpy does NOT add a terminating null
so we must do it ourselves */
printf("Now 'newstr' contains %s \n", newstr);
}
On my machine, the output is:
Now 'newstr' contains he
Notice that we only copied two characters into newstr
so that we could leave room for the null character ('\0'
). We did not exceed array bounds this time.
Applying Your Knowledge to other Functions
The string library has other functions that allow us to manipulate strings. Some of the common ones are:
strchr
(search a string for a character)strcmp
(compare two strings together)strncat
(addn
characters from one string to another string)
Additional Resources
For a more exhaustive list of string functions available in 'C', see the Linux Programmer's Manual.
For more details on the history of the C programming language you can read this article.
Peer Review Contributions by: Nadiv Gold Edelstein