Guides:C/C Crash Course/Characters and Strings

From CoderGuide

Jump to: navigation, search

Back to TOC

Characters and Strings

Traditionally, a single text character occupies one byte of memory (8-bits). In C (as well as in Java), you can expect the char data type to also be one byte in size (in Java, there is also a byte class that you can use). This is important since being able to use some files relies on being able to access individual bytes of a file, such as for graphics and sound, as well as compressed files (gziped, ziped, bziped, etc). Some compilers support a 16-bit character type to represent "wide" or "UNICODE" characters have their own data type for them (many use "w_char," or some other variant), but you can still pretty much expect the char data type to always be one byte in size. We can assign a value to a char data type by either giving it a number (as long as it fits in 8 bits, otherwise the high bits will be lost), or as a character in single quotes such as 'a' (this quotation mark appears on the same key as the double quotation mark on QWERTY keyboards which are used almost exclusively in English-speaking countries). There will be examples later.

Some people have some trouble with the concept that the computer doesn't know whether or not the data stored in the char data type is actually a character, or just data. In fact, to the CPU, its all just data. It is up to the program to decide how to use and represent that data, whether it be as a number, a character, or a pixel on the screen. Just something to keep in mind. On most modern computers, printable text is encoded using ASCII for the lower 128 (0-127) characters and control codes (the upper 128 are determined by whatever character encoding the system is using, and these are called "extended" or "high-bit" characters). You can find the ASCII table here.

Now a text string in C is just an array of characters terminated by a zero (null). This zero at the end of the string indicates the end of the printable string. This means, if you want to have a string of 80 characters, you need to add one more character to allow for the null-terminator (the zero at the end), so it would have to be a full 81 element array. Now, if your 81 element array is only storing a string of 8 bytes,then the null terminator would be in the 9th element of that array (but still, the entire 81 elements would occupy memory). C++ has a way around this with the String class, but it is a bit slow (as is with Java's string class). You can assign the value of a string to an array of characters by enclosing the entire string in double-quotes, such as: "Hello World." But you should only do this to initialize a string, not not after you have created it, because the data of the string is not copied into the array, the string, in fact, replaces the existing array (more on this when we talk about pointers).

Now for a few examples:

#include <stdio.h> /*Include the standard IO header file*/
int main(){ /*Define function main(), the first function a 
              program runs, and the programs entry point.*/
        char a; /* a single character */
        char s[30]="Hello world!"; /*an array of characters */
        a=120; /*Assign 120 to a, which also is the value of the
                 ASCII character x */
        a='A'; /*assign the character value  'A' to a (101 in
        s[2]=0; /*Assign the value of 0 to the third element of the
                  character array s.  Now the string will read "He"
                  because the zero in the third element indicates the
                  end of the string (The array is still 30 elements
                  in length).*/
        printf("%s\n",s); /*And now we'll prove it! The "%s" tells 
                            printf to print a string and that the
                            next parameter is a string variable. 
                            This will be explained in more detail 
} /*the end of function main*/

We will cover the printf() function in more detail later.

Personal tools