Guides:C/C Crash Course/File IO

From CoderGuide

Jump to: navigation, search

Back to TOC

Contents

File I/O

After a program quits, all of it's data is lost, and you'll have to go enter that data back in again if you want to use it again unless you save and load that data to and from a disk, or other mass storage device. That's where file I/O comes in. There are several facilities for writing to and from files, and manipulating them, but we'll only be covering basic buffered file I/O here.

Before you can access a file, you must open a stream for reading, writing, or reading and writing to a file. After you are finished with that file, you must close that stream. Unlike Java, when you close a file stream, all the data is automatically written to disk, you don't have to flush the data to disk as a separate step (I don't know why it is this way in Java-- I think the people who wrote Java had a few "magic Java beans" in their Java [coffee] mix).

Let's talk about the functions you'll be using, and then write a program that utilizes some them.

File I/O Functions

FILE *fopen(const char *path, const char *mode)

Opens a file, and returns a pointer to that stream if successful, otherwise it returns a null pointer (zero). path is the path to the file (including the file name), and mode describes the access mode for the file. Remember, Windows, and other Microsoft operating systems, use the backslash '\' for the directory separator. Unix systems, Mac OS X, and basically the majority of all other operating systems in existence, use the forward slash '/' (the slash below the question mark) for the directory separator (URLs on the Internet also only use the forward slash).

Here is a list of values you can use for mode:

Mode stringAccess Mode
r Open an existing text file for reading
r+ Open, or create, text file for reading and writing.
w+ Create, or overwrite, a text file for reading and writing.
w Create. or overwrite, a text file for writing.
a Write to the end of a text file.
rb Same as r, except explicitly states it's a binary file
r+b same as r+, except explicitly states it's a binary file
wb same as w, except explicitly states it's a binary file
ab same as a, except explicitly states it's a binary file
rt Same as r, perform character translations on some systems
r+t same as r+, perform character translations on some systems
wt same as w, perform character translations on some systems
at same as a, perform character translations on some systems

Unix/POSIX systems ignore 'b' and 't' prefixes, since no character translations are needed on these systems for text files.

When you open a file as a text file, as opposed to a binary file, additional translations occur for that system. In the case of Microsoft Systems, and some very early operating systems, the carriage return character is removed from the input stream on reading, and added to the output stream on writing. Unix systems require no such translations, hence the 'b' option is ignored on Unix systems.


Caution: Some operating systems have restrictions on what characters you may use in a file name (Unix has no such restrictions). Operating systems written by Microsoft, don't allow "*?+\/" characters for use in filenames (the '+' was reserved because they wanted to use for the "copy" command to combine two files together-- it also was a command that was hardly ever used), as well as many others. You shouldn't use "*?\/" characters on any system for file names, even if they are allowed, you should also avoid the use of "$&%!" characters as well.

A note on text files

Text files are a special case. When opening up a text file/stream on a Unix system, no character translations are done. That is because Unix uses the newline character '\n' for the newline terminator. MS-DOS/Windows does not. It uses carriage return/line feed pairs "\r\n". Mac OS prior to version 10 (Mac OS X) used carriage returns to mark the end of a line of text in text files. Mac OS X, being based on BSD Unix, uses newline characters.

When you send a '\n' character to a stream, the stream functions will translate the newline character into the proper new line termination character for that operating system. Likewise, when reading in a file, the line termination characters (whether they be singles or pairs) are translated into '\n' characters. Unix, on the other hand, pretty much ignores the

This is all fine and dandy, except when working with MS-DOS/Windows files. You see, fseek seeks to a location in bytes, so, when reading back the line:


This is a text line in MS-DOS\r\n


The '\r' character is not read in. So, instead of reading in 41 characters, you'd only get 40 characters. But. if you try to seek to the end of the this line, thinking it's only 40 characters long, you'd be off by one byte, and instead read in a '\n' character, which happens to mark the end of a line of text.

Since text files are usually not opened for random access (that is, they're read in as a continuous stream from start to finish), this isn't so much of an issue, but it is something to be aware of.

If you want your program to be able to read ASCII text files from all systems, past and new, then you'd be better off reading the file in as binary data, and determine which character pairs are used for that particular file. Mostly, all you'd need to check for is if there is a \n and \r side-by-side, and strip off one of those characters, then translate all \r characters into \n.

Although you aren't likely to encounter this, there are other methods for representing text characters. The IBM AS/390 systems and prior use EBCDIC for character encoding, which is quite a bit different from ASCII. Fortunately, EBCDIC is dead.

Another thing to note is that early versions of MS-DOS appends ASCII character code 26 (control-Z). You may find this extra code in some older text files. This is also the termination character used by CP/M to mark the end of a text file (due to the fact file sizes were stored in blocks/records, and not in byte sizes).

int fclose(FILE *stream)

Closes a stream you opened with fopen() returns 0 if successful. Trying to use that stream again after it has been closed, without opening it again with fopen() could have any number of unknown results. In other words: It's a really bad idea to use a stream after you have closed it.

int fgetc(FILE *stream)

Reads a single character/byte from the stream

char *fgets(char *s, int size, FILE *stream)

Reads up to size bytes of a string as covered in the Standard Input section.

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)]

Reads in nmemb records of size var from the stream stream and stores the data into ptr. A void pointer is a pointer of no type. This way, the fread() function can read in data of any type, whether it be an integer, a character string, a structure, whatever. The return value is the actual number of bytes read in.

size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream)

fwrite() is similar to fread(), but writes data to a stream, rather than reading it.

int fputc(int c, FILE *stream)

Writes a singe character/byte to a stream.

int fputs(const char *s, FILE *stream)

Writes a string to a file-- everything up to, and including, the null terminator.

int fprintf(FILE *stream, const char *format, ...)

fprintf() is printf() for streams! Acts just like printf, put allows you to send text to a stream which could be a file.

Using fprintf() like this would do the same thing as printf():

     fprintf(stdout, "Hello %s! How's the world?\n", name);

Sometimes you may see a program send output to stderr instead. This is normally for error messages, and usually bypass normal console redirection (take a look at [Shell Basics] for more on that).

     fprintf(stderr,"Yo dude, the thingy is spitting out random bits. I'm confused.\n");
     fprintf(stderr,"oh man.. it's too much dude... I'm going to eat the hard drive if I don't go now\n");
     exit(1);

But, you could as easily write to a file instead of the screen using fprintf().

void rewind(FILE *stream)

Moves the read/write position of a file back to the beginning of the stream.

size_t ftell(FILE *stream)

Returns the current position in the open stream.

int fseek(FILE *stream, long offset, int whence)

Moves the file position indicator to some other location in the stream, based on offset and whence. Here is a table of valid values for whence and how it affects fseek():

WhenceEffect
SEEK_SET Moves the file position by offset relative to the beginning of the file
SEEK_CUR Moves the file position by offset relative to the current position in the file
SEEK_END Moves the file position by offset relative to the end of the file

So, if you use the function like this:

fseek(fp,0,SEEK_END);

That would take you to the very end of the file. And if you used fseek() like this:

fseek(fp.0,SEEK_SET);

It would move to the start of the file like rewind().

int feof(FILE *stream)

Returns a zero value (false) if the the end of the file has not yet been reached. Returns a non-zero (true) if it has. After the EOF flag has been set, you must use the clearerror() function to clear the flag even if you seek to a new position in a the file.

void clearerr(FILE *stream)

Clears any error flag it the file, including the EOF flag.

int fflush(FILE *stream)

Writes all buffered data of a stream, or forces an update of the stream. For files, this forces a write to disk of all data still in the buffer and not yet written to disk. For stdout, and stderr, it forces a write to the screen.

sizeof

sizeof is not a function; it is a keyword in the C/C++ programming language that will return the size of a particular data type or structure. Knowing how much space a particular variable or data type takes up is very useful, and required in order to dynamically allocate memory for that type, and to save/load the data to the disk (or other storage device).

It's pretty easy to use:

#include <stdio.h>
 
int main(){
 
	int inta, intb[10];
	long a,b[10];
	char ch, s[80];
 
	char *sp;
 
        /*long long is only on C99 compliant compilers*/
        printf("The size of a long long is %d\n",sizeof (long long));
 
	printf("The size of inta is %d\n",sizeof inta);
	printf("The size of intb[10] is %d\n",sizeof intb);
	printf("The size of a is %d\n",sizeof(a) );
	printf("The size of b[10] is %d\n",sizeof b);
	printf("The size of a long is %d\n",sizeof (long) );
	printf("The size of ch is %d\n",sizeof (ch) );
	printf("The size of s[80] is %d\n",sizeof s);
	printf("The size of *sp is %d\n",sizeof (sp) );
	sp=s;
	printf("The size of *sp is still is %d\n", sizeof sp);
	printf("The size of a char pointer is %d\n",sizeof (char *) );
	printf("The size of int pointer is %d\n",sizeof (int *) );
	printf("The size of a void/typeless pointer is %d\n",
	   sizeof (void*) );
	printf("The size of a float is %d\n",sizeof (float));
	printf("The size of a double is %d\n",sizeof (double));
	printf("The size of a long double is %d\n",
	   sizeof (long double) );
 
}

When compiled with GNU C for 32-bit Linux, I get:

The size of a long long is 8
The size of inta is 4
The size of intb[10] is 40
The size of a is 4
The size of b[10] is 40
The size of a long is 4
The size of ch is 1
The size of s[80] is 80
The size of *sp is 4
The size of *sp is still is 4
The size of a char pointer is 4
The size of int pointer is 4
The size of a void/typeless pointer is 4
The size of a float is 4
The size of a double is 8
The size of a long double is 12


When compiled with GNU C for 64-bit Linux, I get:

The size of a long long is 8
The size of inta is 4
The size of intb[10] is 40
The size of a is 8
The size of b[10] is 80
The size of a long is 8
The size of ch is 1
The size of s[80] is 80
The size of *sp is 8
The size of *sp is still is 8
The size of a char pointer is 8
The size of int pointer is 8
The size of a void/typeless pointer is 8
The size of a float is 4
The size of a double is 8
The size of a long double is 16

On a Mac Mini running Mac OS X 10.3 and a PowerPC chip with Xcode tools (GNU C/C++ compiler):

The size of inta is 4
The size of intb[10] is 40
The size of a is 4
The size of b[10] is 40
The size of a long is 4
The size of ch is 1
The size of s[80] is 80
The size of *sp is 4
The size of *sp is still is 4
The size of a char pointer is 4
The size of int pointer is 4
The size of a void/typeless pointer is 4
The size of a float is 4
The size of a double is 8
The size of a long double is 8

And finally, for MS-DOS, compiled using the small memory model, for the 8086/8088 series of CPUs:

The size of inta is 2
The size of intb[10] is 20
The size of a is 4
The size of b[10] is 40
The size of a long is 4
The size of ch is 1
The size of s[80] is 80
The size of *sp is 2
The size of *sp is still is 2
The size of a char pointer is 2
The size of int pointer is 2
The size of a void/typeless pointer is 2
The size of a float is 4
The size of a double is 8
The size of a long double is 10

As you can see, the sizes of these data types vary widely. And, as is the case with the Intel/PC and PowerPC chips, the byte order of integers are different (the bytes are in a different order) from both processors. That means, if you save an integer to a file on a PC, and try and read it back with the same program on a PowerPC-based Mac, the bytes will be in a different order, so a different integer will be read in. There is a way to fix with integers which we'll cover more advanced sections as it deals with shuffling the actual bits. There it's not so easy with floating point numbers, since floating point numbers could be handled in any number of ways, so it's better to save it as a text string.

char are always a single byte (8-bits). All modern computers use the same 127 characters for ASCII text.

Next...

Great! Now it's tile to add file read/write ability to phonebook.c! Lets move on to Adding write capability to phonebook.c

Personal tools