| |||
Links Sections Using Data Structures With Files Chapters Part I: Basic Perl 02-Numeric and String
Literals Part II: Intermediate Perl Part III: Advanced Perl 13-Handling Errors and
Signals Part IV: Perl and the Internet 21-Using Perl with Web
Servers Appendixes |
If you've read the previous chapters and have executed some of the programs, then you already know that a file is a series of bytes stored on a disk instead of inside the computer's memory. A file is good for long-term storage of information. Information in the computer's memory is lost when the computer is turned off. Information on a disk, however, is persistent. It will be there when the computer is turned back on.
Back in Chapter 1, "Getting Your Feet Wet," you saw how to create a file using the edit program that comes with Windows 95 and Windows NT. In this chapter you'll see how to manipulate files with Perl.
There are four basic operations that you can do with files. You can open them, read from them, write to them, and close them. Opening a file creates a connection between your program and the location on the disk where the file is stored. Closing a file shuts down that connection.
Every file has a unique fully qualified name so that it can't be confused with other files. The fully qualified name includes the name of the disk, the directory, and the file name. Files in different directories can have the same name because the operating system considers the directory name to be a part of the file name. Here are some fully qualified file names:
c:/windows/win95.txt
c:/windows/command/scandisk.ini
c:/a_long_directory_name/a_long_subdirectory_name/a_long_file_name.doc
Caution |
You may be curious to know if spaces can be used inside
filenames. Yes, they can. But, if you use spaces you need to surround the
file name with quotes when referring to it from a DOS or UNIX command
line. |
Note |
It is very important that you check for errors when
dealing with files. To simplify the examples in this chapter, little error
checking will be used in the example. Instead, error checking information
will be discussed in Chapter 13, "Handling Exceptions and
Signals." |
Name | Description |
---|---|
STDIN | Reads program input. Typically this is the computer's keyboard. |
STDOUT | Displays program output. This is usually the computer's monitor. |
STDERR | Displays program errors. Most of the time, it is equivalent to STDOUT, which means the error messages will be displayed on the computer's monitor. |
You've been using the STDOUT file handle without knowing it for every print() statement in this book. The print() function uses STDOUT as the default if no other file handle is specified. Later in this chapter, in the "Examples: Printing Revisited" section, you will see how to send output to a file instead of to the monitor.
Listing 9.1-09LST01.PL - Read from Standard Input Until an End-of-file Character Is Found |
|
The <> characters, when used together, are called the diamond operator. They tell Perl to read a line of input from the file handle inside the operators. In this case, STDIN. Later, you'll use the diamond operators to read from other file handles.
In this example, the diamond operator assigned the value of the input string to $_. Then, the print() function was called with no parameters, which tells print() to use $_ as the default parameter. Using the $_ variable can save a lot of typing but I'll let you decide which is more readable. Here is the same program without using $_.
while ($inputLine = <STDIN>) {
print($inputLine);
}
When you pressed Ctrl+Z or Ctrl+D, you told Perl
that the input file was finished. This caused the diamond operator to return the
undefined value which Perl equates to false and caused the while loop
to end. In DOS (and therefore in all of the flavors of Windows), 26 - the value
of Ctrl+Z - is considered to be the end-of-file indicator. As DOS reads
or writes a file, it monitors the data stream and when a value of 26 is
encountered the file is closed. UNIX does the same thing when a value of 4 - the
value of Ctrl+D - is read.
Tip |
When a file is read using the diamond operator, the
newline character that ends the line is kept as part of the input string.
Frequently ,you'll see the chop() function used to remove the
newline. For instance, chop($inputLine = <INPUT_FILE>);.
This statement reads a line from the input file, assigns its value to
$inputLine and then removes that last character from
$inputLine - which is almost guaranteed to be a newline
character. If you fear that the last character is not a newline, use the
chomp() function instead. |
perl -w 09lst01.pl
In the previous example, Perl read the keyboard
to get the standard input. But, if there was a way to tell Perl to use the file
09LST01.PL as the standard input you could have the program print
itself. Pretty neat, huh? Well, it turns out that you can change the standard
input. It's done this way:
perl -w 09lst01.pl < 09lst01.pl
The < character is
used to redirect the standard input to the 09LST01.PL file. You
now have a program that duplicates the functionality of the DOS type command.
And it only took three lines of Perl code!
You can redirect standard output to a file using the > character. So if you wanted a copy of 09LST01.PL to be sent to OUTPUT.LOG you could use this command line:
perl -w 09lst01.pl <09lst01.pl >output.log
Keep this use of
the < and > characters in mind. You'll be using them
again shortly when we talk about the open() function. The <
character will signify that files should be opened for input and the
> will be used to signify an output file. But first, let's continue
talking about accessing files listed on the command line.
Listing 9.2-09LST02.PL - Read from Multiple Files or from STDIN |
|
The command line to run the program might look like this:
perl -w 09lst02.pl 09lst01.pl 09lst02.pl
And the output would be:
while (<STDIN>) {
print();
}
while (<>) {
print();
}
Perl will create the @ARGV array from the command line. Each
file name on the command line - after the program name - will be added to the
@ARGV array as an element. When the program runs the diamond operator
starts reading from the filename in the first element of the array. When that
entire file has been read, the next file is read from, and so on, until all of
the elements have been used. When the last file has be finished, the
while loop will end.
Using the diamond operator to iterate over a list of filenames is very handy. You can use it in the middle of your program by explicitly assigning a list of filenames to the @ARGV array. Listing 9.3 shows what this might look like in a program.
Listing 9.3-09LST03.PL - Read from Multiple Files Using the @ARGV Array |
|
This program displays:
while (<STDIN>) {
print();
}
while (<>) {
print();
}
Next, we will take a look at the ways that Perl lets you test files,
and following that, the functions that can be used with files.
Operator | Description |
---|---|
-A OPERAND | Returns the access age of OPERAND when the program started. |
-b OPERAND | Tests if OPERAND is a block device. |
-B OPERAND | Tests if OPERAND is a binary file. If OPERAND is a file handle, then the current buffer is examined, instead of the file itself. |
-c OPERAND | Tests if OPERAND is a character device. |
-C OPERAND | Returns the inode change age of OPERAND when the program started. |
-d OPERAND | Tests if OPERAND is a directory. |
-e OPERAND | Tests if OPERAND exists. |
-f OPERAND | Tests if OPERAND is a regular file as opposed to a directory, symbolic link or other type of file |
-g OPERAND | Tests if OPERAND has the setgid bit set. |
-k OPERAND | Tests if OPERAND has the sticky bit set. |
-l OPERAND | Tests if OPERAND is a symbolic link. Under DOS, this operator will always return false. |
-M OPERAND | Returns the age of OPERAND in days when the program started. |
-o OPERAND | Tests if OPERAND is owned by the effective uid. Under DOS, it always returns true. |
-O OPERAND | Tests if OPERAND is owned by the read uid/gid. Under DOS, it always returns true. |
-p OPERAND | Tests if OPERAND is a named pipe. |
-r OPERAND | Tests if OPERAND can be read from. |
-R OPERAND | Tests if OPERAND can be read from by the real uid/gid. Under DOS, it is identical to -r. |
-s OPERAND | Returns the size of OPERAND in bytes. Therefore, it returns true if OPERAND is non-zero. |
-S OPERAND | Tests if OPERAND is a socket. |
-t OPERAND | Tests if OPERAND is opened to a tty. |
-T OPERAND | Tests if OPERAND is a text file. If OPERAND is a file handle, then the current buffer is examined, instead of the file itself. |
-u OPERAND | Tests if OPERAND has the setuid bit set. |
-w OPERAND | Tests if OPERAND can be written to. |
-W OPERAND | Tests if OPERAND can be written to by the real uid/gid. Under DOS, it is identical to -w. |
-x OPERAND | Tests if OPERAND can be executed. |
-X OPERAND | Tests if OPERAND can be executed by the real uid/gid. Under DOS, it is identical to -x. |
-z OPERAND | Tests if OPERAND size is zero. |
Note |
If the OPERAND is not specified in the file
test, the $_ variable will be used
instead. |
The operand used by the file tests can be either a file handle or a file name. The file tests work by internally calling the operating system to determine information about the file in question. The operators will evaluate to true if the test succeeds and false if it does not.
If you need to perform two or more tests on the same file, you use the special underscore (_) file handle. This tells Perl to use the file information for the last system query and saves time. However, the underscore file handle does have some caveats. It does not work with the -t operator. In addition, the lstat() function and -l test will leave the system buffer filled with information about a symbolic link, not a real file.
The -T and -B file tests will examine the first block or so of the file. If the more than 10% of the bytes are non-characters or if a null byte is encountered then the file is considered a binary file. Binary files are normally data files, as opposed to text or human-readable files. If you need to work with binary files, be sure to use the binmode() file function which is described in the section called "Example: Binary Files" later in this chapter.
Pseudocode |
Start a foreach loop that looks at the command line array. Each element in the array is assigned to the default loop variable $_. Print the file name contained in $_. Print a message indicating the type of file by checking the evaluation of the -f operator. |
Listing 9.4-09LST04.PL - Using the -f Operator to Find Regular Files Inside a foreach Loop |
|
When this program is run using the following command line:
perl -w 09lst04.pl 09lst01.pl \perl5 perl.exe \windows
The
following is displayed:
09lst01.pl -REGULAR
\perl5 -SPECIAL
perl.exe -REGULAR
\windows -SPECIAL
Each of the directories listed on the command line
were recognized as special files. If you want to ignore all special files in the
command line you do so like this:
Pseudocode |
Start a foreach loop that looks at the command line array. If the current file is special, then skip it and go on to the next iteration of the foreach loop. Print the current file name that is contained in $_. Print a message indicating the type of file. |
Listing 9.5-09LST05.PL - Using the -f Operator to Ignore Special Files Inside a foreach Loop |
|
When this program is run using the following command line:
perl -w 09lst05.pl 09lst01.pl \perl perl.exe \windows
The
following is displayed:
09lst01.pl is a Regular file.
perl.exe is a Regular file.
Notice that only the regular file names are
displayed. The two directories on the command line were ignored.
As mentioned above, you can use the underscore file handle to make two tests in a row on the same file so that you program can execute faster and use less system resources. This could be important if your application is time critical or makes many repeated tests on a large number of files.
Pseudocode |
Start a foreach loop that looks at the command line array. If the current file is special, then skip it and go on to the next iteration of the foreach loop. Determine the number of bytes in the file using the -s operator using the underscore file handle so that a second operating system call is not needed. Print a message indicating the name and size of the file. |
Listing 9.6-09LST06.PL - Finding the Size in Bytes of Regular Files Listed on the Command Line |
|
When this program is run using the following command line:
perl -w 09lst06.pl \perl5 09lst01.pl \windows perl.exe
The
following is displayed:
09lst01.pl is 36 bytes long.
perl.exe is 61952 bytes long.
Tip |
Don't get the underscore file handle confused with
the $_ special variable. The underscore file handle tells Perl to
use the file information from the last system call and the $_
variable is used as the default parameter for a variety of
functions. |
Function | Description |
---|---|
binmode(FILE_HANDLE) | This DOS-based function puts FILE_HANDLE into a binary mode. For more information, see the section "Example: Binary Files" later in this chapter. |
chdir(DIR_NAME) | Causes your program to use DIR_NAME as the current directory. It will return true if the change was successful, false if not. |
chmod(MODE, FILE_LIST) | This UNIX-based function changes the permissions for a list of files. A count of the number of files whose permissions were changed is returned. There is no DOS equivalent for this function. |
chown(UID, GID, FILE_LIST) | This UNIX-based function changes the owner and group for a list of files. A count of the number of files whose ownership were changed is returned. There is no DOS equivalent for this function. |
close(FILE_HANDLE) | Closes the connection between your program and the file opened with FILE_HANDLE. |
closedir(DIR_HANDLE) | Closes the connection between your program and the directory opened with DIR_HANDLE. |
eof(FILE_HANDLE) | Returns true if the next read on FILE_HANDLE will result in hitting the end of the file or if the file is not open. If FILE_HANDLE is not specified the status of the last file read is returned. All input functions return the undefined value when the end of file is reached, so you'll almost never need to use eof(). |
fcntl(FILE_HANDLE, FUNCTION, SCALAR) | Implements the fcntl() function which lets you perform various file control operations. Its use is beyond the scope of this book. |
fileno(FILE_HANDLE) | Returns the file descriptor for the specified FILE_HANDLE. |
flock(FILEHANDLE, OPERATION) | This UNIX-based function will place a lock on a file so that multiple users or programs can't simultaneously use it. There is no DOS equivalent for this function. The flock() function is beyond the scope of this book. |
getc(FILE_HANDLE) | Reads the next character from FILE_HANDLE. If FILE_HANDLE is not specified, a character will be read from STDIN. |
glob(EXPRESSION) | Returns a list of files that match the specification of EXPRESSION, which can contain wildcards. For instance, glob("*.pl") will return a list of all Perl program files in the current directory. |
ioctl(FILE_HANDLE, FUNCTION, SCALAR) | Implements the ioctl() function which lets you perform various file control operations. Its use is beyond the scope of this book. For more in-depth discussion of this function see Que's Special Edition Using Perl for Web Programming. |
link(OLD_FILE_NAME, NEW_FILE_NAME) | This UNIX-based function creates a new filename that is linked to the old filename. It returns true for success and false for failure. There is no DOS equivalent for this function. |
lstat(FILE_HANDLE_OR_FILE_NAME) | Returns file statistics in a 13-element array. lstat() is identical to stat() except that it can also return information about symbolic links. See the section "Example: Getting File Statistics" for more information. |
mkdir(DIR_NAME, MODE) | Creates a directory named DIR_NAME. If you try to create a sub-directory, the parent must already exist. This function returns false if the directory can't be created. The special variable $! is assigned the error message. |
open(FILE_HANDLE, EXPRESSION) | Creates a link between FILE_HANDLE and a file specified by EXPRESSION. See the section "Example: Opening Files" for more information. |
opendir(DIR_HANDLE, DIR_NAME) | Creates a link between DIR_HANDLE and the directory specified by DIR_NAME. opendir() returns true if successful, false otherwise. |
pipe(READ_HANDLE, WRITE_HANDLE) | Opens a pair of connected pipes like the corresponding system call. Its use is beyond the scope of this book. For more on this function see Que's Special Edition Using Perl for Web Programming. |
print FILE_HANDLE (LIST) | Sends a list of strings to FILE_HANDLE. If FILE_HANDLE is not specified, then STDOUT is used. See the section "Example: Printing Revisited" for more information. |
printf FILE_HANDLE (FORMAT, LIST) | Sends a list of strings in a format specified by FORMAT to FILE_HANDLE. If FILE_HANDLE is not specified, then STDOUT is used. See the section "Example: Printing Revisited" for more information. |
read(FILE_HANDLE, BUFFER, LENGTH, OFFSET) | Reads up to LENGTH bytes from FILE_HANDLE starting at OFFSET position in the file into the scalar variable called BUFFER. It returns the number of bytes read or the undefined value. |
readdir(DIR_HANDLE) | Returns the next directory entry from DIR_HANDLE when used in a scalar context. If used in an array context, all of the file entries in DIR_HANDLE will be returned in a list. If there are no more entries to return, the undefined value or a null list will be returned depending on the context. |
readlink(EXPRESSION) | This UNIX-based function returns that value of a symbolic link. If an error occurs, the undefined value is returned and the special variable $! is assigned the error message. The $_ special variable is used if EXPRESSION is not specified. |
rename(OLD_FILE_NAME, NEW_FILE_NAME) | Changes the name of a file. You can use this function to change the directory where a file resides but not the disk drive or volume. |
rewinddir(DIR_HANDLE) | Resets DIR_HANDLE so that the next readdir() starts at the beginning of the directory. |
rmdir(DIR_NAME) | Deletes an empty directory. If the directory can be deleted it returns false and $! is assigned the error message. The $_ special variable is used if DIR_NAME is not specified. |
seek(FILE_HANDLE, POSITION, WHENCE) | Moves to POSITION in the file connected to FILE_HANDLE. The WHENCE parameter determines if POSITION is an offset from the beginning of the file (WHENCE=0), the current position in the file (WHENCE=1), or the end of the file (WHENCE=2). |
seekdir(DIR_HANDLE, POSITION) | Sets the current position for readdir(). POSITION must be a value returned by the telldir() function. |
select(FILE_HANDLE) | Sets the default FILE_HANDLE for the write() and print() functions. It returns the currently selected file handle so that you may restore it if needed. You can see the section "Example: Printing Revisited" to see this function in action. |
sprintf(FORMAT, LIST) | Returns a string whose format is specified by FORMAT. |
stat(FILE_HANDLE_OR_FILE_NAME) | Returns file statistics in a 13-element array. See the section "Example: Getting File Statistics" for more information. |
symlink(OLD_FILE_NAME, NEW_FILE_NAME) | This UNIX-based function creates a new filename symbolically linked to the old filename. It returns false if the NEW_FILE_NAME could not be created. |
sysread(FILE_HANDLE, BUFFER, LENGTH, OFFSET) | Reads LENGTH bytes from FILE_HANDLE starting at OFFSET position in the file into the scalar variable called BUFFER. It returns the number of bytes read or the undefined value. |
syswrite(FILE_HANDLE, BUFFER, LENGTH, OFFSET) | Writes LENGTH bytes from FILE_HANDLE starting at OFFSET position in the file into the scalar variable called BUFFER. It returns the number of bytes written or the undefined value. |
tell(FILE_HANDLE) | Returns the current file position for FILE_HANDLE. If FILE_HANDLE is not specified, the file position for the last file read is returned. |
telldir(DIR_HANDLE) | Returns the current position for DIR_HANDLE. The return value may be passed to seekdir() to access a particular location in a directory. |
truncate(FILE_HANDLE, LENGTH) | Truncates the file opened on FILE_HANDLE to be LENGTH bytes long. |
unlink(FILE_LIST) | Deletes a list of files. If FILE_LIST is not specified, then $_ will be used. It returns the number of files successfully deleted. Therefore, it returns false or 0 if no files were deleted. |
utime(FILE_LIST) | This UNIX-based function changes the access and modification times on each file in FILE_LIST. |
write(FILE_HANDLE) | Writes a formatted record to FILE_HANDLE. See chapter 11, "Creating Reports," for more information. |
open(FILE_HANDLE);
The FILE_HANDLE parameter in this
version of open()is the name for the new file handle. It is also the
name of the scalar variable that holds the file name that you'd like to open for
input. For example:
Pseudocode |
Assign the file name, FIXED.DAT, to the $INPUT_FILE variable. All capital letters are used for the variable name to indicate that it is also the name of the file handle. Open the file for reading. Read the entire file into @array. Each line of the file becomes a single element of the array. Close the file. Use a foreach loop to look at each element of @array. Print $_, the loop variable, which contains one of the elements of @array. |
Listing 9.7-09LST07.PL - How to Open a File for Input |
|
This program displays:
1212Jan Jaspree Painter
3453Kelly Horton Jockey
It is considered good
programming practice to close any connections that are made with the
open() function as soon as possible. While not strictly needed, it does
ensure that all temporary buffers and caches are written to the hard disk in
case of a power failure or other catastrophic failure.
Note |
DOS - and by extension, Windows - limits the number
of files that you can have open at any given time. Typically, you can have
from 20 to 50 files open. Normally this is plenty. If you need to open
more files, please see your DOS documentation. |
The open() function has many variations to let you access files in different ways. Table 9.4 shows all of the different method used to open a file.
Open Statement | Description |
---|---|
open(FILE_HANDLE); | Opens the file named in $FILE_HANDLE and connect to it using FILE_HANDLE as the file handle. The file will be opened for input only. |
open(FILE_HANDLE, FILENAME.EXT); | Opens the file called FILENAME.EXT for input using FILE_HANDLE as the file handle. |
open(FILE_HANDLE, <FILENAME.EXT); | Opens FILENAME.EXT for input using FILE_HANDLE as the file handle. |
open(FILE_HANDLE, >FILENAME.EXT); | Opens FILENAME.EXT for output using FILE_HANDLE as the file handle. |
open(FILE_HANDLE, -); | Opens standard input. |
open(FILE_HANDLE, >-); | Opens standard output. |
open(FILE_HANDLE, >>FILENAME.EXT); | Opens FILENAME.EXT for appending using FILE_HANDLE as the file handle. |
open(FILE_HANDLE, +<FILENAME.EXT); | Opens FILENAME.EXT for both input and output using FILE_HANDLE as the file handle. |
open(FILE_HANDLE, +>FILENAME.EXT); | Opens FILENAME.EXT for both input and output using FILE_HANDLE as the file handle. |
open(FILE_HANDLE, +>>FILENAME.EXT); | Opens FILENAME.EXT for both input and output using FILE_HANDLE as the file handle. |
open(FILE_HANDLE, | PROGRAM) | Sends the output printed to FILE_HANDLE to another program. |
open(FILE_HANDLE, PROGRAM |) | Reads the output from another program using FILE_HANDLE. |
Note by Matthew Kleiman |
The +< prefix will open the file only if it exists and it will maintain the original data. This is the equivalent of using the < prefix, with the added bonus that the file can be written to as well. The +> command will open the file regardless of whether it exists or not. If it does exist, the file will be truncated (all data will be lost and a new file will be created) but the new file will be open for input/output instead of just output. I do not know what the +>> does for certain (it is not in the documentation as far as i can tell) but i deduce that it is similar to +< but starts at the end of the file just as the normal append prefix does. |
By prefixing the file name with a > character you open the file for output. This next example opens a file that will hold a log of messages.
Pseudocode |
Call the open() function to open the MESSAGE.LOG file for writing with LOGFILE as the file handle. If the open was successful a true value will be returned and the statement block will be executed. Send the first message to the MESSAGE.LOG file using the print() function. Notice that an alternate method is being used to call print(). Send the second message to the MESSAGE.LOG file. Close the file. |
if (open(LOGFILE, ">message.log")) {
print LOGFILE ("This is message number 1.\n");
print LOGFILE ("This is message number 2.\n");
close(LOGFILE);
}
This program displays nothing. Instead, the output from the
print() function is sent directly to the MESSAGE.LOG file
using the connection established by the open() function.
In this example, the print() function uses the first parameter as a file handle and the second parameter as a list of things to print. You can find more information about printing in the section "Example: Printing Revisited" later in this chapter.
If you needed to add something to the end of the MESSAGE.LOG file, you use >> as the file name prefix when opening the file. For example:
Pseudocode |
Call the open() function to open the MESSAGE.LOG file for appending with LOGFILE as the file handle. If the file does not exist it will be created, otherwise anything printed to LOGFILE will be added to the end of the file. Send a message to the MESSAGE.LOG file. Send a message to the MESSAGE.LOG file. Close the file. |
if (open(LOGFILE, ">>message.log")) {
print LOGFILE ("This is message number 3.\n");
print LOGFILE ("This is message number 4.\n");
close(LOGFILE);
}
Now, when MESSAGE.LOG is viewed, it contains the following
lines:
This is message number 1.
This is message number 2.
This is message number 3.
This is message number 4.
Note |
The examples in this section relate to the DOS
operating system. |
In order to demonstrate these differences, we'll use a data file called BINARY.DAT with the following contents:
01
02
03
First, we'll read the file in the default text mode.
Pseudocode |
Initialize a buffer variable. Both read() and sysread() need their buffer variables to be initialized before the function call is executed. Open the BINARY.DAT file for reading. Read the first 20 characters of the file using the read() function. Close the file. Create an array out of the characters in the $buffer variable and iterate over that array using a foreach loop. Print the value of the current array element in hexadecimal format. Print a newline character the current array element is a newline character. |
Listing 9.8-09LST08.PL - Reading a File to Show Text Mode Line Endings |
|
This program displays:
30 31 0a
30 32 0a
30 33 0a
Note |
The %02x notation used in this program has nothing to do with Perl. The % character tells the printf function to interpret the following character as a format specification. And the x character tells printf to display a value in hexadecimal mode. |
Errata Note |
The printed version used a > character to open the binary.dat file instead of the < character. |
This example does a couple of things that haven't been seen yet in this book. The read() function is used as an alternative to the line-by-line input done with the diamond operator. It will read a specified number of bytes from the input file and assign them to a buffer variable. The fourth parameter specifies an offset at which to start reading. In this example, we started at the beginning of the file.
The split() function in the foreach loop breaks a string into pieces and places those pieces into an array. The double slashes indicate that each character in the string should be an element of the new array.
Once the array of characters has been created, the foreach loop iterates over the array. The printf() statement converts the ordinal value of the character into hexadecimal before displaying it. The ordinal value of a character is the value of the ASCII representation of the character. For example, the ordinal value of '0' is 0x30 or 48.
The next line, the print statement, forces the output onto a new line if the current character is a newline character. This was done simply to make the output display look a little like the input file.
Now, let's read the file in binary mode and see how the output is changed.
Pseudocode |
Initialize a buffer variable. Open the BINARY.DAT file for reading. Change the mode to binary. Read the first 20 characters of the file using the read() function. Close the file. Create an array out of the characters in the $buffer variable and iterate over that array using a foreach loop. Print the value of the current array element in hexadecimal format. Print a newline character the current array element is a newline character. |
Listing 9.9-09LST09.PL - Reading a File to Show Binary Mode Line Endings |
|
This program displays:
30 31 0d 0a
30 32 0d 0a
30 33 0d 0a
Our next example will look at the end of file character in both text and binary modes. We'll use a data file called EOF.DAT with the following contents:
01
02
<end of file character>03
Since the end of file character is a
non-printing character, it can't be shown directly. In the spot <end of
file character> above is really the value 26.
Here is the program that yousaw previously read the BINARY.DAT file only this time it will read EOF.DAT.
Pseudocode |
Initialize a buffer variable. Open the BINARY.DAT file for reading. Read the first 20 characters of the file using the read() function. Close the file. Create an array of out of the characters in the $buffer variable and iterate over that array using a foreach loop. Print the value of the current array element in hexadecimal format. Print a newline character the current array element is a newline character. |
Listing 9.10-09LST10.PL - Reading a File to Show the Text Mode End-of-File Character |
|
This program displays:
30 31 0d 0a
30 32 0d 0a
Pseudocode |
Initialize a buffer variable. Open the BINARY.DAT file for reading. Change the mode to binary. Read the first 20 characters of the file using the read() function. Close the file. Create an array of out of the characters in the $buffer variable and iterate over that array using a foreach loop. Print the value of the current array element in hexadecimal format. Print a newline character the current array element is a newline character. |
Listing 9.11-09LST11.PL - Reading a File to Show that Binary Mode Does Not Recognize the End-of-File Character |
|
This program displays:
30 31 0d 0a
30 32 0d 0a
1a 30 33 0d 0a
You've already seen that you can read a file directly into a regular array using this syntax:
@array = <FILE_HANDLE>;
Unfortunately, there is no similar
way to read an entire file into a hash. But it's still pretty easy to do. The
following example will use the line number as the hash key for each line of a
file.
Pseudocode |
Open the FIXED.DAT file for reading. For each line of FIXED.DAT Create a hash element using the record number special variable ($.) as the key and the line of input ($_) as the value. Close the file. Iterate over the keys of the hash. Print each key, value pair. |
Listing 9.12-09LST12.PL - Reading a Fixed Length Record with Fixed Length Fields into a Hash |
|
This program displays:
1: 1212Jan Jaspree Painter
2: 3453Kelly Horton Jockey
Pseudocode |
Assign the return list from the stat() function to 13 scalar variables. Print the scalar values. |
Listing 9.13-09LST13.PL - Using the stat() Function |
|
In the DOS environment, this program displays:
dev = 2
ino = 0
mode = 33206
nlink = 1
uid = 0
gid = 0
rdev = 2
size = 13
atime = 833137200
mtime = 833195316
ctime = 833194411
blksize =
blocks =
Some of this information is specific to the UNIX environment
and is beyond the scope of this book. For more information on this topic see
Que's 1994 Edition of Using Unix. One interesting piece of information is
the $mtime value - the date and time of the last modification made to
the file. You can interpret this value by using the following line of code:
($sec, $min, $hr, $day, $month, $year, $day_Of_Week,
$julianDate, $dst) = localtime($mtime);
If you are only interested
in the modification date, you can use the array slice notation to just grab that
value from the 13 element array returned by stat(). For example:
$mtime = (stat("eof.dat"))[9];
Notice that the stat()
function is surrounded by parentheses so that the return value is evaluated in
an array context. Then the ninth element is assigned to $mtime. You can
use this technique whenever a function returns a list.
Finding out which files are in a directory is done with the opendir(), readdir(), and closedir() functions. The next example will show you how to create a list of all Perl programs in the current directory - well, at least those files that end with the pl extension.
Pseudocode |
Open the current directory using DIR as the directory handle. Read a list of file names using the readdir() function; extract only those that end in pl; and the sort the list. The sorted list is assigned to the @files array variable. Close the directory. Print the file names from the @files array unless the file is a directory. |
Listing 9.14-09LST14.PL - Print All Files in the Current Directory Whose Name Ends in PL |
|
This program will display each file name that ends in pl on a separate line. If you need to know the number of Perl programs, evaluate the @files array in a scalar context. For example:
$num_Perl_Programs = @files;
Tip |
For this example, I modified the naming convention
used for the variables. I feel that $num_Perl_Programs is easier
to read than $numPerlPrograms. No naming convention should be
inflexible. Use it as a guideline and break the rules when it seems
wise. |
The print() function is used to send output to a file handle. Most of the time, we've been using STDOUT as the file handle. Since STDOUT is the default, we did not need to specify it. The syntax for the print() function is:
print FILE_HANDLE (LIST)
You can see from the syntax that
print() is a list operator because it's looking for a list of values to
print. If you don't specify a list, then $_ will be used. You can
change the default file handle by using the select() function. Let's
take a look at this:
Pseudocode |
Open TESTFILE.DAT for output. Change the default file handle for write and print statements. Notice that the old default handle is returned and saved in the $oldHandle variable. This line prints to the default handle which now the TESTFILE.DAT file. Change the default file handle back to STDOUT. This line prints to STDOUT. |
open(OUTPUT_FILE, ">testfile.dat");
$oldHandle = select(OUTPUT_FILE);
print("This is line 1.\n");
select($oldHandle);
print("This is line 2.\n");
This program displays:
This is line 2.
And creates the TESTFILE.DAT file with a
single line in it:
This is line 1.
Perl also has the printf() function which
lets you be more precise in how things are printed out. The syntax for
printf() looks like this:
printf FILE_HANDLE (FORMAT_STRING, LIST)
Like print(),
the default file handle is STDOUT. The FORMAT_STRING parameter
controls what is printed and how it looks. For simple cases, the formatting
parameter looks identical to the list that is passed to printf(). For
example:
Pseudocode |
Create two variables to hold costs for January and February. Print the cost variables using variable interpolation. Notice that the dollar sign needs to be preceded by the backslash to avoid interpolation that you don't want. |
$januaryCost = 123.34;
$februaryCost = 23345.45;
printf("January = \$$januaryCost\n");
printf("February = \$$februaryCost\n");
This program displays:
January = $123.34
February = $23345.45
In this example, only one parameter is passed to
the printf() function - the formatting string. Since the formatting
string is enclosed in double quotes, variable interpolation will take place just
like for the print() function.
This display is not good enough for a report because the decimal points of the numbers do not line up. You can use the formatting specifiers shown in Table 9.5 together with the modifiers shown in Table 9.6 to solve this problem.
Specifier | Description |
---|---|
c | Indicates that a single character should be printed. |
s | Indicates that a string should be printed. |
d | Indicates that a decimal number should be printed. |
u | Indicates that a unsigned decimal number should be printed. |
x | Indicates that a hexadecimal number should be printed. |
o | Indicates that an octal number should be printed. |
e | Indicates that a floating point number should be printed in scientific notation. |
f | Indicates that a floating point number should be printed. |
g | Indicates that a floating point number should be printed using the most space-spacing format, either e or f. |
Modifier | Description |
---|---|
- | Indicates that the value should be printed left-justified. |
# | Forces octal numbers to be printed with a leading zero. Hexadecimal numbers will be printed with a leading 0x. |
+ | Forces signed numbers to be printed with a leading + or - sign. |
0 | Pads the displayed number with zeros instead of spaces. |
. | Forces the value to be at least a certain width. For example, %10.3f means that the value will be at least 10 positions wide. And since f is used for floating point, at most 3 positions to the right of the decimal point will be displayed. %.10s will print a string at most 10 characters long. |
Pseudocode |
Create two variables to hold costs for January and February. Print the cost variables using format specifiers. |
$januaryCost = 123.34;
$februaryCost = 23345.45;
printf("January = \$%8.2f\n", $januaryCost);
printf("February = \$%8.2f\n", $februaryCost);
This program displays:
January = $ 123.34
February = $23345.45
This example uses the f format specifier
to print a floating point number. The numbers are printed right next to the
dollar sign because $februaryCost is 8 positions width.
If you did not know the width of the numbers that you need to print in advance, you could use the following technique.
Pseudocode |
Create two variables to hold costs for January and February. Find the length of the largest number. Print the cost variables using variable interpolation to determine the width of the numbers to print. Define the max() function. You can look in the "Example: Foreach Loops" of Chapter 7, "Control Statements," for more information about the max() function. |
Listing 9.15-09LST15.PL - Using Variable Interpolation to Align Numbers When Printing |
|
This program displays:
January = $ 123.34
February = $23345.45
While taking the time to find the longest number
is more work. I think you'll agree that the result is worth it.
Tip |
In the next chapter, "Regular Expressions," you
see how to add commas when printing numbers for even more readability when
printing numbers. |
So far, we've only looked at printing numbers. You can also use printf() to control printing strings. Like the printing of numbers above, printf() is best used for controlling the alignment and length of strings. Here is an example:
Pseudocode |
Assign "John O'Mally" to $name. Print using format specifiers to make the value 10 characters wide but only print the first 4 characters from the string. |
$name = "John O'Mally";
printf("The name is %10.4s.\n", $name);
This program displays:
The name is John.
The left side of the period modifier
controls the width of the printed value also called the print field. If
the length of the string to be printed is less than the width of the print
field, then the string is right justified and padded with spaces.
You can left-justify the string by using the dash modifier. For example:
Pseudocode |
Assign "John O'Mally" to $name. Print using format specifiers to left-justify the value. |
$name = "John O'Mally";
printf("The name is %-10.5s.\n", $name);
This program displays:
The name is John .
The period way off to the right shows that
the string was left-justified and padded with spaces until it was 10 positions
wide.
unlink(<*.bak>);
The file specification, *.bak, is
placed between the diamond operator and when evaluated returns a list of files
that match the specification. An asterisk means zero or more of any character
will be matched. So this unlink() call will delete all files with a
BAK extension.
You can use the following:
@array = <f*.*>;
To get a list of all files that start with
the letter f. The next chapter, "Regular Expressions," will show
you more ways to specify file names. Most of the meta-characters used in Chapter
10 can be used inside globs.
1212:Jan:Jaspree:Painter
3453:Kelly:Horton:Jockey
The individual fields or values are separated
from each other by the colon (:) character. The split()
function will be used to create an array of fields. And then a foreach
loop will print the fields. Listing 9.16 shows how to input lines from a file
and split them into fields.
Pseudocode |
Use the qw() notation to create an array of words. Open the FIELDS.DAT file for input. Loop while there are lines to read in the file. Use the split function to create an array of fields, using the colon as the field separator. The scalar value of @fieldList is passed to split to indicate how many fields to expect. Each element in the new array is then added to the %data hash with a key of the field name. Loop through @fieldList array. Print each element and its value in the %data hash. |
Listing 9.16-09LST16.PL - Reading Records from Standard Input |
|
This program will display:
fName = 1212
lName = Jan
job = Jaspree
age = Painter
fName = 3453
lName = Kelly
job = Horton
age = Jockey
The first line of this program uses the
qw() notation to create an array of words. It is identical to
@fieldList = ("fName", "lName", "job", "age"); but without the
distracting quotes and commas.
The split statement might require a little explanation. It is duplicated here so that you can focus on it.
@data{@fieldList} = split(/:/, $_, scalar @fieldList);
Let's use
the first line of the input file as an example. The first line looks like this:
1212:Jan:Jaspree:Painter
The first thing that happens is that
split creates an array using the colon as the separator, this creates an array
that looks like this:
("1212", "Jan", "Jaspree", "Painter")
You can substitute this list
in place of the split() function in the statement.
@data{@fieldList} = ("1212", "Jan", "Jaspree", "Painter");
And you
already know that @fieldList is a list of field name. So the statement
can be further simplified to:
@data{"fName", "lName", "job", "age"} =
("1212", "Jan", "Jaspree", "Painter");
This assignment statement
shows that each array element on the right is paired with a key value on the
left so that four separate hash assignments are taking place in this statement.
Let's review what you know about files. You read that files are a series of bytes stored somewhere outside the computer's memory. Most of the time, a file will be on a hard disk in a directory. But the file could also be on a floppy disk or on a networked computer. The physical location is not important as long as you know the fully qualified file name. This name will include any computer name, drive name, and directory name that is needed to uniquely identify the file.
There are three files - actually file handles - that are always opened before your program starts. These are STDIN, STDOUT, and STDERR. The STDIN file handle is used to connect to the standard input, usually the keyboard. You can use the < character to override the standard input on the command line so that input comes from a file instead of the keyboard. The STDOUT file handle is used to connect to the standard output, usually the monitor. The > character is used to override the standard output. And finally, the STDERR file handle is used when you want to output error messages. STDERR usually points to the computer's monitor.
The diamond operator (<>) is used to read an entire line of text from a file. It stops reading when the end of line character - the newline - character is read. The returned string always includes the newline character. If no file handle is used with the diamond operator, it will attempt to read from files listed in the @ARGV array. If that array is empty, it will read from STDIN.
Next, you read about Perl's file test operators. There are way too many to recap here, but some of the more useful ones are the -d used to test for a directory name, -e used to see if a file exists, and -w to see if a file can be written to. The special file handle, _, can be used to prevent Perl from making a second system call if you need to make two tests on the same file one right after another.
A table of file functions (refer to Table 9.3) was shown which shows many functions that deal with opening files, reading and writing information, and closing files. Some functions were specific to UNIX, although not many.
You learned how to open a file and that files can be opened for input, for output, or for appending. When you read a file, you can use text mode (the default) or binary mode. In binary mode on DOS systems, line endings are read as two characters - the line feed and the carriage return. On both DOS and UNIX systems, binary mode lets you read the end of file character as regular characters with no special meaning.
Reading file information directly from the directory was shown to be very easy by using the opendir(), readdir(), and closedir() functions. An example was given that showed how to find all files with an extension of PL by using the grep() function in conjunction with readdir().
Then, we looked closely at the print() and printf() functions. Both can be used to send output to a file handle. The select() function was used to change the default handle from STDOUT to another file. In addition, some examples were given of the formatting options available with the printf() function.
The topic of Globbing was briefly touched on. Globs let you specify a file name using wildcards. A list of file names is returned that can be processed like any other array.
And finally, you read about how to split a record into fields based on a separator character.
This chapter covered a lot of ground. And some of the examples did not relate to each other. Instead, I tried to give you a feel for the many ways that files can be used. A entire book can be written on the different ways to use files. But, you now know enough to create any kind of file that you might need.
Chapter 10, "Regular Expressions," covers the most difficult topics related to Perl. In fact, Perl's regular expressions are one of the main reasons to learn the language. Few other languages will give you equivalent functionality.
open(FILE_ONE, ">FILE_ONE.DAT");
open(FILE_TWO, ">>FILE_TWO.DAT");
(stat("09lst01.pl"))[7];
printf("%x", 16);
open(FILE, "dir *.pl |");