| |||
Links Sections Chapters Part I: Basic Perl 02-Numeric and String
Literals Part II: Intermediate Perl Part III: Advanced Perl 13-Handling Errors and
Signals Part IV: Perl and the Internet 21-Using Perl with Web
Servers Appendixes |
Perl has a few special features that let you create simple reports. The reports can have a header area where you can place a title, page number, and other information that stays the same from one page to the next. Perl will track how many lines have been used in the report and automatically generate new pages as needed.
Compared to learning about regular expressions, learning how to create reports will be a breeze. There are only a few tricky parts, which I'll be sure to point out.
This chapter starts out by using the print() function to display a CD collection and then gradually moves from displaying the data to a fully formatted report. The data file shown in Listing 11.1 is used for all of the examples in this chapter. The format is pretty simple: the CD album's title, the artist's name, and the album's price.
Listing 11.1-FORMAT.DAT - The Data File |
|
You'll find that Perl is very handy for small text-based data files like this. You can create them in any editor and use any field delimiter you like. In this file, I used an exclamation point to delimit the field. However, I could just as easily have used a caret, a tilde, or any other character.
Now that we have some data, let's look at Listing 11.2, which is a program that reads the data file and displays the information.
Pseudocode |
Open the FORMAT.DAT file. Read all the file's lines and place them in the @lines array. Each line becomes a different element in the array. Close the file. Iterate over the @lines array. $_ is set to a different array element each time through the loop. Remove the linefeed character from the end of the string. Split the string into three fields using the exclamation point as the delimiter. Place each field into the $album, $artist, and $price variables. Print the variables. |
Listing 11.2-11LIST02.PL - A Program to Read and Display the Data File |
|
This program displays:
Use of uninitialized value at 11lst02.pl line 8.
Album=The Lion King Artist= Price=
Album=Tumbleweed Connection Artist=Elton John Price=123.32
Album=Photographs & Memories Artist=Jim Croce Price=4.95
Album=Heads & Tales Artist=Harry Chapin Price=12.50
Why is an
error being displayed on the first line of the output? If you said that the
split() function was returning the undefined value when there was no
matching field in the input file, you were correct. The first input line was the
following:
The Lion King!
There are no entries for the Artist or Price
fields. Therefore, the $artist and $price variables were
assigned the undefined value, which resulted in Perl complaining about
uninitialized values. You can avoid this problem by assigning the empty string
to any variable that has the undefined value. Listing 11.3 shows a program that
does this.
Pseudocode |
Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields are not present in the line, provide a default value of an empty string. Print the variables. |
Listing 11.3-11LST03.PL - How to Avoid the Uninitialized Error When Using the Split() Function |
|
Clarification Note |
The following code lines are responsible for
assigning a null string value to the three variables if no information was
present in the record: $album = "" if !defined($album); $artist = "" if !defined($artist); $price = "" if !defined($price); The defined() function is used to see if each variable is defined. If a variable has no value, then the "" string is assigned to it. |
Errata Note |
The printed version of this book showed the split call to be: ($album, $artist, $price) = (split(/::/)); which was incorrect. |
The first four lines this program displays are the following: Album=The Lion King Artist= Price=
Album=Tumbleweed Connection Artist=Elton John Price=123.32
Album=Photographs & Memories Artist=Jim Croce Price=4.95
Album=Heads & Tales Artist=Harry Chapin Price=12.50
The error
has been eliminated, but it is still very hard to read the output because the
columns are not aligned. The rest of this chapter is devoted to turning this
jumbled output into a report.
Perl reports have both heading and detail lines. A heading is used to identify the report title, the page number, the date, and any other information that needs to appear at the top of each page. Detail lines are used to show information about each record in the report. In the data file being used for the examples in this chapter (see Listing 11.1), each CD has its own detail line.
Headings and detail lines are defined by using format statements, which are discussed in the next section.
format FORMATNAME =
FIELD_LINE
VALUE_LINE
.
The FORMATNAME is usually the same name as the file handle
that is used to accept the report output. The section "Example:
Changing Formats," later in this chapter, talks about using the
format statement where the FORMATNAME is different from the
file handle. If you don't specify a FORMATNAME, Perl uses
STDOUT. The FIELD_LINE part of the format statement consists
of text and field holders. A field holder represents a given line width
that Perl will fill with the value of a variable. The VALUE_LINE line
consists of a comma-delimited list of expressions used to fill the field holders
in FIELD_LINE.
Report headings, which appear at the top of each page, have the following format:
format FORMATNAME_TOP =
FIELD_LINE
VALUE_LINE
.
Yes, the only difference between a detail line and a heading is that
_TOP is appended to the FORMATNAME.
Note |
The location of format statements is unimportant because they define only a format and are never executed. I feel that they should appear either at the beginning of a program or the end of a program, rarely in the middle. Placing format statements in the middle of your program might make them hard to find when they need to be changed. Of course, you should be consistent where you place them. |
A typical format statement might look like this:
format =
The total amount is $@###.##
$total
.
The at character @ is used to start a field holder. In this
example, the field holder is seven characters long (the at sign and decimal
point count, as well as the pound signs #). The next section, "Example:
Using Field Lines," goes into more detail about field lines and field
holders.
Format statements are used only when invoked by the write() function. The write() function takes only one parameter: a file handle to send output to. Like many things in Perl, if no parameter is specified, a default is provided. In this case, STDOUT will be used when no FORMATNAME is specified. In order to use the preceding format, you simply assign a value to $total and then call the write() function. For example:
$total = 243.45;
write();
$total = 50.00;
write();
These lines will display:
The total amount is $ 243.45
The total amount is $ 50.50
The output will be sent to
STDOUT. Notice that the decimal points are automatically lined up when
the lines are displayed.
You saw a field holder in action in the last section in which I demonstrated sending the report to STDOUT. I'll repeat the format statement here so you can look at it in more detail:
format =
The total amount is $@###.##
$total
.
The character sequence The total amount is $ is static text.
It will not change no matter how many times the report is printed. The character
sequence @###.##, however, is a field holder. It reserves seven spaces
in the line for a number to be inserted. The third line is the value line; it
tells Perl which variable to use with the field holder. Table 11.11 contains a
list of the different format characters you can use in field lines.
Format Character | Description |
---|---|
@ | This character represents the start of a field holder. |
< | This character indicates that the field should be left-justified. |
> | This character indicates that the field should be right-justified. |
| | This character indicates that the field should be centered. |
# | This character indicates that the field will be numeric. If used as the first character in the line, it indicates that the entire line is a comment. |
. | This character indicates that a decimal point should be used with numeric fields. |
^ | This character also represents the start of a field holder. Moreover, it tells Perl to turn on word-wrap mode. See the section "Example: Using Long Pieces of Text in Reports" later in this chapter for more information about word-wrapping. |
~ | This character indicates that the line should not be written if it is blank. |
~~ | This sequence indicates that lines should be written as needed until the value of a variable is completely written to the output file. |
@* | This sequence indicates that a multi-line field will be used. |
Let's start using some of these formatting characters by formatting a report to display information about the FORMAT.DAT file we used earlier. The program in Listing 11.4 displays the information in nice, neat columns.
Pseudocode |
Declare a format for the STDOUT file handle. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields are not present in the line, provide a default value of an empty string. Notice that a numeric value must be given to $price instead of the empty string. Invoke the format statement by using the write() function. |
Listing 11.4-11LST04.PL - Using a Format with STDOUT |
|
This program displays the following:
Album=The Lion King Artist= Price=$ 0.00
Album=Tumbleweed Con Artist= Elton John Price=$123.32
Album=Photographs & Artist= Jim Croce Price=$ 4.95
Album=Heads & Tales Artist= Harry Chapin Price=$ 12.50
You can
see that the columns are now neatly aligned. This was done with the
format statement and the write() function. The format
statement used in this example used three field holders. The first field holder,
@<<<<<<<<<<<<<, created a
left-justified spot for a 14-character-wide field filled by the value in
$album. The second field holder,
@>>>>>>>>>>>>, created a
right-justified spot for a 12-character-wide field filled by the value in
$artist. The last field holder, @##.##, created a
6-character-wide field filled by the numeric value in $price.
You might think it's wasteful to have the field labels repeated on each line, and I would agree with that. Instead of placing field labels on the line, you can put them in the report heading. The next section discusses how to do this.
To add a heading to the report about the CD collection, you might use the following format statement:
format STDOUT_TOP =
@|||||||||||||||||||||||||||||||||||| Pg @<
"CD Collection of David Medinets", $%
Album Artist Price
----------------- ---------------- -------
.
Adding this format statement to Listing 11.4 produces this
output:
CD Collection of David Medinets Pg 1
Album Artist Price
----------------- ---------------- -------
Album=The Lion King Artist= Price=$ 0.00
Album=Tumbleweed Con Artist= Elton John Price=$123.32
Album=Photographs & Artist= Jim Croce Price=$ 4.95
Album=Heads & Tales Artist= Harry Chapin Price=$ 12.50
Whenever
a new page is generated, the heading format is automatically invoked. Normally,
a page is 60 lines long. However, you can change this by setting the $=
special variable.
Another special variable, $%, holds the current page number. It will be initialized to zero when your program starts. Then, just before invoking the heading format, it is incremented so its value is one. You can change $% if you need to change the page number for some reason.
You might notice that the | formatting character was used to center the report title over the columns. You might also notice that placing the field labels into the heading allows the columns to be expanded in width.
Unfortunately, Perl does not truly have any facility for adding footer detail lines. However, you can try a bit of "magic" in order to fool Perl into creating footers with static text. The $^L variable holds the string that Perl writes before every report page except for the first, and the $= variable holds the number of lines per page. By changing $^L to hold your footer and by reducing the value in $= by the number of lines your footer will need, you can create primitive footers. Listing 11.5 displays the CD collection report on two pages by using this technique.
Pseudocode |
Declare a format for the STDOUT file handle. Declare a heading format for the STDOUT file handle. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Assign a value of 6 to $=. Normally, it has a value of 60. Changing the value to 6 will create very short pages - ideal for small example programs. Assign a string to $^L, which usually is equal to the form-feed character. The form-feed character causes printers to eject a page. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields are not present in the line, provide a default value of an empty string. Notice that a numeric value must be given to $price instead of the empty string. Invoke the format statement using the write() function. Print the footer on the last page. You need to explicitly do this because the last page of the report will probably not be a full page. |
Listing 11.5-11LST05.PL - Tricking Perl into Creating Primitive Footers |
|
This program displays the following:
CD Collection of David Medinets Pg 1
Album Artist Price
----------------- ---------------- -------
Album=The Lion King Artist= Price=$ 0.00
Album=Tumbleweed Con Artist= Elton John Price=$123.32
------------------------------------------------------------
Copyright, 1996, Eclectic Consulting
CD Collection of David Medinets Pg 2
Album Artist Price
----------------- ---------------- -------
Album=Photographs & Artist= Jim Croce Price=$ 4.95
Album=Heads & Tales Artist= Harry Chapin Price=$ 12.50
------------------------------------------------------------
Copyright, 1996, Eclectic Consulting
Let me explain the assignment to
$^L in more detail. The assignment is duplicated here for your
convenience:
$^L = '-' x 60 . "\n" .
"Copyright, 1996 by Eclectic Consulting\n" .
"\n\n";
The first part of the assignment, '-' x 60,
creates a line of 60 dash characters. Then a newline character is concatenated
to the line of dashes. Next, the copyright line is appended. Finally, two more
linefeeds are appended to separate the two pages of output. Normally, you
wouldn't add the ending linefeeds because the form-feed character makes them
unnecessary. Here's how the code would look when designed to be sent to a
printer:
$^L = '-' x 60 . "\n" .
"Copyright, 1996 by Eclectic Consulting" .
"\014";
The "\014" string is the equivalent of a
form-feed character because the ASCII value for a form-feed is 12, which is 14
in octal notation.
Note |
I feel that it's important to say that the coding style in this example is not really recommended for "real" programming. I concatenated each footer element separately so I could discuss what each element did. The last three elements in the footer assignment should probably be placed inside one string literal for efficiency. |
Tip |
This example is somewhat incomplete. If the last
page of the report ends at line 20 and there are 55 lines per page, simply
printing the $^L variable will not place the footer at the bottom
of the page. Instead, the footer will appear after line 20. This is
probably not the behavior you'd like. Try the following statement to fix
this problem:
print("\n" x $- . "$^L"); This will concatenate enough linefeeds to the beginning of the footer variable to place the footer at the bottom of the page. |
Pseudocode |
Declare a format for the STDOUT file handle. In this example, the value line calls the dotize() function. Declare a heading format for the STDOUT file handle. Declare the dotize() function. Initialize local variables called $width and $string. If the width of $string is greater than $width, return a value that consists of $string shortened to $width-3 with ... appended to the end; otherwise, return $string. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields are not present in the line, provide a default value of an empty string. Notice that a numeric value must be given to $price instead of the empty string. Invoke the format statement by using the write() function. |
Listing 11.6-11LIST05.PL - Using a Function with a Value Line |
|
This program displays the following:
CD Collection of David Medinets Pg 1
Album Artist Price
----------------- ---------------- -------
The Lion King $ 0.00
Tumbleweed Con... Elton John $123.32
Photographs & ... Jim Croce $ 4.95
Heads & Tales Harry Chapin $ 12.50
The second and third
detail lines have benefited from the dotize() function. You can use a
similar technique to invoke any function in the value line. You can also use
expressions directly in the value line, but it might be harder to maintain
because the intent of the expression might not be clear.
Pseudocode |
Declare a format for the STDOUT file handle. Declare a format for the total price information. Declare a heading format for the STDOUT file handle. Declare the dotize() function. Initialize local variables called $width and $string. If the width of $string is greater than $width, return a value that consists of $string shortened to $width-3 with ... appended to the end; otherwise, return $string. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Initialize the $total variable to zero. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. Provide a default value for any empty variables. Invoke the format statement by using the write() function. Change the current format by assigning a value to the $~ special variable. Invoke the format statement by using the write() function. |
Listing 11.7-11LST07.PL - Using an Alternative format Statement |
|
This program displays the following:
CD Collection of David Medinets Pg 1
Album Artist Price
----------------- ---------------- --------
The Lion King $ 0.00
Tumbleweed Con... Elton John $ 123.32
Photographs & ... Jim Croce $ 4.95
Heads & Tales Harry Chapin $ 12.50
---------------------------------------------
$ 140.77
This example shows you
how to keep a running total and how to switch to an alternative detail line
format. If you need to switch to an alternative heading format, assign the new
header format name to the $^ special variable.
Pseudocode |
Declare a format for the STDOUT file handle. The field and value lines are repeated enough times to print the entire length of the expected output. Initialize the $word and $definition variables. The $definition variable is initialized by using concatenated strings to avoid line breaks caused by the book printing process. A line of asterisks is printed. The format is invoked. Another line of asterisks is printed. |
Listing 11.8-11LST08.PL - Using the ^ Formatting Character to Print Long Text Values |
|
This program displays the following:
****************
outlier 1. someone sleeping outdoors. 2.
someone whose office is not at
home. 3. an animal who strays from
the fold. 4. something that has
been separated from the main body.
****************
The ^ formatting character causes Perl to do
word-wrapping on the specified variable. Word-wrapping means that Perl
will accumulate words into a temporary buffer, stopping when the next word will
cause the length of the accumulated string to exceed the length of the field.
The accumulated string is incorporated into the report, and the accumulated
words are removed from the variable. Therefore, the next time Perl looks at the
variable, it can start accumulating words that have not been used yet.
Note |
Any linefeed characters in the variable are ignored when the ^ formatting character is used in the format statement. |
Caution |
Because the value of the variable used in the value line changes when word-wrapping is being used, make sure to use only copies of variables in the format statement. By using copies of the variables, you'll still have the original value available for further processing. |
The asterisks in the preceding example were printed to show that a blank line was printed by the format. This was caused because the $definition variable ran out of words before the format ran out of space. Extra blank lines can be eliminated by placing the ~ character somewhere - usually at the beginning or end - of the field line. The format statement would then look like this:
format =
^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~
$word, $definition
^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~
$word, $definition
^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~
$word, $definition
^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~
$word, $definition
^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~
$word, $definition
^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~
$word, $definition
.
The new report would not have a blank line.
****************
outlier 1. someone sleeping outdoors. 2.
someone whose office is not at
home. 3. an animal who strays from
the fold. 4. something that has
been separated from the main body.
****************
It is rather wasteful to have to repeat the field
lines often enough to account for the longest possible length of
$definition. In fact, if you are reading the definitions from a file,
you might not know how long the definitions could be ahead of time. Perl
provides the ~~ character sequence to handle situations like this. By
placing ~~ on the field line, Perl will repeat the field line as often
as needed until a blank line would be printed. Using this technique would change
the format statement to this:
format =
^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~
$word, $definition
.
You might be wondering how Perl decides when a word ends. This
behavior is controlled by the $: variable. The default value for
$: is a string consisting of the space, newline, and dash characters.
Listing 11.9 shows how easy it is to convert an existing program from using STDOUT to using a file. The program shown is a reworking of the program in Listing 11.4. Four changes needed to be made for the conversion. The format statement was changed to specify a format name identical to the file handle used in the second open() statement. A second open() statement was added. The write() function was changed to specify the file handle to use, and a second close() statement was added.
Pseudocode |
Declare a format for the CD_REPORT file handle. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Open the FORMAT.RPT file for output to hold the report. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields are not present in the line, provide a default value of an empty string. Notice that a numeric value must be given to $price instead of the empty string. Invoke the format statement by using the write() function specifying the file handle to use. Close the FORMAT.RPT file. |
Listing 11.9-11LST09.PL - Saving a Report in a File |
|
This program creates a file called FORMAT.RPT that contains the following:
Album=The Lion King Artist= Price=$ 0.00
Album=Tumbleweed Con Artist= Elton John Price=$123.32
Album=Photographs & Artist= Jim Croce Price=$ 4.95
Album=Heads & Tales Artist= Harry Chapin Price=$ 12.50
The
contents of FORMAT.RPT are identical to the display created by the
program in Listing 11.4.
Using more than one format in reports destined for files is slightly more complicated than it was when STDOUT was used. The process is more involved because you need to make the output file handle the default file handle before setting the $~ or $^ special variables.
Pseudocode |
Declare a format for the CD_REPORT file handle. Declare a format for the total price information using CD_REPORT_TOTAL as the format name. Declare a heading format for the CD_REPORT file handle using CD_REPORT_TOP as the format name. Declare the dotize() function. Initialize local variables called $width and $string. If the width of $string is greater than $width, return a value that consists of $string shortened to $width-3 with ... appended to the end; otherwise, return $string. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Open the FORMAT.RPT file for output to hold the report. Initialize the $total variable to zero. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. Provide a default value for any empty variables. Invoke the format statement by using the write() function specifying the CD_REPORT file name. Change the current format by assigning a value to the $~ special variable. This statement uses some advanced concepts and is explained further after the listing. Invoke the format statement by using the write() function. Close the FORMAT.RPT file. |
Listing 11.10-11LST10.PL - Using an Alternative format Statement |
|
This program creates a file called FORMAT.RPT that contains the following:
CD Collection of David Medinets Pg 1
Album Artist Price
----------------- ---------------- --------
The Lion King $ 0.00
Tumbleweed Con... Elton John $ 123.32
Photographs & ... Jim Croce $ 4.95
Heads & Tales Harry Chapin $ 12.50
---------------------------------------------
$ 140.77
The contents of
FORMAT.RPT are identical to the display created by the program in
Listing 11.7.
The statement that changes a default file handle and format name is a little complicated. Let's take a closer look at it.
select((select(CD_REPORT), $~ = "CD_REPORT_TOTAL")[0]);
In order
to understand most statements, you need to look at the innermost parenthesis
first, and this one is no different. The innermost expression to evaluate is
select(CD_REPORT), $~ = "CD_REPORT_TOTAL"
You might recall that
the comma operator lets you place one or more statements where normally you can
place only one. That's what is happening here. First, CD_REPORT is
selected as the default file handle for the print and write
statements, and then the $~ variable is changed to the new format name.
By enclosing the two statements inside parentheses, their return values are used
in an array context. You have probably already guessed that the [0]
notation is then used to retrieve the first element of the array: the value
returned from the select() function. Because the select()
function returns the value of the previous default file handle, after executing
the second select(), the default file handle is restored to its
previous value.
This bit of code could have been written like this:
$oldhandle = select(CD_REPORT);
$~ = "CD_REPORT_TOTAL";
select($oldhandle);
Header and detail lines are defined by using format statements that have alternating field and value lines. The field lines hold the static text and field holders while the value lines hold a comma-delimited list of expressions.
You can use several different format characters when creating the field holder to have left-justified, right-justified, or centered fields. You can also use word-wrapping to display long pieces of text in your reports.
Directing a report to a file instead of to STDOUT required some simple steps. The output file needs to be opened; the file handle needs to be specified as the format name in the format statement; the format name needs to be specified in the write statement; and the output file needs to be closed.
The next chapter focuses on special variables. All the different special variables you have seen so far - and more - are discussed along with some examples of how to use them .
select((select(ANNUAL_RPT), $^ = "REGIONAL_SALES")[0]);