< Apache Cookbook by Ken Coar, Rich Bowen Apache Cookbook by Ken Coar, Rich Bowen

Full online text of Apache Cookbook by Ken Coar, Rich Bowen

From BookGlutton.com

Apache Cookbook

Ken Coar

Rich Bowen

O'Reilly Media

Preface

Platform Notes

The recipes in this book are geared toward two major platforms: Unixish ones (such as Linux, FreeBSD, and Solaris) and Windows. There are many that have no platform-specific aspects, and for those any mention of the underlying operating system or hardware is gratefully omitted. Due to the authors' personal preferences and experiences, Unixish coverage is more complete than that for the Windows platforms. However, contributions, suggestions, and corrections for Windows-specific recipes will be gladly considered for future revisions and inclusion on the web site.

Other Books

There are a number of books currently in print that deal with the Apache web server and its operation. Among them are:

Apache: The Definitive Guide, Third Edition (O'Reilly)

Apache Server Unleashed (Macmillan)

Apache Administrator's Handbook (Macmillan)

You can also keep an eye on a couple of web pages that track Apache titles:

http://Apache-Server.Com/store.html

http://httpd.apache.org/info/apache_books.html

Other Sources

In addition to books, there is a wealth of information available online. There are web sites, mailing lists, and USENET newsgroups devoted to the use and management of the Apache web server. The web sites are limitless, but here are some active and useful sources of information.

The comp.infosystems.www.servers.unix and comp.infosystems.www.servers.ms-dos

How This Book Is Organized

This book is broken up into twelve chapters and two appendixes, as follows:

Chapter 1 covers the basics of installing the vanilla Apache software, from source on Unixish systems, and on Windows from the Microsoft Software Installer (MSI) package built by the Apache developers.

Chapter 2 describes the details of installing some of the most common third-party modules, and includes generic instructions that apply to many others that have less complex installation needs.

Chapter 3

Conventions Used in This Book

Throughout this book certain stylistic conventions are followed. Once you are accustomed to them, you can easily distinguish between comments, commands you need to type, values you need to supply, and so forth.

In some cases, the typeface of terms in the main text will be different and likewise in code examples. The details of what the different styles (italic, boldface, etc.) mean are described in the following sections.

Programming Conventions

In this book, most case examples of code will be in the form of excerpts from scripts, rather than actual application code. When commands need to be issued at a command-line prompt (such as an xterm for a Unixish system or a DOS command prompt for Windows), they will look something like this:

% find /usr/local -name apachectl -print # /usr/local/apache/bin/apachectl graceful C:>cd "\Program Files\Apache Group\Apache\bin" C:\Program Files\Apache Group\Apache\bin>apache -k stop

On Unixish systems, command prompts that begin with # indicate that you need to be logged in as the superuser (root username); if the prompt begins with %

We'd Like to Hear from You

We have tested and verified the information in this book to the best of our ability, but you may find that features have changed (which may in fact resemble bugs). Please let us know about any errors you find, as well as your suggestions for future editions, by writing to:

Please address comments and questions concerning this book to the publisher:

O'Reilly & Associates, Inc.

Acknowledgments

Originally, each recipe was going to be individually attributed, but that turned out to be logistically impossible.

Many people have helped us during the writing of this book, by posing a problem, providing a solution, proofreading, reviewing, editing, or just (!) providing moral support. This multitude, to each of whom we are profoundly grateful, includes Nat Torkington (our project editor and demonstrator of Herculean feats of patience), Sharco and Guy- from #apache on irc.freenode.net, Mads Toftum, Morbus Iff (known to the FBI under the alias Kevin Hemenway), and Andy Holman.

Ken Coar

Chapter 1. Installation

For this cookbook to be useful, you need to install the Apache web server software. So what better way to start than with a set of recipes that deal with the installation?

There are many ways of installing this package; one of the features of open software like Apache is that anyone may make an installation kit. This allows vendors (such as Debian, FreeBSD, Red Hat, Mandrake, Hewlett-Packard, and so on) to customize the Apache file locations and default configuration settings so that these settings fit with the rest of their software. Unfortunately, one of the consequences of customization is that the various prepackaged installation kits are almost all different from one another.

In addition to installing it from a prepackaged kit, of which the variations are legion, there's always the option of building and installing it from the source yourself. This has both advantages and disadvantages; on the one hand you know exactly what you installed and where you put it, but on the other hand, it's likely that binary add-on packages will expect files to be in locations different than those you have chosen.

Installing Apache on Windows

Problem

You want to install the Apache web server software on a Windows platform.

Tip

If you already have Apache installed on your Windows system, remove it before installing a new version. Failure to do this results in unpredictable behavior. See Recipe 1.7.

Solution

Primarily, Windows is a graphically oriented environment, so the Apache install for Windows is correspondingly graphical in nature.

The simplest way to install Apache is to download and execute the Microsoft Software Installer (MSI) package from the Apache web site at http://httpd.apache.org/download. The following screenshots come from an actual installation made using this method.

Each step of the installation process is distinct in the process and you can revise earlier decisions, until the files are installed. The first screen (Figure 1-1) simply confirms what you're about to do and the version of the package you're installing.

Figure 1-1. First screen of Apache MSI install

The second screen (Figure 1-2) presents the Apache license. Its basic tenets boil down to the following: do what you want with the software, don't use the Apache marks (trademarks like the feather or the name Apache) without permission, and provide proper attribution for anything you build based on Apache software. (This only applies if you plan to distribute your package; if you use it strictly on an internal network, this isn't required.) You can't proceed past this screen until you agree to the license terms.

Figure 1-2. License agreement

Figure 1-3

Downloading the Apache Sources

Problem

You want to build the Apache web server yourself from the sources directly (see Recipe 1.4), but don't know how to obtain them.

Solution

There are a number of ways to obtain the sources. You can access the latest version in close to real-time by using CVS, the tool used by the Apache developers for source control, you can download a release tarball, or you can install a source package prepared by a distributor, among others.

From a prepackaged tarball, download the tarball from http://httpd.apache.org/dist/, and then:

% tar xzvf apache_1.3.27.tar.gz

If your version of tar doesn't support the z option for processing zipped archives, use this command instead:

% gunzip -c < apache_1.3.27.tar.gz | tar xvf -

Building Apache from the Sources

Problem

You want to build your Apache web server from the sources directly rather than installing it from a prepackaged kit.

Solution

Assuming that you already have the Apache source tree, whether you installed it from a tarball, CVS, or some distribution package, the following commands, executed in the top directory of the tree, builds the server package with most of the standard modules as DSOs:

Apache 1.3:

% ./configure --prefix= /usr/local/apache --with-layout=Apache --enable-shared=max--enable-module=most % make

Installing with ApacheToolbox

Problem

You have a complicated collection of modules you want to install correctly.

Solution

Download ApacheToolbox from http://www.apachetoolbox.com/. (Note that the version numbers will probably be different than these, which were the latest available when this section was written.) Unpack the file:

% bunzip2 Apachetoolbox-1.5.65.tar.bz2 % tar xvf Apachetoolbox-1.5.65.tar

(Depending on your version of tar, you may be able to combine these operations into a single tar xjvf command.)

Then run the installation script:

# cd Apachetoolbox-1.5.65 # ./install.sh

Discussion

ApacheToolbox is developed and maintained by Bryan

Starting, Stopping, and Restarting Apache

Problem

You want to be able to start and stop the server at need, using the appropriate tools.

Solution

On Unixish systems, use the apachectl script; on Windows, use the options in the Apache folder of the Start menu.

Discussion

The basic Apache package includes tools to make it easy to control the server. For Unixish systems, this is usually a script called apachectl

Uninstalling Apache

Problem

You have the Apache software installed on your system, and you want to remove it.

Solution

On Red Hat Linux, to remove an Apache version installed with the RPM tool, use:

# rpm -ev apache

Other packaging systems may provide some similar mechanism.

Chapter 2. Adding Common Modules

There are a number of extremely popular modules for the Apache web server that are not included in the basic distribution. Most of these are separate because of licensing or support reasons; some are not distributed by the Apache Software Foundation because of a decision by the Apache developers; and some are integral parts of other projects. For instance, mod_ssl for Apache 1.3 is developed and maintained separately not only because of the U.S. export control laws (which were more restrictive when the package was originally developed), but because it requires changes to the core software that the Apache developers chose not to integrate.

This chapter provides recipes for installing some of the most popular of these third-party modules; when available, there are separate recipes for installation on Unixish systems and on Windows.

The most comprehensive list of third-party modules can be found in the Apache Module Registry at http://modules.apache.org/. Some modules are so popular—or complex—that they have entire sites devoted to them, as do the ones listed in this chapter.

Although hundreds of third-party modules are available, many module developers are only concerned with their single module. This means that there are potentially as many different sets of installation instructions as there are modules. The first recipe in this chapter describes an installation process that should work with many Apache 1.3 modules, but you should check with the individual packages' instructions to see if they have a different or more detailed process.

Many of the modules are available from organizations that prepackage or distribute Apache software, such as in the form of an RPM from Mandrake or Red Hat, but such prebuilt module packages include the assumptions of the packager. In other words, if you build the server from source and use custom locations for the files, don't be surprised if the installation of a packaged module fails.

All of the modules described in this chapter are supported with Apache 1.3 on Unixish systems. Status of support with Apache 2.0 on Windows is shown in Table 2-1.

Table 2-1. Module support status

Installing mod_dav on a Unixish System

Problem

You want to add or enable WebDAV capabilities to your server. WebDAV permits specific documents to be reliably and securely manipulated by remote users without the need for FTP, to perform such tasks as adding, deleting, or updating files.

Solution

If you're using Apache 2.0, mod_dav is automatically available, although you may need to enable it at compile time with —enable-dav.

If you are using Apache 1.3, download and unpack the mod_dav source package from http://webdav.org/mod_dav/, and then:

% cd mod_dav-1.0.3-1.3.6 % ./configure --with- apxs=/usr/local/ apache/bin/apxs % make # make install

Restart the server, and be sure to read Recipe 6.18.

Discussion

mod_dav is an encapsulated and well-behaved module that is easily built and added to an existing server. To test that it has been properly installed, you need to enable some location on the server for WebDAV management and verify access to that location with some WebDAV-capable tool. We recommend cadaver

Installing mod_dav on Windows

Problem

You want to enable WebDAV capabilities on your existing Apache 1.3 server with mod_dav.

Solution

Apache 2.0 includes mod_dav as a standard module, so you do not need to download and build it.

Download and unpack the mod_dav Windows package from http://webdav.org/mod_dav/win32/. Verify that your Apache installation already has the xmlparse.dll and xmltok.dll files in the ServerRoot directory; if they aren't there, check through the Apache directories to locate and copy them to the ServerRoot. mod_dav requires the Expat package, which is included with versions of the Apache web server after 1.3.9; these files hook into Expat, which mod_dav will use.

Put the mod_dav DLL file into the directory where Apache keeps its modules:

C:\>cd mod_dav-1.0.3-dev C:\mod_dav-1.0.3-dev>copy mod_dav.dll C:\Apache\modules C:\mod_dav-1.0.3-dev>cd \Apache

Add the following lines to your httpd.conf file:

LoadModule dav_module modules/mod_dav.dll

You may also need to add an AddModule line if your httpd.conf file includes a ClearModuleList

Installing mod_perl on a Unixish System

Problem

You want to install the mod_perl scripting module to allow better Perl script performance and easy integration with the web server.

Solution

Download and unpack the mod_perl source package from http://perl.apache.org/. Then use the following command:

% perl Makefile.PL \ > USE_APXS=1 \ > WITH_APXS= /usr/local/apache/bin/apxs \ > EVERYTHING=1 \ > PERL_USELARGEFILES=0 % make % make install

Restart your server.

Discussion

mod_perl

Installing mod_php on a Unixish System

Problem

You want to add the mod_php scripting module to your existing Apache web server.

Solution

Download the mod_php package source from the web site at http://php.net/ (follow the links for downloading) and unpack it. Then:

% cd php-4.3.2 %

Installing mod_php on Windows

Problem

You want to add the mod_php scripting module to your existing Apache server on Windows.

Solution

This recipe needs to be described largely in terms of actions rather than explicit commands to be issued.

Download the PHP Windows binary .zip file with API extensions (not the .exe file) from http://php.net/.

Unpack the .zip file into a directory where you can keep its contents indefinitely (such as C:\PHP4). If you use WinZip, be sure to select the Use

Installing the mod_snake Python Module

Problem

You want to add the mod_snake Python scripting module to your existing Apache server.

Solution

To install mod_snake on a Unixish system, download the source from the http://sourceforge.net/projects/modsnake/

Installing mod_ssl

Problem

You want to add SSL support to your Apache server with the mod_ssl secure HTTP module.

Solution

Windows

At the time of this writing, there is no supported means of installing mod_ssl on Windows.

Apache 2.0

mod_ssl is included with 2.0, although it is not automatically compiled nor installed when you build from source. You need to include the —enable-ssl option on your ./configure line, and enable it with LoadModule

Chapter 3. Logging

Apache can, and usually does, record information about every request it processes. Controlling how this is done and extracting useful information out of these logs after the fact is at least as important as gathering the information in the first place.

The logfiles may record two types of data: information about the request itself, and possibly one or more messages about abnormal conditions encountered during processing (such as file permissions). You, as the webmaster, have a limited amount of control over the logging of error conditions, but a great deal of control over the format and amount of information logged about request processing (activity logging ). The server may log activity information about a request in multiple formats in mulitple log files, but it will only record a single copy of an error message.

One aspect of activity logging you should be aware of is that the log entry is formatted and written after the request has been completely processed. This means that the interval between the time a request begins and when it finishes may be long enough to make a difference.

For example, if your logfiles are rotated while a particularly large file is being downloaded, the log entry for the request will appear in the new logfile when the request completes, rather than in the old logfile when the request was started. In contrast, an error message is written to the error log as soon as it is encountered.

The web server will continue to record information in its logfiles as long as it's running. This can result in extremely large logfiles for a busy site and uncomfortably large ones even for a modest site. To keep the file sizes from growing ever larger, most sites rotate or roll over their logfiles on a semi-regular basis. Rolling over a logfile simply means persuading the server to stop writing to the current file and start recording to a new one. Due to Apache's determination to see that no records are lost, cajoling it to do this according to a specific timetable may require a bit of effort; some of the recipes in this chapter cover how to accomplish the task successfully and reliably (see Recipe 3.8 and Recipe 3.9).

The log declaration directives, CustomLog and ErrorLog , can appear inside <VirtualHost> containers, outside them (in what's called the main or global server, or sometimes the global scope ), or both. Entries will only be logged in one set or the other; if a <VirtualHost> container applies to the request or error and has an applicable log directive, the message will be written only there and won't appear in any globally declared files. On the other hand, if no <VirtualHost> log directive applies, the server will fall back on logging the entry according to the global directives.

However, whichever scope is used for determining what logging directives to use, all CustomLog directives in that scope are processed and treated independently. That is, if you have a CustomLog directive in the global scope and two inside a <VirtualHost> container, both of these will be used. Similarly, if a CustomLog directive uses the env= option, it has no effect on what requests will be logged by other CustomLog directives in the same scope.

Activity logging has been around since the Web first appeared, and it didn't take long for the original users to decide what items of information they wanted logged. The result is called the common log format (CLF). In Apache terms, this format is:

"%h %l %u %t \"%r\" %>s %b"

That is, it logs the client's hostname or IP address, the name of the user on the client (as defined by RFC 1413 and if Apache has been told to snoop for it with an IdentityCheck On directive), the username with which the client authenticated (if weak access controls are being imposed by the server), the time at which the request was received, the actual HTTP request line, the final status of the server's processing of the request, and the number of bytes of content that were sent in the server's response.

Before long, as the HTTP protocol advanced, the common log format was found to be wanting, so an enhanced format, called the combined log format , was created:

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""

The two additions were the Referer (it's spelled incorrectly in the specifications) and the User-agent. These are the URL of the page that linked to the document being requested, and the name and version of the browser or other client software making the request.

Both of these formats are widely used, and many logfile analysis tools assume log entries are made in one or the other.

The Apache web server's standard activity logging module allows you to create your own formats; it is highly configurable and is called (surprise!) mod_log_config . Apache 2.0 has an additional module, mod_logio , which enhances mod_log_config

Getting More Detailed Errors

Problem

You want more information in the error log in order to debug a problem.

Solution

Change (or add) the LogLevel line in your httpd.conf file. There are several possible arguments, which are enumerated below:

For example:

LogLevel Debug

Discussion

There are several hierarchical levels of error logging available, each identified by its own keyword. The default value of LogLevel is warn. Listed in descending order of importance, the possible values are:

emerg

Emergencies; web server is unusable

alert

Action must be taken immediately

crit

Critical conditions

Logging POST Contents

Problem

You want to record data submitted with the POST method, such as from a web form.

Solution

Generally not possible in Apache 1.3 unless the POST-handling module explicitly records the data; possible in Apache 2.0

Logging a Proxied Client's IP Address

Problem

You want to log the IP address of the actual client requesting your pages, even if they're being requested through a proxy.

Solution

None.

Logging Client MAC Addresses

Problem

You want to record the

Logging Cookies

Problem

You want to record all the cookies sent to your server by clients and all the cookies your server asks clients to set in their databases; this can be useful when debugging web applications that use cookies.

Solution

To log cookies received from the client:

CustomLog logs/cookies_in.log "%{UNIQUE_ID}e %{Cookie}i" CustomLog logs/cookies2_in.log "%{UNIQUE_ID}e %{Cookie2}i"

To log cookie values set and sent by the server to the client:

CustomLog logs/cookies_out.log "%{UNIQUE_ID}e %{Set-Cookie}o" CustomLog logs/cookies2_out.log "%{UNIQUE_ID}e %{Set-Cookie2}o"

Using the %{Set-Cookie}o format effector for debugging is not recommended if multiple cookies are (or may be) involved. Only the first one will be recorded in the logfile. See the Discussion text for an example.

Tip

At the time of this writing, the Apache package includes no way to record all cookie values, but one of the authors of this book is working on one. When it's available, it should be mentioned on this book's web site, http://Apache-Cookbook.Com/.

Not Logging Image Requests from Local Pages

Problem

You want to log requests for images on your site, except when they're requests from one of your own pages. You might want to do this to keep your logfile size down, or possibly to track down sites that are hijacking your artwork and using it to adorn their pages.

Solution

Use SetEnvIfNoCase to restrict logging to only those requests from outside of your site:

Logging Requests by Day or Hour

Problem

You want to automatically roll over the Apache logs at specific times without having to shut down and restart the server.

Solution

Use CustomLog and the rotatelogs program:

CustomLog "| /path/to/rotatelogs /path/to/logs/access_log.%Y-%m-%d 86400" combined

Rotating Logs on the First of the Month

Problem

You want to close the previous month's logs and open new ones on the first of each month.

Solution

Logging Hostnames Instead of IP Addresses

Problem

You want to see hostnames in your activity log instead of IP addresses.

Solution

You can let the web server resolve the hostname when it processes the request by enabling runtime lookups with the Apache directive:

HostnameLookups On

Or, you can let Apache use the IP address during normal processing and let a piped logging process resolve them as part of recording the entry:

HostnameLookups Off CustomLog "| /path/to

Maintaining Separate Logs for Each Virtual Host

Problem

You want to have separate activity logs for each of your virtual hosts, but you don't want to have all the open files that multiple CustomLog directives would use.

Solution

Use the split-logfile program that comes with Apache. To split logfiles after they've been rolled over (replace /path/to/ServerRoot with the correct path):

# cd /path/to

Logging Proxy Requests

Problem

You want to log requests that go through your proxy to a different file than the requests coming directly to your server.

Solution

Use the SetEnv directive to earmark those requests that came through the proxy server, in order to trigger conditional logging:

<Directory proxy:*> SetEnv is_proxied 1 </Directory> CustomLog logs/proxy_log combined env=is_proxied

Discussion

Apache 1.3 has a special syntax for the <Directory>

Logging Errors for Virtual Hosts to Multiple Files

Problem

Unlike access logs, Apache only logs errors to a single location. You want Apache to log errors that refer to a particular virtual host to the host's error log, as well as to the global error log.

Solution

There are at least two possible ways of doing this:

Logging Server IP Addresses

Problem

You want to log the IP address of the server that responds to a request, possibly because you have virtual hosts with multiple addresses each.

Solution

Use the %A

Logging the Referring Page

Problem

You want to record the URL of pages that refer clients to yours, perhaps to find out how people are reaching your site.

Solution

Logging the Name of the Browser Software

Problem

You want to know the software visitors use to access your site, for example, so you can optimize its appearance for the browser that most of your audience uses.

Solution

Logging Arbitrary Request Header Fields

Problem

You want to record the values of arbitrary fields clients send to their request header, perhaps to tune the types of content you have available to the needs of your visitors.

Solution

Use the %{...}i

Logging Arbitrary Response Header Fields

Problem

You want to record the values of arbitrary fields the server has included in a response header, probably to debug a script or application.

Logging Activity to a MySQL Database

Problem

Rather than logging accesses to your server in flat text files, you want to log the information directly to a database for easier analysis.

Solution

Install the latest release of mod_log_sql

Logging to syslog

Problem

You want to send your log entries to syslog.

Solution

To log your error log to syslog, simply tell Apache to log to syslog:

ErrorLog syslog:user

Tip

Some syslog reporting class other than user, such as local1, might be more appropriate in your environment.

Logging your access log to syslog takes a little work. Add the following to your configuration file:

Logging User Directories

Problem

You want each user directory web site (i.e., those accessed via http://servername/~ username) to have its own logfile.

Solution

In httpd.conf, add the directive:

CustomLog "|/usr/local/apache/bin/userdir_log" combined

Then, in the file /usr/local/apache/bin/userdir_log, place the following code:

Chapter 4. Virtual Hosts

A web server system supports multiple web sites in a way similar to a person who responds to her given name, as well as her nickname. In the Apache configuration file, each alternate identity, and probably the "main" one as well, is known as a virtual host (sometimes written as vhost) identified with a <VirtualHost> container directive. Depending on the name used to access the web server, Apache responds appropriately, just as someone might answer differently depending on whether she is addressed as "Miss Jones" or "Hey, Debbie!" If you want to have a single system support multiple web sites, you must configure Apache appropriately.

There are two different types of virtual host supported by Apache. The first type, called address-based or IP-based, is tied to the numeric network address used to reach the system. Bruce Wayne never answered the parlour telephone with "Batman here!" nor did he answer the phone in the Batcave by saying, "Bruce Wayne speaking." However, it's the same person answering the phone, just as it's the same web server receiving the request.

The other type of virtual host is name-based, because the server's response depends on what it is called. To continue the telephone analogy, consider an apartment shared by multiple roommates; you call the same number whether you want to speak to Dave, Joyce, Amaterasu, or George. Just as multiple people may share a single telephone number, multiple web sites can share the same IP address. However, all IP addresses shared by multiple Apache virtual hosts need to be declared with a NameVirtualHost directive.

Designating One Name-Based Virtual Host as the Default

Problem

You want all requests, whether they match a virtual host or use an IP address, to be directed to a default host, possibly with a "host not found" error message.

Solution

Add the following <VirtualHost> section, and list it before all of your other ones:

Setting Up Address-Based Virtual Hosts

Problem

You have multiple IP addresses assigned to your system, and you want to support one web site on each.

Solution

Creating a Default Address-Based Virtual Host

Problem

You want to create a virtual host to catch all requests that don't map to one of your address-based virtual hosts.

Solution

Use the _default_

Mixing Address-Based and Name-Based Virtual Hosts

Problem

You have multiple IP addresses assigned to your system, and you want to support more than one web site on each address.

Solution

Mass Virtual Hosting with mod_vhost_alias

Problem

You want to host many virtual hosts, all of which have exactly the same configuration.

Solution

Use VirtualDocumentRoot and VirtualScriptAlias provided by mod_vhost_alias .

VirtualDocumentRoot /www/vhosts/%-1/%-2.1/%-2/htdocs VirtualScriptAlias /www/vhosts/%-1/%-2.1/%-2/cgi-bin

Discussion

This recipe uses directives from mod_vhost_alias , which you may not have installed when you built Apache, as it is not one of the modules that is enabled by default.

These directives map requests to a directory built up from pieces of the hostname that was requested. Each of the variables represents one part of the hostname, so that each hostname is be mapped to a different directory.

In this particular example, requests for content from www.example.com is served from the directory /www/vhosts/com/e/example/htdocs, or from /www/vhosts/com/e/example/cgi-bin (for CGI requests). The full range of available variables is shown in Table 4-1.

Table 4-1. mod_vhost_alias variables

Variable

Meaning

%%

insert a %

%p

insert the port number of the virtual host

%M.N

insert (part of) the name

Mass Virtual Hosting Using Rewrite Rules

Problem

Although there is a module—mod_vhost_alias—which is explicitly for the purpose of supporting large numbers of virtual hosts, it is very limiting and requires that every virtual host be configured exactly the same way. You want to support a large number of vhosts, configured dynamically, but, at the same time, you want to avoid mod_vhost_alias.

Solution

SSL and Name-Based Virtual Hosts

Problem

You want to have multiple SSL web sites on the same server.

Solution

Logging for Each Virtual Host

Problem

You want each virtual host to have its own logfiles.

Solution

Specify Errorlog and CustomLog within each virtual host declaration:

Splitting Up a LogFile

Problem

Due to a large number of virtual hosts, you want to have a single logfile and split it up afterwards.

Solution

LogFormat "%v %h %l %u %t \"%r\" %>s %b" vhost CustomLog logs/vhost_log vhost

Then, after rotating your logfile:

split-logfile < logs/vhost_log

Discussion

The LogFormat directive in this recipe creates a logfile that is similar to the common log file format but additionally contains the name of the virtual host being accessed. The split-logfile utility splits up this logfile into its constituent virtual hosts.

See Also

Recipe 3.11

Port-Based Virtual Hosts

Problem

You want to present different content for HTTP connections on different ports.

Solution

Displaying the Same Content on Several Addresses

Problem

You want to have the same content displayed on two of your addresses.

Solution

Specify both addresses in the <VirtualHost>

Chapter 5. Aliases, Redirecting, and Rewriting

When Apache receives a request, it is assumed that the client will be served a file out of the DocumentRoot directory. However, there will be times when you want these resources to be served from some other location. For example, if you wanted to place a set of documents on your web site, it may be more convenient to leave them where they are, rather than to move them to a new location.

In this chapter, we deal with three general categories of these sort of cases. Aliasing refers to mapping a URL to a particular directory. Redirecting refers to mapping a URL to another URL. And Rewriting refers to using mod_rewrite to alter the URL in some way.

Other recipes in this chapter are related because they map URLs to resources that are at unexpected places in the filesystem.

Mapping a URL to a Directory

Problem

You want to serve content out of a directory other than the DocumentRoot directory. For example, you may have an existing directory of documents, which you want to have on your web site that you do not want to move into the Apache document root.

Solution

Alias /desired-URL-prefix /path/to/other/directory

Discussion

The example given maps URLs starting with /desired-URL-prefix to files in the /path/to/other/directory directory. For example, a request for the URL:

http://example.com/desired/something.html

results in the file /path/to/other/directory/something.html being sent to the client.

Creating a New URL for Existing Content

Problem

You have an existing directory which you want to access using a different name.

Solution

Use an Alias directive in httpd.conf:

Giving Users Their Own URL

Problem

You want to give each user on your system his own web space.

Solution

If you want users' web locations to be under their home directories, add this to your httpd.conf file:

UserDir public_html

To put all users' web directories under a central location:

UserDir /www/users/*/htdocs

If you have mod_perl installed, you can do something more advanced like this (again, added to your httpd.conf file):

<Perl> # Folks you don't want to have this privilege my %forbid = map { $_ => 1 } qw(root postgres bob); opendir H, '/home/'; my @dir = readdir(H); closedir H; foreach my $u (@dir) { next if $u =~ m/^\./; next if $forbid{$u}; if (-e "/home/$u/public_html") { push @Alias, "/$u/", "/home/$u/public_html/"; } } </Perl>

Discussion

The first solution is the simplest and most widely used of the possible recipes we present here. With this directive in place, all users on your system are able to create a directory called public_html in their home directories and put web content there. Their web space is accessible via a URL starting with a tilde (~), followed by their usernames. So, a user named bacchus accesses his personal web space via the URL:

http://www.example.com/~bacchus/

Aliasing Several URLs with a Single Directive

Problem

You want to have more than one URL map to the same directory but don't want multiple Alias directives.

Solution

Mapping Several URLs to the Same CGI Directory

Problem

You want to have a number of URLs map to the same CGI directory but don't want to have multiple ScriptAlias directives.

Solution

Creating a CGI Directory for Each User

Problem

You want each user to have their own cgi-bin directory rather than giving them all access to the main server CGI directory.

Solution

Put this in your httpd.conf:

Redirecting to Another Location

Problem

You want requests to a particular URL to be redirected to another server.

Solution

Use a Redirect directive in httpd.conf, and give an absolute URL on the second argument:

Redirect /example http://www.other.server/new/location

Discussion

Whereas Alias maps a URL to something in the local filesystem, Redirect maps a URL to another URL, usually on another server. The second argument is a full URL and is sent back to the client (browser), which makes a second request for the new URL.

It is also important to know that the Redirect directive preserves path information, if there is any. Therefore, this recipe redirects a request for http://original.server/example/something.html to http://www.other.server/new/location/something.html.

Redirections come in several different flavors, too; you can specify which particular type of redirect you want to use by inserting the appropriate keyword between the Redirect

Redirecting Several URLs to the Same Destination

Problem

You want to redirect a number of URLs to the same place. For example, you want to redirect requests for /fish and /Fishing to http://fish.example.com/.

Solution

Permitting Case-Insensitive URLs

Problem

You want requested URLs to be valid whether uppercase or lowercase letters are used.

Solution

Use mod_speling

Replacing Text in Requested URLs

Problem

You want to change all occurrences of string1 to string2 in a request's URI.

Rewriting Path Information to CGI Arguments

Problem

You want to pass arguments as part of the URL but have these components of the URL rewritten as CGI QUERY_STRING arguments.

Solution

This is just an example, of course; make appropriate changes to the RewriteRule line to fit your own environment and needs:

Denying Access to Unreferred Requests

Problem

You want to prevent other web sites from using your images (or other types of documents) in their pages and allow your images to be accessed only if they were referred from your own site.

Solution

Rewriting Based on the Query String

Problem

You want to translate one URI into another based on the value of the query string.

Solution

Put this in your httpd.conf:

Redirecting All—or Part—of Your Server to SSL

Problem

You want certain parts of your non-SSL web space to be redirected to a secured area.

Solution

You can redirect everything that is attached to port 80 with the following RewriteRule:

RewriteCond "%{SERVER_PORT}" "^80$" RewriteRule "^(.*)$" "https://%{SERVER_NAME}" [R,L]

You can redirect particular URLs to a secure version:

Turning Directories into Hostnames

Problem

You want to migrate pathnames under a single hostname to distinct hostnames.

Solution

Use RewriteRule

Redirecting All Requests to a Single Host

Problem

You want all requests made of your system to be redirected to a specific host.

Solution

Put this in your httpd.conf:

Turning Document Names into Arguments

Problem

You want to redirect requests for documents to a CGI script, or other handler, that gets the

Chapter 6. Security

In this chapter, security means allowing people to see what you want them to see and preventing them from seeing what you don't want them to see. Additionally, there are the issues of what measures you need to take on your server in order to restrict access via non-Web means. This chapter illustrates the precautions you need to take to protect your server from malicious access and modification of your web site.

The most common questions ask how to protect documents and restrict access. Unfortunately, due to the complexity of the subject and the nature of the web architecture, these questions tend to also have the most complex answers or often no convenient answers at all.

Normal security nomenclature and methodology separate the process of applying access controls into two discrete steps; in the case of the Web, they may be thought of as the server asking itself these questions:

Are you really who you claim to be?

Are you allowed to be here?

These steps are called authentication and authorization , respectively. Here's a real-world example: a flight attendant checks your photo identification (authentication) and your ticket (authorization) before permitting you to board an airplane.

Authentication can be broken down into what might be called weak and strong. Weak authentication is based on the correctness of credentials that the end user supplies (which therefore may have been stolen from the real owner—hence the name "weak"), whereas strong authentication is based on attributes of the request over which the end user has little or no control, and it cannot change from request to request—such as the IP address of his system.

Although checking authentication and authorization are clearly separate activities, their application gets a bit blurred in the context of the Apache web server modules. Even though the main difference between the many security modules is how they store the credentials (in a file, a database, an LDAP directory, etc

Setting Up Single-Use Passwords

Problem

You want to be able to provide credentials that will allow visitors into your site only once.

Solution

No solution is available with standard Apache features.

Expiring Passwords

Problem

You want a user's username and password to expire at a particular time or after some specific interval.

Solution

No solution is available with standard Apache features, but a few third-party solutions exist.

Discussion

Refer to HTTP, Browsers, and Credentials. In order for Apache to provide this functionality, it would need to store more than just the valid username and password; it would also have to maintain information about the credentials' expiration time. No module provided as part of the standard Apache distribution does this.

There are several third-party solutions to this problem, including the Perl module Apache::Htpasswd::Perishable and the mod_perl handler Apache::AuthExpire

Limiting Upload Size

Problem

With more and more web hosting services allowing customers to upload documents, uploads may become too large. With a little creativity, you can put a limit on uploads by using the security capabilities of the server.

Solution

Assume you want to put a limit on uploads of ten thousand (10,000) bytes. Here's how you could do that for your /upload location:

SetEnvIf Content-Length "^[1-9][0-9]{4,}" upload_too_large=1 <Location /upload> Order Deny,Allow Deny from env=upload_too_large ErrorDocument 403 /cgi-bin/remap-403-to-413 </Location>

You can tailor the response by making the /cgi-bin/remap-403-to-413 script look something like this:

Restricting Images from Being Used Off-Site

Problem

Other sites are linking to images on your system, stealing bandwidth from you and incidentally making it appear as though the images belong to them. You want to ensure that all access to your images is from documents that are on your server.

Solution

Add the following lines to the .htaccess

Requiring Both Weak and Strong Authentication

Problem

You want to require both weak and strong authentication for a particular resource. For example, you wish to ensure that the user accesses the site from a particular location and to require that he provides a password.

Solution

Use the Satisfy

Managing .htpasswd Files

Problem

You wish to create password files for use with Basic HTTP authentication.

Solution

Use the htpasswd utility to create your password file, as in Table 6-1.

Table 6-1. Managing password files with htpasswd

Command

Action

% htpasswd -c user.pass waldo

Create a new password file called user.pass with this one new entry for user waldo. Will prompt for password.

% htpasswd user.pass ralph

Add an entry for user ralph in password file user.pass. Will prompt for password.

% htpasswd -b

Making Password Files for Digest Authentication

Problem

You need to create a password file to be used for Digest authentication.

Solution

Use the following command forms to set up a credential file for a realm to be protected by Digest authentication:

% htdigest -c

Relaxing Security in a Subdirectory

Problem

There are times when you might want to apply a tight security blanket over portions of your site, such as with something like:

<Directory /usr/local/apache/htdocs/BoD> Satisfy All AuthUserFile /usr/local/apache/access/bod.htpasswd Require valid-user </Directory>

Due to Apache's scoping rules, this blanket applies to all documents in that directory and in any subordinate subdirectories underneath it. But suppose you want to make a subdirectory, such as BoD/minutes, available without restriction?

Solution

The Satisfy directive is the answer. Add the following to either the .htaccess file in the subdirectory or in an appropriate <Directory> container:

Satisfy Any Order Deny,Allow Allow from all

HTTP, Browsers, and Credentials

Lifting Restrictions Selectively

Problem

You want most documents to be restricted, such as requiring a username and password, but want a few to be available to the public. For example, you may want index.html to be publicly accessible, while the rest of the files in the directory require password authentication.

Solution

Use the Satisfy Any directive in the appropriate place in your .htaccess or httpd.conf file:

Authorizing Using File Ownership

Problem

You wish to require user authentication based on system file ownership. That is, you want to require that the user that owns the file matches the username that authenticated.

Solution

Use the

Storing User Credentials in a MySQL Database

Problem

You wish to use user and password information in your MySQL database for authenticating users.

Solution

For Apache 1.3, use mod_auth_mysql :

Auth_MySQL_Info db_host.example.com db_user my_password Auth_MySQL_General_DB auth_database_name <Directory /www/htdocs/private> AuthName "Protected directory" AuthType Basic require valid-user </Directory>

For Apache 2.1 and later, use mod_authn_dbi:

Accessing the Authenticated Username

Problem

You want to know the name of the user who has authenticated.

Solution

Consult the environment variable

Obtaining the Password Used to Authenticate

Problem

You want to get the password that the user authenticated with.

Solution

Preventing Brute-Force Password Attacks

Problem

You want to disable a username when there are repeated failed attempts to authenticate using it, as if it is being attacked by a password-cracker.

Solution

Using Digest Versus Basic Authentication

Problem

You want to understand the distinction between the Basic and Digest authentication methods.

Solution

Use AuthType Basic and the htpasswd tool to control access using Basic authentication. Use AuthType Digest and the

Accessing Credentials Embedded in URLs

Problem

You know people access your site using URLs with embedded credentials, such as http://user:password@host/, and you want to extract them from the URL for validation or other purposes.

Solution

None; this is a nonissue that is often misunderstood.

Discussion

For nonproxy requests, this doesn't even exist; the browser dissects the URL and turns it into the appropriate request header fields (i.e., WWW-Authenticate). For proxy requests, who knows?

Securing WebDAV

Problem

You want to allow your users to upload and otherwise manage their web documents with WebDAV, but without exposing your server to any additional security risks.

Solution

Require authentication to use WebDAV:

Enabling WebDAV Without Making Files Writable by the Web User

Problem

You want to run WebDAV but don't want to make your document files writable by the Apache server user.

Solution

Run two web servers as different users. The DAV-enabled server, for example, might run as User dav, Group dav, while the other server, which is responsible for serving your content, might run as User nobody, Group nobody

Restricting Proxy Access to Certain URLs

Problem

You don't want people using your proxy server to access particular URLs or patterns of URLs (such as MP3 or streaming video files).

Solution

You can block by keyword:

ProxyBlock .rm .ra .mp3

You can block by specific backend URLs:

<Directory proxy:http://other-host.org/path> Order Allow,Deny Deny from all Satisfy All </Directory>

Or you can block according to regular expression pattern matching:

<Directory proxy:*> RewriteEngine On # # Disable proxy access to Real movie and audio files # RewriteRule "\.(rm|ra)$" "-" [F,NC] # # Don't allow anyone to access .mil sites through us # RewriteRule "^[a-z]+://[-.a-z0-9]*\.mil($|/)" "-" [F,NC] </Directory>

Protecting Files with a Wrapper

Problem

You have files to which you want to limit access using some method other than standard web authentication (such as a members-only area).

Solution

In httpd.conf, add the following lines to a <Directory> container whose contents should be accessed only through a script:

RewriteEngine On RewriteRule "\.(dll|zip|exe)$" protect.php [NC] RewriteCond %{REMOTE_ADDR} "!^my.servers.ip" RewriteRule "\.cgi$" protect.php [NC]

And an example protect.php that just displays the local URI of the document that was requested:

Protecting All Files Except a Subset

Problem

You want to deny all web access to files in a directory, except for those with a particular extension (i.e., a directory with HTML files in it, where you don't want other files to be accessible).

Protecting Server Files from Malicious Scripts

Problem

Scripts running on your web server may access, modify, or destroy files located on your web server if they are not adequately protected. You want to ensure that this cannot happen.

Solution

Ensure that none of your files are writable by the nobody user or the nobody group, and that sensitive files are not readable by that user and group:

Setting Correct File Permissions

Problem

You want to set file permissions to provide the maximum level of security.

Solution

The bin directory under the ServerRoot should be owned by user root, group root, and have file permissions of 755 (rwxr-xr-x). Files contained therein should also be owned by root.root and be mode 755.

Document directories, such as htdocs, cgi-bin, and icons, will have to have permissions set in a way that makes the most sense for the development model of your particular web site, but under no circumstances should any of these directories or files contained in them be writable by the web server user.

Tip

The solution provided here is specific to Unixish systems. Users of other operating systems should adhere to the principles laid out here, although the actual implementation will vary.

The conf directory should be readable and writable only by root, as should all the files contained therein.

The include and libexec directories should be readable by everyone, writable by no one.

The logs directory should be owned and writable by root. You may, if you like, permit other users to read files in this directory, as it is often useful for users to be able to access their logfiles, particularly for troubleshooting purposes.

The man directory should be readable by all users.

Finally, the proxy

Running a Minimal Module Set

Problem

You want to eliminate all modules that you don't need in order to reduce the potential exposure to security holes. What modules do you really need?

Solution

For Apache 1.3, you can run a bare-bones server with just three modules. (Actually, you can get away with not running any modules at all, but it is not recommended.)

% ./configure --disable-module=all --enable-module=dir \ > --enable-module=mime --enable-module=log_config \

For Apache 2.0, this is slightly more complicated, as you must individually disable modules you don't want:

% ./configure --disable-access \ > --disable-auth --disable-charset-lite \ > --disable-include --disable-log-config --disable-env --disable-setenvif \ > --disable-mime --disable-status --disable-autoindex --disable-asis \ > --disable-cgid --disable-cgi --disable-negotiation --disable-dir \ > --disable-imap --disable-actions --disable-alias --disable-userdir

Note that with 2.0, as with 1.3, you may wish to enable mod_dir, mod_mime, and mod_log_config, by simply leaving them off of this listing.

Discussion

Restricting Access to Files Outside Your Web Root

Problem

You want to make sure that files outside of your web directory are not accessible.

Solution

For Unixish systems:

<Directory /> Order deny,allow Deny from all AllowOverride None Options None </Directory>

For Windows systems:

Limiting Methods by User

Problem

You want to allow some users to use certain methods but prevent their use by others. For instance, you might want users in group A to be able to use both GET and POST but allow everyone else to use only GET.

Solution

Apply user authentication per method using the Limit

Restricting Range Requests

Problem

You want to prevent clients from requesting partial downloads of documents within a particular scope, forcing them to request the entire document instead.

Solution

You can overload ErrorDocument 403 to make it handle range requests. To do this, put the following into the appropriate <Directory> container in your httpd.conf file or in the directory's .htaccess file:

SetEnvIf "Range" "." partial_requests Order Allow,Deny Allow from all Deny from env=partial_requests ErrorDocument 403 /forbidden.cgi

Then put the following into a file named forbidden.cgi in your server's DocumentRoot:

Chapter 7. SSL

Secure Socket Layers (SSL) is the standard way to implement secure web sites. By encrypting the traffic between the server and the client, which is what SSL does, that content is protected from a third party listening to the traffic going past.

The exact mechanism by which this encryption is accomplished is discussed extensively in the SSL specification, which you can read at http://wp.netscape.com/eng/ssl3/. For a more user-friendly discussion of SSL, we recommend looking through the mod_ssl manual, which you can find at http://www.modssl.org/docs/2.8/index.html. This document discusses not only the specific details of setting up mod_ssl, but also covers the general theory behind SSL it and has pictures illustrating the concepts.

In this chapter, we talk about some of the common things that you might want to do with your secure server, including how to install it.

Installing SSL

Problem

You want to install SSL on your Apache server.

Solution

The solutions to this problem fall into several categories, depending on how you installed Apache in the first place (or whether you are willing to rebuild Apache to get SSL).

Generating SSL Certificates

Problem

You want to generate certificates to use on your SSL server.

Solution

Use the openssl command-line program that comes with OpenSSL:

% openssl genrsa -out hostname.key 1024 % openssl req -new -key hostname.key -out hostname.csr

At this point, you can either send your Certificate Signing Request (CSR) off to one of the certificate authority companies, such as Thawte or Entrust, for them to sign, or, if you prefer, you can sign the key yourself:

% openssl x509 -req -days 365 -in hostname.csr -signkey hostname.key -out hostname.crt

Then move these files to your Apache server's configuration directory, such as /www/conf/, and then add the following lines in your httpd.conf configuration file:

SSLCertificateFile /www/conf/hostname.crt SSLCertificateKeyFile /www/conf/hostname.key

Discussion

The SSL certificate is a central part of the SSL conversation and is required before you can run a secure server. Thus, generating the certificate is a necessary first step to configuring your secure server.

Generating the key is a multistep process, but it is fairly simple.

Generating the private key

In the first step, we generate the private key. SSL is a private/public key encryption system, with the private key residing on the server and the public key going out with each connection to the server and encrypting data sent back to the server.

The first argument passed to the

Generating a Trusted CA

Problem

You want to generate SSL keys that browsers will accept without a warning message.

Solution

Issue the following commands:

% CA.pl -newca % CA.pl -newreq % CA.pl -signreq % CA.pl -pkcs12

Discussion

Recipe 7.2

Serving a Portion of Your Site via SSL

Problem

You want to have a certain portion of your site available via SSL exclusively.

Solution

This is done by making changes to your httpd.conf file.

For Apache 1.3, add a line such as the following:

Redirect /secure/ https://secure.domain.com/secure/

For Apache 2.0:

<Directory /www/secure> SSLRequireSSL </Directory>

Or, with mod_rewrite:

RewriteEngine On RewriteCond %{HTTPS} !=on RewriteRule ^/(.*) https://%{SERVER_NAME}/ [R,L]

Discussion

It is perhaps best to think of your site's normal pages and its SSL-protected pages as being handled by two separate servers, rather than one. While they may point to the same content, they run on different ports, are configured differently, and, most importantly, the browser considers them to be completely separate servers. So you should too.

Don't think of enabling SSL for a particular directory; rather, you should think of it as redirecting requests for one directory to another.

Note that the Redirect

Authenticating with Client Certificates

Problem

You want to use client certificates to authenticate access to your site.

Solution

Chapter 8. Dynamic Content

CGI programs are one of the simplest ways to provide dynamic content for your web site. They tend to be easy to write, because you can write them in any language. Thus, you don't have to learn a new language to write CGI programs.

Other dynamic content providers, such as PHP and mod_perl, also enjoy a great deal of popularity, because they provide many of the same functions as CGI programs but typically execute faster.

Very few web sites can survive without some mechanism for providing dynamic content—content that is generated in response to the needs of the user. The recipes in this chapter guide you through enabling various mechanisms for producing this dynamic content and help you troubleshoot possible problems that may occur.

Enabling a CGI Directory

Enabling CGI Scripts in Non-ScriptAliased Directories

Problem

You want to put a CGI program in a directory that contains non-CGI documents.

Solution

Use AddHandler to map the CGI handler to the particular files that you want to be executed:

<Directory "/foo"> Options +ExecCGI AddHandler cgi-script .cgi .py .pl </Directory>

Using Windows File Extensionsto Launch CGI Programs

Problem

You want to have CGI programs on Windows executed by the program associated with the file extension. For example, you want .pl files to be executed by perl.exe without having to change the #! line to point at the right location.

Solution

Add the following line to your httpd.conf file:

ScriptInterpreterSource registry

Discussion

Using Extensions to Identify CGI Scripts

Problem

You want Apache to know that all files with a particular extension should be treated as CGI scripts.

Solution

Add the following to your httpd.conf

Testing That CGI Is Set Up Correctly

Problem

You want to test that you have CGI enabled correctly. Alternatively, you are receiving an error message when you try to run your CGI script and you want to ensure the problem doesn't lie in the web server before you try to find a problem in the script.

Solution

#! /usr/bin/perl print "Content-type: text/plain\r\n\r\n"; print "It's working.\n";

And then, if things are still not working, look in the error log.

Discussion

Because Perl is likely to be installed on any Unixish system, this CGI program should be a pretty safe way to test that CGI is configured correctly. In the event that you do not have Perl installed, an equivalent shell program may be substituted:

#! /bin/sh echo Content-type: text/plain echo echo It\'s working.

And, if you are running Apache on Windows, so that neither of the above options works for you, you could also try this with a batch file:

echo off echo Content-type: text/plain echo. echo It's working.

Make sure that you copy the program code exactly, with all the right punctuation, slashes, etc., so that you don't introduce additional complexity by having to troubleshoot the program itself.

Reading Form Parameters

Problem

You want your CGI program to read values from forms for use in your program.

Solution

First, look at an example in Perl, which uses the popular CGI.pm module:

#!/usr/bin/perl use CGI; use strict; use warnings; my $query = CGI->new; # Load the various form parameters my $name = $form->param("name"); # Multi-value select lists will return a list my @foods = $form->param("favorite_foods"); # Output useful stuff print "Content-type: text/html\n\n"; print "Name: " . $form->{name} . "n"; print "Favorite foods: <ul>"; foreach my $food (@foods) { print "<li>$food</li>"; } print "</ul>\n";

Next, look at the same program in C, which uses the cgic C library:

#include "cgic.h" /* Boutell.com's cgic library */ int cgiMain( ) { char name[100]; /* Send content type */ cgiHeaderContentType("text/html"); /* Load a particular variable */ cgiFormStringNoNewlines("name", name, 100); fprintf(cgiOut, "Name: "); cgiHtmlEscape(name); return 0; }

For this example, you will also need a Makefile, which looks something like this:

Invoking a CGI Program for Certain Content Types

Problem

You want to invoke a CGI program to act as a sort of content filter for certain document types. For example, a photographer may wish to create a custom handler to add a watermark to photographs served from his web site.

Solution

Use the Action directive to create a custom handler, which will be implemented by a CGI program. Then use the AddHandler

Getting SSIs to Work

Problem

You want to enable Server-Side Includes ( SSIs) to make your HTML documents more dynamic.

Solution

There are at least two different ways of doing this.

Specify which files are to be parsed by using a filename extension such as .shtml . For Apache 1.3, add the following directives to your httpd.conf in the appropriate scope:

<Directory /www/html/example> Options +Includes AddHandler server-parsed .shtml AddType "text/html; charset=ISO-8859-1" .shtml </Directory>

Or, for Apache 2.0:

<Directory /www/html/example> Options +Includes AddType text/html .shtml AddFilter INCLUDES .shtml </Directory>

Add the XBitHack directive to the appropriate scope in your httpd.conf file and allow the file permissions to indicate which files are to be parsed for SSI directives:

XBitHack On

Displaying Last Modified Date

Problem

You want your web page to indicate when it was last modified but not have to update the date every time.

Solution

Including a Standard Header

Problem

You want to include a header (or footer) in each of your HTML documents.

Solution

Use SSI by inserting a line in all your parsed files:

<--#include virtual="/include/headers.html" -->

Discussion

By using the SSI include

Including the Output of a CGI Program

Problem

You want to have the output of a CGI program appear within the body of an existing HTML document.

Solution

Use SSIs by adding a line such as the following to the document (which must be enabled for SSI parsing):

Running CGI Scripts as a Different User with suexec

Problem

You want to have CGI programs executed by some user other than nobody. For example, you may have a database that is not accessible to anyone except a particular user, so the server needs to temporarily assume that user's identity to access it.

Solution

When building Apache, enable suexec by passing the —enable-suexec argument to configure.

Then, in a virtual host section, specify which user and group you'd like to use to run CGI programs:

User rbowen Group users

Also, suexec will be invoked for any CGI programs run out of username-type URLs for the affected virtual host.

Discussion

The suexec wrapper is a suid (runs as the user ID of the user that owns the file) program that allows you to run CGI programs as any user you specify, rather than as the nobody user which Apache runs as. suexec is a standard part of Apache and is enabled by default.

Tip

The suexec concept does not fit well into the Windows environment, and so suexec is not available under Windows.

When suexec

Installing a mod_perl Handler from CPAN

Problem

You want to install one of the many mod_perl handler modules available on CPAN. For example, you want to install the Apache::Perldoc module, which generates HTML documentation for any Perl module you happen to have installed.

Solution

Assuming you already have mod_perl installed, you'll just need to install the module from CPAN, and then add a few lines to your Apache configuration file.

To install the module, run the following command from the shell as root:

#

Writing a mod_perl Handler

Problem

You want to write your own mod_perl handler.

Solution

Here's a simple handler:

package Apache::Cookbook::Example; sub handler { my $r = shift; $r->send_http_header( 'text/plain' ); $r->print( "Hello, World." ); } 1;

Place this code in a file called Example.pm, in a directory Apache/Cookbook/, somewhere that Perl knows to look for it.

Enabling PHP Script Handling

Problem

You want to enable PHP scripts on your server.

Solution

If you have mod_php installed, use AddHandler to map .php and .phtml files to the PHP handler:

AddHandler application/x-httpd-php .phtml .php

Discussion

This recipe maps all files with .phtml or .php to the PHP handler. You must ensure that the mod_php module is installed.

See Also

Recipe 2.5

Installation instructions on the mod_php web site at http://www.php.net/manual/en/install.apache.php for Apache 1.3 or http://www.php.net/manual/en/install.apache2.php for Apache 2.0

Verifying PHP Installation

Problem

You want to verify that you have PHP correctly installed and configured.

Solution

Put the following in your test PHP file:

<?php phpinfo( ); ?>

Discussion

Place the above text in a file called something.php in a directory where you believe you have enabled PHP script execution. Accessing that file should give you a list of all configured PHP system variables. The first screen of the output should look something like Figure 8-2.

Figure 8-2. Sample phpinfo( ) output

See Also

Recipe 8.15

Chapter 9. Error Handling

When you're running a web site, things go wrong. And when they do, it's important that they are handled gracefully, so that the user experience is not too greatly diminished. In this chapter, you'll learn how to handle error conditions, return useful messages to the user, and capture information that will help you fix the problem so that it does not happen again.

Handling a Missing Host Field

Problem

You have multiple virtual hosts in your configuration, and at least one of them is name-based. For name-based virtual hosts to work properly, the client must send a valid Host

Changing the Response Status for CGI Scripts

Problem

There may be times when you want to change the status for a response—for example, you want 404 Not Found errors to be sent back to the client as 403 Forbidden instead.

Solution

Point your ErrorDocument

Customized Error Messages

Problem

You want to display a customized error message, rather than the default Apache error page.

Solution

Providing Error Documents in Multiple Languages

Problem

On a multilingual (content negotiated) web site, you want your error documents to be content negotiated as well.

Solution

The Apache 2.0 default configuration file contains a configuration section, initially commented out, that allows you to provide error documents in multiple languages customized to the look of your web site, with very little additional work.

Uncomment those lines. You can identify the lines by looking for the following comment in your default configuration file:

# The internationalized error documents require mod_alias, mod_include # and mod_negotiation. To activate them, uncomment the following 30 lines.

Redirecting Invalid URLs to Some Other Page

Problem

You want all "not found" pages to go to some other page instead, such as the front page of the site, so that there is no loss of continuity on bad URLs.

Solution

Use the ErrorDocument to catch 404 (Not Found) errors:

ErrorDocument 404 /index.html

Making Internet Explorer Display Your Error Page

Problem

You have an ErrorDocument correctly configured, but IE is displaying its own error page, rather than yours.

Solution

Notification on Error Conditions

Problem

You want to receive email notification when there's an error condition on your server.

Solution

Point the ErrorDocument directive to a CGI program that sends mail, rather than to a static document:

ErrorDocument 404 /cgi-bin/404.cgi

404.cgi looks like the following:

Chapter 10. Proxies

Proxy means to act on behalf of another. In the context of a web server, this means one server fetching content from another server, then returning it to the client. For example, you may have several web servers that hide behind a proxy server. The proxy server is responsible for having requests end up going to the right backend server.

mod_proxy , which comes with Apache, handles proxying behavior. The recipes in this chapter cover various techniques that can be used to take advantage of this capability. We discuss securing your proxy server, caching content proxied through your server, and ways to use mod_proxy to map requests to services running on alternate ports.

Additional information about mod_proxy can be found at http://httpd.apache.org/docs/mod/mod_proxy.html for Apache 1.3, or http://httpd.apache.org/docs-2.0/mod/mod_proxy.html for Apache 2.0.

Preventing Your Proxy Server from Being Used as an Open Mail Relay

Problem

If your Apache server is set up to operate as a proxy, it is possible for it to be used as a mail relay unless precautions are taken. This means that your system may be functioning as an "open relay" even though your mail server software is actually securely configured.

Solution

Forwarding Requests to Another Server

Problem

You want requests for particular URLs to be transparently forwarded to another server.

Solution

Use ProxyPass and ProxyPassReverse directives in your

Blocking Proxied Requests to Certain Places

Problem

You want to use your proxy server as a content filter, forbidding requests to certain places.

Solution

Proxying mod_perl Content to Another Server

Problem

You want to run a second HTTPD server for dynamically generated content and have Apache transparently map requests for this content to the other server.

Solution

Configuring a Caching Proxy Server

Problem

You want to run a caching proxy server.

Solution

Configure your server to proxy requests, and provide a location for the cached files to be placed:

ProxyRequests on CacheRoot /var/spool/httpd/proxy

Filtering Proxied Content

Problem

You want to apply some filter to proxied content, such as altering certain words.

Solution

In Apache 2.0 and later, you can use mod_ext_filter to create output filters to apply to content before it is sent to the user:

Requiring Authentication for a Proxied Server

Problem

You wish to proxy content from a server, but it requires a login and password before content may be served from this proxied site.

Solution

Use standard authentication techniques to require logins for proxied content:

Chapter 11. Performance

Your web site can probably be made to run faster, if you are willing to make a few tradeoffs, and spend a little time benchmarking your site to see what is really slowing it down.

There are a number of things that you can configure differently to get a performance boost. Although, there are other things to which you may have to make more substantial changes. It all depends on what you can afford to give up and what you are willing to trade off. For example, in many cases, you may need to trade performance for security, or vice versa.

In this chapter, we make some recommendations of things that you can change, and we warn against things that can cause substantial slow-downs. Be aware that web sites are very individual, and what may speed up one web site may not necessarily speed up another web site.

Topics covered include hardware considerations, configuration file changes, and dynamic content generation, which can all be factors in getting every ounce of performance out of your web site.

Note

Benchmarking Apache with ab

Problem

You want to benchmark changes that you are making to verify that they are in fact making a difference in performance.

Solution

Use ab (Apache bench), which you will find in the bin directory of your Apache installation:

ab -n 1000 -c 10 http://www.example.com/test.html

Discussion

ab

Tuning Keepalive Settings

Problem

You want to tune the keepalive-related directives to the best possible setting for your web site.

Solution

Turn on the KeepAlive setting, and set the related directives to sensible values:

KeepAlive On MaxKeepAliveRequests 0 KeepAliveTimeout 15

Discussion

The default behavior of HTTP is for each document to be requested over a new connection. This causes a lot of time to be spent opening and closing connections. KeepAlive allows multiple requests to be made over a single connection, thus reducing the time spent establishing socket connections. This, in turn, speeds up the load time for clients requesting content from your site.

Getting a Snapshot of Your Site's Activity

Problem

You want to find out exactly what your server is doing.

Solution

Enable the server-status handler to get a snapshot of what child processes are running and what each one is doing. Enable ExtendedStatus to get even more detail:

<Location /server-status> SetHandler server-status </Location> ExtendedStatus On

Avoiding DNS Lookups

Problem

You want to avoid situations where you have to do DNS lookups of client addresses, as this is a very slow process.

Solution

Always set the HostNameLookups directive to Off:

HostNameLookups Off

And make sure that, whenever possible, Allow from and/or Deny from directives use the IP address, rather than the hostname of the hosts in question.

Optimizing Symbolic Links

Problem

You wish to balance the security needs associated with symbolic links with the performance impact of a solution, such as using Options SymLinksIfOwnerMatch, which causes a server slowdown.

Solution

For tightest security, use Options SymlinksIfOwnerMatch, or Options -FollowSymLinks if you seldom or never use symlinks.

Minimizing the Performance Impact of .htaccess Files

Problem

You want per-directory configuration but want to avoid the performance hit of .htaccess files.

Solution

Turn on AllowOverride only in directories where it is required, and tell Apache not to waste time looking for .htaccess files elsewhere:

AllowOverride None

Then use <Directory> sections to selectively enable .htaccess files only where needed.

Discussion

.htaccess files cause a substantial reduction in Apache's performance, because it must check for a .htaccess in every directory along the path to the requested file to be assured of getting all of the relevant configuration overrides. This is necessary because Apache configuration directives apply not only to the directory in which they are set, but also to all subdirectories. Thus, we must check for .htaccess

Disabling Content Negotiation

Problem

Content negotiation causes a big reduction in performance.

Solution

Disable content negotiation where it is not needed. If you do require content negotiation, use the type-map handler, rather than the MultiViews option:

Options -MultiViews AddHandler type-map var

Discussion

If at all possible, disable content negotiation. However, if you must do content negotiation—if, for example, you have a multilingual web site—you should use the type-map handler, rather than the MultiViews method.

When MultiViews is used, Apache needs to get a directory listing each time a request is made. The resource requested is compared to the directory listing to see what variants of that resource might exist. For example, if index.html is requested, the variants index.html.en and index.html.fr might exist to satisfy that request. Each matching variant is compared with the user's preferences, expressed in the various Accept headers passed by the client. This information allows Apache to determine which resource is best suited to the user's needs.

Optimizing Process Creation

Problem

You're using Apache 1.3, or Apache 2.0 with the prefork MPM, and you want to tune MinSpareServers and MaxSpareServers to the best settings for your web site.

Solution

Will vary from one site to another. You'll need to watch traffic on your site and decide accordingly.

Discussion

The MinSpareServers

Tuning Thread Creation

Problem

You're using Apache 2.0 with one of the threaded MPMs, and you want to optimize the settings for the number of threads.

Solution

Will vary from server to server.

Discussion

The various threaded MPMs on Apache 2.0 handle thread creation somewhat differently. In Apache 1.3, the Windows and Netware versions are threaded, while the Unixish version is not. Tuning the thread creation values will vary from one of these versions to another.

Setting the number of threads on single-child MPMs

On MPMs that run Apache with a single threaded child process, such as the Windows MPM (mpm_winnt), and the Windows and Netware versions of Apache 1.3, there are a fixed number of threads in the child process. This number is controlled by the ThreadsPerChild directive and must be large enough to handle the peak traffic of the site on any given day. There really is no performance tuning that can be done here, as this number is fixed throughout the lifetime of the Apache process.

Number of threads when using the worker MPM

The worker

Caching Frequently Viewed Files

Problem

You want to cache files that are viewed frequently, such as your site's front page, so that they don't have to be loaded from the filesystem every time.

Solution

Use mod_mmap_static or mod_file_cache (for Apache 1.3 and 2.0, respectively) to cache these files in memory:

MMapFile /www/htdocs/index.html MMapFile /www/htdocs/other_page.html

For Apache 2.0, you can use either module or the CacheFile directive. MMapFile caches the file contents in memory, while CacheFile

Sharing Load Between Servers Using mod_proxy

Problem

You want to have a certain subset of your web site served from another machine, in order to share the load of the site.

Solution

Use ProxyPass and ProxyPassReverse to have Apache fetch the content from another server:

Distributing Load Evenly Between Several Servers

Problem

You want to serve the same content from several servers and have hits distributed evenly among the servers

Solution

Use DNS round-robin to have requests distributed evenly, or at least fairly evenly, among the servers:

Caching Directory Listings

Problem

You want to provide a directory listing but want to reduce the performance hit of doing so.

Solution

Use the TrackModified argument to IndexOptions

Speeding Up Perl CGI Programs with mod_perl

Problem

You have existing functional Perl CGI programs and want them to run faster.

Solution

If you have the mod_perl module installed, you can configure it to run your Perl CGI programs, instead of running mod_cgi. This gives you a big performance boost, without having to modify your CGI code.

There are two slightly different ways to do this.

For Apache 1.3 and mod_perl Version 1:

Alias /cgi-perl/ /usr/local/apache/cgi-bin/ <Location /cgi-perl> Options ExecCGI SetHandler perl-script PerlHandler Apache::PerlRun PerlSendHeader On </Location> Alias /perl/ /usr/local/apache/cgi-bin/ <Location /perl> Options ExecCGI SetHandler perl-script PerlHandler Apache::Registry PerlSendHeader On </Location>

Chapter 12. Miscellaneous Topics

With its hundreds of configuration directives, and dozens upon dozens of modules providing additional functionality, the Apache web server can be terrifically complex. So too can the questions about how to use it. We have collected many of the most common questions we have seen and categorized them, putting related topics into their own chapters when there were enough of them.

However, some of the things that come up don't fall readily into one of the categories we have chosen, or perhaps are more fundamental and we've collected them into this catch-all chapter of "things that don't belong anywhere else."

Placing Directives Properly

Problem

You know what directive you need but aren't sure where to put it.

Solution

If you wish the scope of the directive to be global (i.e., you want it to affect all requests to the web server), then it should be put in the main body of the configuration file or it should be put in the section starting with the line <Directory /> and ending with </Directory>.

If you wish the directive to affect only a particular directory, it should be put in a <Directory> section that specifies that directory. Be aware that directives specified in this manner also affect subdirectories of the stated directory.

Likewise, if you wish the directive to affect a particular virtual host or a particular set of URLs, then the directive should be put in a <VirtualHost> section, <Location> section, or perhaps a <Files> section, referring to the particular scope in which you want the directive to apply.

In short, the answer to "Where should I put it?" is "Where do you want it to be in effect?"

Renaming .htaccess Files

Problem

You want to change the default name of per-directory configuration files to something else, such as on a Windows system, because filenames beginning with a dot can cause problems.

Solution

Use the AccessFileName

Generating Directory/Folder Listings

Problem

You want to see a directory listing when a directory is requested.

Solution

Turn on Options Indexes for the directory in question:

<Directory /www/htdocs/images> Options +Indexes </Directory>

Discussion

When a URL maps to a directory or folder in the filesystem, Apache will respond to the request in one of three ways:

If mod_dir is part of the server configuration, and the mapped directory is within the scope of a DirectoryIndex directive, and the server can find one of the files identified in that directive, then the file will be used to generate the response.

If

Solving the "Trailing Slash" Problem

Problem

Loading a particular URL works with a trailing slash but does not work without it.

Solution

Make sure that ServerName is set correctly and that none of the Alias directives have a trailing slash.

Discussion

The "trailing slash" problem can be caused by one of two configuration problems: an incorrect or missing value of ServerName, or an Alias with a trailing slash that doesn't work without it.

Setting the Content-Type According to Browser Capability

Problem

You want to set Content-Type headers differently for different browsers, which may render the content incorrectly otherwise.

Handling Missing Host: Header Fields

Problem

You want to treat differently all requests that are made without a Host: request header field.

Solution

Alternate Default Document

Problem

You want to have some file other than index.html appear by default.

Solution

Setting Up a Default "Favicon"

Problem

You want to define a default favorite icon, or "favicon," for your site, but allow individual sites or users to override it.

Solution

Put your default favicon.ico file into the /icons/

Appendix A. Using Regular Expressions in Apache

A number of the Apache web server's configuration directives permit (or require!) the use of what are called regular expressions . Regular expressions are used to determine if a string, such as a URL or a user's name, matches a pattern.

There are numerous resources that cover regular expressions in excruciating detail, so this appendix is not designed to be a tutorial for their use. Instead, it documents the specific features of regular expressions used by Apache—what's available and what isn't. Even though there are quite a number of regular expression packages, with differing feature sets, there are some commonalities among them. The Perl language, for instance, has a particularly rich set of regular expressions but only a small subset of them are available in the Apache regex library, which is different from Perl's.

Regular expressions, as mentioned, are a language that allows you to determine if a particular string or variable looks like some pattern. For example, you may wish to determine if a particular string is all uppercase, or if it contains at least 3 numbers, or perhaps if it contains the word "monkey" or "Monkey." Regular expressions provide a vocabulary for talking about these sort of tests. Most modern programming languages contain some variety of regular expression library, and they tend to have a large number of things in common, although they may differ in small details.

Apache 1.3 uses a regular expression library called hsregex , so called because it was developed by Henry Spencer. Note that this is the same regular expression library used in egrep, which is the same thing as grep on many Unixish platforms.

Apache 2.0 uses a somewhat more full-featured regular expression library called Perl Compatible Regular Expressions (PCRE), so called because it implements many of the features available in the regular expression engine that comes with the Perl programming language. While this appendix does not attempt to communicate all the differences between these two implementations, you should know that hsregex is a subset of PCRE, as far as functionality goes, so everything you can do with regular expressions in Apache 1.3, you can do in 2.0, but not necessarily the other way around.

To grossly simplify, regular expressions implement two kinds of characters. Some characters mean exactly what they say (for example, a G appearing in a regular expression will usually mean the literal character G), while some characters have special significance (for example, the period (.) will match any character at all—a wildcard character). Regular expressions can be composed of these characters to represent (almost) any desired pattern appearing in a string.

What Directives Use Regular Expressions?

Two main categories of Apache directives use regular expressions. Any directive with a name containing the word Match, such as FilesMatch, can be assumed to use regular expressions in its arguments. And directives supplied by the module mod_rewrite use regular expressions to accomplish their work.

For more about mod_rewrite, see Chapter 5.

Something Match directives each implement the same functionality as their counterpart without the Match. For example, the RedirectMatch directive does essentially the same thing as the Redirect directive, except that the first argument, rather than being a literal string, is a regular expression, which will be compared to the incoming request URL.

Regular Expression Basics

To get started in writing your own regular expressions, you'll need to know a few basic pieces of vocabulary, such as shown in Table A-1 and Table A-2. These constitute the bare minimum that you need to know. Although this will hardly qualify you as an expert, it will enable you to solve many of the regex scenarios you will find yourself faced with.

Table A-1. A basic regex vocabulary

Character

Meaning

.

Matches any character. This is the wildcard character.

+

Appendix B. Troubleshooting

The Apache web server is a very complex beast. In the vanilla package it includes over 30 functional modules and more than 12 dozen configuration directives. This means that there are significant opportunities for interactions that produce unexpected or undesirable results. This appendix covers some of the more common issues that cause problems, as culled from various support forums.

Troubleshooting Methodology

In the Error Log

Debugging the Configuration

Debugging Premature End of Script Headers

When you're working with CGI scripts, certain messages can quickly become extremely familiar and tiresome; typically the output in the browser window will be either a blank page or an Internal Server Error page.

This message has several different possible causes. These include, but are not necessarily limited to:

Common Problems on Windows

Windows has its own distinct set of problem areas that don't apply to Unixish environments.

Cannot Determine Hostname

When trying to start Apache from a DOS window, you receive a message like " Cannot determine hostname. Use ServerName directive to set it manually."

If you don't explicitly supply Apache with a name for your system, it tries to figure it out. This message is the result of that process failing.

The cure for this is really quite simple: edit your conf\httpd.conf file, look for the string ServerName, and make sure there's an uncommented directive such as:

ServerName localhost

or:

ServerName www.foo.com

in the file. Correct it if there is one there with wrong information, or add one if you don't already have one.

Also, make sure that your Windows system has DNS enabled. See the TCP/IP setup component of the Networking or Internet Options control panel.

After verifying that DNS is enabled and that you have a valid hostname in your ServerName directive, try to start the server again.

Fixing Build-Time Error Messages

__inet Symbols

If you have installed BIND-8, then this is normally due to a conflict between your include files and your libraries. BIND-8 installs its include files and libraries in /usr/local/include/ and /usr/local/lib/, while the resolver that comes with your system is probably installed in /usr/include/ and /usr/lib/.

If your system uses the header files in /usr/local/include/ before those in

Getting Server-Side Includes to Work

The solution is to make sure that Options

Debugging Rewrites That Result in "Not Found" Errors

If your RewriteRule directives keep resulting in 404 Not Found error pages, add the PT (PassThrough) flag to the RewriteRule line. Without this flag, Apache won't process a lot of other factors that might apply, such as

.htaccess Files Having No Effect

Make sure that AllowOverride is set to an appropriate value. Then, to make sure that the .htaccess file is being parsed at all, put the following line in the file and ensure that it causes a server error page to show up in your browser:

Garbage Goes Here

Address Already in Use

If, when attempting to start your Apache server, you get the following error message:

[Thu May 15 01:23:40 2003] [crit] (98)Address already in use: make_sock: could not bind to port 80

One of three things is happening:

Colophon

Our look is the result of reader comments, our own experimentation, and feedback from distribution channels. Distinctive covers complement our distinctive approach to technical topics, breathing personality and life into potentially dry subjects.

The animal on the cover of Apache Cookbook is a moose. The moose roams the forests of North America, Europe, and Russia. It's the largest of the deer family, and the largest moose of all, Alces alces gigas

Share-widget Embed This Book (Widget)
Close this box

BUY THIS BOOK TO SEE THE REST OF THIS SECTION

Or continue reading chapter excerpts




Buy this Book
Close this box
Grab this widgetGrab this Reader WidgetGrab this widget
Which size LAUNCHER fits on your site? The code will open to this book.
Wide version of launcher
SQUARE 250 x 250
Narrow version of launcher
RECTANGLE 160 X 250
Copy and paste this code into your site. We’ll do the rest.
Learn more.
Log In
Close this box