Copyright © 2009 Ken Coar and Richard Bowen
O'Reilly Media
The recipes in this book are geared toward two major platforms: Unixish ones (such as Linux, FreeBSD, and Solaris) and Windows. There are many that have no platform-specific aspects, and for those any mention of the underlying operating system or hardware is gratefully omitted. Due to the authors' personal preferences and experiences, Unixish coverage is more complete than that for the Windows platforms. However, contributions, suggestions, and corrections for Windows-specific recipes will be gladly considered for future revisions and inclusion on the web site.
There are a number of books currently in print that deal with the Apache web server and its operation. Among them are:
Apache: The Definitive Guide, Third Edition (O'Reilly)
Apache Server Unleashed (Macmillan)
Apache Administrator's Handbook (Macmillan)
You can also keep an eye on a couple of web pages that track Apache titles:
http://Apache-Server.Com/store.html
http://httpd.apache.org/info/apache_books.html
In addition to books, there is a wealth of information available online. There are web sites, mailing lists, and USENET newsgroups devoted to the use and management of the Apache web server. The web sites are limitless, but here are some active and useful sources of information.
The comp.infosystems.www.servers.unix and comp.infosystems.www.servers.ms-dos
This book is broken up into twelve chapters and two appendixes, as follows:
Chapter 1 covers the basics of installing the vanilla Apache software, from source on Unixish systems, and on Windows from the Microsoft Software Installer (MSI) package built by the Apache developers.
Chapter 2 describes the details of installing some of the most common third-party modules, and includes generic instructions that apply to many others that have less complex installation needs.
Throughout this book certain stylistic conventions are followed. Once you are accustomed to them, you can easily distinguish between comments, commands you need to type, values you need to supply, and so forth.
In some cases, the typeface of terms in the main text will be different and likewise in code examples. The details of what the different styles (italic, boldface, etc.) mean are described in the following sections.
In this book, most case examples of code will be in the form of excerpts from scripts, rather than actual application code. When commands need to be issued at a command-line prompt (such as an xterm for a Unixish system or a DOS command prompt for Windows), they will look something like this:
% find /usr/local -name apachectl -print # /usr/local/apache/bin/apachectl graceful C:>cd "\Program Files\Apache Group\Apache\bin" C:\Program Files\Apache Group\Apache\bin>apache -k stopOn Unixish systems, command prompts that begin with # indicate that you need to be logged in as the superuser (root username); if the prompt begins with %
We have tested and verified the information in this book to the best of our ability, but you may find that features have changed (which may in fact resemble bugs). Please let us know about any errors you find, as well as your suggestions for future editions, by writing to:
Please address comments and questions concerning this book to the publisher:
O'Reilly & Associates, Inc.Originally, each recipe was going to be individually attributed, but that turned out to be logistically impossible.
Many people have helped us during the writing of this book, by posing a problem, providing a solution, proofreading, reviewing, editing, or just (!) providing moral support. This multitude, to each of whom we are profoundly grateful, includes Nat Torkington (our project editor and demonstrator of Herculean feats of patience), Sharco and Guy- from #apache on irc.freenode.net, Mads Toftum, Morbus Iff (known to the FBI under the alias Kevin Hemenway), and Andy Holman.
For this cookbook to be useful, you need to install the Apache web server software. So what better way to start than with a set of recipes that deal with the installation?
There are many ways of installing this package; one of the features of open software like Apache is that anyone may make an installation kit. This allows vendors (such as Debian, FreeBSD, Red Hat, Mandrake, Hewlett-Packard, and so on) to customize the Apache file locations and default configuration settings so that these settings fit with the rest of their software. Unfortunately, one of the consequences of customization is that the various prepackaged installation kits are almost all different from one another.
In addition to installing it from a prepackaged kit, of which the variations are legion, there's always the option of building and installing it from the source yourself. This has both advantages and disadvantages; on the one hand you know exactly what you installed and where you put it, but on the other hand, it's likely that binary add-on packages will expect files to be in locations different than those you have chosen.
You want to install the Apache web server software on a Windows platform.
If you already have Apache installed on your Windows system, remove it before installing a new version. Failure to do this results in unpredictable behavior. See Recipe 1.7.
Primarily, Windows is a graphically oriented environment, so the Apache install for Windows is correspondingly graphical in nature.
The simplest way to install Apache is to download and execute the Microsoft Software Installer (MSI) package from the Apache web site at http://httpd.apache.org/download. The following screenshots come from an actual installation made using this method.
Each step of the installation process is distinct in the process and you can revise earlier decisions, until the files are installed. The first screen (Figure 1-1) simply confirms what you're about to do and the version of the package you're installing.
Figure 1-1. First screen of Apache MSI install
The second screen (Figure 1-2) presents the Apache license. Its basic tenets boil down to the following: do what you want with the software, don't use the Apache marks (trademarks like the feather or the name Apache) without permission, and provide proper attribution for anything you build based on Apache software. (This only applies if you plan to distribute your package; if you use it strictly on an internal network, this isn't required.) You can't proceed past this screen until you agree to the license terms.
Figure 1-2. License agreement
You want to build the Apache web server yourself from the sources directly (see Recipe 1.4), but don't know how to obtain them.
There are a number of ways to obtain the sources. You can access the latest version in close to real-time by using CVS, the tool used by the Apache developers for source control, you can download a release tarball, or you can install a source package prepared by a distributor, among others.
From a prepackaged tarball, download the tarball from http://httpd.apache.org/dist/, and then:
% tar xzvf apache_1.3.27.tar.gzIf your version of tar doesn't support the z option for processing zipped archives, use this command instead:
% gunzip -c < apache_1.3.27.tar.gz | tar xvf -You want to build your Apache web server from the sources directly rather than installing it from a prepackaged kit.
Assuming that you already have the Apache source tree, whether you installed it from a tarball, CVS, or some distribution package, the following commands, executed in the top directory of the tree, builds the server package with most of the standard modules as DSOs:
Apache 1.3:
% ./configure --prefix= /usr/local/apache --with-layout=Apache --enable-shared=max--enable-module=most % makeYou have a complicated collection of modules you want to install correctly.
Download ApacheToolbox from http://www.apachetoolbox.com/. (Note that the version numbers will probably be different than these, which were the latest available when this section was written.) Unpack the file:
% bunzip2 Apachetoolbox-1.5.65.tar.bz2 % tar xvf Apachetoolbox-1.5.65.tar(Depending on your version of tar, you may be able to combine these operations into a single tar xjvf command.)
Then run the installation script:
# cd Apachetoolbox-1.5.65 # ./install.shApacheToolbox is developed and maintained by Bryan
You want to be able to start and stop the server at need, using the appropriate tools.
On Unixish systems, use the apachectl script; on Windows, use the options in the Apache folder of the Start menu.
The basic Apache package includes tools to make it easy to control the server. For Unixish systems, this is usually a script called apachectl
You have the Apache software installed on your system, and you want to remove it.
On Red Hat Linux, to remove an Apache version installed with the RPM tool, use:
# rpm -ev apacheOther packaging systems may provide some similar mechanism.
There are a number of extremely popular modules for the Apache web server that are not included in the basic distribution. Most of these are separate because of licensing or support reasons; some are not distributed by the Apache Software Foundation because of a decision by the Apache developers; and some are integral parts of other projects. For instance, mod_ssl for Apache 1.3 is developed and maintained separately not only because of the U.S. export control laws (which were more restrictive when the package was originally developed), but because it requires changes to the core software that the Apache developers chose not to integrate.
This chapter provides recipes for installing some of the most popular of these third-party modules; when available, there are separate recipes for installation on Unixish systems and on Windows.
The most comprehensive list of third-party modules can be found in the Apache Module Registry at http://modules.apache.org/. Some modules are so popular—or complex—that they have entire sites devoted to them, as do the ones listed in this chapter.
Although hundreds of third-party modules are available, many module developers are only concerned with their single module. This means that there are potentially as many different sets of installation instructions as there are modules. The first recipe in this chapter describes an installation process that should work with many Apache 1.3 modules, but you should check with the individual packages' instructions to see if they have a different or more detailed process.
Many of the modules are available from organizations that prepackage or distribute Apache software, such as in the form of an RPM from Mandrake or Red Hat, but such prebuilt module packages include the assumptions of the packager. In other words, if you build the server from source and use custom locations for the files, don't be surprised if the installation of a packaged module fails.
All of the modules described in this chapter are supported with Apache 1.3 on Unixish systems. Status of support with Apache 2.0 on Windows is shown in Table 2-1.
Table 2-1. Module support status
You want to add or enable WebDAV capabilities to your server. WebDAV permits specific documents to be reliably and securely manipulated by remote users without the need for FTP, to perform such tasks as adding, deleting, or updating files.
If you're using Apache 2.0, mod_dav is automatically available, although you may need to enable it at compile time with —enable-dav.
If you are using Apache 1.3, download and unpack the mod_dav source package from http://webdav.org/mod_dav/, and then:
% cd mod_dav-1.0.3-1.3.6 % ./configure --with- apxs=/usr/local/ apache/bin/apxs % make # make installRestart the server, and be sure to read Recipe 6.18.
mod_dav is an encapsulated and well-behaved module that is easily built and added to an existing server. To test that it has been properly installed, you need to enable some location on the server for WebDAV management and verify access to that location with some WebDAV-capable tool. We recommend cadaver
You want to enable WebDAV capabilities on your existing Apache 1.3 server with mod_dav.
Apache 2.0 includes mod_dav as a standard module, so you do not need to download and build it.
Download and unpack the mod_dav Windows package from http://webdav.org/mod_dav/win32/. Verify that your Apache installation already has the xmlparse.dll and xmltok.dll files in the ServerRoot directory; if they aren't there, check through the Apache directories to locate and copy them to the ServerRoot. mod_dav requires the Expat package, which is included with versions of the Apache web server after 1.3.9; these files hook into Expat, which mod_dav will use.
Put the mod_dav DLL file into the directory where Apache keeps its modules:
C:\>cd mod_dav-1.0.3-dev C:\mod_dav-1.0.3-dev>copy mod_dav.dll C:\Apache\modules C:\mod_dav-1.0.3-dev>cd \ApacheAdd the following lines to your httpd.conf file:
LoadModule dav_module modules/mod_dav.dllYou may also need to add an AddModule line if your httpd.conf file includes a ClearModuleList
You want to install the mod_perl scripting module to allow better Perl script performance and easy integration with the web server.
Download and unpack the mod_perl source package from http://perl.apache.org/. Then use the following command:
% perl Makefile.PL \ > USE_APXS=1 \ > WITH_APXS= /usr/local/apache/bin/apxs \ > EVERYTHING=1 \ > PERL_USELARGEFILES=0 % make % make installRestart your server.
mod_perl
You want to add the mod_php scripting module to your existing Apache web server.
Download the mod_php package source from the web site at http://php.net/ (follow the links for downloading) and unpack it. Then:
% cd php-4.3.2 %You want to add the mod_php scripting module to your existing Apache server on Windows.
This recipe needs to be described largely in terms of actions rather than explicit commands to be issued.
Download the PHP Windows binary .zip file with API extensions (not the .exe file) from http://php.net/.
Unpack the .zip file into a directory where you can keep its contents indefinitely (such as C:\PHP4). If you use WinZip, be sure to select the Use
You want to add the mod_snake Python scripting module to your existing Apache server.
To install mod_snake on a Unixish system, download the source from the http://sourceforge.net/projects/modsnake/
You want to add SSL support to your Apache server with the mod_ssl secure HTTP module.
At the time of this writing, there is no supported means of installing mod_ssl on Windows.
Apache 2.0mod_ssl is included with 2.0, although it is not automatically compiled nor installed when you build from source. You need to include the —enable-ssl option on your ./configure line, and enable it with LoadModule
Apache can, and usually does, record information about every request it processes. Controlling how this is done and extracting useful information out of these logs after the fact is at least as important as gathering the information in the first place.
The logfiles may record two types of data: information about the request itself, and possibly one or more messages about abnormal conditions encountered during processing (such as file permissions). You, as the webmaster, have a limited amount of control over the logging of error conditions, but a great deal of control over the format and amount of information logged about request processing (activity logging ). The server may log activity information about a request in multiple formats in mulitple log files, but it will only record a single copy of an error message.
One aspect of activity logging you should be aware of is that the log entry is formatted and written after the request has been completely processed. This means that the interval between the time a request begins and when it finishes may be long enough to make a difference.
For example, if your logfiles are rotated while a particularly large file is being downloaded, the log entry for the request will appear in the new logfile when the request completes, rather than in the old logfile when the request was started. In contrast, an error message is written to the error log as soon as it is encountered.
The web server will continue to record information in its logfiles as long as it's running. This can result in extremely large logfiles for a busy site and uncomfortably large ones even for a modest site. To keep the file sizes from growing ever larger, most sites rotate or roll over their logfiles on a semi-regular basis. Rolling over a logfile simply means persuading the server to stop writing to the current file and start recording to a new one. Due to Apache's determination to see that no records are lost, cajoling it to do this according to a specific timetable may require a bit of effort; some of the recipes in this chapter cover how to accomplish the task successfully and reliably (see Recipe 3.8 and Recipe 3.9).
The log declaration directives, CustomLog and ErrorLog , can appear inside <VirtualHost> containers, outside them (in what's called the main or global server, or sometimes the global scope ), or both. Entries will only be logged in one set or the other; if a <VirtualHost> container applies to the request or error and has an applicable log directive, the message will be written only there and won't appear in any globally declared files. On the other hand, if no <VirtualHost> log directive applies, the server will fall back on logging the entry according to the global directives.
However, whichever scope is used for determining what logging directives to use, all CustomLog directives in that scope are processed and treated independently. That is, if you have a CustomLog directive in the global scope and two inside a <VirtualHost> container, both of these will be used. Similarly, if a CustomLog directive uses the env= option, it has no effect on what requests will be logged by other CustomLog directives in the same scope.
Activity logging has been around since the Web first appeared, and it didn't take long for the original users to decide what items of information they wanted logged. The result is called the common log format (CLF). In Apache terms, this format is:
"%h %l %u %t \"%r\" %>s %b"That is, it logs the client's hostname or IP address, the name of the user on the client (as defined by RFC 1413 and if Apache has been told to snoop for it with an IdentityCheck On directive), the username with which the client authenticated (if weak access controls are being imposed by the server), the time at which the request was received, the actual HTTP request line, the final status of the server's processing of the request, and the number of bytes of content that were sent in the server's response.
Before long, as the HTTP protocol advanced, the common log format was found to be wanting, so an enhanced format, called the combined log format , was created:
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""The two additions were the Referer (it's spelled incorrectly in the specifications) and the User-agent. These are the URL of the page that linked to the document being requested, and the name and version of the browser or other client software making the request.
Both of these formats are widely used, and many logfile analysis tools assume log entries are made in one or the other.
The Apache web server's standard activity logging module allows you to create your own formats; it is highly configurable and is called (surprise!) mod_log_config . Apache 2.0 has an additional module, mod_logio , which enhances mod_log_config
You want more information in the error log in order to debug a problem.
Change (or add) the LogLevel line in your httpd.conf file. There are several possible arguments, which are enumerated below:
For example:
LogLevel DebugThere are several hierarchical levels of error logging available, each identified by its own keyword. The default value of LogLevel is warn. Listed in descending order of importance, the possible values are:
emergEmergencies; web server is unusable
alertAction must be taken immediately
critCritical conditions
You want to record data submitted with the POST method, such as from a web form.
Generally not possible in Apache 1.3 unless the POST-handling module explicitly records the data; possible in Apache 2.0
You want to log the IP address of the actual client requesting your pages, even if they're being requested through a proxy.
None.
You want to record all the cookies sent to your server by clients and all the cookies your server asks clients to set in their databases; this can be useful when debugging web applications that use cookies.
To log cookies received from the client:
CustomLog logs/cookies_in.log "%{UNIQUE_ID}e %{Cookie}i" CustomLog logs/cookies2_in.log "%{UNIQUE_ID}e %{Cookie2}i"To log cookie values set and sent by the server to the client:
CustomLog logs/cookies_out.log "%{UNIQUE_ID}e %{Set-Cookie}o" CustomLog logs/cookies2_out.log "%{UNIQUE_ID}e %{Set-Cookie2}o"Using the %{Set-Cookie}o format effector for debugging is not recommended if multiple cookies are (or may be) involved. Only the first one will be recorded in the logfile. See the Discussion text for an example.
At the time of this writing, the Apache package includes no way to record all cookie values, but one of the authors of this book is working on one. When it's available, it should be mentioned on this book's web site, http://Apache-Cookbook.Com/.
You want to log requests for images on your site, except when they're requests from one of your own pages. You might want to do this to keep your logfile size down, or possibly to track down sites that are hijacking your artwork and using it to adorn their pages.
Use SetEnvIfNoCase to restrict logging to only those requests from outside of your site:
You want to automatically roll over the Apache logs at specific times without having to shut down and restart the server.
Use CustomLog and the rotatelogs program:
CustomLog "| /path/to/rotatelogs /path/to/logs/access_log.%Y-%m-%d 86400" combinedYou want to close the previous month's logs and open new ones on the first of each month.
You want to see hostnames in your activity log instead of IP addresses.
You can let the web server resolve the hostname when it processes the request by enabling runtime lookups with the Apache directive:
HostnameLookups OnOr, you can let Apache use the IP address during normal processing and let a piped logging process resolve them as part of recording the entry:
HostnameLookups Off CustomLog "| /path/toYou want to have separate activity logs for each of your virtual hosts, but you don't want to have all the open files that multiple CustomLog directives would use.
Use the split-logfile program that comes with Apache. To split logfiles after they've been rolled over (replace /path/to/ServerRoot with the correct path):
# cd /path/toYou want to log requests that go through your proxy to a different file than the requests coming directly to your server.
Use the SetEnv directive to earmark those requests that came through the proxy server, in order to trigger conditional logging:
<Directory proxy:*> SetEnv is_proxied 1 </Directory> CustomLog logs/proxy_log combined env=is_proxiedApache 1.3 has a special syntax for the <Directory>
Unlike access logs, Apache only logs errors to a single location. You want Apache to log errors that refer to a particular virtual host to the host's error log, as well as to the global error log.
There are at least two possible ways of doing this:
You want to log the IP address of the server that responds to a request, possibly because you have virtual hosts with multiple addresses each.
You want to record the URL of pages that refer clients to yours, perhaps to find out how people are reaching your site.
You want to know the software visitors use to access your site, for example, so you can optimize its appearance for the browser that most of your audience uses.
You want to record the values of arbitrary fields clients send to their request header, perhaps to tune the types of content you have available to the needs of your visitors.
Use the %{...}i
You want to record the values of arbitrary fields the server has included in a response header, probably to debug a script or application.
Rather than logging accesses to your server in flat text files, you want to log the information directly to a database for easier analysis.
Install the latest release of mod_log_sql
You want to send your log entries to syslog.
To log your error log to syslog, simply tell Apache to log to syslog:
ErrorLog syslog:userSome syslog reporting class other than user, such as local1, might be more appropriate in your environment.
Logging your access log to syslog takes a little work. Add the following to your configuration file:
You want each user directory web site (i.e., those accessed via http://servername/~ username) to have its own logfile.
In httpd.conf, add the directive:
CustomLog "|/usr/local/apache/bin/userdir_log" combinedThen, in the file /usr/local/apache/bin/userdir_log, place the following code:
A web server system supports multiple web sites in a way similar to a person who responds to her given name, as well as her nickname. In the Apache configuration file, each alternate identity, and probably the "main" one as well, is known as a virtual host (sometimes written as vhost) identified with a <VirtualHost> container directive. Depending on the name used to access the web server, Apache responds appropriately, just as someone might answer differently depending on whether she is addressed as "Miss Jones" or "Hey, Debbie!" If you want to have a single system support multiple web sites, you must configure Apache appropriately.
There are two different types of virtual host supported by Apache. The first type, called address-based or IP-based, is tied to the numeric network address used to reach the system. Bruce Wayne never answered the parlour telephone with "Batman here!" nor did he answer the phone in the Batcave by saying, "Bruce Wayne speaking." However, it's the same person answering the phone, just as it's the same web server receiving the request.
The other type of virtual host is name-based, because the server's response depends on what it is called. To continue the telephone analogy, consider an apartment shared by multiple roommates; you call the same number whether you want to speak to Dave, Joyce, Amaterasu, or George. Just as multiple people may share a single telephone number, multiple web sites can share the same IP address. However, all IP addresses shared by multiple Apache virtual hosts need to be declared with a NameVirtualHost directive.
You want all requests, whether they match a virtual host or use an IP address, to be directed to a default host, possibly with a "host not found" error message.
Add the following <VirtualHost> section, and list it before all of your other ones:
You have multiple IP addresses assigned to your system, and you want to support one web site on each.
You want to create a virtual host to catch all requests that don't map to one of your address-based virtual hosts.
You have multiple IP addresses assigned to your system, and you want to support more than one web site on each address.
You want to host many virtual hosts, all of which have exactly the same configuration.
Use VirtualDocumentRoot and VirtualScriptAlias provided by mod_vhost_alias .
VirtualDocumentRoot /www/vhosts/%-1/%-2.1/%-2/htdocs VirtualScriptAlias /www/vhosts/%-1/%-2.1/%-2/cgi-binThis recipe uses directives from mod_vhost_alias , which you may not have installed when you built Apache, as it is not one of the modules that is enabled by default.
These directives map requests to a directory built up from pieces of the hostname that was requested. Each of the variables represents one part of the hostname, so that each hostname is be mapped to a different directory.
In this particular example, requests for content from www.example.com is served from the directory /www/vhosts/com/e/example/htdocs, or from /www/vhosts/com/e/example/cgi-bin (for CGI requests). The full range of available variables is shown in Table 4-1.
Table 4-1. mod_vhost_alias variables
Variable
Meaning
%%
insert a %
%p
insert the port number of the virtual host
%M.N
insert (part of) the name
Although there is a module—mod_vhost_alias—which is explicitly for the purpose of supporting large numbers of virtual hosts, it is very limiting and requires that every virtual host be configured exactly the same way. You want to support a large number of vhosts, configured dynamically, but, at the same time, you want to avoid mod_vhost_alias.
You want to have multiple SSL web sites on the same server.
You want each virtual host to have its own logfiles.
Specify Errorlog and CustomLog within each virtual host declaration:
Due to a large number of virtual hosts, you want to have a single logfile and split it up afterwards.
Then, after rotating your logfile:
split-logfile < logs/vhost_logThe LogFormat directive in this recipe creates a logfile that is similar to the common log file format but additionally contains the name of the virtual host being accessed. The split-logfile utility splits up this logfile into its constituent virtual hosts.
You want to present different content for HTTP connections on different ports.
You want to have the same content displayed on two of your addresses.
Specify both addresses in the <VirtualHost>
When Apache receives a request, it is assumed that the client will be served a file out of the DocumentRoot directory. However, there will be times when you want these resources to be served from some other location. For example, if you wanted to place a set of documents on your web site, it may be more convenient to leave them where they are, rather than to move them to a new location.
In this chapter, we deal with three general categories of these sort of cases. Aliasing refers to mapping a URL to a particular directory. Redirecting refers to mapping a URL to another URL. And Rewriting refers to using mod_rewrite to alter the URL in some way.
Other recipes in this chapter are related because they map URLs to resources that are at unexpected places in the filesystem.
You want to serve content out of a directory other than the DocumentRoot directory. For example, you may have an existing directory of documents, which you want to have on your web site that you do not want to move into the Apache document root.
The example given maps URLs starting with /desired-URL-prefix to files in the /path/to/other/directory directory. For example, a request for the URL:
http://example.com/desired/something.htmlresults in the file /path/to/other/directory/something.html being sent to the client.
You have an existing directory which you want to access using a different name.
Use an Alias directive in httpd.conf:
You want to give each user on your system his own web space.
If you want users' web locations to be under their home directories, add this to your httpd.conf file:
UserDir public_htmlTo put all users' web directories under a central location:
UserDir /www/users/*/htdocsIf you have mod_perl installed, you can do something more advanced like this (again, added to your httpd.conf file):
<Perl> # Folks you don't want to have this privilege my %forbid = map { $_ => 1 } qw(root postgres bob); opendir H, '/home/'; my @dir = readdir(H); closedir H; foreach my $u (@dir) { next if $u =~ m/^\./; next if $forbid{$u}; if (-e "/home/$u/public_html") { push @Alias, "/$u/", "/home/$u/public_html/"; } } </Perl>The first solution is the simplest and most widely used of the possible recipes we present here. With this directive in place, all users on your system are able to create a directory called public_html in their home directories and put web content there. Their web space is accessible via a URL starting with a tilde (~), followed by their usernames. So, a user named bacchus accesses his personal web space via the URL:
http://www.example.com/~bacchus/You want to have more than one URL map to the same directory but don't want multiple Alias directives.
You want to have a number of URLs map to the same CGI directory but don't want to have multiple ScriptAlias directives.
You want each user to have their own cgi-bin directory rather than giving them all access to the main server CGI directory.
Put this in your httpd.conf:
You want requests to a particular URL to be redirected to another server.
Use a Redirect directive in httpd.conf, and give an absolute URL on the second argument:
Redirect /example http://www.other.server/new/locationWhereas Alias maps a URL to something in the local filesystem, Redirect maps a URL to another URL, usually on another server. The second argument is a full URL and is sent back to the client (browser), which makes a second request for the new URL.
It is also important to know that the Redirect directive preserves path information, if there is any. Therefore, this recipe redirects a request for http://original.server/example/something.html to http://www.other.server/new/location/something.html.
Redirections come in several different flavors, too; you can specify which particular type of redirect you want to use by inserting the appropriate keyword between the Redirect
You want to redirect a number of URLs to the same place. For example, you want to redirect requests for /fish and /Fishing to http://fish.example.com/.
You want requested URLs to be valid whether uppercase or lowercase letters are used.
Use mod_speling
You want to change all occurrences of string1 to string2 in a request's URI.
You want to pass arguments as part of the URL but have these components of the URL rewritten as CGI QUERY_STRING arguments.
This is just an example, of course; make appropriate changes to the RewriteRule line to fit your own environment and needs:
You want to prevent other web sites from using your images (or other types of documents) in their pages and allow your images to be accessed only if they were referred from your own site.
You want to translate one URI into another based on the value of the query string.
Put this in your httpd.conf:
You want certain parts of your non-SSL web space to be redirected to a secured area.
You can redirect everything that is attached to port 80 with the following RewriteRule:
RewriteCond "%{SERVER_PORT}" "^80$" RewriteRule "^(.*)$" "https://%{SERVER_NAME}" [R,L]You can redirect particular URLs to a secure version:
You want to migrate pathnames under a single hostname to distinct hostnames.
You want all requests made of your system to be redirected to a specific host.
Put this in your httpd.conf:
You want to redirect requests for documents to a CGI script, or other handler, that gets the
In this chapter, security means allowing people to see what you want them to see and preventing them from seeing what you don't want them to see. Additionally, there are the issues of what measures you need to take on your server in order to restrict access via non-Web means. This chapter illustrates the precautions you need to take to protect your server from malicious access and modification of your web site.
The most common questions ask how to protect documents and restrict access. Unfortunately, due to the complexity of the subject and the nature of the web architecture, these questions tend to also have the most complex answers or often no convenient answers at all.
Normal security nomenclature and methodology separate the process of applying access controls into two discrete steps; in the case of the Web, they may be thought of as the server asking itself these questions:
Are you really who you claim to be?
Are you allowed to be here?
These steps are called authentication and authorization , respectively. Here's a real-world example: a flight attendant checks your photo identification (authentication) and your ticket (authorization) before permitting you to board an airplane.
Authentication can be broken down into what might be called weak and strong. Weak authentication is based on the correctness of credentials that the end user supplies (which therefore may have been stolen from the real owner—hence the name "weak"), whereas strong authentication is based on attributes of the request over which the end user has little or no control, and it cannot change from request to request—such as the IP address of his system.
Although checking authentication and authorization are clearly separate activities, their application gets a bit blurred in the context of the Apache web server modules. Even though the main difference between the many security modules is how they store the credentials (in a file, a database, an LDAP directory, etc
You want to be able to provide credentials that will allow visitors into your site only once.
No solution is available with standard Apache features.
You want a user's username and password to expire at a particular time or after some specific interval.
No solution is available with standard Apache features, but a few third-party solutions exist.
Refer to HTTP, Browsers, and Credentials. In order for Apache to provide this functionality, it would need to store more than just the valid username and password; it would also have to maintain information about the credentials' expiration time. No module provided as part of the standard Apache distribution does this.
There are several third-party solutions to this problem, including the Perl module Apache::Htpasswd::Perishable and the mod_perl handler Apache::AuthExpire
With more and more web hosting services allowing customers to upload documents, uploads may become too large. With a little creativity, you can put a limit on uploads by using the security capabilities of the server.
Assume you want to put a limit on uploads of ten thousand (10,000) bytes. Here's how you could do that for your /upload location:
SetEnvIf Content-Length "^[1-9][0-9]{4,}" upload_too_large=1 <Location /upload> Order Deny,Allow Deny from env=upload_too_large ErrorDocument 403 /cgi-bin/remap-403-to-413 </Location>You can tailor the response by making the /cgi-bin/remap-403-to-413 script look something like this:
Other sites are linking to images on your system, stealing bandwidth from you and incidentally making it appear as though the images belong to them. You want to ensure that all access to your images is from documents that are on your server.
Add the following lines to the .htaccess
You want to require both weak and strong authentication for a particular resource. For example, you wish to ensure that the user accesses the site from a particular location and to require that he provides a password.
Use the Satisfy
You wish to create password files for use with Basic HTTP authentication.
Use the htpasswd utility to create your password file, as in Table 6-1.
Table 6-1. Managing password files with htpasswd
Command
Action
% htpasswd -c user.pass waldo
Create a new password file called user.pass with this one new entry for user waldo. Will prompt for password.
% htpasswd user.pass ralph
Add an entry for user ralph in password file user.pass. Will prompt for password.
% htpasswd -b
You need to create a password file to be used for Digest authentication.
Use the following command forms to set up a credential file for a realm to be protected by Digest authentication:
% htdigest -cThere are times when you might want to apply a tight security blanket over portions of your site, such as with something like:
<Directory /usr/local/apache/htdocs/BoD> Satisfy All AuthUserFile /usr/local/apache/access/bod.htpasswd Require valid-user </Directory>Due to Apache's scoping rules, this blanket applies to all documents in that directory and in any subordinate subdirectories underneath it. But suppose you want to make a subdirectory, such as BoD/minutes, available without restriction?
The Satisfy directive is the answer. Add the following to either the .htaccess file in the subdirectory or in an appropriate <Directory> container:
Satisfy Any Order Deny,Allow Allow from allHTTP, Browsers, and Credentials
You want most documents to be restricted, such as requiring a username and password, but want a few to be available to the public. For example, you may want index.html to be publicly accessible, while the rest of the files in the directory require password authentication.
Use the Satisfy Any directive in the appropriate place in your .htaccess or httpd.conf file:
You wish to require user authentication based on system file ownership. That is, you want to require that the user that owns the file matches the username that authenticated.
Use the
You wish to use user and password information in your MySQL database for authenticating users.
For Apache 1.3, use mod_auth_mysql :
Auth_MySQL_Info db_host.example.com db_user my_password Auth_MySQL_General_DB auth_database_name <Directory /www/htdocs/private> AuthName "Protected directory" AuthType Basic require valid-user </Directory>For Apache 2.1 and later, use mod_authn_dbi:
You want to know the name of the user who has authenticated.
Consult the environment variable
You want to get the password that the user authenticated with.
You want to disable a username when there are repeated failed attempts to authenticate using it, as if it is being attacked by a password-cracker.
You want to understand the distinction between the Basic and Digest authentication methods.
Use AuthType Basic and the htpasswd tool to control access using Basic authentication. Use AuthType Digest and the
You know people access your site using URLs with embedded credentials, such as http://user:password@host/, and you want to extract them from the URL for validation or other purposes.
None; this is a nonissue that is often misunderstood.
For nonproxy requests, this doesn't even exist; the browser dissects the URL and turns it into the appropriate request header fields (i.e., WWW-Authenticate). For proxy requests, who knows?
You want to allow your users to upload and otherwise manage their web documents with WebDAV, but without exposing your server to any additional security risks.
Require authentication to use WebDAV:
You want to run WebDAV but don't want to make your document files writable by the Apache server user.
Run two web servers as different users. The DAV-enabled server, for example, might run as User dav, Group dav, while the other server, which is responsible for serving your content, might run as User nobody, Group nobody
You don't want people using your proxy server to access particular URLs or patterns of URLs (such as MP3 or streaming video files).
You can block by keyword:
ProxyBlock .rm .ra .mp3You can block by specific backend URLs:
<Directory proxy:http://other-host.org/path> Order Allow,Deny Deny from all Satisfy All </Directory>Or you can block according to regular expression pattern matching:
<Directory proxy:*> RewriteEngine On # # Disable proxy access to Real movie and audio files # RewriteRule "\.(rm|ra)$" "-" [F,NC] # # Don't allow anyone to access .mil sites through us # RewriteRule "^[a-z]+://[-.a-z0-9]*\.mil($|/)" "-" [F,NC] </Directory>You have files to which you want to limit access using some method other than standard web authentication (such as a members-only area).
In httpd.conf, add the following lines to a <Directory> container whose contents should be accessed only through a script:
RewriteEngine On RewriteRule "\.(dll|zip|exe)$" protect.php [NC] RewriteCond %{REMOTE_ADDR} "!^my.servers.ip" RewriteRule "\.cgi$" protect.php [NC]And an example protect.php that just displays the local URI of the document that was requested:
You want to deny all web access to files in a directory, except for those with a particular extension (i.e., a directory with HTML files in it, where you don't want other files to be accessible).
Scripts running on your web server may access, modify, or destroy files located on your web server if they are not adequately protected. You want to ensure that this cannot happen.
Ensure that none of your files are writable by the nobody user or the nobody group, and that sensitive files are not readable by that user and group:
You want to set file permissions to provide the maximum level of security.
The bin directory under the ServerRoot should be owned by user root, group root, and have file permissions of 755 (rwxr-xr-x). Files contained therein should also be owned by root.root and be mode 755.
Document directories, such as htdocs, cgi-bin, and icons, will have to have permissions set in a way that makes the most sense for the development model of your particular web site, but under no circumstances should any of these directories or files contained in them be writable by the web server user.
The solution provided here is specific to Unixish systems. Users of other operating systems should adhere to the principles laid out here, although the actual implementation will vary.
The conf directory should be readable and writable only by root, as should all the files contained therein.
The include and libexec directories should be readable by everyone, writable by no one.
The logs directory should be owned and writable by root. You may, if you like, permit other users to read files in this directory, as it is often useful for users to be able to access their logfiles, particularly for troubleshooting purposes.
The man directory should be readable by all users.
Finally, the proxy
You want to eliminate all modules that you don't need in order to reduce the potential exposure to security holes. What modules do you really need?
For Apache 1.3, you can run a bare-bones server with just three modules. (Actually, you can get away with not running any modules at all, but it is not recommended.)
% ./configure --disable-module=all --enable-module=dir \ > --enable-module=mime --enable-module=log_config \For Apache 2.0, this is slightly more complicated, as you must individually disable modules you don't want:
% ./configure --disable-access \ > --disable-auth --disable-charset-lite \ > --disable-include --disable-log-config --disable-env --disable-setenvif \ > --disable-mime --disable-status --disable-autoindex --disable-asis \ > --disable-cgid --disable-cgi --disable-negotiation --disable-dir \ > --disable-imap --disable-actions --disable-alias --disable-userdirNote that with 2.0, as with 1.3, you may wish to enable mod_dir, mod_mime, and mod_log_config, by simply leaving them off of this listing.
You want to make sure that files outside of your web directory are not accessible.
For Unixish systems:
<Directory /> Order deny,allow Deny from all AllowOverride None Options None </Directory>For Windows systems:
You want to allow some users to use certain methods but prevent their use by others. For instance, you might want users in group A to be able to use both GET and POST but allow everyone else to use only GET.
Apply user authentication per method using the Limit
You want to prevent clients from requesting partial downloads of documents within a particular scope, forcing them to request the entire document instead.
You can overload ErrorDocument 403 to make it handle range requests. To do this, put the following into the appropriate <Directory> container in your httpd.conf file or in the directory's .htaccess file:
SetEnvIf "Range" "." partial_requests Order Allow,Deny Allow from all Deny from env=partial_requests ErrorDocument 403 /forbidden.cgiThen put the following into a file named forbidden.cgi in your server's DocumentRoot:
Secure Socket Layers (SSL) is the standard way to implement secure web sites. By encrypting the traffic between the server and the client, which is what SSL does, that content is protected from a third party listening to the traffic going past.
The exact mechanism by which this encryption is accomplished is discussed extensively in the SSL specification, which you can read at http://wp.netscape.com/eng/ssl3/. For a more user-friendly discussion of SSL, we recommend looking through the mod_ssl manual, which you can find at http://www.modssl.org/docs/2.8/index.html. This document discusses not only the specific details of setting up mod_ssl, but also covers the general theory behind SSL it and has pictures illustrating the concepts.
In this chapter, we talk about some of the common things that you might want to do with your secure server, including how to install it.
You want to install SSL on your Apache server.
The solutions to this problem fall into several categories, depending on how you installed Apache in the first place (or whether you are willing to rebuild Apache to get SSL).
You want to generate certificates to use on your SSL server.
Use the openssl command-line program that comes with OpenSSL:
% openssl genrsa -out hostname.key 1024 % openssl req -new -key hostname.key -out hostname.csrAt this point, you can either send your Certificate Signing Request (CSR) off to one of the certificate authority companies, such as Thawte or Entrust, for them to sign, or, if you prefer, you can sign the key yourself:
% openssl x509 -req -days 365 -in hostname.csr -signkey hostname.key -out hostname.crtThen move these files to your Apache server's configuration directory, such as /www/conf/, and then add the following lines in your httpd.conf configuration file:
SSLCertificateFile /www/conf/hostname.crt SSLCertificateKeyFile /www/conf/hostname.keyThe SSL certificate is a central part of the SSL conversation and is required before you can run a secure server. Thus, generating the certificate is a necessary first step to configuring your secure server.
Generating the key is a multistep process, but it is fairly simple.
In the first step, we generate the private key. SSL is a private/public key encryption system, with the private key residing on the server and the public key going out with each connection to the server and encrypting data sent back to the server.
The first argument passed to the
You want to generate SSL keys that browsers will accept without a warning message.
Issue the following commands:
% CA.pl -newca % CA.pl -newreq % CA.pl -signreq % CA.pl -pkcs12You want to have a certain portion of your site available via SSL exclusively.
This is done by making changes to your httpd.conf file.
For Apache 1.3, add a line such as the following:
Redirect /secure/ https://secure.domain.com/secure/For Apache 2.0:
<Directory /www/secure> SSLRequireSSL </Directory>Or, with mod_rewrite:
RewriteEngine On RewriteCond %{HTTPS} !=on RewriteRule ^/(.*) https://%{SERVER_NAME}/ [R,L]It is perhaps best to think of your site's normal pages and its SSL-protected pages as being handled by two separate servers, rather than one. While they may point to the same content, they run on different ports, are configured differently, and, most importantly, the browser considers them to be completely separate servers. So you should too.
Don't think of enabling SSL for a particular directory; rather, you should think of it as redirecting requests for one directory to another.
You want to use client certificates to authenticate access to your site.
CGI programs are one of the simplest ways to provide dynamic content for your web site. They tend to be easy to write, because you can write them in any language. Thus, you don't have to learn a new language to write CGI programs.
Other dynamic content providers, such as PHP and mod_perl, also enjoy a great deal of popularity, because they provide many of the same functions as CGI programs but typically execute faster.
Very few web sites can survive without some mechanism for providing dynamic content—content that is generated in response to the needs of the user. The recipes in this chapter guide you through enabling various mechanisms for producing this dynamic content and help you troubleshoot possible problems that may occur.
You want to put a CGI program in a directory that contains non-CGI documents.
Use AddHandler to map the CGI handler to the particular files that you want to be executed:
<Directory "/foo"> Options +ExecCGI AddHandler cgi-script .cgi .py .pl </Directory>You want to have CGI programs on Windows executed by the program associated with the file extension. For example, you want .pl files to be executed by perl.exe without having to change the #! line to point at the right location.
Add the following line to your httpd.conf file:
ScriptInterpreterSource registryYou want Apache to know that all files with a particular extension should be treated as CGI scripts.
Add the following to your httpd.conf
You want to test that you have CGI enabled correctly. Alternatively, you are receiving an error message when you try to run your CGI script and you want to ensure the problem doesn't lie in the web server before you try to find a problem in the script.
And then, if things are still not working, look in the error log.
Because Perl is likely to be installed on any Unixish system, this CGI program should be a pretty safe way to test that CGI is configured correctly. In the event that you do not have Perl installed, an equivalent shell program may be substituted:
#! /bin/sh echo Content-type: text/plain echo echo It\'s working.And, if you are running Apache on Windows, so that neither of the above options works for you, you could also try this with a batch file:
echo off echo Content-type: text/plain echo. echo It's working.Make sure that you copy the program code exactly, with all the right punctuation, slashes, etc., so that you don't introduce additional complexity by having to troubleshoot the program itself.
You want your CGI program to read values from forms for use in your program.
First, look at an example in Perl, which uses the popular CGI.pm module:
#!/usr/bin/perl use CGI; use strict; use warnings; my $query = CGI->new; # Load the various form parameters my $name = $form->param("name"); # Multi-value select lists will return a list my @foods = $form->param("favorite_foods"); # Output useful stuff print "Content-type: text/html\n\n"; print "Name: " . $form->{name} . "n"; print "Favorite foods: <ul>"; foreach my $food (@foods) { print "<li>$food</li>"; } print "</ul>\n";Next, look at the same program in C, which uses the cgic C library:
#include "cgic.h" /* Boutell.com's cgic library */ int cgiMain( ) { char name[100]; /* Send content type */ cgiHeaderContentType("text/html"); /* Load a particular variable */ cgiFormStringNoNewlines("name", name, 100); fprintf(cgiOut, "Name: "); cgiHtmlEscape(name); return 0; }For this example, you will also need a Makefile, which looks something like this:
You want to invoke a CGI program to act as a sort of content filter for certain document types. For example, a photographer may wish to create a custom handler to add a watermark to photographs served from his web site.
Use the Action directive to create a custom handler, which will be implemented by a CGI program. Then use the AddHandler
You want to enable Server-Side Includes ( SSIs) to make your HTML documents more dynamic.
There are at least two different ways of doing this.
Specify which files are to be parsed by using a filename extension such as .shtml . For Apache 1.3, add the following directives to your httpd.conf in the appropriate scope:
<Directory /www/html/example> Options +Includes AddHandler server-parsed .shtml AddType "text/html; charset=ISO-8859-1" .shtml </Directory>Or, for Apache 2.0:
<Directory /www/html/example> Options +Includes AddType text/html .shtml AddFilter INCLUDES .shtml </Directory>Add the XBitHack directive to the appropriate scope in your httpd.conf file and allow the file permissions to indicate which files are to be parsed for SSI directives:
XBitHack OnYou want your web page to indicate when it was last modified but not have to update the date every time.
You want to include a header (or footer) in each of your HTML documents.
Use SSI by inserting a line in all your parsed files:
<--#include virtual="/include/headers.html" -->You want to have the output of a CGI program appear within the body of an existing HTML document.
Use SSIs by adding a line such as the following to the document (which must be enabled for SSI parsing):
You want to have CGI programs executed by some user other than nobody. For example, you may have a database that is not accessible to anyone except a particular user, so the server needs to temporarily assume that user's identity to access it.
When building Apache, enable suexec by passing the —enable-suexec argument to configure.
Then, in a virtual host section, specify which user and group you'd like to use to run CGI programs:
User rbowen Group usersAlso, suexec will be invoked for any CGI programs run out of username-type URLs for the affected virtual host.
The suexec wrapper is a suid (runs as the user ID of the user that owns the file) program that allows you to run CGI programs as any user you specify, rather than as the nobody user which Apache runs as. suexec is a standard part of Apache and is enabled by default.
The suexec concept does not fit well into the Windows environment, and so suexec is not available under Windows.
When suexec
You want to install one of the many mod_perl handler modules available on CPAN. For example, you want to install the Apache::Perldoc module, which generates HTML documentation for any Perl module you happen to have installed.
Assuming you already have mod_perl installed, you'll just need to install the module from CPAN, and then add a few lines to your Apache configuration file.
To install the module, run the following command from the shell as root:
#You want to write your own mod_perl handler.
Here's a simple handler:
package Apache::Cookbook::Example; sub handler { my $r = shift; $r->send_http_header( 'text/plain' ); $r->print( "Hello, World." ); } 1;Place this code in a file called Example.pm, in a directory Apache/Cookbook/, somewhere that Perl knows to look for it.
You want to enable PHP scripts on your server.
If you have mod_php installed, use AddHandler to map .php and .phtml files to the PHP handler:
AddHandler application/x-httpd-php .phtml .phpThis recipe maps all files with .phtml or .php to the PHP handler. You must ensure that the mod_php module is installed.
Installation instructions on the mod_php web site at http://www.php.net/manual/en/install.apache.php for Apache 1.3 or http://www.php.net/manual/en/install.apache2.php for Apache 2.0
You want to verify that you have PHP correctly installed and configured.
Put the following in your test PHP file:
<?php phpinfo( ); ?>Place the above text in a file called something.php in a directory where you believe you have enabled PHP script execution. Accessing that file should give you a list of all configured PHP system variables. The first screen of the output should look something like Figure 8-2.
Figure 8-2. Sample phpinfo( ) output
When you're running a web site, things go wrong. And when they do, it's important that they are handled gracefully, so that the user experience is not too greatly diminished. In this chapter, you'll learn how to handle error conditions, return useful messages to the user, and capture information that will help you fix the problem so that it does not happen again.
You have multiple virtual hosts in your configuration, and at least one of them is name-based. For name-based virtual hosts to work properly, the client must send a valid Host
There may be times when you want to change the status for a response—for example, you want 404 Not Found errors to be sent back to the client as 403 Forbidden instead.
Point your ErrorDocument
You want to display a customized error message, rather than the default Apache error page.
On a multilingual (content negotiated) web site, you want your error documents to be content negotiated as well.
The Apache 2.0 default configuration file contains a configuration section, initially commented out, that allows you to provide error documents in multiple languages customized to the look of your web site, with very little additional work.
Uncomment those lines. You can identify the lines by looking for the following comment in your default configuration file:
# The internationalized error documents require mod_alias, mod_include # and mod_negotiation. To activate them, uncomment the following 30 lines.You want all "not found" pages to go to some other page instead, such as the front page of the site, so that there is no loss of continuity on bad URLs.
Use the ErrorDocument to catch 404 (Not Found) errors:
ErrorDocument 404 /index.htmlYou have an ErrorDocument correctly configured, but IE is displaying its own error page, rather than yours.
You want to receive email notification when there's an error condition on your server.
Point the ErrorDocument directive to a CGI program that sends mail, rather than to a static document:
ErrorDocument 404 /cgi-bin/404.cgi404.cgi looks like the following:
Proxy means to act on behalf of another. In the context of a web server, this means one server fetching content from another server, then returning it to the client. For example, you may have several web servers that hide behind a proxy server. The proxy server is responsible for having requests end up going to the right backend server.
mod_proxy , which comes with Apache, handles proxying behavior. The recipes in this chapter cover various techniques that can be used to take advantage of this capability. We discuss securing your proxy server, caching content proxied through your server, and ways to use mod_proxy to map requests to services running on alternate ports.
Additional information about mod_proxy can be found at http://httpd.apache.org/docs/mod/mod_proxy.html for Apache 1.3, or http://httpd.apache.org/docs-2.0/mod/mod_proxy.html for Apache 2.0.
If your Apache server is set up to operate as a proxy, it is possible for it to be used as a mail relay unless precautions are taken. This means that your system may be functioning as an "open relay" even though your mail server software is actually securely configured.
You want requests for particular URLs to be transparently forwarded to another server.
Use ProxyPass and ProxyPassReverse directives in your
You want to use your proxy server as a content filter, forbidding requests to certain places.
You want to run a second HTTPD server for dynamically generated content and have Apache transparently map requests for this content to the other server.
You want to run a caching proxy server.
Configure your server to proxy requests, and provide a location for the cached files to be placed:
ProxyRequests on CacheRoot /var/spool/httpd/proxyYou want to apply some filter to proxied content, such as altering certain words.
In Apache 2.0 and later, you can use mod_ext_filter to create output filters to apply to content before it is sent to the user:
You wish to proxy content from a server, but it requires a login and password before content may be served from this proxied site.
Use standard authentication techniques to require logins for proxied content:
Your web site can probably be made to run faster, if you are willing to make a few tradeoffs, and spend a little time benchmarking your site to see what is really slowing it down.
There are a number of things that you can configure differently to get a performance boost. Although, there are other things to which you may have to make more substantial changes. It all depends on what you can afford to give up and what you are willing to trade off. For example, in many cases, you may need to trade performance for security, or vice versa.
In this chapter, we make some recommendations of things that you can change, and we warn against things that can cause substantial slow-downs. Be aware that web sites are very individual, and what may speed up one web site may not necessarily speed up another web site.
Topics covered include hardware considerations, configuration file changes, and dynamic content generation, which can all be factors in getting every ounce of performance out of your web site.
You want to benchmark changes that you are making to verify that they are in fact making a difference in performance.
Use ab (Apache bench), which you will find in the bin directory of your Apache installation:
ab -n 1000 -c 10 http://www.example.com/test.htmlab
You want to tune the keepalive-related directives to the best possible setting for your web site.
Turn on the KeepAlive setting, and set the related directives to sensible values:
KeepAlive On MaxKeepAliveRequests 0 KeepAliveTimeout 15The default behavior of HTTP is for each document to be requested over a new connection. This causes a lot of time to be spent opening and closing connections. KeepAlive allows multiple requests to be made over a single connection, thus reducing the time spent establishing socket connections. This, in turn, speeds up the load time for clients requesting content from your site.
You want to find out exactly what your server is doing.
Enable the server-status handler to get a snapshot of what child processes are running and what each one is doing. Enable ExtendedStatus to get even more detail:
<Location /server-status> SetHandler server-status </Location> ExtendedStatus OnYou want to avoid situations where you have to do DNS lookups of client addresses, as this is a very slow process.
Always set the HostNameLookups directive to Off:
HostNameLookups OffAnd make sure that, whenever possible, Allow from and/or Deny from directives use the IP address, rather than the hostname of the hosts in question.
You wish to balance the security needs associated with symbolic links with the performance impact of a solution, such as using Options SymLinksIfOwnerMatch, which causes a server slowdown.
For tightest security, use Options SymlinksIfOwnerMatch, or Options -FollowSymLinks if you seldom or never use symlinks.
You want per-directory configuration but want to avoid the performance hit of .htaccess files.
Turn on AllowOverride only in directories where it is required, and tell Apache not to waste time looking for .htaccess files elsewhere:
AllowOverride NoneThen use <Directory> sections to selectively enable .htaccess files only where needed.
.htaccess files cause a substantial reduction in Apache's performance, because it must check for a .htaccess in every directory along the path to the requested file to be assured of getting all of the relevant configuration overrides. This is necessary because Apache configuration directives apply not only to the directory in which they are set, but also to all subdirectories. Thus, we must check for .htaccess
Content negotiation causes a big reduction in performance.
Disable content negotiation where it is not needed. If you do require content negotiation, use the type-map handler, rather than the MultiViews option:
Options -MultiViews AddHandler type-map varIf at all possible, disable content negotiation. However, if you must do content negotiation—if, for example, you have a multilingual web site—you should use the type-map handler, rather than the MultiViews method.
When MultiViews is used, Apache needs to get a directory listing each time a request is made. The resource requested is compared to the directory listing to see what variants of that resource might exist. For example, if index.html is requested, the variants index.html.en and index.html.fr might exist to satisfy that request. Each matching variant is compared with the user's preferences, expressed in the various Accept headers passed by the client. This information allows Apache to determine which resource is best suited to the user's needs.
You're using Apache 1.3, or Apache 2.0 with the prefork MPM, and you want to tune MinSpareServers and MaxSpareServers to the best settings for your web site.
Will vary from one site to another. You'll need to watch traffic on your site and decide accordingly.
The MinSpareServers
You're using Apache 2.0 with one of the threaded MPMs, and you want to optimize the settings for the number of threads.
Will vary from server to server.
The various threaded MPMs on Apache 2.0 handle thread creation somewhat differently. In Apache 1.3, the Windows and Netware versions are threaded, while the Unixish version is not. Tuning the thread creation values will vary from one of these versions to another.
On MPMs that run Apache with a single threaded child process, such as the Windows MPM (mpm_winnt), and the Windows and Netware versions of Apache 1.3, there are a fixed number of threads in the child process. This number is controlled by the ThreadsPerChild directive and must be large enough to handle the peak traffic of the site on any given day. There really is no performance tuning that can be done here, as this number is fixed throughout the lifetime of the Apache process.
You want to cache files that are viewed frequently, such as your site's front page, so that they don't have to be loaded from the filesystem every time.
Use mod_mmap_static or mod_file_cache (for Apache 1.3 and 2.0, respectively) to cache these files in memory:
MMapFile /www/htdocs/index.html MMapFile /www/htdocs/other_page.htmlFor Apache 2.0, you can use either module or the CacheFile directive. MMapFile caches the file contents in memory, while CacheFile
You want to have a certain subset of your web site served from another machine, in order to share the load of the site.
Use ProxyPass and ProxyPassReverse to have Apache fetch the content from another server:
You want to serve the same content from several servers and have hits distributed evenly among the servers
Use DNS round-robin to have requests distributed evenly, or at least fairly evenly, among the servers:
You want to provide a directory listing but want to reduce the performance hit of doing so.
Use the TrackModified argument to IndexOptions
You have existing functional Perl CGI programs and want them to run faster.
If you have the mod_perl module installed, you can configure it to run your Perl CGI programs, instead of running mod_cgi. This gives you a big performance boost, without having to modify your CGI code.
There are two slightly different ways to do this.
For Apache 1.3 and mod_perl Version 1:
Alias /cgi-perl/ /usr/local/apache/cgi-bin/ <Location /cgi-perl> Options ExecCGI SetHandler perl-script PerlHandler Apache::PerlRun PerlSendHeader On </Location> Alias /perl/ /usr/local/apache/cgi-bin/ <Location /perl> Options ExecCGI SetHandler perl-script PerlHandler Apache::Registry PerlSendHeader On </Location>With its hundreds of configuration directives, and dozens upon dozens of modules providing additional functionality, the Apache web server can be terrifically complex. So too can the questions about how to use it. We have collected many of the most common questions we have seen and categorized them, putting related topics into their own chapters when there were enough of them.
However, some of the things that come up don't fall readily into one of the categories we have chosen, or perhaps are more fundamental and we've collected them into this catch-all chapter of "things that don't belong anywhere else."
You know what directive you need but aren't sure where to put it.
If you wish the scope of the directive to be global (i.e., you want it to affect all requests to the web server), then it should be put in the main body of the configuration file or it should be put in the section starting with the line <Directory /> and ending with </Directory>.
If you wish the directive to affect only a particular directory, it should be put in a <Directory> section that specifies that directory. Be aware that directives specified in this manner also affect subdirectories of the stated directory.
Likewise, if you wish the directive to affect a particular virtual host or a particular set of URLs, then the directive should be put in a <VirtualHost> section, <Location> section, or perhaps a <Files> section, referring to the particular scope in which you want the directive to apply.
In short, the answer to "Where should I put it?" is "Where do you want it to be in effect?"
You want to change the default name of per-directory configuration files to something else, such as on a Windows system, because filenames beginning with a dot can cause problems.
You want to see a directory listing when a directory is requested.
Turn on Options Indexes for the directory in question:
<Directory /www/htdocs/images> Options +Indexes </Directory>When a URL maps to a directory or folder in the filesystem, Apache will respond to the request in one of three ways:
If mod_dir is part of the server configuration, and the mapped directory is within the scope of a DirectoryIndex directive, and the server can find one of the files identified in that directive, then the file will be used to generate the response.
If
Loading a particular URL works with a trailing slash but does not work without it.
Make sure that ServerName is set correctly and that none of the Alias directives have a trailing slash.
The "trailing slash" problem can be caused by one of two configuration problems: an incorrect or missing value of ServerName, or an Alias with a trailing slash that doesn't work without it.
You want to set Content-Type headers differently for different browsers, which may render the content incorrectly otherwise.
You want to treat differently all requests that are made without a Host: request header field.
You want to have some file other than index.html appear by default.
You want to define a default favorite icon, or "favicon," for your site, but allow individual sites or users to override it.
Put your default favicon.ico file into the /icons/
A number of the Apache web server's configuration directives permit (or require!) the use of what are called regular expressions . Regular expressions are used to determine if a string, such as a URL or a user's name, matches a pattern.
There are numerous resources that cover regular expressions in excruciating detail, so this appendix is not designed to be a tutorial for their use. Instead, it documents the specific features of regular expressions used by Apache—what's available and what isn't. Even though there are quite a number of regular expression packages, with differing feature sets, there are some commonalities among them. The Perl language, for instance, has a particularly rich set of regular expressions but only a small subset of them are available in the Apache regex library, which is different from Perl's.
Regular expressions, as mentioned, are a language that allows you to determine if a particular string or variable looks like some pattern. For example, you may wish to determine if a particular string is all uppercase, or if it contains at least 3 numbers, or perhaps if it contains the word "monkey" or "Monkey." Regular expressions provide a vocabulary for talking about these sort of tests. Most modern programming languages contain some variety of regular expression library, and they tend to have a large number of things in common, although they may differ in small details.
Apache 1.3 uses a regular expression library called hsregex , so called because it was developed by Henry Spencer. Note that this is the same regular expression library used in egrep, which is the same thing as grep on many Unixish platforms.
Apache 2.0 uses a somewhat more full-featured regular expression library called Perl Compatible Regular Expressions (PCRE), so called because it implements many of the features available in the regular expression engine that comes with the Perl programming language. While this appendix does not attempt to communicate all the differences between these two implementations, you should know that hsregex is a subset of PCRE, as far as functionality goes, so everything you can do with regular expressions in Apache 1.3, you can do in 2.0, but not necessarily the other way around.
To grossly simplify, regular expressions implement two kinds of characters. Some characters mean exactly what they say (for example, a G appearing in a regular expression will usually mean the literal character G), while some characters have special significance (for example, the period (.) will match any character at all—a wildcard character). Regular expressions can be composed of these characters to represent (almost) any desired pattern appearing in a string.
Two main categories of Apache directives use regular expressions. Any directive with a name containing the word Match, such as FilesMatch, can be assumed to use regular expressions in its arguments. And directives supplied by the module mod_rewrite use regular expressions to accomplish their work.
For more about mod_rewrite, see Chapter 5.
Something Match directives each implement the same functionality as their counterpart without the Match. For example, the RedirectMatch directive does essentially the same thing as the Redirect directive, except that the first argument, rather than being a literal string, is a regular expression, which will be compared to the incoming request URL.
To get started in writing your own regular expressions, you'll need to know a few basic pieces of vocabulary, such as shown in Table A-1 and Table A-2. These constitute the bare minimum that you need to know. Although this will hardly qualify you as an expert, it will enable you to solve many of the regex scenarios you will find yourself faced with.
Table A-1. A basic regex vocabulary
Character
Meaning
.
Matches any character. This is the wildcard character.
+
The Apache web server is a very complex beast. In the vanilla package it includes over 30 functional modules and more than 12 dozen configuration directives. This means that there are significant opportunities for interactions that produce unexpected or undesirable results. This appendix covers some of the more common issues that cause problems, as culled from various support forums.
When you're working with CGI scripts, certain messages can quickly become extremely familiar and tiresome; typically the output in the browser window will be either a blank page or an Internal Server Error page.
This message has several different possible causes. These include, but are not necessarily limited to:
Windows has its own distinct set of problem areas that don't apply to Unixish environments.
When trying to start Apache from a DOS window, you receive a message like " Cannot determine hostname. Use ServerName directive to set it manually."
If you don't explicitly supply Apache with a name for your system, it tries to figure it out. This message is the result of that process failing.
The cure for this is really quite simple: edit your conf\httpd.conf file, look for the string ServerName, and make sure there's an uncommented directive such as:
ServerName localhostor:
ServerName www.foo.comin the file. Correct it if there is one there with wrong information, or add one if you don't already have one.
Also, make sure that your Windows system has DNS enabled. See the TCP/IP setup component of the Networking or Internet Options control panel.
After verifying that DNS is enabled and that you have a valid hostname in your ServerName directive, try to start the server again.
If you have installed BIND-8, then this is normally due to a conflict between your include files and your libraries. BIND-8 installs its include files and libraries in /usr/local/include/ and /usr/local/lib/, while the resolver that comes with your system is probably installed in /usr/include/ and /usr/lib/.
If your system uses the header files in /usr/local/include/ before those in
The solution is to make sure that Options
If your RewriteRule directives keep resulting in 404 Not Found error pages, add the PT (PassThrough) flag to the RewriteRule line. Without this flag, Apache won't process a lot of other factors that might apply, such as
Make sure that AllowOverride is set to an appropriate value. Then, to make sure that the .htaccess file is being parsed at all, put the following line in the file and ensure that it causes a server error page to show up in your browser:
Garbage Goes HereIf, when attempting to start your Apache server, you get the following error message:
[Thu May 15 01:23:40 2003] [crit] (98)Address already in use: make_sock: could not bind to port 80One of three things is happening:
Our look is the result of reader comments, our own experimentation, and feedback from distribution channels. Distinctive covers complement our distinctive approach to technical topics, breathing personality and life into potentially dry subjects.
The animal on the cover of Apache Cookbook is a moose. The moose roams the forests of North America, Europe, and Russia. It's the largest of the deer family, and the largest moose of all, Alces alces gigas