Tag Archive: apache


improving php performance on apache

Apache is available on both Unix and Windows. It is the most popular web server in the world. Apache 1.3 uses a pre-forking model for web serving. When Apache starts up, it creates multiple child processes that handle HTTP requests. The initial parent process acts like a guardian angel, making sure that all the child processes are working properly and coordinating everything. As more HTTP requests come in, more child processes are spawned to process them. As the HTTP requests slow down, the parent will kill the idle child processes, freeing up resources for other processes. The beauty of this scheme is that it makes Apache extremely robust. Even if a child process crashes, the parent and the other child processes are insulated from the crashing child.
The pre-forking model is not as fast as some other possible designs, but to me that it is “much ado about nothing” on a server serving PHP scripts because other bottlenecks will kick in long before Apache performance issues become significant. The robustness and reliability of Apache is more important.

Apache 2.0 offers operation in multi-threaded mode. My benchmarks indicate there is little performance advantage in this mode. Also be warned that many PHP extensions are not compatible (e.g. GD and IMAP). Tested with Apache 2.0.47.
Apache is configured using the httpd.conf file. The following parameters are particularly important in configuring child processes:

MaxClients : default: 256
The maximum number of child processes to create. The default means that up to 256 HTTP requests can be handled concurrently. Any further connection requests are queued.

StartServers: default: 5
The number of child processes to create on startup.

MinSpareServers: default:5
The number of idle child processes that should be created. If the number of idle child processes falls to less than this number, 1 child is created initially, then 2 after another second, then 4 after another second, and so forth till 32 children are created per second.

MaxSpareServers: default:10
If more than this number of child processes are alive, then these extra processes will be terminated.

MaxRequestsPerChild: default: 0
Sets the number of HTTP requests a child can handle before terminating. Setting to 0 means never terminate. Set this to a value to between 100 to 10000 if you suspect memory leaks are occurring, or to free under-utilized resources

For large sites, values close to the following might be better:

MinSpareServers 32
MaxSpareServers 64

Apache on Windows behaves differently. Instead of using child processes, Apache uses threads. The above parameters are not used. Instead we have one parameter: ThreadsPerChild which defaults to 50. This parameter sets the number of threads that can be spawned by Apache. As there is only one child process in the Windows version, the default setting of 50 means only 50 concurrent HTTP requests can be handled. For web servers experiencing higher traffic, increase this value to between 256 to 1024.

Other useful performance parameters you can change include:

SendBufferSize: Set to OS default
Determines the size of the output buffer (in bytes) used in TCP/IP connections. This is primarily useful for congested or slow networks when packets need to be buffered; you then set this parameter close to the size of the largest file normally downloaded. One TCP/IP buffer will be created per client connection.

KeepAlive [onoff] default:On
In the original HTTP specification, every HTTP request had to establish a separate connection to the server. To reduce the overhead of frequent connects, the keep-alive header was developed. Keep-alives tells the server to reuse the same socket connection for multiple HTTP requests.

If a separate dedicated web server serves all images, you can disable this option. This technique can substantially improve resource utilization.

KeepAliveTimeout:default:15
The number of seconds to keep the socket connection alive. This time includes the generation of content by the server and acknowledgements by the client. If the client does not respond in time, it must make a new connection.

This value should be kept low as the socket will be idle for extended periods otherwise.

MaxKeepAliveRequests: default:100
Socket connections will be terminated when the number of requests set by MaxKeepAliveRequests is reached. Keep this to a high value below MaxClients or ThreadsPerChild.

TimeOut: default:300
Disconnect when idle time exceeds this value. You can set this value lower if your clients have low latencies.

LimitRequestBody: default:0
Maximum size of a PUT or POST. O means there is no limit.

If you do not require DNS lookups and you are not using the htaccess file to configure Apache settings for individual directories you can set:

# disable DNS lookups: PHP scripts only get the IP address
HostnameLookups off

# disable htaccess checks

<Directory />

AllowOverride none

</Directory>

If you are not worried about the directory security when accessing symbolic links, turn on FollowSymLinks and turn off SymLinksIfOwnerMatch to prevent additional lstat() system calls from being made:

Options FollowSymLinks

#Options SymLinksIfOwnerMatch

Fronting Tomcat with Apache or IIS

Summary

Running cluster of Tomcat servers behind the Web server can be demanding
task if you wish to archive maximum performance and stability.
This article describes best practices how to accomplish that.

By Mladen Turk

Fronting Tomcat

One might ask a question Why to put the Web server in front of Tomcat
at all? Thanks to the latest advances in Java Virtual Machines (JVM)
technology and the Tomcat core itself, the Tomcat standalone is quite
comparable with performance to the native web servers.
Even when delivering static content it is only 10%
slower than recent Apache 2 web servers.

The answer is: scalability.

Tomcat can serve many concurrent users by assigning a separate thread of
execution to each concurrent client connection. It can do that nicely but
there is a problem when the number of those concurrent connections rise.
The time the Operating System will spend on managing those threads will degrade
the overall performance. JVM will spend more time managing and switching those
threads then doing a real job, serving the requests.

Besides the connectivity there is one more significant problem, and it caused
by the applications running on the Tomcat. A typical application will process
client data, access the database, do some calculations and present the data
back to the client. All that can be a time consuming job that in most cases
must be finished inside half a second, to achieve user perception of a working
application. Simple math will show that for a 10ms application response time you
will be able to serve at most 50 concurrent users, before your users start
complaining. So what to do if you need to support more users?
The simplest thing is to buy a faster hardware, add more CPU or add more boxes.
A two 2-way boxes are usually cheaper then a 4-way one, so adding more boxes
is generally a cheaper solution then buying a mainframe.

First thing to ease the load from the Tomcat is to use the Web server
for serving static content like images, etc..

Figure 1.
Figure 1. Generic configuration

Figure 1. shows the simplest possible configuration scenario. Here the
Web server is used to deliver static context while Tomcat only does the
real job – serving application. In most cases this is all that you will need.
With 4-way box and 10ms application time you’ll be capable of serving 200
concurrent users, thus giving 3.5 million hits per day, that is by all
means a respectable number.

For that kind of load you generally do not need the Web server in front of
Tomcat. But here comes the second reason why to put the Web server in front, and
that is creating an DMZ (demilitarized zone). Putting Web server on a
computer host inserted as a “neutral zone” between a company’s private network
and the internet or some other outside public network gives the applications
hosted on Tomcat capability to access company private data, while securing
the access to other private resources.

Figure 2.
Figure 2. Secure generic configuration

Beside having DMZ and secure access to a private network there can
be many other factors like the need for the custom authentication for example.

If you need to handle more load you will eventually have to add more Tomcat
application servers. The reason for that can be either caused by the fact
that your client load just can not be handled by a single box or that you
need some sort of failover in case one of the nodes breaks.

Figure 3.
Figure 3. Load balancing configuration

Configuration containing multiple Tomcat application servers needs a load balancer
between web server and Tomcat. For Apache 1.3, Apache 2.0 and IIS Web servers
you can use Jakarta Tomcat Connector (also known as JK), because it offers
both software load balancing and sticky sessions. For the upcoming Apache 2.1/2.2
use the advanced mod_proxy_balancer that is a new module designed and integrated
within the Apache httpd core.


Calculating Load

When determining the number of Tomcat servers that you will need to satisfy
the client load, the first and major task is determining the Average Application
Response Time (hereafter AART). As said before, to satisfy the user experience
the application has to respond within half of second. The content received by the client
browser usually triggers couple of physical requests to the Web server (e.g. images). The
web page usually consists of html and image data, so client issues a series
of requests, and the time that all this gets processed and delivered is
called AART. To get most out of Tomcat you should limit the number of concurrent
requests to 200 per CPU.

So we can come with the simple formula to calculate the maximum
number of concurrent connections a physical box can handle:

                              500
    Concurrent requests = ( ---------- max 200 ) * Number of CPU's
                            AART (ms)

The other thing that you must care is the Network throughput between the
Web server and Tomcat instances. This introduces a new variable called
Average Application Response Size (hereafter AARS), that is the number of
bytes of all context on a web page presented to the user. On a standard
100Mbps network card with 8 Bits per Byte, the maximum theoretical
throughput is 12.5 MBytes.

                               12500
    Concurrent requests = ---------------
                            AARS (KBytes)

For a 20KB AARS this will give a theoretical maximum of 625 concurrent
requests. You can add more cards or use faster 1Gbps hardware if need
to handle more load.

The formulas above will give you rudimentary estimation of the number of
Tomcat boxes and CPU’s that you will need to handle the desired
number of concurrent client requests.
If you have to deploy the configuration without
having actual hardware, the closest you can get is to measure the AART on
a test platform and then compare the hardware vendor Specmarks.


Fronting Tomcat with Apache

If you need to put the Apache in front of Tomcat use the Apache2 with
worker MPM. You can use Apache1.3 or Apache2 with prefork MPM for handling
simple configurations like shown on the Figure 1. If you need to front
several Tomcat boxes and implement load balancing use Apache2 and worker
MPM compiled in.

MPM or Multi-Processing Module is Apache2 core feature and it is responsible
for binding to network ports on the machine, accepting requests,
and dispatching children to handle the requests.
MPMs must be chosen during configuration, and compiled into the server.
Compilers are capable of optimizing a lot of functions if threads are used,
but only if they know that threads are being used. Because some MPMs use threads
on Unix and others don’t, Apache will always perform better if the MPM is
chosen at configuration time and built into Apache.

Worker MPM offers a higher scalability compared to a standard prefork
mechanism where each client connection creates a separate Apache process.
It combines the best from two worlds, having a set of child processes each
having a set of separate threads. There are sites that are running
10K+ concurrent connections using this technology.

Connecting to Tomcat

In a simplest scenario when you need to connect to single Tomcat instance
you can use mod_proxy that comes as a part of every Apache distribution.
However, using the mod_jk connector will provide approximately double the performance.
There are several reasons for that and the major is that mod_jk manages a
persistent connection pool to the Tomcat, thus avoiding opening and closing
connections to Tomcat for each request. The other reason is that mod_jk uses a custom
protocol named AJP an by that avoids assembling and disassembling header
parameters for each request that are already processed on the Web server.
You can find more details about AJP
protocol on the
Jakarta Tomcat connectors
site.

For those reasons you can use mod_proxy only for the low load sites
or for the testing purposes. From now on I’ll focus on mod_jk for fronting
Tomcat with Apache, because it offers better performance and scalability.

One of the major design parameters when fronting Tomcat with Apache
or any other Web server is to synchronize the maximum number of concurrent
connections. Developers often leave default configuration values from both Apache and
Tomcat, and are faced with spurious error messages in their
log files. The reason for that is very simple. Tomcat and Apache can each accept only
a predefined number of connections. If those
two configuration parameters differs, usually with Tomcat having
lower configured number of connections, you will be faced with the
sporadic connection errors. If the load gets even higher, your users will
start receiving HTTP 500 server errors even if your hardware is capable
of dealing with the load.

Determining the number of maximum of connections to the Tomcat
in case of Apache web server depends on the MPM used.

MPM configuration parameter
Prefork MaxClients
Worker MaxClients
WinNT ThreadsPerChild
Netware MaxThreads

On the Tomcat side the configuration parameter that limits the number
of allowed concurrent requests is maxProcessors with default value of
20. This number needs to be equal to the MPM configuration parameter.

Load balancing

Load balancing is one of the ways to increase the number of concurrent
client connections to the application server. There are two types of
load balancers that you can use. The first one is hardware load balancer
and the second one is software load balancer. If you are using load balancing
hardware, instead of a mod_jk or proxy, it must support a compatible passive
or active cookie persistence mechanism, and SSL persistence.

Mod_jk has an integrated virtual load balancer worker that can contain
any number of physical workers or particular physical nodes.
Each of the nodes can have its own balance factor or the worker’s
quota or lbfactor. Lbfactor is how much we expect this worker
to work
, or the workers’s work quota.
This parameter is usually dependent on the hardware topology itself, and
it offers to create a cluster with different hardware node configurations.
Each lbfactor is compared to all other lbfactors in the cluster and its
relationship gives the actual load. If the lbfactors are equal the workers
load will be equal as well (e.g. 1-1, 2-2, 50-50, etc…). If first
node has lbfactor 2 while second has lbfactor 1, than the first node
will receive two times more requests than second one.
This asymmetric load configuration enables to have nodes with different
hardware architecture.

In the simplest load balancer topology with only two nodes in the
cluster, the number of concurrent connections on a web server side
can be as twice as high then on a particular node. But …

    1 + 1 != 2

The upper statement means that the sum of allowed connections on a
particular nodes does not give the total number of connections allowed.
This means that each node has to allow a slightly higher number of
connections than the desired total sum. This number is usually a
20% higher and it means that

    1 * 1.2 + 1 * 1.2 == 2

So if you wish to have a 100 concurrent connections with two nodes,
each of the node will have to handle the maximum of 60 connections.
The 20% margin factor is experimental, and depends on the Apache
server used. For prefork MPMs it can rise up to 50%, while for
the NT or Netware its value is 0%. The reason for that is that
each particular child process menages its own balance statistics
thus giving this 20% error for multiple child process web servers.

    worker.node1.type=ajp13
    worker.node1.host=10.0.0.10
    worker.node1.lbfactor=1

    worker.node2.type=ajp13
    worker.node2.host=10.0.0.11
    worker.node2.lbfactor=2

    worker.node3.type=ajp13
    worker.node3.host=10.0.0.12
    worker.node3.lbfactor=1

    worker.list=lbworker
    worker.lbworker.type=lb
    worker.lbworker.balance_workers=node1,node2,node3

The minimum configuration for a three node cluster shown in the
upper example will give the 25%-50%-25% distribution of the load,
meaning that the node2 will get as much load as the rest of the two members.
It will also impose the following number of maxProcessors for each particular
node in case of the MaxClients=200.

    node1 :
        <Connector ... maxProcessors="60" ... />
    node2 :
        <Connector ... maxProcessors="120" ... />
    node3 :
        <Connector ... maxProcessors="60" ... />

Using simple math the load should be 50-100-50 but we needed to add the
20% load distribution error. In case this 20% additional load is not sufficient,
you will need to set the higher value up to the 50%. Of course the average
number of connections for each particular node will still follow the
load balancer distribution quota.

Sticky sessions and failower

One of the major problems with having multiple backend
application servers is determining the client-server relationship.
Once the client makes a request to a server application that
needs to track user actions over a designated time period,
some sort of state has to be enforced inside a stateless http
protocol. Tomcat issues a session identifier that
uniquely distinguishes each user. The problem with that session
identifier is that he does not carry any information about the
particular Tomcat instance that issued that identifier.

Tomcat in that case adds an extra jvmRoute configurable
mark to that session. The jvmRoute can be any name that will
uniquely identify the particular Tomcat instance in the cluster.
On the other side of the wire the mod_jk will use that jvmRoute
as the name of the worker in it’s load balancer list. This means
that the name of the worker and the jvmRoute must be equal.

jvmRoute is appended to the session identifier :

http://host/app;jsessionid=0123456789ABCDEF0123456789ABCDEF.jvmRouteName

When having multiple nodes in a cluster you can improve your application
availability by implementing failover. The failover means that if the
particular elected node can not fulfill the request the another node
will be selected automatically. In case of three nodes you are actually doubling your
application availability. The application response
time will be slower during failover, but none
of your users will be rejected. Inside the mod_jk configuration there
is a special configuration parameter called worker.retries that has default value of 3, but
that needs to be adjusted to the actual number of nodes in the cluster.

    ...
    worker.list=lbworker
    worker.lbworker.type=lb
    # Adjust to the number of workers
    worker.retries=4
    worker.lbworker.balance_workers=node1,node2,node3,node4

If you add more then three workers to the load balancer
adjust the retries parameter to reflect that number.
It will ensure that even in the worse case scenario the request
gets served if there is a single operable node. Of course, the
request will be rejected if there are no free connections available on the
Tomcat side , so you should increase the allowed number of connections
on each Tomcat instance. In the three node scenario (1-2-1)
if one of the nodes goes down, the other
two will have to take its load. So if the load is divided equally you will need
to set the following Tomcat configuration:

    node1 :
        <Connector ... maxProcessors="120" ... />
    node2 :
        <Connector ... maxProcessors="160" ... />
    node3 :
        <Connector ... maxProcessors="120" ... />

This configuration will ensure that 200 concurrent connections will
always be allowable no matter which of the nodes goes down. The reason for
doubling the number of processors on node1 and node3 is because they
need to handle the additional load in case node2 goes down (load 1-1).
Node2 also needs the adjustment because
if one of the other two nodes goes down, the load will be 1-2. As you
can see the 20% load error is always calculated in.

Figure 4.
Figure 4. Three node example load balancer
Figure 5.
Figure 5. Failover for node2

As shown in the two figures above setting maxProcessors depends both
on 20% load balancer error and expected single node failure. The
calculation must include the node with the highest lbfactor as
the worst case scenario.

Domain Clustering model

Since JK version 1.2.8 there is a new domain clustering model and
it offers horizontal scalability and performance of tomcat cluster.

Tomcat cluster does only allow session replication to all nodes in the cluster.
Once you work with more than 3-4 nodes there is too much overhead and risk in
replicating sessions to all nodes. We split all nodes into clustered groups.
The newly introduced worker attribute domain let
mod_jk know, to which other nodes a session gets replicated (all workers with
the same value in the domain attribute). So a load balancing worker knows, on
which nodes the session is alive. If a node fails or is being taken down
administratively, mod_jk chooses another node that has a replica of the session.

For example if you have a cluster with four nodes you can make
two virtual domains and replicate the sessions only inside the domains.
This will lower the replication network traffic by half

Figure 6.
Figure 6. Domain model clustering

For the above example the configuration would look like:

    worker.node1.type=ajp13
    worker.node1.host=10.0.0.10
    worker.node1.lbfactor=1
    worker.node1.domain=A

    worker.node2.type=ajp13
    worker.node2.host=10.0.0.11
    worker.node2.lbfactor=1
    worker.node2.domain=A

    worker.node3.type=ajp13
    worker.node3.host=10.0.0.12
    worker.node3.lbfactor=1
    worker.node3.domain=B

    worker.node4.type=ajp13
    worker.node4.host=10.0.0.13
    worker.node4.lbfactor=1
    worker.node4.domain=B

    worker.list=lbworker
    worker.lbworker.type=lb
    worker.lbworker.balance_workers=node1,node2,node3,node4

Now assume you have multiple Apaches and Tomcats. The Tomcats are clustered and
mod_jk uses sticky sessions. Now you are going to shut down (maintenance) one
tomcat. All Apache will start connections to all tomcats. You end up with all
tomcats getting connections from all apache processes, so the number of threads
needed inside the tomcats will explode.
If you group the tomcats to domain as explained above, the connections normally
will stay inside the domain and you will need much less threads.


Fronting Tomcat with IIS

Just like Apache Web server for Windows, Microsoft IIS maintains
a separate child process and thread pool for serving concurrent client
connections. For non server products like Windows 2000 Professional or
Windows XP the number of concurrent connections is limited to 10.
This mean that you can not use workstation products for production
servers unless the 10 connections limit will fulfil your needs.
The server range of products does not impose that 10 connection
limit, but just like Apache, the 2000 connections is a limit when
the thread context switching will take its share and slow down the
effective number of concurrent connections.
If you need higher load you will need to deploy additional web servers
and use Windows Network Load Balancer (WNLB) in front of Tomcat servers.

Figure 7.
Figure 7. WNLB High load configuration

For topologies using Windows Network Load Balancer the same rules are in place
as for the Apache with worker MPM. This means that each Tomcat instance
will have to handle 20% higher connection load per node than its real lbfactor.
The workers.properties configuration must be
identical on each node that constitutes WNLB, meaning that you will have to
configure all four Tomcat nodes.


Apache 2.2 and new mod_proxy

For the new Apache 2.1/2.2 mod_proxy has been rewriten and has
a new AJP capable protocol module (mod_proxy_ajp) and integrated
software load balancer (mod_proxy_balancer).

Because it can maintain a constant connection pool to backed
servers it can replace the mod_jk functionality.

    LoadModule proxy_module modules/mod_proxy.so
    LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
    LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
    ...
    <Proxy balancer://mycluster>
        BalancerMember ajp://10.0.0.10:8009 min=10 max=100 route=node1 loadfactor=1
        BalancerMember ajp://10.0.0.11:8009 min=20 max=200 route=node2 loadfactor=2
    </Proxy>
    ProxyPass /servlets-examples balancer://mycluster/servlets-examples

The above example shows how easy is to configure a Tomcat cluster with
proxy loadbalancer. One of the major advantages of using proxy is the
integrated caching, and no need to compile external module.

Mod_proxy_balancer has integrated manager for dynamic parameter changes.
It offers changing session routes or disabling a node for maintenance.

    <Location /balancer-manager>
        SetHandler balancer-manager
        Order deny,allow
        Allow from localhost
    </Location>
Figure 8.
Figure 8. Changing BalancerMember parameters

The future development of mod_proxy will include the option to
dynamically discover the particular node topology. It will also allow
to dynamically update loadfactors and session routes.


About the Author

Mladen Turk is a Developer and Consultant for JBoss Inc in Europe, where he is
responsible for native integration. He is a long time commiter for Jakarta Tomcat Connectors,
Apache Httpd and Apache Portable Runtime projects.


Links and Resources

Jakarta Tomcat connectors documentation

Apache 2.0 documentation

Apache 2.1 documentation

Lấy thông tin “Server Load” bằng PHP

Đôi khi bạn muốn kiểm soát lượng truy cập vào website của mình, thông qua Apache bạn có thể tính toán được thông số lượng truy cập hiện tại.Dựa trên hàm get_server_load(), giả sử server của bạn chỉ cho phép tối đa 1000 lượt truy cập cùng 1 thời điểm, và lượt truy cập thứ 1001 sẽ phải chờ, bạn chỉ cần viết 1 đoạn code sau:

if (get_server_load(true)>1000){echo “Server busy now. Try again later!”;exit(0);}

Dưới đây là hàm tính get_server_load();

function get_server_load($windows = false) {$os = strtolower(PHP_OS);if(strpos($os, “win”) === false) {if(file_exists(“/proc/loadavg”)) {$load = file_get_contents(“/proc/loadavg”);$load = explode(‘ ‘, $load);return $load[0];}elseif(function_exists(“shell_exec”)) {$load = explode(‘ ‘, `uptime`);return $load[count($load)-1];}else {return false;}}elseif($windows) {if(class_exists(“COM”)) {$wmi = new COM(“WinMgmts:\\\\.”);$cpus = $wmi->InstancesOf(”Win32_Processor”);$cpuload = 0;$i = 0;while ($cpu = $cpus->Next()) {$cpuload += $cpu->LoadPercentage;$i++;}$cpuload = round($cpuload / $i, 2);return “$cpuload%”;}else {return false;}}}print_r(get_server_load(true));?>

Speed up Apache – how I went from F to A in YSlow

I decided to embark on figuring out how to make my site as fast as possible. There were a few tips I was already aware of but decided to grade myself using YSlow. My initial score was bad, an F. I realized I had to do a few things.

  • Compress text/* files using gzip
  • Decrease HTTP requests by combining multiple JavaScript (and CSS) files into single files
  • Add aggressive caching since the site isn’t updated very often (especially images, JavaScript and CSS files)

Here’s what my original scores looked like.



Making fewer HTTP requestsIn order to make fewer HTTP requests I knew that I had to combine my 3 CSS files into one as well as my 7 JavaScript files. I decided to use a little bit of Apache’s mod_rewrite magic. I set up a rewrite rule for any file which didn’t exist inside my js directory. The rule looks like this:

RewriteEngine onRewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)\?*$ compress.php?__args__=$1 [L,QSA]
  

This let’s me have a script tag that looks like this.

<script src="file1.js|file2.js|file3.js"></script>

I have a php file named compress.php which takes the file names in the query string and concatenates them together. That file looks like this.Note: Thanks to Pascal in the comments for pointing out a security issue in the original version.

  ini_set('include_path', '.');
  header('Content-Type: text/javascript');
  if(!empty($_GET['__args__']))  {
      $files = (array)explode('|',$_GET['__args__']);
      foreach($files as $file)    {
            if(isValidFile($file))      {
                    readfile($file);
                    echo "\n";
            }
    }
  }
  function isValidFile($file)  {
      // add whatever logic here you'd like
      return substr($file, -3) == '.js';
  }
  

I did the exact same thing for CSS files so I won’t go into great detail with that.Adding expire tags to the HTTP headersThe next step was to add expires headers to the response. I originally thought about doing it with PHP but decided that it would be a much better idea to do it using Apache. I am not going to go into the details of caching (maybe another post someday). I decided to add expiration headers by file type. This can be done using Apache by using mod_expires. You can turn this on in your httpd.conf file by searching for mod_expires and making sure it’s commented out. Once that’s done you can add the following to your configuration files. I did mine inside of my VirtualHost block.

  ExpiresActive On  
  ExpiresByType text/html "A7200"
  ExpiresByType text/javascript "A604800"  
  ExpiresByType text/css "A604800"  
  ExpiresByType image/x-icon "A31536000"  
  ExpiresByType image/gif "A604800"  
  ExpiresByType image/jpg "A604800"  
  ExpiresByType image/jpeg "A604800"  
  ExpiresByType image/png "A604800"   
  Header set Cache-Control "must-revalidate"
  

You first have to Enable the module and then you can specify when you want specific file types to expire. The syntax for the 2nd parameter is [A|M]seconds. So A3600 would mean one hour since it was last accessed and M60 would mean one minute since it was last modified. I also added the must-revalidate cache control so that all browsers and proxies let me specify exactly how my site handles caching.Compressing files (text/*)This one was easy. I suggest using Apache’s mod_deflate or mod_gzip to do this but I was getting lazy and decided to do it with PHP. It was a one liner for me and I only had to add it in one spot for my entire site to obey it. Doing this also took care of minifying JS listed as #10.

ob_start('ob_gzhandler');

Configuring ETagsI had no idea what ETags were so I had to look them up. ETags are server generated ids for each object. The nice thing about this is that it can be updated programatically to let the browser or proxy know that a new file is available. I decided to have apache handle this for all requests and base it off of the modification time and the file size. I added the following immediately below the cache configurations in my VirtualHost block.

FileETag MTime Size

The results…The results were pretty good. I can do some more tweaking but a few of the remaining issued are due to my usage of Google Analytics and the Photagious API. My final score was an A (92).


Serving JavaScript Fast

With our so-called “Web 2.0″ applications and their rich content and interaction, we expect our applications to increasingly make use of CSS and JavaScript. To make sure these applications are nice and snappy to use, we need to optimize the size and nature of content required to render the page, making sure we’re delivering the optimum experience. In practice, this means a combination of making our content as small and fast to download as possible, while avoiding unnecessarily refetching unmodified resources.This is complicated a little by the nature of CSS and JavaScript resources. In contrast to image assets, CSS and JavaScript source code is very likely to change many times as time goes by. When these resources change, we need our clients to download them all over again, invalidating the version in their local cache (and any versions stored in other caches along the way). In this article, we’ll look at ways we can make the whole experience as fast as possible for our users – the initial page load, subsequent page loads and ongoing resource loading as the application evolves and content changes.I believe strongly in making things as simple as possible for developers, so we’ll also be looking at ways we can set up our systems to automatically take care of these optimization issues for us. With a little up front work, we can get the best of both worlds – an environment that makes development easy with great end-user performance – all without changing the way we work.

Monolith

The old school of thought was that we could achieve optimal performance by combining multiple CSS and JavaScript files into fewer, larger blocks. Rather than having ten 5k JavaScript files, we combine them into a single 50k file. While the total size of the code is still the same, we avoid having the overhead associated with multiple HTTP requests. Each request has a setup and teardown phase on both the client and server, incurs request and response header size overhead, and resource overhead on the server side in the form of more processes or threads (and perhaps more CPU time for on-the-fly gzipped content).The parellization aspect is also important. By default, both Internet Explorer and Mozilla/Firefox will only download two resources from a single domain at once when using persistent connections (as suggested in the HTTP 1.1 spec, section 8.1.4). This means that while we’re waiting to download those JavaScript files, 2 at a time, we’re not loading image assets – the page our users see during the loading phase will be missing its images.However, there are a couple of downsides to this approach. By bundling all of our resources together, we force the user to download everything up front. By chunking content into multiple files we can spread out the cost of loading across several pages, amortizing the speed hit across a session (or avoiding some of the cost completely, depending on the path the user chooses). If we make the first page slow to speed up subsequent pages, we might find that we have more users who never wait around to request a second page.The big downside to the single file approach has not often, historically, been considered. In an environment where we will have to often change our resources, any changes to a single-file system will require the client to re-download a copy of the entire CSS or JavaScript working set. If our application has a single monolithic 100k JavaScript source file, any tiny change to our code will force all clients to suck down the 100k all over again.

A splintered approach

The alternative approach lies somewhere in the middle – we split our CSS and JavaScript resources into multiple sub-files, while at the same time keeping that number functionally low. This compromise comes at a cost – we need to be able to develop applications with our code split out into logical chunks to increase development efficiency, while delivering merged files for performance. With a few additions to our build system (the set of tools which turn your development code into production code, ready for deployment), this needn’t be a compromise we have to make.For an application environment with distinct development and production environments, you can use a few simple techniques to keep your code manageable. In your development environment, code can be split into many logical components to make separation clear. In Smarty (A PHP templating language) we can create a simple function to manage the loading of our JavaScript:

SMARTY:{insert_js files="foo.js,bar.js,baz.js"}
PHP: function smarty_insert_js($args){
  foreach (explode(',', $args['files']) as $file){
      echo "<script type=\"text/javascript\" src=\"/javascript/$file\"></script>\n";
  }
}
OUTPUT:<script type="text/javascript" src="/javascript/foo.js"></script><script type="text/javascript" src="/javascript/bar.js"></script><script type="text/javascript" src="/javascript/baz.js"></script>

So far, so easy. But then we instruct our build process to merge certain files together into single resources. In our example, imagine we merged foo.js and bar.js into foobar.js, since they are nearly always loaded together. We can then record this fact in our application configuration and modify our template function to use this information.

SMARTY:{insert_js files="foo.js,bar.js,baz.js"}

PHP:# map of where we can find .js source files after the build process# has merged as necessary
$GLOBALS['config']['js_source_map'] = array(
   'foo.js'	=> 'foobar.js',
   'bar.js'	=> 'foobar.js',
   'baz.js'	=> 'baz.js',
);
function smarty_insert_js($args){
  if ($GLOBALS['config']['is_dev_site']){
    $files = explode(',', $args['files']);
  }else{
    $files = array();
    foreach (explode(',', $args['files']) as $file){
      $files[$GLOBALS['config']['js_source_map'][$file]]++;
    }
    $files = array_keys($files);
  }
  foreach ($files as $file){
   echo "<script type=\"text/javascript\" src=\"/javascript/$file\"></script>\n";
  }
}
OUTPUT:<script type="text/javascript" src="/javascript/foobar.js"></script><script type="text/javascript" src="/javascript/baz.js"></script>

The source code in our templates doesn’t need to change between development and production, but allows us to keep files separated while developing and merged in production. For bonus points, we can write our merging process in PHP and use the same configuration block to perform the merge process, allowing us to keep a single configuration file and avoid having to keep anything in sync. For super-bonus points, we could analyze the occurrence of scripts and style sheets together on pages we serve, to determine which files would be best to merge (files that nearly always appear together are good candidates for merging).For CSS, a useful model to start from is that of a master and subsection relationship. A single master style sheet controls style across your entire application, while multiple sub-sheets control various distinct feature areas. In this way, most pages will load only two sheets, one of which is cached the first time any page is requested (the master sheet).For small CSS and JavaScript resource sets, this approach may be slower for the first request than a single large resource, but if you keep the number of components low then you’ll probably find it’s actually faster, since the data size per page is much lower. The painful loading costs are spread out around different application areas, so the number of parallel loads is kept to a minimum while also keeping the resources-per-page size low.

Compression

When talk about asset compression, most people think immediately of mod_gzip. Beware, however – mod_gzip is actually evil, or at the least, a resource hogging nightmare. The idea behind it is simple – browsers request resources and send along a header to show what kind of content encodings they accept. It looks something like this:

Accept-Encoding: gzip,deflate

When a server encounters this header, it can then gzip or deflate (compress) the content it’s sending to the client, where the client will then decompress it. This burns CPU time on both the client and server, while reducing the amount of data transferred. All well and good. The way mod_gzip works, however, is to create a temporary file on disk in which to compress the source data, serve that file out, then delete it. For high volume systems, you very quickly become bound by disk IO. We can avoid this by using mod_deflate instead (Apache 2 only), which does all the compression in memory – sensible. For Apache 1 users, you can instead create a RAM disk and have mod_gzip writes its temporary files there – not quite as fast as pure in-memory compression, but not nearly as slow as writing to disk.Even so, we can avoid the compression overhead completely by pre-compressing the relevant static resources and using mod_gzip to serve people the compressed version where appropriate. If we add this compression into our build process, it all happens transparently to us. The number of files that need compressing is typically quite low – we don’t compress images since we don’t gain much, if any, size benefit (since they’re already compressed) so we only need to compress our JavaScript and CSS (and any other uncompressed static content). Configuration options tell mod_gzip where to look for pre-compressed files.

mod_gzip_can_negotiate Yes mod_gzip_static_suffix	.gzAddEncoding	gzip	.gz

Newer versions of mod_gzip (starting with version 1.3.26.1a) can pre-compress files for you automatically by adding a single extra configuration option. You’ll need to make sure that Apache has the correct permissions to create and overwrite the gzipped files for this to work.

mod_gzip_update_static	Yes

However, it’s not that simple. Certain versions of Netscape 4 (specifically 4.06 to 4.08) identify themselves as being able to interpret gzipped content (they send a header saying they do), but they cannot correctly decompress it. Most other versions of Netscape 4 have issues with loading compressed JavaScript and CSS in different and exciting ways. We need to detect these agents on the server side and make sure they get served an uncompressed version. This is fairly easy to work around, but Internet Explorer (versions 4 through 6) has some more interesting issues. When loading gzipped JavaScript, Internet Explorer will sometimes incorrectly decompress the resource, or halt compression halfway through, presenting half a file to the client. If you rely on your JavaScript working, you need to avoid sending gzipped content to Internet Explorer. In the cases where Internet Explorer does receive gzipped JavaScript correctly, some older 5.x versions won’t cache the file, regardless of it’s e-tag headers.Since gzip compression of content is so problematic, we can instead turn our attention to compressing content without changing its format. There are many JavaScript compression scripts available, most of which use a regular expression driven rule set to reduce the size of JavaScript source. There are several things which can be done to make the source smaller – removing comments, collapsing whitespace, shortening privately scoped variable names and removing optional syntax.Unfortunately, most of these scripts either obtain a fairly low compression rate, or are destructive under certain circumstances (or both). Without understanding the full parse tree, it’s difficult for a compressor to distinguish between a comment and what looks like a comment inside a quoted string. Adding closures to the mix, it’s not easy to find which variables have a private lexical scope using regular expressions, so some variable name shortening techniques will break certain kinds of closure code.One compressor does avoid this fate – the Dojo Compressor (there’s a ready-to-use version here) works by using Rhino (Mozilla’s JavaScript engine implemented in Java) to build a parse tree, which it then reduces before serializing it to a file. The Dojo Compressor can give pretty good savings for a low cost – a single compression at build time. By building this compression into our build process, it all happens transparently for us. We can add as much whitespace and as many comments as we like to our JavaScript in our development environment, without worrying about bloating our production code.Compared to JavaScript, CSS is relatively simple to compress. Because of a general lack of quoted strings (typically paths and font names) we can mangle the whitespace using regular expressions. In the cases where we do have quoted strings, we can nearly always collapse a whitespace sequence into a single space (since we don’t tend to find multiple spaces or tabs in URL paths or font names). A simple Perl script should be all we need:

#!/usr/bin/perlmy
$data = '';
open F, $ARGV[0] or die "Can't open source file: $!";
$data .= $_ while <F>;
close F;
$data =~ s!\/\*(.*?)\*\/!!g;
# remove comments
$data =~ s!\s+! !g;
# collapse space
$data =~ s!\} !}\n!g;
# add line breaks
$data =~ s!\n$!!;
# remove last break
$data =~ s! \{ ! {!g;
# trim inside brackets
$data =~ s!; \}!}!g;
# trim inside bracketsprint
$data;

We can then feed individual CSS files through the script to compress them like so:

perl compress.pl site.source.css > site.compress.css

With these simple plaintext optimizations we can reduce the amount of data sent over the wire by as much as 50% (depending upon your coding style – it might be much less), which can translate to a much faster experience for our users. But what we’d really like to do is avoid users having to even request files unless completely necessary – and that’s where an intimate knowledge of HTTP caching comes in handy.

Caching is your friend

When a user agent requests a resource from a server for the first time, it caches the response to avoid making the same request in the future. How long it stores this response for is influenced by two factors – the agent configuration and any cache control response headers from the server. All browsers have subtly different configuration options and behaviors, but most will cache a given resource for at least the length of a session, unless explicitly told otherwise.It’s quite likely you already send out anti-caching headers for dynamic content pages to avoid the browser caching pages which constantly change. In PHP, you can achieve this with a pair of function calls:

<?php
header("Cache-Control: private");
header("Cache-Control: no-cache", false);
?>

Sounds too easy? It is – some agents will ignore this header under certain circumstances. To really convince a browser not to cache a document, you’ll need to be a little more forceful:

<?php
# 'Expires' in the past
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
# Always modified
header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");
# HTTP/1.1
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
# HTTP/1.0
header("Pragma: no-cache");
?>

This is fine for content we don’t want to be cached, but for content that doesn’t change with every request we want to encourage the browser to cache it aggressively. The “If-Modified-Since” request header allows us to get part of the way there. If a client sends an “If-Modified-Since” header with its request, Apache (or your web server of choice) can respond with status code 304 (”Not Modified”), telling the browser that its cached copy of the file is already up to date. With this mechanism, we can avoid sending the contents of a file to the browser, but we still incur the overhead of an HTTP request. Hmmm.Similar to the if-modified-since mechanism are entity tags. Under Apache, each response for a static resource is given an “ETag” header containing a checksum generated from the file’s modified-time, size and inode number. A browser can then perform a HEAD request to check the e-tag for a resource without downloading it. E-tags suffer from the same problem as the if-modified-since mechanism – the client still needs to perform an HTTP request to determine the validity of the locally cached copy.In addition, you need to be careful with if-modified-since and e-tags if you serve content from multiple servers. With two load-balanced web servers, a single resource could be requested from either server by a single agent – and could be requested from each at different times. This is great – it’s why we load balance. However, if the two servers generate different e-tags or modified dates for the same files, then browsers won’t be able to properly cache content. By default, e-tags are generated using the inode number of the file, which will vary from server to server. You can turn this off using a single Apache configuration option:

FileETag MTime Size

With this option, Apache will use only the modification time and file size to determine the e-tag. This, unfortunately, leads us to the other problem with e-tags, which can affect if-modified-since too (though not nearly as badly). Since the e-tag relies on the modified time of the file, we need those times to be in sync. If we’re pushing files to multiple web servers, there’s always a chance that the time at which the files are pushed are subtly different by a second or two. In this case, the e-tags generated by two servers will still be different. We could change the configuration to generate e-tags only from the file size, but this means that we’ll generate the same e-tag if we change a file’s contents without changing its size. Not ideal.

Caching is your best friend

The problem here is that we are approaching the issue from the wrong direction. These possible caching strategies all revolve around the client asking the server if its cached copy is fresh. If we could notify the client when we change a file, it would know that its own cached copy was fresh, until we told it otherwise. But the web doesn’t work that way – the client makes requests to the server.But that’s not quite true – before fetching any JavaScript or CSS files, the client makes a request to the server for the page which will be loading those files via <script> or <link> tags. We can use the response from the server to notify the client of any changes in those resources. This is all a little cryptic, so let’s spell it out – if we change the filenames of JavaScript and CSS files when we change their contents, we can tell the client to cache every URL forever, since the content of any given URL will never change.If we are sure that a given resource will never change, then we can send out some seriously aggressive caching headers. In PHP, we just need a couple of lines:

<?php
header("Expires: ".gmdate("D, d M Y H:i:s", time()+315360000)." GMT");
header("Cache-Control: max-age=315360000");
?>

Here we tell the browser that the content will expire in 10 years (there are 315,360,000 seconds in 10 years, more or less) and that it can keep it around for 10 years. Of course, we’re probably not serving our JavaScript and CSS via PHP – we’ll address that in a few moments.

Mistakes abound

Manually changing the filenames of resources when the contents are modified is a dangerous task. What happens if you rename the file, but not the templates pointing to it? What happens if you change some templates but not others? What happens if you change the templates but don’t rename the file? Most likely of all, what happens if you modify a resource but forget to rename it or change any references to it. In the best of these cases, users will not see the new content and be stuck with the old versions. In the worst case, no valid resource is found and your site stops working. This sounds like a dumb idea.Luckily computers are really good at this sort of thing – dull repetitive tasks which need to be done exactly right, over and over again, when some kind of change occurs.The first step in making this process as painless as possible is to realize that we don’t need to rename files at all. URLs we serve content from and where the content is located on disk don’t have to have anything to do with each other. Using Apache’s mod_rewrite we can create a simple rule to redirect certain URLs to certain files.

RewriteEngine onRewriteRule ^/(.*\.)v[0-9.]+\.(css|js|gif|png|jpg)$	/$1$2	[L]

This rule matches any URL with one of the specified extensions which also contains a ‘version’ nugget. The rule then rewrites these URLs to a path without the version nugget. Some examples:

URL			   Path
/images/foo.v2.gif	-> /images/foo.gif
/css/main.v1.27.css	-> /css/main.css
/javascript/md5.v6.js	-> /javascript/md5.js

With this rule in-place, we can change the URL (by changing the version number) without changing where the file lives on disk. Because the URL has changed, the browser treats it as a different resource. For bonus points, you can combine this with the script grouping function from earlier to produce a list of versioned <script> tags as needed.At this point, you might ask why we don’t just add a query string to the end of the resource – /css/main.css?v=4. According the letter of the HTTP caching specification, user agents should never cache URLs with query strings. While Internet Explorer and Firefox ignore this, Opera and Safari don’t – to make sure all user agents can cache your resources, we need to keep query strings out of their URLs.Now that we can change our URLs without moving the file, it would be nice to be able to have the URLs updated automatically. In a small production environment (or a development environment, for people with large production environments), we can do this really easily using a template function. This example is for Smarty, but applies equally well to other templating engines.

SMARTY:<link href="{version src='/css/group.css'}" rel="stylesheet" type="text/css" />
PHP: function smarty_version($args){
  $stat = stat($GLOBALS['config']['site_root'].$args['src']);
  $version = $stat['mtime'];
  echo preg_replace('!\.([a-z]+?)$!', ".v$version.\$1", $args['src']);
}
OUTPUT:<link href="/css/group.v1234567890.css" rel="stylesheet" type="text/css" />

For each linked resource, we determine the file’s location on disk, check its mtime (the date and time the file was last modified on disk) and insert that into the URL as the version number. This works great for low traffic sites (where stat operations are cheap) and for development environments, but it doesn’t scale well to high volume deployments – each call to stat requires a disk read.The solution is fairly simple. In a large system we already have a version number for each resource, in the form of the source control revision number (you’re already using source control, right?). At the point when we go to build our site for deployment, we simply check the revision numbers of all of our resource files and write them to a static configuration file.

<?php
$GLOBALS['config']['resource_versions'] = array(
  '/images/foo.gif'    => '2.1',
  '/css/main.css'      => '1.27',
  '/javascript/md5.js' => '6.1.4',
);?>

We can then modify our templating function to use these version numbers when we’re operating in production.

<?php
function smarty_version($args){
  if ($GLOBALS['config']['is_dev_site']){
    $stat = stat($GLOBALS['config']['site_root'].$args['src']);
    $version = $stat['mtime'];
  }else{
    $version = $GLOBALS['config']['resource_versions'][$args['src']];
  }
  echo preg_replace('!\.([a-z]+?)$!', ".v$version.\$1", $args['src']);
}
?>

In this way, we don’t need to rename any files, or even remember when we modify resources – the URL will be automatically changed everywhere whenever we push out a new revision – lovely. We’re almost where we want to be.

Bringing it all together

When we talked about sending very-long-period cache headers with our static resources earlier, we noted that since this content isn’t usually served through PHP, we can’t easily add the cache headers. We have a couple of obvious choices for dealing with this; inserting PHP into the process or letting Apache do the work.Getting PHP to do our work for us is fairly simple. All we need to do is change the rewrite rule for the static files to be routed through a PHP script, then have the PHP script output headers before outputting the content of the requested resource.

Apache: RewriteRule ^/(.*\.)v[0-9.]+\.(css|js|gif|png|jpg)$  /redir.php?path=$1$2  [L]
PHP:
header("Expires: ".gmdate("D, d M Y H:i:s", time()+315360000)." GMT");
header("Cache-Control: max-age=315360000");
# ignore paths with a '..'
if (preg_match('!\.\.!', $_GET[path])){
 go_404();
}
# make sure our path starts with a known directory
if (!preg_match('!^(javascript|css|images)!', $_GET[path])){
 go_404();
}
# does the file exist?
if (!file_exists($_GET[path])){
 go_404();
}
# output a mediatype header
$ext = array_pop(explode('.', $_GET[path]));
switch ($ext){
  case 'css':
    header("Content-type: text/css");
    break;
  case 'js' :
    header("Content-type: text/javascript");
    break;
  case 'gif':
    header("Content-type: image/gif");
    break;
  case 'jpg':
    header("Content-type: image/jpeg");
    break;
  case 'png':
    header("Content-type: image/png");
    break;
  default:
    header("Content-type: text/plain");
}
# echo the file's contents
echo implode('', file($_GET[path]));
function go_404(){
  header("HTTP/1.0 404 File not found");
  exit;
}

While this works, it’s not a great solution. PHP demands more memory and execution time than if we did everything in Apache. In addition, we have to be careful to protect against exploits made possible by sending us doctored values for the path query parameter. To avoid all this headache, we can have Apache add the headers directly. The RewriteRule directive allows us to set environment variables when a rule is matched, while the Header directive lets us add headers only when a given environment variable is set. Combining these two directives, we can easily chain the rewrite rule together with the header settings.

RewriteEngine onRewriteRule ^/(.*\.)v[0-9.]+\.(css|js|gif|png|jpg)$ /$1$2 [L,E=VERSIONED_FILE:1]
Header add "Expires" "Mon, 28 Jul 2014 23:30:00 GMT" env=VERSIONED_FILEHeader add "Cache-Control" "max-age=315360000" env=VERSIONED_FILE

Because of Apache’s order of execution, we need to add the RewriteRule line to the main configuration file (httpd.conf) and not a per-directory (.htaccess) configuration file, otherwise the Header lines get run first, before the environment variable gets set. The Header lines can either go in the main configuration file or in an .htaccess file – it makes no difference.

Skinning rabbits

By combining the above techniques, we can build a flexible development environment and a fast and performant production environment. Of course, this is far from the last word on speed. There are further techniques we could look at (separate serving of static content, multiple domain names for increased concurrency) and different ways of approaching the ones we’ve talked about (building an Apache filter to modify outgoing URLs in HTML source to add versioning information on the fly). Tell us about techniques and approaches that have worked well for you by leaving a comment.

Read more

Cal’s new book, Building Scalable Web Sites, contains more tips and tricks to help you develop and manage the next generation of web applications.

Powered by WordPress | Theme: by 85ideas. Editor by Khoanguyen