Drupal Performance and Scalability

Obtaining Optimal Performance From Drupal And The LAMP Stack.

This book is written by Jeremy Andrews and licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.

Part 1: Drupal Performance

The
first section of this book offers details on how to get good
performance out of your Drupal powered website, and how to scale it as
demand grows. The majority of the features discussed in this section
are available without making any modifications to Drupal.

Chapter 1: Getting Started

Synopsis

This chapter explains the importance of fully defining your
performance and scalability goals. It helps you to identify what you
need to accomplish, showing how to set concrete and attainable goals.
This chapter also explains why it's important to maintain historical
performance logs, and discusses numerous technologies and services that
are available to aide in this effort. It stresses the importance of
making regular backups, and of testing backups before changes are made.
Finally, this chapter describes best practices for testing changes, and
deploying tested changes onto production servers.

Section 1: Setting Goals

Understanding and Defining the Problem

There are numerous ways that you can use this book. It is designed
to be readable from cover to cover, while also being usable as a
reference. Whether you have a specific performance problem you are
trying to solve, or you're researching options for improving the
general performance of your website, this book will prove helpful. It
can also prove useful to someone making a decision as to whether or not
Drupal can scale sufficiently for an upcoming project.

This first chapter is aimed at someone that has been tasked with
making general performance improvements to their website, helping you
to better understand what you need to accomplish and why you need to
accomplish it. Rather than randomly finding problems and fixing them in
the order they're found, you will review your entire website and
identify all of the areas where there are significant performance and
scalability problems. You will then prioritize these problems based on
the potential gains, as well as on the size and complexity of the
planned solution. Finally, you will begin focusing on the "lowest
hanging fruit", often solving the simplest problems first and quickly
realizing measurable performance improvements.

Goals versus Requirements

More often than not, there is a specific reason that you have begun
to focus on improving the performance of your website. This reason may
be a technical problem that needs solving, such as a database server
that fails when you are linked to by popular websites like Slashdot and
Digg. Or, your current interest in website performance may be business
driven, a task that was handed down the management chain to make all
pages on your website load within 2 seconds. Either way, it is
important to understand the tasks that need to be accomplished, and to
distinguish which tasks are requirements, and which tasks are goals.

A requirement is something that absolutely must be accomplished,
while a goal is something that it would be nice to accomplish. Using
the above examples, if your database server is failing whenever your
website gets too busy, it is reasonable to consider solving this
problem a requirement. On the other hand, if you are tasked with
achieving sub-two-second page load times, this is more likely to be
classified as a goal. You can often achieve sub-two-second page load
times for the majority of visitors to most of your pages, but for a
variety of reasons it is not always possible to achieve specific page
load times for all visitors of all web pages. It is important to set
realistic expectations.

Performance and Scalability Checklist

The following lists will help you better define the areas of your
website that need to be improved, and to better understand what is
driving this need for improvement. They are logically grouped into
multiple sections. Review each section to determine which are
applicable to your current project, and then work through your selected
lists, thoroughly documenting the goals and requirements for your
upcoming project. There is a temptation to skip this step and charge
head first into actually making changes, however until you define your
goals you have no way to measure your progress, and you may end up
trying to fix things that aren't even broken.

Quantitative Goals

Can you quantify the performance improvements you would like to make
to your website? Work through this section, clearly listing your
quantitative goals to the best of your ability.

Average page load times: Are your web pages loading too
slowly? Are users complaining of slow page loads? Are pages slow for
anonymous visitors, or logged in users, or both? What are your targeted
page load times for each?

Maximum page load times: Do most of your we pages
load in a reasonable amount of time, while some pages take an
abnormally long time? Are the same pages always slow, or does it seem
to be more random than that? What is your current maximum page load
time? What would be an acceptable maximum page load time? What would be
an optimal maximum page load time?

Page load times for first time visitors: Do you need
to make a good impression on first time visitors to your website? How
long does it take someone to load your web page if they've never
visited it before, and they don't have any of your page elements loaded
in their web browser cache?

Number of monthly page views: How many page views
has your website see on each of the previous six months? Have you
launched any new advertising campaigns or made any significant
announcements that you expect to result in more website traffic? What
are your targeted number of page views for each of the next six months?
What are you basing this projected growth on?

Number of monthly anonymous visitors: What
percentage of your traffic is anonymous visitors that do not have user
accounts or choose to not log in? Where have these anonymous visitors
come from in the past? What is your targeted number of anonymous
visitors for each of the next six months?

Number of monthly logged-in visitors: What
percentage of your traffic is logged-in users? How much does your
logged in traffic increase from month to month? What is your targeted
number of logged-in users for each of the next six months?

Number of subscriptions: Is your website
subscription oriented? How do subscriptions differ from normal users?
How many subscriptions have you seen in each of the previous six
months? What are your targeted number of subscriptions for each of the
next six months?

The time it takes to submit content: How long does
it currently take to submit a new story? How long does it take to
submit a new comment? Are you using free tagging? How many seconds is
an acceptable amount of time for submitting new content? How many
seconds is an optimal amount of time for submitting a new content?

Business Goals

Are there business needs driving your current performance and
scalability efforts? Work through this section of the checklist to
fully define these business needs.

Growth rate: Is there a general business drive to increase
the monthly growth rate of your website? How is this growth rate being
measured? What is the targeted growth rate? How does this growth rate
compare with past growth rates? Is the planned growth rate realistic?

Advertisement impressions: Does your business model
depend on selling a certain number of advertisements on your website?
Is online advertising new to your business, or is it an ongoing source
of income? Do you plan to sell the same number of advertisements each
month, or do you plan to regularly increase the number of
advertisements you are selling? Are you managing the ads in-house, or
are you using a third-party advertising network?

Partnerships: Are you partnering up with another
popular website, and expecting a significant increase in web traffic?
Will this traffic be mostly anonymous visitors, or mostly logged in
users? How much traffic does your partner website see?

Risk Management Goals

Is your website a critical component of your business? Are you
currently unable to take regular backups, or unclear even what data
needs to be backed up? Work through this section to define what is
acceptable data loss, setting goals and requirements for your upcoming
performance and scalability efforts.

High availability: How fault tolerant is your existing
infrastructure? What happens if your primary database server fails?
What happens if a web server fails?

Minimizing down time: What is the most downtime your
website has already experienced? What was the effect of this downtime?
What are the consequences if your website is down for too long? What
qualifies as downtime? What is your budget for building a fault
tolerant infrastructure? How much downtime would be acceptable, and is
it measured in seconds, minutes, hours, or days? How many much downtime
would be catastrophic to your business, and is it measured in seconds,
minutes, hours, or days?

Fast data recovery: Where do you store your backups?
How often are you taking backups? How many copies of backups do you
retain? Have you ever tried restoring data from your backups? How long
did it take? If you something happened to your database, how long can
you afford to recover data from a backup?

Survival after catastrophic failures: Do you have a
plan if a hurricane, earthquake, or explosion wipes out your data
center? Do you keep a copy of your data at a completely separate
physical location? If using an online backup solution, have you
confirmed that their servers are actually in another data center? How
long would it take you to build an entire new infrastructure?

Other Goals

What other needs are driving your performance efforts? Reviewing the
following goals, and try to come up with some more of your own.

Auditing current site performance: Do you currently not
have a good idea of the performance of your website? Are you looking
for ways to better understand how your site is currently performing, in
order to understand what needs to be improved, if anything?

Solve specific known performance bottlenecks: Do you
know exactly where the problems are with your website? Are you
receiving complaints from website users, or from management? Can you
duplicate the reported problems? Do you have a general idea of what is
causing the problems? How can you measure the known performance
bottlenecks?

Improve scalability: Are you expecting to outgrow
your existing infrastructure? Do you know how much traffic your current
infrastructure can handle? Do you have a budget to add additional
servers to your network? Do you need to make due with the hardware you
have?

Contributing back to Drupal: Have you solved some
performance issues in ways that you think would be useful to other
Drupal users? Would you like to be recognized for contributing code and
documentation back to the Drupal project? Would you like to see your
improvements merged into Drupal's core code so when you upgrade in the
future you don't have to keep solving the same problems?

Section 2: Measuring Progress

Setting A Baseline

It is a common mistake to make performance oriented modifications to
a website before measuring existing site performance. By doing this, it
can then prove impossible to determine whether your changes have
resulted in real performance improvements, or if instead they have
resulted in reduced performance. For this reason, the first thing you
should do is to set up proper monitoring of your website, and to
quantify your current performance.

What To Monitor

There are many useful measurements that can be regularly monitored
on your website, and a large number of tools that can help with taking
these measurements. Each website will some unique monitoring needs,
however there are some basic measurements that most all websites will
want to regularly monitor. The following list will give you some ideas
of what you should monitor. Thought the list is numbered, the numbers
do not indicate the level of importance of each item. Instead, we
number items so we can provide specific examples when we discuss
monitoring tools.

The time it takes to load the front page when not logged in and with nothing cached by your browser.

The time it takes to load the front page again, when not
logged in but when you already have the CSS, JavaScript and images
cached by your.

The time it takes to load the front page page when you're logged in.

The time it takes to load each of your 25 most popular types of web pages.

The time it takes to load the above pages from different areas of the world.

The popularity of the various types of web pages on your
website. (Example types of pages include the front page, forum pages,
RSS feeds, and custom pages generated by modules.)

Server resource utilization such as CPU, load average, free memory, cached memory, swap, disk IO, and network traffic.

The number of pages being served by your web server(s).

Your database, including including the number of queries per
second, the efficiency of your query cache, how much memory is being
used, and how often temporary tables are being created.

Database queries taking more than 1 second to complete.

Database queries not using indexes.

The number of searches being performed per hour.

Memcache, including how much memory is being used, how many queries are being made per second, and your hit versus miss rates.

Monitoring Tools

You will need to use multiple tools to fully monitor your website.
Some of these tools can run on your existing infrastructure, while
other tools may need to live outside of your network.

ps

Ps displays information about all processes that are
currently running on your server. The command line utility supports a
large number of optional flags that control which processes are
displayed, and what information is displayed about each process.
Information that can be displayed includes CPU usage, memory usage, how
much CPU time the process has used, and much more. Common invocations
of ps include ps -ef and ps aux. Learn more about ps by typing man ps on most Unix servers.

top

Top provides an automatically updating view of the
processes running on a server. It offers a quick summary of a server's
health, showing CPU utilization, as well as memory and swap usage.
Processes can be sorted in many ways, such as listing the processes
that are consuming the most CPU, or the processes that are using the
most memory.

http://www.unixtop.org/

vmstat

Vmstat offers a useful report on several areas of system
health, including the number of processes waiting to run, memory usage,
swap activity, CPU utilization, and Disk IO. A common invocation of vmstat is vmstat 1 10. Learn more about vmstat by typing man vmstat on most Unix servers.

Sar

Sar is part of the Sysstat collection of Unix performance
monitoring tools. Sar can be configured to collect regular
comprehensive snapshots of a system's health without putting any
noticeable load on the system. It is a very good idea to enable Sar on
any server that you are managing, as the historical information this
utility collects can prove invaluable when tuning a server, or when
performing damage control on a failed server.

http://pagesperso-orange.fr/sebastien.godard/

Cacti

Cacti is a PHP front-end for RRDTool, displaying useful
graphs based on historical data collected from your servers. By default
it tracks useful system information such as CPU and memory utilization,
however it can also be integrated with programs such as MySQL, Apache,
and memcache, displaying useful historical graphs of their performance.

http://www.cacti.net/

YSlow

YSlow is a FireFox add-on that enhances Firebug to analyze
how quickly your web pages load, highlighting areas that can be
improved. This tool is discussed in depth in chapter 13.

http://developer.yahoo.com/yslow/

AWStats

AWStats is a log analyzer that can be used to create
graphical reports from web server and proxy log files. When scaling a
Drupal website, you can achieve better performance by disabling
Drupal's core statistics module, and instead using AWStats to generate regular reports from Apache's own access logs.

http://awstats.sourceforge.net/

devel module

The devel module is one of a suite of development oriented
Drupal modules. Among its many useful features, it can display a list
of all queries used to build each page served by a Drupal powered
website, highlighting slow queries and queries that are run multiple
times. The devel module is discussed in depth in chapter 6.

http://drupal.org/project/devel

mysqlreport

Mysqlreport is a perl script that generates reports based
on numerous internal "status variables" maintained by MySQL. With this
script, you can quickly interpret what these variables mean, helping
you to tune your server for better performance. Mysqlreport is discussed in depth in chapter 22.

http://hackmysql.com/mysqlreport

mysqlsla

Mysqlsla, the MySQL Statement Log Analyzer, is a perl
script that helps you analyze MySQL logs. This script will be discussed
in depth in chapter 23, detailing how it can be used to review MySQL's
slow query logs.

http://hackmysql.com/mysqlsla

mytop

Mytop is a useful tool for monitoring a MySQL database from
the command line. It offers a summary of database threads in a format
similar to how top lists running server processes.

http://jeremy.zawodny.com/mysql/mytop/

innotop

Innotop was originally written to monitor MySQL's InnoDB
storage engine, but it has long since evolved into a very powerful tool
for monitoring all aspects of MySQL. Inspired by mytop, it takes MySQL monitoring to a new level.

http://innotop.sourceforge.net/

MySQL Enterprise Monitor

The MySQL Enterprise Monitor is a commercial offering by Sun
Microsystems for monitoring one or more MySQL servers. The
comprehensive tool provides useful charts and graphs, makes tuning
suggestions, and can send alerts when your MySQL servers need attention.

http://www.mysql.com/products/enterprise/monitor.html

Online Services

There are many online services that can help you with monitoring
your website. It is beyond the scope of this book to list and review
all of these services, but popular examples include Google Analytics,
IndexTools, ClickTracks and Omniture. Other online services can help
you to understand how quickly your web pages are loading from various
locations around the world, including Keynote, Webmetrics, Alert Bot,
and host-tracker.com.

Section 3: Backups

Why To Backup

Hopefully you already understand the general importance of
maintaining regular backups. For example, if a server fails and all
data on that server is lost, you can create a new server just like the
old by restoring a backup. If someone runs a bad query and accidentally
deletes data from your database, you can restore the lost data from a
backup. If you make a change to your website and later find that it was
a buggy change, you can roll back to the previous version of your
website from a backup. If you're setting up multiple web servers, you
can build the second server from a backup of the first. When you need
to test changes before deploying them on a live website, you can create
a copy of your actual website on a development server by restoring a
backup.

What To Backup

Generally speaking, it is important to back up anything that you
can't afford to lose and you can't easily recreate. For example, you
will certainly want to make regular backups of your database. If you
have written custom themes and modules, they too should be backed up.
If you written custom patches for Drupal unique to your website, back
them up. Any customized configuration files on your servers should also
be backed up. If your users upload files such as pictures or sounds,
this data should also be backed up.

Backups are an inexpensive insurance policy for when things go
wrong, as well as a useful tool for duplicating servers. When backups
are combined with a revision control system, they can also be useful
for reviewing changes over time, and for understanding how changes have
affected your website. Often times data loss is not immediately
detected, in which case it is important to have multiple copies of
backups.

The following list offers a suggestion of data that you should
consider backing up. When deciding what from the following list you
will be backing up, ask yourself, "what happens if I lose this data?"

Data to include in your backups

Database

Database configuration file(s)

Web server configuration file(s)

PHP configuration file(s)

User uploaded content

Custom modules and themes

Custom patches

What You May Not Want To Backup

While it is possible to back up your entire server, including the
underlying operating system, this is often not necessary. The
underlying operating system can be re-installed on a new server with
minimal fuss. Then, the various customized configuration changes can be
restored from backups. Furthermore, backing up your entire server will
require significantly more storage space. This becomes more and more of
a concern as you add additional servers to your infrastructure.
Finally, a backup one server may not easily restore to another server
if it has different hardware, such as different network cards or a hard
drive of another size.

When backing up your database tables, it is possible to not back up
up certain tables. For example, you don't have to back up Drupal 6's
four search tables as they can be regenerated if they are lost. The
many cache tables also do not have to be backed up. As the watchdog and
access log tables are already automatically flushed after a certain
amount of time, they are also good candidates for tables to skip if
trying to minimize the size of your backups. If you decide to skip
certain tables when making your backups, be aware that this can
complicate the restoration process. If you are building a new server
from backups, in addition to restoring your backup you will also have
to manually create any tables that weren't included in your backup.

Redundancy vs. Backups

You may have set up redundant systems, and expect this to take the
place of backups. For example, you may two databases with one
replicating to the other. Or, your data may be stored on a high end
RAID system, mirrored onto multiple physical drives. However, remember
that you're not only trying to protect yourself from system failures.
One of the most common reasons for data loss is human error. If you
accidentally run a query that deletes half your users, this errant
query will run on your database slave as well and delete your users in
both places. Or, if you accidentally delete a directory containing
user-contributed content, again this change will also be made on the
mirrored drives. For this reason, it's important to not assume that
redundancy replaces the need for regular backups.

When To Backup

A single backup of the above data from all your servers is a good
start. But most websites are constantly changing, with new content
being posted, old content being updated, and new users signing up all
the time. Any changes made between the time of your last backup and
when something goes wrong will be lost. Thus, it is important to make
regular backups.

In the first section of this chapter one of the discussed goals
asked you to define how much data you can afford to lose. Can you
afford to lose an hour of data? Can you afford to lose 24 hours of
data? Can you afford to lose a week of data? Obviously you would prefer
to not have any lost data, but at the end of the day it comes down to a
question of practicality and budget. Set realistic goals for yourself,
and then figure out how you can meet those goals. If you can afford to
lose a week of data, obviously your backup strategy can be much simpler
than someone who can't afford to lose more than an hour of data.

Also note that different types of data may change with different
frequency. For example, your database is likely to be constantly
changing, while your custom themes and modules are rarely changing.
Thus, different data can be backed up at a different frequency. It'

Backup Schedules

Now that you've defined how much data you can afford to lose in the
event of a catastrophic failure, it's time to set up a regular backup
schedule that meets your requirements. Your backup schedule needs to
take into account two significant questions:

How often does the backed up data change?

How much data can you afford to lose?

If the data being backed up never or very rarely changes, you can
update your backup each time you make a change. If your data changes
all the time, then you'll instead need to automate regular backups that
happen at least as frequently as your needs dictate. For example, if
you can only afford to lose 6 hours of data should your database fail,
set up your backup scripts to backup your database once every 6 hours.

Examples

Tracking Multiple Text Database Backups With Git

The following script is a simple yet powerful example of how you
could efficiently store multiple backups of your database within a
revision control system. In this example, we are using 'git', however
you could easily replace git with your favorite source control system.
Note that git is designed for storing lots of small files, not for
storing one large file, so it is may not be the best choice of tools
for maintaining backups of a growing database. Our use of the
"--single-transaction" flag for mysqldump assumes that you are using
MySQL's InnoDB storage engine.

To use this script, you should edit the configuration section as
appropriate for your system. You then need to create an empty directory
at the path defined by the script's BACKUP_DIRECTORY variable. Next,
create a new git repository by moving into this directory and typing
'git init'. With the repository initialized, manually run the mysqldump
command to generate the first copy of your database. Add this text
backup to the repository using 'git add', and check it in using 'git
commit -a'.

The steps described in the previous paragraph could have been
automated, however my goal was to keep the script as simple as
possible. Furthermore, you may end up deciding to use a different
revision control system than 'git', in which case you will need to set
things up differently.

The actual backup script follows:

#!/bin/sh

# Configuration:
BACKUP_DIRECTORY="/var/backup/mysql.git"
DATABASE="database_name"
DATABASE_USERNAME="username"
DATABASE_PASSWORD="password"
# End of configuration.

export PATH="/usr/bin:/usr/local/bin:$PATH"

cd $BACKUP_DIRECTORY

START=`date +'%m-%d-%Y %H:%M:%S'`

mysqldump -u$DATABASE_USERNAME -p$DATABASE_PASSWORD \
           --single-transaction --add-drop-table \
           $DATABASE > $DATABASE.sql

END=`date +'%m-%d-%Y %H:%M:%S'`
CHANGES=`git diff --stat`
SIZE=`ls -lh $DATABASE.sql | awk '{print $5}'`

/usr/bin/git-commit -v -m "Started:  $START
Finished: $END
File size: $SIZE
$CHANGES" -v $DATABASE.dump

Each time you run the above script, it will generate a current backup
of your database and check in the difference between this backup and
the previous backup. The script should be called from a regular
cronjob, causing your database to be backed up every few hours or every
day, depending on your needs.

Using 'git log', you can review the versions of your database that
have been checked in, and you can see the information that is logged
each time you make a backup:

Author: Jeremy Andrews
Date:   Sun Jul 20 15:14:09 2008 -0400

    Started:  07-20-2008 15:13:01
    Finished: 07-20-2008 15:14:02
    File size: 14M
     database.sql |   44 ++++++++++++++++++++++----------------------
     1 files changed, 22 insertions(+), 22 deletions(-)

There are many simple improvements you could make to increase the usefulness of this script, including:

Occasionally run 'git gc' to compress all the older copies of your database stored in your git repository.

Replace 'git' with your favorite source control system.

Push a copy of your repository to a remote server, so the
backups don't live only on the same server as your database. It is
important that you can access the backups if your database server
fails.

Generate an email each time the backup is completed, sending a brief status report.

Redirect stdout and stderr to a log file so you can see any errors that happen when running the script from crontab.

Minimize the size of the changes between each backup by making
two backups of your database. One backup should only include your table
definition using the --no-data option to mysqldump, and one backup
should only include your data using the --no-create-info option.

Backing Up Your Website With Git

Git provides a very simple method for backing up your website. It
offers much more than a backup, but that's all we're concerned about in
this section. In preparation, first create an empty Git repository on
your backup server. If you have multiple servers or web directories you
wish to back up, you should create an empty Git repository for each. By
using the "--bare" flag, we reduce the size of our backup as it won't
maintain an uncompressed copy of the latest version of the files:

$ mkdir backup.git
$ cd backup.git
$ git --bare init
Initialized empty Git repository in /home/user/backup.git/

Next, on the web server that you are backing up, "initialize" a
repository in your web directory. Add your website files to this
repository, and then "push" it to the empty repository on the backup
server. It is safe to initialize a Git repository on your live server
and check files into it as this does not modify your files in any way.
Instead, it creates a ".git" subdirectory where the local repository is
stored. In this example, we'll assume that your backup server has an IP
address of 10.10.10.10:

$ cd /var/www/html
$ git init
Initialized empty Git repository in .git/
$ git add .
$ git commit -a -m "Backup all files in website."
$ git remote add backup-server user@10.10.10.10:backup.git
$ git push backup-server master

Now, as you add new files to your web server, add them to your git
repository by running "git add". Commit these new files and any changed
files by running "git commit -a". And finally, push these updates to
the backup server by running "git push backup-server master".

You will learn more about using Git in the next section of this chapter.

Testing Backups

Simply making backups of your data is only half of the job. It's
also critical that you regularly validate your backups, insuring that
they are not corrupt and that they contain everything you need to
rebuild your websites.

One way to test your backups is to restore them to your development
server, building an up-to-date development environment. Doing this one
time is not enough, as though this does validate your general backup
strategy, it doesn't regularly validate the integrity of each backup.
You should instead update your development environment from backups on
a regular schedule, such as once a week. The process can be automated
through simple scripts.

Section 4: Staging Changes

Testing Changes

As you scale your website and its popularity grows, it becomes
increasingly important to properly test all changes prior to updating
your production website. At minimum, you should have a separate testing
server which duplicates your production environment. Your development
environment should be using the same exact operating system as your
production servers, with the same extensions installed and updates
applied. If you instead use for example CentOS on your production
servers, and Fedora on your development servers, you may find that code
which works perfectly on your development server fails in production
due to issues such as to failed dependencies.

The more similar you make your development environment to your
production environment, the more valid your testing will be. That said,
very often while your production environment may be comprised of
numerous servers, your development environment may be limited to a
single server. In this case, you should do what you can to simulate
your production infrastructure.

In this final section of chapter one, I offer best practices for
testing changes and pushing these changes to your production servers.

Revision Control

It's often tempting to maintain your website one file at a time,
manually copying individual files into place. Usually this involves
first making a backup copy of the file you wish to change, over time
resulting in dozens of old backups cluttering the directories of your
production servers. Often this can also involve editing files directly
on a production server and hoping for the best. Unfortunately even the
most trivial seeming change can have unforeseen consequences. It can
also quickly become confusing which files have been updated, and which
files are still an older version. This can result in bug fixes never
actually making it into production, or new and bigger bugs being
created while trying to fix old bugs.

A simple yet extremely effective solution to this problem is to
utilize a revision control system. Revision control is one of many
phrases used to describe the management of multiple versions of the
same information. Other popular phrases often used to describe this
functionality include version control, source control, and source code
management. There are a great many number of both open source and
proprietary revision control tools available to you. Popular examples
include CVS, Subversion (SVN), Perforce, and Git.

For the example contained in this book, we have chosen to use Git, a
fast and flexible distributed source control system originally designed
by Linus Torvalds for managing the Linux kernel. Git was selected
because of its distributed design, its growing popularity, its
flexibility, its applicability to what we are trying to solve, and its
free availability. However, this does not mean that you also need to
use Git to manage your website. It is possible to apply the tips and
best practices we explain here to your favorite source control system.

Tracking File Changes

The basic steps required for managing files with Git were briefly
discussed in the previous section on backups. In this section, we build
upon our previous examples, showing you how Git can offer you much more
than a versioned backup of your website.

Managing Drupal Core With Git

In this first example you will learn how you can manage a website
built from Drupal's core files. Start with an older version of Drupal,
which you will manually patch. You will then use Git to easily upgrade
your website to a newer version of Drupal. Start by checking out Drupal
6.2 out of CVS:

$ cvs -z6 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal \
  co -d html -r DRUPAL-6-2 drupal

Next, create a git repository in your website's directory, and in it
store all the files you checked out from CVS, including the CVS files
themselves. Then create a "drupal" branch where you'll keep an
unmodified copy of Drupal for use in upgrading the site later. Finally,
tag your release with the same tag found in CVS for simplified
reference, and revert back to the "master" branch:

$ cd html/
$ git init
Initialized empty Git repository in .git/
$ git add .
$ git commit -a -m "Drupal 6.2"
$ git checkout -b drupal-core
Switched to a new branch "drupal-core"
$ git tag DRUPAL-6-2
$ git checkout master
Switched to branch "master"

Now use your web browser to configure your Drupal installation, which
will create and configure settings.php. Once completed, add your new
settings.php file to the master branch of your Git repository.
Throughout these examples, you will be using many branches for merging
and website development, but the "master" branch will always contain
your actual website:

$ git add sites/default/settings.php
$ git commit -a -m "Add configured settings.php"

At this point you are ready to patch your new Drupal website. For this
example, you will apply a very simple patch to bootstrap.inc that is
intentionally a slightly different version of a change made to the file
in Drupal 6.3. You do this to cause a conflict when you upgrade the
website to Drupal 6.3:

$ cat bootstrap.inc.patch
index 44cd0d7..d45cf5d 100644
--- a/includes/bootstrap.inc
+++ b/includes/bootstrap.inc
@@ -283,0 +284,7 @@ function conf_init() {
+  // Do not use the placeholder url from default.settings.php.
+  if (isset($db_url)) {
+    if ($db_url == 'mysql://username:password@localhost/databasename') {
+      $db_url = '';
+    }
+  }
+

Manually apply this patch to your master branch, checking in the modified bootstrap.inc include file:

$ patch -p1 < bootstrap.inc.patch
$ git commit -a -m "custom bootstrap patch"

Now, upgrade your website to Drupal 6.3. First, update the version in
your "drupal-core" branch from CVS. You update the "drupal-core" branch
so that CVS won't run into any conflicts. If you instead update your
"master" branch, CVS will corrupt the bootstrap.inc include file due to
our patch. We will later rely on Git to more intelligently help us
resolve the merge conflict:

$ git checkout drupal-core
Switched to branch "drupal-core"
$ cvs update -r DRUPAL-6-3

With your "drupal-core" branch updated to Drupal 6.3, commit the
updated files to your Git repository and tag them for possible future
reference:

$ git commit -a -m "Drupal 6.3"
$ git tag DRUPAL-6-3

Now you use this updated "drupal-core" branch to upgrade your website.
You will perform the merge in a temporary branch, though it would be
just as easy to perform the merge in the "master" branch. Either way,
Git provides easy mechanisms for undoing a merge if you make a mistake
our change your mind. In this case, you should test the merge in your
temporary branch before merging it into your official "master" branch:

$ git checkout master -b temporary
Switched to branch "temporary"
$ git merge drupal-core
Auto-merged includes/bootstrap.inc
CONFLICT (content): Merge conflict in includes/bootstrap.inc
Automatic merge failed; fix conflicts and then commit the result.

Git was able to automatically merge all files except for
includes/bootstrap.inc, which failed because of the custom changes
which modified the file in the exact same lines as Drupal 6.3. You will
quickly resolve this conflict using a graphical tool, verify that the
changes look sane, then check in all the merged results:

$ git mergetool
$ git diff --color master includes/bootstrap.inc
$ git commit -m "Upgrade to Drupal 6.3"

If you make a mistake during the merge, you can easily and safely
delete the temporary branch ("git branch -d temporary"), recreate it,
and try the above steps again, fixing your mistake. Once you've
confirmed that the website is working as expected, merge the temporary
branch into your master branch, and delete the temporary branch:

$ git checkout master
Switched to branch "master"
$ git merge temporary
$ git branch -d temporary
Deleted branch temporary.

Managing Contributed Themes And Modules With Git

Managing contributed themes and modules is best done by using
another branch. It is helpful to create one branch for each remote
source for the files you use to build your website. You can use a
single branch for all your contributed modules and themes, as they all
come from Drupal's "contrib" CVS repository.

In this example, we'll add the devel module to our website, checking it out of CVS:

$ git checkout master -b drupal-contrib
Switched to branch "drupal-contrib"
$ cd sites/default
$ mkdir modules
$ cvs -z6 \
 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib \
 checkout -r DRUPAL-6--1-10 -d modules/devel contributions/modules/devel
$ git add modules
$ git commit -a -m "Devel module version 6.1.10"

You can repeat this process to check out additional contributed module
or themes from CVS, checking them in to your local 'drupal-contrib' Git
branch. Once you've checked out all the modules and themes you need for
your website, merge them into your master branch:

$ git checkout master
Switched to branch "drupal-contrib"
$ git merge drupal-contrib

When you need to upgrade any of your contributed modules or themes,
follow the same steps described above for updating Drupal core. Switch
to the 'drupal-contrib' branch to checkout the updated version from
CVS. Commit the changes to your "drupal-contrib" branch, then use Git
to merge the changed files into your "master" branch.

The important thing is to keep the files in your 'drupal-contrib'
branch unmodified so that CVS can update the files without any
conflicts. If you need to modify any of the contributed modules or
themes, do it in the 'master' branch, or in another development branch.
If your changes conflict with future upgrades, you can easily resolve
these conflicts in the same way that you did in our previous example
with a conflict in bootstrap.inc.

Managing And Upgrading An Existing Website With Git

The previous examples assumed that you were creating a new website
with Drupal. In this example, we will show you how Git can also help
you to manage and upgrade an existing website, even if you've not been
using revision control up to this point.

The first step is to create a new Git repository within your website
directory, and to add your existing website files to this new
repository. This first step is identical to the example provided in the
previous section for backup up your website files:

$ cd /var/www/html
$ git init
Initialized empty Git repository in .git/
$ git add .
$ git commit -a -m "Initial commit."

When you're ready to upgrade your website, checkout the version of
Drupal that you wish to upgrade your website to, creating a new Git
repository with this new version of Drupal. In this example, you'll
upgrade your website to Drupal 6.3:

$ cvs -z6 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal \
  co -r DRUPAL-6-3 drupal
$ cd drupal
$ git init
Initialized empty Git repository in .git/
$ git add .
$ git commit -a -m "Drupal 6.3"

In previous examples, you've always kept all your files in different
branches of the same Git repository. In this example, you take
advantage of Git's distributed design to merge code from two different
repositories. To upgrade your website to Drupal 6.3, switch back to
your website repository and create a "drupal-core" branch. Now, "pull"
the updated version of Drupal from the second repository you just
created. Finally, merge the "drupal-core" branch into your "master"
branch and manually resolve any conflicts that Git is unable to
automatically merge:

$ cd ../html
$ git checkout -b drupal-core
$ git pull ../drupal
$ git mergetool
$ git commit -a -m "Drupal 6.3, resolved conflicts."
$ git checkout master
$ git merge drupal-core

At this point, you can either continue tracking Drupal core in the
"drupal-core" branch of your website repository, or you can instead
continue tracking Drupal core in the external "drupal" repository,
deleting your local "drupal-core" branch until you need it again. There
is no technical reason to favor one solution over the other, so it is
left to you to decide which method works best for you.

Apply this same technique when you wish to upgrade contributed
themes or modules. Once again, checkout the new version of the module
and create a new repository with it. Then, merge this repository into a
new branch of your website repository. Once you are happy that the
upgrade has gone smoothly, merge the update into your "master" branch.

Finally, if multiple people are involved in the ongoing development
of your website, each developer can use Git to "clone" your repository
and implement their own custom changes. When they finish, you can then
"pull" their changes back into your repository. It is such distributed
development that Git excels at, providing you with powerful tools
allowing you to pick and choose which changes to merge from another
repository, and to undo commits if they later prove to be problematic.
There is much documentation found online to help you master Git,
thereby greatly increasing your productivity.

Tracking Database Schema Changes

As your website evolves, you will find that you sometimes need to
update your database schema. Fortunately, Drupal provides a method of
tracking and automating such schema changes. When developing custom
modules for Drupal, you can define various "hooks" in the .install
file. For example, the _install hook is called when your custom module
is first enabled, and should be used to create custom database tables.
If you need to modify your schema in the future, you define an
_update_N hook in your module's .install file, then run update.php on
your website. Drupal will track which updates have already been
installed on your website, and will alert you as new updates come
available. As you update your .install file, be sure to commit your
changes to your Git repository. Read more about .install files,
_install hooks and _update_N hooks in the official Drupal API
documentation at the following URLs:

http://api.drupal.org/api/function/hook_install/6

http://api.drupal.org/api/function/hook_update_N/6

Staging configuration changes from development servers

It is best to test all configuration changes on your development
server, before attempting to make changes on your production server.
Once you have made all your desired changes, you have to decide the
best method for duplicating these changes in production. Many tediously
take notes as they make changes on their development servers, then
manually repeat the same steps on their production servers.

It is much preferable if you can automate some of this process,
allowing you to test for consistency and to track all configuration
changes in the database. What follows is a recipe for partially
automating and tracking this process using git. It does require a solid
knowledge of SQL.

To begin, first configure an exact copy of your production website
on a development server by restoring an up-to-date backup. Do not
attempt to work from an outdated backup or the following steps may have
unexpected results.

Next, create an empty sub-directory and capture a baseline database
backup from your new development server. This will contain the same
exact data as is in the backup you used to create this development
server, however the backup will be formatted differently as you will be
using different mysqldump options. In most cases, it will make sense to use the --no-create-info option as you will not be adding new tables or altering table definitions. It can also be very helpful to use the --skip-extended-insert option so that each change is on its own line, simplifying patch generation. Finally, the --complete-insert

option can prove helpful for generating database queries to use when
data in your database is being updated rather than simply inserted.

Once you have created your baseline snapshot with the appropriate
mysqldump options, initialize a new git repository, and commit your
database snapshot into your new empty repository:

$ mkdir snapshot
$ cd snapshot/
$ mysqldump -uUSERNAME -p --no-create-info --skip-extended-insert \
  --complete-insert DATABASE > snapshot.sql
$ git init
$ git add snapshot.sql
$ git commit -m "initial database snapshot"

Now, log in to your development website and make the necessary
configuration changes. Do not attempt to make too many changes at one
time, or it may prove too difficult to later merge these changes into
your production website.

In our example, we will visit the Site configuration section in the Drupal Administration pages and make the following changes:

On the date and time page, disable user-configurable time zones

On the site information page, configure a new slogan.

You're now ready to extract the changes you've made on your
development server, preparing to push them into production. First, get
a new database snapshot using the same identical mysqldump flags that
you used previously. Now, utilize a handy git feature to only commit
the relevant changes into your temporary development repository.
Finally, use git to generate a patch from this commit.

To commit only the relevant changes, you will use the git add --patch
command. It will logically split your changes by table, referring to
each table as a "hunk", asking you for each whether or not you wish to
"stage this hunk". In this example, you will answer "n" to all changes
affecting the cache* tables, the sessions table, and the watchdog table. You will only answer "y" to the changes affecting the variable

table. You do not stage the changes for the many cache tables because
these will be automatically generated on your production server as
needed. You do not stage the changes to the sessions or users tables,
because these are specific to your current session on your development
server and unrelated to your configuration changes. You also do not
stage the changes to the watchdog table as this is only internal
logging information and not relevant to updating the configuration of
your website:

$ mysqldump -uUSERNAME -p --no-create-info --skip-extended-insert \
  --complete-insert DATABASE > snapshot.sql$ git add --patch snapshot.sql
$ git commit -m "example configuration changes"

You can now generate a patch from your partial commit. First, use git log
to find the previous commit against which a patch will be generated. In
our example this is the initial database snapshot with an ID of 908f027ba0077baad4b7c52ebbe986fb89b40f41. Second, call git format-patch to generate the actual patch, passing in enough unique characters of the commit ID:

$ git log
 commit 968fe8271ed7ff08fa46d789371b626b80c46ac6
 Author: Jeremy Andrews 

 Date:   Fri Aug 22 16:20:54 2008 -0700

     example configuration changes 

 commit 908f027ba0077baad4b7c52ebbe986fb89b40f41
 Author: Jeremy Andrews
 Date:   Fri Aug 22 16:06:22 2008 -0700

     initial database snapshot
$ git format-patch 908f02
0001-example-configuration-changes.patch

Next,
use this automatically generated patch file to create an appropriate
_update hook for a custom .install file. This is done by first opening
the patch file with a text editor. Reviewing the patch, note that any
pre-existing configuration options which you have updated involve two
lines in the patch, one starting with a "-", and one starting with a
"+". All lines starting with a "-" are being removed from your
database, while all lines starting with a "+" are being added to your
database. On our example website the site slogan was previously
defined, so in our patch file we see a "-" line removing the old
slogan, and a "+" line adding the new slogan:

-INSERT INTO `variable` (`name`, `value`) VALUES \
  ('site_slogan','s:18:\"This is my slogan.\";');
+INSERT INTO `variable` (`name`, `value`) VALUES \
  ('site_slogan','s:26:\"This is my updated slogan.\";');

Using our knowledge of SQL, we manually convert this into a single update as follows:

UPDATE `variable` SET `value` = \
  's:26:\"This is my updated slogan.\";' WHERE `name` = 'site_slogan';

Our other change was to disable user configurable time zones, and as
this had never been updated on our website before we only find a single
relevant line in our patch starting with a "+", and none starting with
a "-":

+INSERT INTO `variable` (`name`, `value`) VALUES \
  ('configurable_timezones','s:1:\"0\";');

Finally, we use the queries we collected above and create a new
_update_N hook in a custom module used for pushing database updates to
our website. If you are not already using a custom module, you can
create an empty custom.module file, a proper custom.info file, and a
custom.install file. In the custom.install file, you will add a new
_update_N hook. Refer to the links provided at the beginning of this
subsection for a more in depth description of how these Drupal hooks
work. In our example, we add the following function to our
custom.install file. In your own usage, be sure to increment N in your
new _update_N hook:

function custom_update_6001() {
  $ret = array();
  $ret[] = update_sql("UPDATE `variable` SET `value` = \
    's:26:\"This is my updated slogan.\";' WHERE `name` \
    = 'site_slogan';");
  $ret[] = update_sql("INSERT INTO `variable` (`name`, \
    `value`) VALUES ('configurable_timezones','s:1:\"0\";');");
  return $ret;
}

You should commit the changes you have made to your custom module files
into your website source code repository. You can then push these
changes to your production website as explained below. Note that it is
highly recommended that you first push these changes to a staging
server, testing the update process and verifying that you have properly
written your update hook. To have your actual updates performed on your
staging and production servers, you will need to point your browser to
yoursite/update.sql and follow the directions.

The same principles that have been documented in this simplistic
example can be applied to more complex configuration changes. You are
not limited to just calling UPDATE and INSERT in your _update_N hooks,
you can also call DELETE, CREATE, ALTER, and any other appropriate SQL
command. When making more complex configuration changes, you should
dump your database regularly without actually committing each
individual change. After each database dump, you can use git diff --color
to view how your changes are affecting the database. The more you do
this, and the more familiar you get with how Drupal works under the
hood, the quicker the process will become.

There has been much discussion about how these processes can be
further automated in Drupal 7 and beyond. There are also existing
projects attempting to further automate the process for earlier
versions of Drupal, such as the Database Scripts project found at http://drupal.org/project/dbscripts.

Pushing Changes To Production

In previous examples, you've learned how you can use Git to manage
your website, simplifying many processes including upgrading to a newer
release of Drupal, and making configuration changes to your website.
This final section discusses using Git to push changes to your
production server. In an earlier example dealing with backups, we
configured a Git repository on a backup server with the IP address
10.10.10.10. We will use this previously configured backup server again
in this example.

At this point, you have updated your website to Drupal 6.3, and
merged all of your changes into the master branch of your Git
repository. You have tested all your changes, and are now ready to push
them to your live web server. You should first tag your release for
easy reference in the future. As you're working in a different
repository than you used in the backup example, you need to configure
the remote backup server. Then, push your current code to the remote
server:

$ git tag RELEASE-2008-07-002
$ git remote add backup-server user@10.10.10.10:backup.git
$ git push backup-server master

This process is greatly simplified if only one person (or on Git
repository) is pushing changes to the backup server. This one person
can be responsible for merging together everyone else's work, and
testing all the changes. Once the code is pushed to the backup server,
it is now available to be pulled to your website. When using this work
flow, it's important that you don't edit files directly on your web
server, but instead that you always pull changes to files via your Git
repository. On the production web server:

$ git pull user@10.10.10.10:backup.git master

If for any reason you want to revert to an earlier version of your
website, this can be easily done using tags. We'll assume that your
previous release was tagged as 'RELEASE-2008-07-001'. We use the
"--hard" option

$ git reset --hard RELEASE-2008-07-001

You can now fix whatever problems you ran into by making changes to
your local repository. Once things are fixed and tested, add a new tag
and again push your changes to the backup server. Finally, pull these
changes to your production server.

With this strategy, you always know exactly what version of your
website is currently being used in production. It also becomes possible
to quickly back out any changes if. Finally, if you have multiple web
servers, it is now trivial to keep them all in sync by checking out
files from the same remote Git repository.

Chapter 2: Drupal Infrastructure

This
chapter will provide an overview of what is coming up later in the
book. It will talk about cheap $5/month web hosts, versus slightly more
powerful Virtual Private Servers, versus dedicated servers and server
farms. It will collect together network diagrams for the various
configurations, and point to later chapters where the various features
are more fully explained.

Bargain Basement Hosting
1. Advantages
2. Squeezing Water From A Rock
3. Development and Testing
4. Outgrowing Your Host
5. Diagram

Virtual Private Servers
1. Advantages
2. What Is Virtualization?
3. Competing For Resources
4. Outgrowing Your Host
5. Diagram

Multiple Installations versus Multi-site Installations
1. Advantages
2. Security Considerations
3. Diagrams

Dedicated Hosting
1. Single Server
2. Multiple Servers
3. Sharing Files And File Systems
4. Load Balancers
5. High Availability
6. Scaling Up vs. Scaling Out
7. Caching
8. Network Diagrams

Chapter 3: Performance Configuration

This
chapter introduces Drupal's built-in performance features. It explains
how Drupal's built-in page cache works, and details how it can be
configured. The chapter also discusses Drupal's built-in CSS and JS
aggregation and compression. The importance of regularly purging
Drupal's logs will be discussed. And finally, the chapter will explore
Drupal's throttle module.

Section 1: Performance Configuration

Page Cache

There are many things you can do to improve the performance and
scalability of a Drupal powered website. Before adding or upgrading
servers, applying performance oriented patches, or any of the many
other topics of varying complexity that will be discussed in this book,
you should first enable all of Drupal's relevant built-in performance
options. Find Drupal's performance configuration options by navigating
to the Performance page in the Site Configuration section of your website's administration pages.

When the page cache is enabled, Drupal will save a fully rendered copy of each page accessed by anonymous visitors in the cache_page
database table. When the same page is subsequently visited by the same
or another anonymous user, the pre-rendered, cached copy is quickly and
efficiently served directly out of the cache_page table. As
most public web pages see significantly more anonymous traffic than
logged in traffic, enabling the page cache generally results in a very
significant performance improvement.

Drupal's page cache only caches pages accessed by anonymous visitors utilizing the HTTP GET method.

Caching Mode

When enabling Drupal's page cache, you can select normal mode or aggressive

mode. You can also completely disable the page cache. It should be
noted that the page cache is not Drupal's only cache. Disabling the
page cache does not affect Drupal's other caches, such as its menu
cache, form cache, or filter cache. Of all of Drupal's caches, the page
cache is one of only two caches that can be manually disabled. The
other is the block cache, discussed below.

The different cache levels are defined as constants in the bootstrap.inc
include file. Though CACHE_DISABLED, CACHE_NORMAL, and CACHE_AGGRESSIVE
are all defined, only CACHE_DISABLED and CACHE_AGGRESSIVE are directly
referenced in the core Drupal code. This is because whether you have
normal caching or aggressive caching enabled, the same anonymous page
content is cached. We will discuss the differences between these two
caching modes more thoroughly below, for now simply noting that when in
aggressive caching mode, Drupal does not call the _boot() and exit() hooks in any modules.

When page caching is enabled (normal or aggressive), the first time
a page is generated for an anonymous visitor the resulting output is
stored in the page_cache database table. This is the result of the last line of index.php, where there is a call to the function drupal_page_footer() in common.inc. This function calls page_set_cache()

in the same file, which verifies that the current page is being served
to an anonymous visitor using the HTTP GET method, and that there
haven't been any Drupal messages set in the current session. If these
three conditions are all true, the function calls PHP's built in ob_get_contents()
function to retrieve from PHP's buffers the page that Drupal has
generated. This output is optionally compressed, as described below,
then the function calls PHP's built in ob_end_flush() function which tells PHP to flush its page buffer and send the generated page to the remote web browser. Finally, page_set_cache() calls the Drupal function, cache_set(), storing the anonymous page in the cache_page database table.

The next time this page is visited by the same or another anonymous
visitor, the cached copy that was previously generated and stored is
retrieved directly from the cache_page database table,
bypassing the need to regenerate the page. Logic for actually
retrieving a cached page from the database lives in Drupal's bootstrap.inc file. The process starts in the first couple of lines of index.php, with a call to the drupal_bootstrap() function defined in bootstrap.inc.
The bootstrap function defines a series of phases which are called one
by one. The first phase initializes Drupal's configuration array,
reading settings.php. In the second phase it's possible to
define custom caching functions, making it possible to do things like
using memcached for caching Drupal data. The third phase initializes
the database. The fourth phase loads the session data, typically from
the database. And finally, the fifth phase makes a call to page_get_cache() which loads the cached page from the cache_page database table. If in normal caching mode, the fifth phase executes the _boot hook in all modules defining it, then displays the page to the anonymous visitor. Finally, the _exit hook is called in all modules that define it, and Drupal exits.

Though the above logic may already sound complicated, it all happens
very quickly, and allows Drupal to avoid loading and running a
significant amount of code.

As noted above, when switching the cache from normal mode to aggressive mode, Drupal no longer calls the _boot and _exit
hooks during the fifth bootstrap phase. This has several performance
and scalability advantages. First, it means that the modules defining
these hooks do not need to be loaded into memory.

Minimum Cache Lifetime

Page Compression

Block Cache

Bandwidth Optimizations

Optimizing CSS Files

Optimizing JavaScript Files

Section 2: Drupal Logs

Watchdog Logs

The Access Log

Section 3: The Throttle Module

Background

Configuration

Modules

Blocks

Custom Integration

Why The Throttle Was Removed From Drupal 7

Chapter 4: Too Many Modules

This
chapter takes an in depth look at Drupal's modular design. It explores
the concept behind Drupal's “hooks”, using the nodeapi as an example.
It also looks at Drupal's menu system. The chapter then puts all of
this together by tracing what happens when you enable a single Drupal
module. Finally, it discusses the temptation to enable hundreds of
contributed modules.

Modules and Hooks
1. Drupal modules
2. Adding Features With Hooks
3. Example: the nodeapi

Menus
1. Defining Pages

Enabling Modules
1. Memory Limits
2. .install Files
3. Drupal 7 Registry Preview
4. All You Can Eat?

Chapter 5: Caching Layer

This
chapter dives into Drupal's code, taking a look at the underlying
caching layer. It will begin with an accessible, high-level description
before it dives into the actual implementation. Finally, it will teach
module developers how to better use Drupal's built-in caching layers.

Understanding Drupal's Caching Layer
1. Overview
2. Variables
3. The Many Cache Tables

Developing With Drupal's Caching Layer
1. Drupal's Cache API
2. Caching With Custom Modules
3. Sessions

Chapter 6: The devel Module

This
chapter will take a look at the contributed devel module, explaining
its key importance in performance tuning a Drupal website. It will
discuss the many configuration options, and explain how the module can
be used to profile page loads.

More Then A Development Tool
1. Visualizing Slow Queries
2. Timing Page Creation
3. Page Elements Versus The Database

Configuration

Profiling Database Queries
1. Identifying Slow Queries
2. Identifying Duplicate Queries
3. Common Queries and What They Mean

Chapter 7: To Patch Or Not To Patch

Drupal
offers considerable performance and scalability without modifying the
code in any way. However, much more performance can be obtained by
patching the core code. This chapter weighs the pros and cons of
patching Drupal, and the impact this has on keeping up to date with
security patches and upgrading to new releases.

The Case For Patching
1. Optimal Performance
2. Community Patchsets
3. Backports
4. Hitting Modularity Limitations

The Case For Not Patching
1. Avoiding The Unknown and Under Tested
2. Keeping Up With Security Updates
3. Upgrading To New Releases

Part 2: Front End Performance

This
second section of the book begins to look at the underlying LAMP stack,
discussing how it can be optimized specifically to get the most out of
Drupal. Most of the information will be presented so it is accessible
to people without a background in system administration, though
advanced topics will also be discussed.

Chapter 8: Optimizing PHP

This
chapter will look at tuning PHP with php.ini. It will explain how to
read phpinfo(), and discuss PHP's memory footprint. It will explain how
PHP is compiled for each page, unless you enable an opcode cache. It
will then review some of the most popular opcode caches, how they work
with Drupal, and known issues and fixes.

Configuring PHP
1. What is php.ini
2. Finding php.ini
3. phpinfo()

Tuning PHP
1. Modifying php.ini
2. PHP's memory footprint
3. Disabling Unnecessary Features

Writing Good Code
1. Common Pitfalls
2. Investment vs. Return

Opcode Caches
1. Scripting Languages
2. APC
3. Xcache
4. eAccelerator
5. The White Screen Of Death

Chapter 9: Optimizing Apache

This
chapter will review how Apache can be optimized to achieve better
Drupal performance. It will discuss performance oriented Apache
configuration options. It will look at Apache modules, and will explore
the importance of minimizing Apache's memory footprint. Finally, it
will look at the various web server architectures, exploring the use of
load balancers to scale out this layer.

Configuring Apache
1. httpd.conf
2. vhosts
3. Compression

Apache Modules
1. Performance Features
2. Memory Considerations
3. Load Testing

Infrastructure Choices
1. Basement Startups: All On One Server
2. Stand Alone Web Servers
3. Multiple Servers With Load Balancers
4. Multiple Datacenters

Chapter 10: Alternatives To Apache

While
Apache is the most popular open source web server, it's not the only
open source web server. This chapter will review the advantages and
disadvantages of serving pages with the most popular alternative,
lighttpd. It will detail how to get Drupal up and running with
lighttpd, and explore configuration options for improving performance.
It will also look at using lighttpd to compliment Apache in an
infrastructure, instead of replacing it. Later, the chapter will take a
brief look at running Drupal on a newer and lesser known alternative,
Nginx. Finally, it will also briefly explore WAMP based Drupal
installs, tuning IIS on Windows.

Lighttpd
1. Feature Comparison
2. Benchmarks
3. Limitations
4. Configuration

Other Alternatives
1. Nginx
2. IIS (WAMP versus LAMP)

Chapter 11: Optimizing Your Theme

Drupal
themes are what give websites their own unique look. This chapter
explores the impact of creating overly complex designs with many
images, CSS files, and external JavaScripts. It will take a fresh look
at CSS and JavaScript aggregation, previously discussed in Chapter 2.
It will also review best practices for using images, and how the size
of images affects page load times. Finally, it will look at how to get
a complex looking design without negatively slowing down the time it
takes for each page to load.

Images
1. Multiple HTTP Requests
2. File size

CSS
1. Inline Styles
2. External CSS files
3. Caching
4. Aggregation
5. Compression

JavaScript
1. Inline
2. External
3. Caching
4. Aggregation
5. Compression

Optimizations
1. Multiple sub-domains
2. Browser Cookies
3. Far-Future Expiration
4. JQuery

Chapter 12: Content Delivery Networks

This
chapter will provide background on Content Delivery Networks, or CDNs,
explaining how they speed up page load times by bringing the contents
of a web page physically closer to the visitor. It will examine
contributed modules for quickly integrating Drupal websites with CDNs.
It will also offer some insight into the pros and cons of some of the
more powerful CDN services currently being offered.

Background
1. Concepts
2. Building a mini-CDN

Integration
1. Modules
2. Themes

CDN Lineup
1. Panther Express
2. Akamai
3. EdgeCast
4. Limelight

Chapter 13: Front-end Performance Tools

There
are several useful tools freely available for the open source FireFox
web browser. This chapter will explore how to use FireBug to take apart
and understand the elements that combine to form a web page. It will
also explore the Yslow extension, detailing how to use its extremely
useful performance reports. (I will research to see if similar tools
are available for other browsers, and if so will also cover them in
this chapter.)

FireFox
1. FireBug
2. YSlow

Part 3: Improved Caching and Searching

This
section will focus in on two key areas where Drupal can benefit from
third party integration, caching and searching. Many of these advanced
topics will require patching Drupal's core.

Chapter 14: Reverse Proxies

This
chapter explores the usage of reverse proxies, adding additional layers
of caching to your web infrastructure. It explains how this improves
both performance and scalability. It then looks at several specific
reverse proxy options, and their configurations, including Squid,
Varnish, and Apache's mod_proxy.

Reverse Proxy Architecture

The Benefits Of Reverse Proxies
1. Performance
2. Scalability
3. Layered Caching

Selecting and Configuring a Reverse Proxy
1. Squid
2. Varnish
3. Apache and mod_proxy

Chapter 15: Integrating Third Party Caches

This
chapter introduces the concept of integrating Drupal with a third-party
cache. It will examine the use, advantages and limitation of file-based
caches. It will also review the use, advantages and limitations of
using PHP opcode caches for caching other content. The chapter will
detail the many projects helping with this integration, explaining
their configuration and use. It will also review the many patches
available for improving Drupal's caching, including the advcache
project, block caching, and improved taxonomy caching. Finally, the
chapter will provide an initial introduction to memcached.

File caches
1. Boost module
2. Fastpath_fscache
3. Cache Coherency

Patching Drupal
1. Advcache
2. Caching Blocks
3. Caching Taxonomy

Memory caches
1. Opcode Caches
2. Distributed Memory Cache

Chapter 16: Caching With Memcached

This
chapter will offer an in depth look at what memcache is, and how it
improves both website performance and scalability. It will look at how
memcache achieves its performance, reviewing the difference between
hash tables and databases, and explaining how memcache can help
websites of all sizes. This chapter will explore Drupal's contributed
memcache integration module, and the patches that come with the
project. It will look at how to modify Drupal so anonymous pages can be
served directly out of RAM, and so pages for logged-in users can be
assembled from objects stored in RAM. Finally, it will look at the
areas in Drupal that most benefit from memcached integration.

Memcache Background
1. LiveJournal.com
2. Hash Table Lookups

Infrastructure Design
1. Spare RAM
2. Distributed Caching
3. Failing Servers
4. Memcache Clusters

Memache Module
1. Overview
2. Base Features
3. Administration
4. Advanced Features

Beyond Core
1. Finding Good Candidates For Caching
2. Memcache Integration
3. Common Mistakes
4. AdvCache project

Chapter 17: Drupal's Search Module

This
chapter will explore how Drupal's search module works, explaining
limitations introduced by the fact that SQL was not designed as a
searching language. It will discuss how to get the best performance out
of Drupal's search module, and how to know when it's time to look
consider other alternatives. This chapter will mostly look at search in
Drupal 6, but will take a brief look at why an improved search API is
likely to be one of the killer features in Drupal 7.

Search Module Design
1. Background
2. Searching With SQL

Performance Bottlenecks
1. InnoDB Performance
2. When To Replace

Searching Drupal's Future
1. Search API in Drupal 7
2. Introducing Third Party Search Integration

Chapter 18: Searching With Xapian, Sphinx & Solr

Xapian
and Sphinx are two unrelated standalone search technologies written in
C++. This chapter will explain how Xapian supports real time indexing
and exposes extremely flexible APIs, while Sphinx offers lightening
fast search performance. It will also explore using the Java based Solr
search engine, discussing the steeper requirements and its flexible,
advanced feature set. This chapter will detail how each solution can be
integrated into a Drupal website, replacing or enhancing Drupal's core
search functionality.

Xapian
1. Background
2. Strengths
3. Weaknesses
4. Benchmarks
5. Integration

Sphinx
1. Background
2. Strengths
3. Weaknesses
4. Benchmarks
5. Integration

Solr
1. Background
2. Strengths
3. Weaknesses
4. Benchmarks
5. Integration

Part 4: Optimizing the Database Layer

The fourth section of the book will examine database administration for a Drupal powered website.

Chapter 19: Drupal's Database Abstraction Layer

This
chapter will note that Drupal is described as being “database
agnostic”, as the code strives to not depend on the underlying database
that is being used. It will review the database abstraction layer, and
will talk about the currently supported databases. It will detail how
in spite of this noble aim, MySQL is still strongly favored. It will
compare MySQL support with PostgreSQL support. Finally, it will offer a
preview of the database layer rewrite that is happening for Drupal 7,
detailing how this may finally make Drupal database agnostic.

Abstraction Layer Design
1. Abstraction Concepts
2. MySQL Support
3. PostgreSQL Support

Database Abstraction in Drupal 7

Chapter 20: Choosing a Storage Engine

This
chapter will primarily compare MyISAM and InnoDB. It will look at
Drupal's history of being designed for MyISAM, and talk about some of
the Drupal-specific pitfalls with using InnoDB. It will then explain
the many advantages to using InnoDB, presenting this as currently being
the only serious option for large high traffic websites using MySQL.
This chapter will also briefly look at some of the up and coming MySQL
storage engines currently being developed.

Storage Engines
1. Concepts
2. Mix and Match

MyISAM
1. Strengths
2. Weaknesses

InnoDB
1. Strengths
2. Weaknesses

Previews
1. Falcon
2. Maria

Chapter 21: Monitoring MySQL

This
chapter will first explain the importance of monitoring your database
server. It will then present several useful tools for monitoring MySQL,
including mytop and innotop. It will also discuss MySQL's built in
reports, including SHOW FULL PROCESSLIST, SHOW GLOBAL STATUS, and SHOW
INNODB STATUS. This chapter will also discuss MySQL's various logs.

Overview
1. Why
2. How Often

Monitoring Tools
1. MySQL's Built In Reports
2. mysqlreport
3. mytop
4. innotop
5. Cacti
6. MySQL Enterprise Montitors

Logs
1. Error Logs
2. Slow Query Logs
3. No Index Logs

Chapter 22: Tuning MySQL

This
chapter will build upon what was learned in the previous chapter,
detailing how to use that knowledge to isolate and fix performance
bottlenecks. It will take a lengthy look at the mysqlreport perl
script, explaining how it summarizes many of the reports discussed in
the previous chapter, and how to use this tool to tune your server for
optimal performance. It will highlight the MySQL configuration options
that most affect Drupal performance.

Isolating Trouble Spots

Tuning With mysqlreport
1. Examples

Deploying Changes
1. The Tortoise and the Hare
2. Historical Monitoring
3. Controlled Experimentation

Chapter 23: Slow Query Log, Indexes, and Query Performance

This
chapter will take a closer look at MySQL queries. It will examine the
mysqlsla perl script, detailing how it is used to quickly track down
the database queries that are wasting the greatest amount of resources.
It will then explain how to determine why a query is performing poorly.
It discuss how some queries can be optimized by adding indexes, while
also looking at the impact of adding too many indexes to your tables.
It will offer an in depth look at how MySQL indexes work, comparing
indexes in MyISAM versus InnoDB. It will also review when to use
multiple simple queries instead of complex queries. Throughout these
chapter, specific Drupal examples will be provided.

Revisiting the Slow Query Log
1. configuration
2. mysqlsla
3. micro-second patches

Query Performance
1. Reviewing the devel Module
2. Understanding Indexes
3. Joining Tables

Chapter 24: MySQL Replication

This
chapter will define MySQL replication, explaining how it works and how
it can be used to improve a Drupal website's performance and
scalability. It will explore patches that have been deployed on
Drupal.org to send some database queries to a slave server, and the
rest to the master server. It will examine the idea of using
Master-Master replication, arguing against this as a means for scaling
Drupal websites. It will also briefly look at the concept of sharding,
and look at plans in Drupal 7 for potentially supporting these advanced
scalability features, reviewing the limitations imposed by Drupal's
design in Drupal 5 and Drupal 6.

Concepts
1. Configuration
2. Monitoring
3. Backups
4. Errors

Drupal and Replication
1. Mixing Storage Engines
2. Redirecting Search Queries
3. High Availability

Federated Databases
1. Sharding
2. MySQL Proxy
3. Drupal 7 Preview

Part 5: Drupal In The Cloud

This
final section will be considered a “bonus” in the first edition of this
book. This is because cloud computing is a new type of solution that
remains relatively unproven. There is a significant amount of interest
in the potential for scalability with cloud computing, so it is
important to explore this topic in these final chapters, while
acknowledging that this is a quickly changing landscape.

Chapter 25: Cloud Computing

This
chapter will offer a high level overview of what cloud computing is,
and how it potentially solves the scalability problem. It looks at the
advantages to outsourcing your underlying infrastructure, as well as
the limitations this imposes

Overview
1. Concepts
2. Pay For What You Use
3. Outsourcing
4. Scalability
5. Performance
6. Latency
7. Impermanence

Chapter 26: Running Drupal on Amazon's EC2

This
chapter will provide details on how to get Drupal up and running with
Amazon's EC2 cloud computing service. This chapter will include
screenshots, as it will be a high level guide to getting things up and
running. It will then examine performance concerns introduced by the
high latency often found in a cloud environment. It will provide
specific suggestions for improving Drupal's performance while running
in the cloud. It will also look at cloud impermanence, and how to
provide reliability through redundancy, replication, and backups.

Getting Started
1. Requirements
2. ElasticFox
3. AMIs
4. 32-bit versus 64-bit
5. Helpful Links

Drupal in the Cloud
1. Installation
2. Configuration
3. Benchmarks

Cloud Impermanence
1. Re-installation Scripts and Images
2. Rsync
3. Replication Across Zones
4. Load Balancing
5. Preserving IP Addresses
6. Automated Backups and S3

Performance
1. Dealing With Latency
2. Striping Drives
3. Layered Websites
4. Revisiting Memcache

Chapter 27: Scaling In The Clouds

This
chapter will take a high level look at the many benefits of scaling
Drupal websites in the clouds. It will explore many of the advanced
features Amazon is planning for EC2, and how this will continue to make
cloud computing a more attractive option.

Endless Scalability?

The Future

About The Author

Jeremy
Andrews has been a core Drupal contributer since early 2002, when he
was originally introduced to the project by its creator. His hobby web
page, KernelTrap.org, was the first online community to push Drupal to
scale beyond its modest beginnings and to achieve popular recognition
as a competitive CMS solution [1]. He has worked to improve Drupal's
caching layer, optimized Drupal's bootstrap process, improved core
Drupal queries, focused on improving Drupal's overall performance, and
written core modules that are still included with every single copy of
Drupal. He has given seminars on Drupal performance and scalability,
both in person [2] and over the Internet [3].

Jeremy formed Tag1 Consulting, Inc in 2007, a successful consulting
company that focuses on Drupal performance and scalability [4],
recognized by Drupal's creator as being among the very best at what
they do [5].

[1] http://luckofseven.com/vlog/episode13

[2] http://www.lullabot.com/seminar/drupal_performance_and_scalability/sunny...

[3] http://www.mysql.com/news-and-events/web-seminars/display-94.html

[4] http://tag1consulting.com/

[5] http://tag1consulting.com/blog/jeremy/Drupal_Creator_Praises_Tag1_Consul...

Back Cover

Drupal
is a very flexible, modular framework often used as a content
management system. This book is aimed at people who have already
learned the basics of Drupal administration and theming. Through hands
on instruction, this book will take you to the next level of
understanding, teaching you how to achieve optimal Drupal performance
and scalability.

Master Drupal's built in performance features, and learn to scale
Drupal through integration with third party searching and caching
tools. Gain a greater understanding of the underlying LAMP stack on
which Drupal runs with useful recipes and tips for monitoring and
tuning Linux, Apache, MySQL and PHP. Learn performance secrets from
other popular websites that continue to push Drupal to new levels,
gaining insights from problems they've experienced and how they solved
them.

Whether your Drupal-powered website has outgrown its current
infrastructure, you want to be prepared for future growth, or you want
to understand what Drupal is capable of before you commit to using it
for your website, this book will be your guide.

Pages

Sunday, January 23, 2011

drupal add tabs to node use cck

drupal add tabs to node use cck