performance and scalability goals. It helps you to identify what you
need to accomplish, showing how to set concrete and attainable goals.
are available to aide in this effort. It stresses the importance of
making regular backups, and of testing backups before changes are made.
deploying tested changes onto production servers.
Section 2: Measuring Progress
Setting A Baseline
It is a common mistake to make performance oriented modifications to
a website before measuring existing site performance. By doing this, it
can then prove impossible to determine whether your changes have
resulted in real performance improvements, or if instead they have
resulted in reduced performance. For this reason, the first thing you
should do is to set up proper monitoring of your website, and to
quantify your current performance.
What To Monitor
There are many useful measurements that can be regularly monitored
on your website, and a large number of tools that can help with taking
these measurements. Each website will some unique monitoring needs,
however there are some basic measurements that most all websites will
want to regularly monitor. The following list will give you some ideas
of what you should monitor. Thought the list is numbered, the numbers
do not indicate the level of importance of each item. Instead, we
number items so we can provide specific examples when we discuss
monitoring tools.
- The time it takes to load the front page when not logged in and with nothing cached by your browser.
- The time it takes to load the front page again, when not
logged in but when you already have the CSS, JavaScript and images
cached by your.
- The time it takes to load the front page page when you're logged in.
- The time it takes to load each of your 25 most popular types of web pages.
- The time it takes to load the above pages from different areas of the world.
- The popularity of the various types of web pages on your
website. (Example types of pages include the front page, forum pages,
RSS feeds, and custom pages generated by modules.)
- Server resource utilization such as CPU, load average, free memory, cached memory, swap, disk IO, and network traffic.
- The number of pages being served by your web server(s).
- Your database, including including the number of queries per
second, the efficiency of your query cache, how much memory is being
used, and how often temporary tables are being created.
- Database queries taking more than 1 second to complete.
- Database queries not using indexes.
- The number of searches being performed per hour.
- Memcache, including how much memory is being used, how many queries are being made per second, and your hit versus miss rates.
Monitoring Tools
You will need to use multiple tools to fully monitor your website.
Some of these tools can run on your existing infrastructure, while
other tools may need to live outside of your network.
ps
Ps displays information about all processes that are
currently running on your server. The command line utility supports a
large number of optional flags that control which processes are
displayed, and what information is displayed about each process.
Information that can be displayed includes CPU usage, memory usage, how
much CPU time the process has used, and much more. Common invocations
of
ps include
ps -ef and
ps aux. Learn more about
ps by typing
man ps on most Unix servers.
top
Top provides an automatically updating view of the
processes running on a server. It offers a quick summary of a server's
health, showing CPU utilization, as well as memory and swap usage.
Processes can be sorted in many ways, such as listing the processes
that are consuming the most CPU, or the processes that are using the
most memory.
vmstat
Vmstat offers a useful report on several areas of system
health, including the number of processes waiting to run, memory usage,
swap activity, CPU utilization, and Disk IO. A common invocation of
vmstat is
vmstat 1 10. Learn more about
vmstat by typing
man vmstat on most Unix servers.
Sar
Sar is part of the Sysstat collection of Unix performance
monitoring tools. Sar can be configured to collect regular
comprehensive snapshots of a system's health without putting any
noticeable load on the system. It is a very good idea to enable Sar on
any server that you are managing, as the historical information this
utility collects can prove invaluable when tuning a server, or when
performing damage control on a failed server.
Cacti
Cacti is a PHP front-end for RRDTool, displaying useful
graphs based on historical data collected from your servers. By default
it tracks useful system information such as CPU and memory utilization,
however it can also be integrated with programs such as MySQL, Apache,
and memcache, displaying useful historical graphs of their performance.
YSlow
YSlow is a FireFox add-on that enhances Firebug to analyze
how quickly your web pages load, highlighting areas that can be
improved. This tool is discussed in depth in chapter 13.
AWStats
AWStats is a log analyzer that can be used to create
graphical reports from web server and proxy log files. When scaling a
Drupal website, you can achieve better performance by disabling
Drupal's core
statistics module, and instead using AWStats to generate regular reports from Apache's own access logs.
devel module
The
devel module is one of a suite of development oriented
Drupal modules. Among its many useful features, it can display a list
of all queries used to build each page served by a Drupal powered
website, highlighting slow queries and queries that are run multiple
times. The
devel module is discussed in depth in chapter 6.
mysqlreport
Mysqlreport is a perl script that generates reports based
on numerous internal "status variables" maintained by MySQL. With this
script, you can quickly interpret what these variables mean, helping
you to tune your server for better performance.
Mysqlreport is discussed in depth in chapter 22.
mysqlsla
Mysqlsla, the MySQL Statement Log Analyzer, is a perl
script that helps you analyze MySQL logs. This script will be discussed
in depth in chapter 23, detailing how it can be used to review MySQL's
slow query logs.
mytop
Mytop is a useful tool for monitoring a MySQL database from
the command line. It offers a summary of database threads in a format
similar to how
top lists running server processes.
innotop
Innotop was originally written to monitor MySQL's InnoDB
storage engine, but it has long since evolved into a very powerful tool
for monitoring all aspects of MySQL. Inspired by
mytop, it takes MySQL monitoring to a new level.
MySQL Enterprise Monitor
The MySQL Enterprise Monitor is a commercial offering by Sun
Microsystems for monitoring one or more MySQL servers. The
comprehensive tool provides useful charts and graphs, makes tuning
suggestions, and can send alerts when your MySQL servers need attention.
Online Services
There are many online services that can help you with monitoring
your website. It is beyond the scope of this book to list and review
all of these services, but popular examples include Google Analytics,
IndexTools, ClickTracks and Omniture. Other online services can help
you to understand how quickly your web pages are loading from various
locations around the world, including Keynote, Webmetrics, Alert Bot,
and host-tracker.com.
Section 3: Backups
Why To Backup
Hopefully you already understand the general importance of
maintaining regular backups. For example, if a server fails and all
data on that server is lost, you can create a new server just like the
old by restoring a backup. If someone runs a bad query and accidentally
deletes data from your database, you can restore the lost data from a
backup. If you make a change to your website and later find that it was
a buggy change, you can roll back to the previous version of your
website from a backup. If you're setting up multiple web servers, you
can build the second server from a backup of the first. When you need
to test changes before deploying them on a live website, you can create
a copy of your actual website on a development server by restoring a
backup.
What To Backup
Generally speaking, it is important to back up anything that you
can't afford to lose and you can't easily recreate. For example, you
will certainly want to make regular backups of your database. If you
have written custom themes and modules, they too should be backed up.
If you written custom patches for Drupal unique to your website, back
them up. Any customized configuration files on your servers should also
be backed up. If your users upload files such as pictures or sounds,
this data should also be backed up.
Backups are an inexpensive insurance policy for when things go
wrong, as well as a useful tool for duplicating servers. When backups
are combined with a revision control system, they can also be useful
for reviewing changes over time, and for understanding how changes have
affected your website. Often times data loss is not immediately
detected, in which case it is important to have multiple copies of
backups.
The following list offers a suggestion of data that you should
consider backing up. When deciding what from the following list you
will be backing up, ask yourself, "what happens if I lose this data?"
Data to include in your backups
- Database
- Database configuration file(s)
- Web server configuration file(s)
- PHP configuration file(s)
- User uploaded content
- Custom modules and themes
- Custom patches
What You May Not Want To Backup
While it is possible to back up your entire server, including the
underlying operating system, this is often not necessary. The
underlying operating system can be re-installed on a new server with
minimal fuss. Then, the various customized configuration changes can be
restored from backups. Furthermore, backing up your entire server will
require significantly more storage space. This becomes more and more of
a concern as you add additional servers to your infrastructure.
Finally, a backup one server may not easily restore to another server
if it has different hardware, such as different network cards or a hard
drive of another size.
When backing up your database tables, it is possible to not back up
up certain tables. For example, you don't have to back up Drupal 6's
four search tables as they can be regenerated if they are lost. The
many cache tables also do not have to be backed up. As the watchdog and
access log tables are already automatically flushed after a certain
amount of time, they are also good candidates for tables to skip if
trying to minimize the size of your backups. If you decide to skip
certain tables when making your backups, be aware that this can
complicate the restoration process. If you are building a new server
from backups, in addition to restoring your backup you will also have
to manually create any tables that weren't included in your backup.
Redundancy vs. Backups
You may have set up redundant systems, and expect this to take the
place of backups. For example, you may two databases with one
replicating to the other. Or, your data may be stored on a high end
RAID system, mirrored onto multiple physical drives. However, remember
that you're not only trying to protect yourself from system failures.
One of the most common reasons for data loss is human error. If you
accidentally run a query that deletes half your users, this errant
query will run on your database slave as well and delete your users in
both places. Or, if you accidentally delete a directory containing
user-contributed content, again this change will also be made on the
mirrored drives. For this reason, it's important to not assume that
redundancy replaces the need for regular backups.
When To Backup
A single backup of the above data from all your servers is a good
start. But most websites are constantly changing, with new content
being posted, old content being updated, and new users signing up all
the time. Any changes made between the time of your last backup and
when something goes wrong will be lost. Thus, it is important to make
regular backups.
In the first section of this chapter one of the discussed goals
asked you to define how much data you can afford to lose. Can you
afford to lose an hour of data? Can you afford to lose 24 hours of
data? Can you afford to lose a week of data? Obviously you would prefer
to not have any lost data, but at the end of the day it comes down to a
question of practicality and budget. Set realistic goals for yourself,
and then figure out how you can meet those goals. If you can afford to
lose a week of data, obviously your backup strategy can be much simpler
than someone who can't afford to lose more than an hour of data.
Also note that different types of data may change with different
frequency. For example, your database is likely to be constantly
changing, while your custom themes and modules are rarely changing.
Thus, different data can be backed up at a different frequency. It'
Backup Schedules
Now that you've defined how much data you can afford to lose in the
event of a catastrophic failure, it's time to set up a regular backup
schedule that meets your requirements. Your backup schedule needs to
take into account two significant questions:
- How often does the backed up data change?
- How much data can you afford to lose?
If the data being backed up never or very rarely changes, you can
update your backup each time you make a change. If your data changes
all the time, then you'll instead need to automate regular backups that
happen at least as frequently as your needs dictate. For example, if
you can only afford to lose 6 hours of data should your database fail,
set up your backup scripts to backup your database once every 6 hours.
Examples
Tracking Multiple Text Database Backups With Git
The following script is a simple yet powerful example of how you
could efficiently store multiple backups of your database within a
revision control system. In this example, we are using 'git', however
you could easily replace git with your favorite source control system.
Note that git is designed for storing lots of small files, not for
storing one large file, so it is may not be the best choice of tools
for maintaining backups of a growing database. Our use of the
"--single-transaction" flag for mysqldump assumes that you are using
MySQL's InnoDB storage engine.
To use this script, you should edit the configuration section as
appropriate for your system. You then need to create an empty directory
at the path defined by the script's BACKUP_DIRECTORY variable. Next,
create a new git repository by moving into this directory and typing
'git init'. With the repository initialized, manually run the mysqldump
command to generate the first copy of your database. Add this text
backup to the repository using 'git add', and check it in using 'git
commit -a'.
The steps described in the previous paragraph could have been
automated, however my goal was to keep the script as simple as
possible. Furthermore, you may end up deciding to use a different
revision control system than 'git', in which case you will need to set
things up differently.
The actual backup script follows:
#!/bin/sh
# Configuration:
BACKUP_DIRECTORY="/var/backup/mysql.git"
DATABASE="database_name"
DATABASE_USERNAME="username"
DATABASE_PASSWORD="password"
# End of configuration.
export PATH="/usr/bin:/usr/local/bin:$PATH"
cd $BACKUP_DIRECTORY
START=`date +'%m-%d-%Y %H:%M:%S'`
mysqldump -u$DATABASE_USERNAME -p$DATABASE_PASSWORD \
--single-transaction --add-drop-table \
$DATABASE > $DATABASE.sql
END=`date +'%m-%d-%Y %H:%M:%S'`
CHANGES=`git diff --stat`
SIZE=`ls -lh $DATABASE.sql | awk '{print $5}'`
/usr/bin/git-commit -v -m "Started: $START
Finished: $END
File size: $SIZE
$CHANGES" -v $DATABASE.dump
Each time you run the above script, it will generate a current backup
of your database and check in the difference between this backup and
the previous backup. The script should be called from a regular
cronjob, causing your database to be backed up every few hours or every
day, depending on your needs.
Using 'git log', you can review the versions of your database that
have been checked in, and you can see the information that is logged
each time you make a backup:
Author: Jeremy Andrews
Date: Sun Jul 20 15:14:09 2008 -0400
Started: 07-20-2008 15:13:01
Finished: 07-20-2008 15:14:02
File size: 14M
database.sql | 44 ++++++++++++++++++++++----------------------
1 files changed, 22 insertions(+), 22 deletions(-)
There are many simple improvements you could make to increase the usefulness of this script, including:
- Occasionally run 'git gc' to compress all the older copies of your database stored in your git repository.
- Replace 'git' with your favorite source control system.
- Push a copy of your repository to a remote server, so the
backups don't live only on the same server as your database. It is
important that you can access the backups if your database server
fails.
- Generate an email each time the backup is completed, sending a brief status report.
- Redirect stdout and stderr to a log file so you can see any errors that happen when running the script from crontab.
- Minimize the size of the changes between each backup by making
two backups of your database. One backup should only include your table
definition using the --no-data option to mysqldump, and one backup
should only include your data using the --no-create-info option.
Backing Up Your Website With Git
Git provides a very simple method for backing up your website. It
offers much more than a backup, but that's all we're concerned about in
this section. In preparation, first create an empty Git repository on
your backup server. If you have multiple servers or web directories you
wish to back up, you should create an empty Git repository for each. By
using the "--bare" flag, we reduce the size of our backup as it won't
maintain an uncompressed copy of the latest version of the files:
$ mkdir backup.git
$ cd backup.git
$ git --bare init
Initialized empty Git repository in /home/user/backup.git/
Next, on the web server that you are backing up, "initialize" a
repository in your web directory. Add your website files to this
repository, and then "push" it to the empty repository on the backup
server. It is safe to initialize a Git repository on your live server
and check files into it as this does not modify your files in any way.
Instead, it creates a ".git" subdirectory where the local repository is
stored. In this example, we'll assume that your backup server has an IP
address of 10.10.10.10:
$ cd /var/www/html
$ git init
Initialized empty Git repository in .git/
$ git add .
$ git commit -a -m "Backup all files in website."
$ git remote add backup-server user@10.10.10.10:backup.git
$ git push backup-server master
Now, as you add new files to your web server, add them to your git
repository by running "git add". Commit these new files and any changed
files by running "git commit -a". And finally, push these updates to
the backup server by running "git push backup-server master".
You will learn more about using Git in the next section of this chapter.
Testing Backups
Simply making backups of your data is only half of the job. It's
also critical that you regularly validate your backups, insuring that
they are not corrupt and that they contain everything you need to
rebuild your websites.
One way to test your backups is to restore them to your development
server, building an up-to-date development environment. Doing this one
time is not enough, as though this does validate your general backup
strategy, it doesn't regularly validate the integrity of each backup.
You should instead update your development environment from backups on
a regular schedule, such as once a week. The process can be automated
through simple scripts.
Section 4: Staging Changes
Testing Changes
As you scale your website and its popularity grows, it becomes
increasingly important to properly test all changes prior to updating
your production website. At minimum, you should have a separate testing
server which duplicates your production environment. Your development
environment should be using the same exact operating system as your
production servers, with the same extensions installed and updates
applied. If you instead use for example CentOS on your production
servers, and Fedora on your development servers, you may find that code
which works perfectly on your development server fails in production
due to issues such as to failed dependencies.
The more similar you make your development environment to your
production environment, the more valid your testing will be. That said,
very often while your production environment may be comprised of
numerous servers, your development environment may be limited to a
single server. In this case, you should do what you can to simulate
your production infrastructure.
In this final section of chapter one, I offer best practices for
testing changes and pushing these changes to your production servers.
Revision Control
It's often tempting to maintain your website one file at a time,
manually copying individual files into place. Usually this involves
first making a backup copy of the file you wish to change, over time
resulting in dozens of old backups cluttering the directories of your
production servers. Often this can also involve editing files directly
on a production server and hoping for the best. Unfortunately even the
most trivial seeming change can have unforeseen consequences. It can
also quickly become confusing which files have been updated, and which
files are still an older version. This can result in bug fixes never
actually making it into production, or new and bigger bugs being
created while trying to fix old bugs.
A simple yet extremely effective solution to this problem is to
utilize a revision control system. Revision control is one of many
phrases used to describe the management of multiple versions of the
same information. Other popular phrases often used to describe this
functionality include version control, source control, and source code
management. There are a great many number of both open source and
proprietary revision control tools available to you. Popular examples
include CVS, Subversion (SVN), Perforce, and Git.
For the example contained in this book, we have chosen to use Git, a
fast and flexible distributed source control system originally designed
by Linus Torvalds for managing the Linux kernel. Git was selected
because of its distributed design, its growing popularity, its
flexibility, its applicability to what we are trying to solve, and its
free availability. However, this does not mean that you also need to
use Git to manage your website. It is possible to apply the tips and
best practices we explain here to your favorite source control system.
Tracking File Changes
The basic steps required for managing files with Git were briefly
discussed in the previous section on backups. In this section, we build
upon our previous examples, showing you how Git can offer you much more
than a versioned backup of your website.
Managing Drupal Core With Git
In this first example you will learn how you can manage a website
built from Drupal's core files. Start with an older version of Drupal,
which you will manually patch. You will then use Git to easily upgrade
your website to a newer version of Drupal. Start by checking out Drupal
6.2 out of CVS:
$ cvs -z6 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal \
co -d html -r DRUPAL-6-2 drupal
Next, create a git repository in your website's directory, and in it
store all the files you checked out from CVS, including the CVS files
themselves. Then create a "drupal" branch where you'll keep an
unmodified copy of Drupal for use in upgrading the site later. Finally,
tag your release with the same tag found in CVS for simplified
reference, and revert back to the "master" branch:
$ cd html/
$ git init
Initialized empty Git repository in .git/
$ git add .
$ git commit -a -m "Drupal 6.2"
$ git checkout -b drupal-core
Switched to a new branch "drupal-core"
$ git tag DRUPAL-6-2
$ git checkout master
Switched to branch "master"
Now use your web browser to configure your Drupal installation, which
will create and configure settings.php. Once completed, add your new
settings.php file to the master branch of your Git repository.
Throughout these examples, you will be using many branches for merging
and website development, but the "master" branch will always contain
your actual website:
$ git add sites/default/settings.php
$ git commit -a -m "Add configured settings.php"
At this point you are ready to patch your new Drupal website. For this
example, you will apply a very simple patch to bootstrap.inc that is
intentionally a slightly different version of a change made to the file
in Drupal 6.3. You do this to cause a conflict when you upgrade the
website to Drupal 6.3:
$ cat bootstrap.inc.patch
index 44cd0d7..d45cf5d 100644
--- a/includes/bootstrap.inc
+++ b/includes/bootstrap.inc
@@ -283,0 +284,7 @@ function conf_init() {
+ // Do not use the placeholder url from default.settings.php.
+ if (isset($db_url)) {
+ if ($db_url == 'mysql://username:password@localhost/databasename') {
+ $db_url = '';
+ }
+ }
+
Manually apply this patch to your master branch, checking in the modified bootstrap.inc include file:
$ patch -p1 < bootstrap.inc.patch
$ git commit -a -m "custom bootstrap patch"
Now, upgrade your website to Drupal 6.3. First, update the version in
your "drupal-core" branch from CVS. You update the "drupal-core" branch
so that CVS won't run into any conflicts. If you instead update your
"master" branch, CVS will corrupt the bootstrap.inc include file due to
our patch. We will later rely on Git to more intelligently help us
resolve the merge conflict:
$ git checkout drupal-core
Switched to branch "drupal-core"
$ cvs update -r DRUPAL-6-3
With your "drupal-core" branch updated to Drupal 6.3, commit the
updated files to your Git repository and tag them for possible future
reference:
$ git commit -a -m "Drupal 6.3"
$ git tag DRUPAL-6-3
Now you use this updated "drupal-core" branch to upgrade your website.
You will perform the merge in a temporary branch, though it would be
just as easy to perform the merge in the "master" branch. Either way,
Git provides easy mechanisms for undoing a merge if you make a mistake
our change your mind. In this case, you should test the merge in your
temporary branch before merging it into your official "master" branch:
$ git checkout master -b temporary
Switched to branch "temporary"
$ git merge drupal-core
Auto-merged includes/bootstrap.inc
CONFLICT (content): Merge conflict in includes/bootstrap.inc
Automatic merge failed; fix conflicts and then commit the result.
Git was able to automatically merge all files except for
includes/bootstrap.inc, which failed because of the custom changes
which modified the file in the exact same lines as Drupal 6.3. You will
quickly resolve this conflict using a graphical tool, verify that the
changes look sane, then check in all the merged results:
$ git mergetool
$ git diff --color master includes/bootstrap.inc
$ git commit -m "Upgrade to Drupal 6.3"
If you make a mistake during the merge, you can easily and safely
delete the temporary branch ("git branch -d temporary"), recreate it,
and try the above steps again, fixing your mistake. Once you've
confirmed that the website is working as expected, merge the temporary
branch into your master branch, and delete the temporary branch:
$ git checkout master
Switched to branch "master"
$ git merge temporary
$ git branch -d temporary
Deleted branch temporary.
Managing Contributed Themes And Modules With Git
Managing contributed themes and modules is best done by using
another branch. It is helpful to create one branch for each remote
source for the files you use to build your website. You can use a
single branch for all your contributed modules and themes, as they all
come from Drupal's "contrib" CVS repository.
In this example, we'll add the devel module to our website, checking it out of CVS:
$ git checkout master -b drupal-contrib
Switched to branch "drupal-contrib"
$ cd sites/default
$ mkdir modules
$ cvs -z6 \
-d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib \
checkout -r DRUPAL-6--1-10 -d modules/devel contributions/modules/devel
$ git add modules
$ git commit -a -m "Devel module version 6.1.10"
You can repeat this process to check out additional contributed module
or themes from CVS, checking them in to your local 'drupal-contrib' Git
branch. Once you've checked out all the modules and themes you need for
your website, merge them into your master branch:
$ git checkout master
Switched to branch "drupal-contrib"
$ git merge drupal-contrib
When you need to upgrade any of your contributed modules or themes,
follow the same steps described above for updating Drupal core. Switch
to the 'drupal-contrib' branch to checkout the updated version from
CVS. Commit the changes to your "drupal-contrib" branch, then use Git
to merge the changed files into your "master" branch.
The important thing is to keep the files in your 'drupal-contrib'
branch unmodified so that CVS can update the files without any
conflicts. If you need to modify any of the contributed modules or
themes, do it in the 'master' branch, or in another development branch.
If your changes conflict with future upgrades, you can easily resolve
these conflicts in the same way that you did in our previous example
with a conflict in bootstrap.inc.
Managing And Upgrading An Existing Website With Git
The previous examples assumed that you were creating a new website
with Drupal. In this example, we will show you how Git can also help
you to manage and upgrade an existing website, even if you've not been
using revision control up to this point.
The first step is to create a new Git repository within your website
directory, and to add your existing website files to this new
repository. This first step is identical to the example provided in the
previous section for backup up your website files:
$ cd /var/www/html
$ git init
Initialized empty Git repository in .git/
$ git add .
$ git commit -a -m "Initial commit."
When you're ready to upgrade your website, checkout the version of
Drupal that you wish to upgrade your website to, creating a new Git
repository with this new version of Drupal. In this example, you'll
upgrade your website to Drupal 6.3:
$ cvs -z6 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal \
co -r DRUPAL-6-3 drupal
$ cd drupal
$ git init
Initialized empty Git repository in .git/
$ git add .
$ git commit -a -m "Drupal 6.3"
In previous examples, you've always kept all your files in different
branches of the same Git repository. In this example, you take
advantage of Git's distributed design to merge code from two different
repositories. To upgrade your website to Drupal 6.3, switch back to
your website repository and create a "drupal-core" branch. Now, "pull"
the updated version of Drupal from the second repository you just
created. Finally, merge the "drupal-core" branch into your "master"
branch and manually resolve any conflicts that Git is unable to
automatically merge:
$ cd ../html
$ git checkout -b drupal-core
$ git pull ../drupal
$ git mergetool
$ git commit -a -m "Drupal 6.3, resolved conflicts."
$ git checkout master
$ git merge drupal-core
At this point, you can either continue tracking Drupal core in the
"drupal-core" branch of your website repository, or you can instead
continue tracking Drupal core in the external "drupal" repository,
deleting your local "drupal-core" branch until you need it again. There
is no technical reason to favor one solution over the other, so it is
left to you to decide which method works best for you.
Apply this same technique when you wish to upgrade contributed
themes or modules. Once again, checkout the new version of the module
and create a new repository with it. Then, merge this repository into a
new branch of your website repository. Once you are happy that the
upgrade has gone smoothly, merge the update into your "master" branch.
Finally, if multiple people are involved in the ongoing development
of your website, each developer can use Git to "clone" your repository
and implement their own custom changes. When they finish, you can then
"pull" their changes back into your repository. It is such distributed
development that Git excels at, providing you with powerful tools
allowing you to pick and choose which changes to merge from another
repository, and to undo commits if they later prove to be problematic.
There is much documentation found online to help you master Git,
thereby greatly increasing your productivity.
Tracking Database Schema Changes
As your website evolves, you will find that you sometimes need to
update your database schema. Fortunately, Drupal provides a method of
tracking and automating such schema changes. When developing custom
modules for Drupal, you can define various "hooks" in the .install
file. For example, the _install hook is called when your custom module
is first enabled, and should be used to create custom database tables.
If you need to modify your schema in the future, you define an
_update_N hook in your module's .install file, then run update.php on
your website. Drupal will track which updates have already been
installed on your website, and will alert you as new updates come
available. As you update your .install file, be sure to commit your
changes to your Git repository. Read more about .install files,
_install hooks and _update_N hooks in the official Drupal API
documentation at the following URLs:
Staging configuration changes from development servers
It is best to test all configuration changes on your development
server, before attempting to make changes on your production server.
Once you have made all your desired changes, you have to decide the
best method for duplicating these changes in production. Many tediously
take notes as they make changes on their development servers, then
manually repeat the same steps on their production servers.
It is much preferable if you can automate some of this process,
allowing you to test for consistency and to track all configuration
changes in the database. What follows is a recipe for partially
automating and tracking this process using git. It does require a solid
knowledge of SQL.
To begin, first configure an exact copy of your production website
on a development server by restoring an up-to-date backup. Do not
attempt to work from an outdated backup or the following steps may have
unexpected results.
Next, create an empty sub-directory and capture a baseline database
backup from your new development server. This will contain the same
exact data as is in the backup you used to create this development
server, however the backup will be formatted differently as you will be
using different
mysqldump options. In most cases, it will make sense to use the
--no-create-info option as you will not be adding new tables or altering table definitions. It can also be very helpful to use the
--skip-extended-insert option so that each change is on its own line, simplifying patch generation. Finally, the
--complete-insertoption can prove helpful for generating database queries to use when
data in your database is being updated rather than simply inserted.
Once you have created your baseline snapshot with the appropriate
mysqldump options, initialize a new git repository, and commit your
database snapshot into your new empty repository:
$ mkdir snapshot
$ cd snapshot/
$ mysqldump -uUSERNAME -p --no-create-info --skip-extended-insert \
--complete-insert DATABASE > snapshot.sql
$ git init
$ git add snapshot.sql
$ git commit -m "initial database snapshot"
Now, log in to your development website and make the necessary
configuration changes. Do not attempt to make too many changes at one
time, or it may prove too difficult to later merge these changes into
your production website.
In our example, we will visit the
Site configuration section in the Drupal Administration pages and make the following changes:
- On the date and time page, disable user-configurable time zones
- On the site information page, configure a new slogan.
You're now ready to extract the changes you've made on your
development server, preparing to push them into production. First, get
a new database snapshot using the same identical mysqldump flags that
you used previously. Now, utilize a handy git feature to only commit
the relevant changes into your temporary development repository.
Finally, use git to generate a patch from this commit.
To commit only the relevant changes, you will use the
git add --patchcommand. It will logically split your changes by table, referring to
each table as a "hunk", asking you for each whether or not you wish to
"stage this hunk". In this example, you will answer "n" to all changes
affecting the
cache* tables, the
sessions table, and the
watchdog table. You will only answer "y" to the changes affecting the
variabletable. You do not stage the changes for the many cache tables because
these will be automatically generated on your production server as
needed. You do not stage the changes to the sessions or users tables,
because these are specific to your current session on your development
server and unrelated to your configuration changes. You also do not
stage the changes to the watchdog table as this is only internal
logging information and not relevant to updating the configuration of
your website:
$ mysqldump -uUSERNAME -p --no-create-info --skip-extended-insert \
--complete-insert DATABASE > snapshot.sql$ git add --patch snapshot.sql
$ git commit -m "example configuration changes"
You can now generate a patch from your partial commit. First, use
git logto find the previous commit against which a patch will be generated. In
our example this is the initial database snapshot with an ID of
908f027ba0077baad4b7c52ebbe986fb89b40f41. Second, call
git format-patch to generate the actual patch, passing in enough unique characters of the commit ID:
$ git log
commit 968fe8271ed7ff08fa46d789371b626b80c46ac6
Author: Jeremy Andrews
Date: Fri Aug 22 16:20:54 2008 -0700
example configuration changes
commit 908f027ba0077baad4b7c52ebbe986fb89b40f41
Author: Jeremy Andrews
Date: Fri Aug 22 16:06:22 2008 -0700
initial database snapshot
$ git format-patch 908f02
0001-example-configuration-changes.patch
Next,
use this automatically generated patch file to create an appropriate
_update hook for a custom .install file. This is done by first opening
the patch file with a text editor. Reviewing the patch, note that any
pre-existing configuration options which you have updated involve two
lines in the patch, one starting with a "-", and one starting with a
"+". All lines starting with a "-" are being removed from your
database, while all lines starting with a "+" are being added to your
database. On our example website the site slogan was previously
defined, so in our patch file we see a "-" line removing the old
slogan, and a "+" line adding the new slogan:
-INSERT INTO `variable` (`name`, `value`) VALUES \
('site_slogan','s:18:\"This is my slogan.\";');
+INSERT INTO `variable` (`name`, `value`) VALUES \
('site_slogan','s:26:\"This is my updated slogan.\";');
Using our knowledge of SQL, we manually convert this into a single update as follows:
UPDATE `variable` SET `value` = \
's:26:\"This is my updated slogan.\";' WHERE `name` = 'site_slogan';
Our other change was to disable user configurable time zones, and as
this had never been updated on our website before we only find a single
relevant line in our patch starting with a "+", and none starting with
a "-":
+INSERT INTO `variable` (`name`, `value`) VALUES \
('configurable_timezones','s:1:\"0\";');
Finally, we use the queries we collected above and create a new
_update_N hook in a custom module used for pushing database updates to
our website. If you are not already using a custom module, you can
create an empty custom.module file, a proper custom.info file, and a
custom.install file. In the custom.install file, you will add a new
_update_N hook. Refer to the links provided at the beginning of this
subsection for a more in depth description of how these Drupal hooks
work. In our example, we add the following function to our
custom.install file. In your own usage, be sure to increment N in your
new _update_N hook:
function custom_update_6001() {
$ret = array();
$ret[] = update_sql("UPDATE `variable` SET `value` = \
's:26:\"This is my updated slogan.\";' WHERE `name` \
= 'site_slogan';");
$ret[] = update_sql("INSERT INTO `variable` (`name`, \
`value`) VALUES ('configurable_timezones','s:1:\"0\";');");
return $ret;
}
You should commit the changes you have made to your custom module files
into your website source code repository. You can then push these
changes to your production website as explained below. Note that it is
highly recommended that you first push these changes to a staging
server, testing the update process and verifying that you have properly
written your update hook. To have your actual updates performed on your
staging and production servers, you will need to point your browser to
yoursite/update.sql and follow the directions.
The same principles that have been documented in this simplistic
example can be applied to more complex configuration changes. You are
not limited to just calling UPDATE and INSERT in your _update_N hooks,
you can also call DELETE, CREATE, ALTER, and any other appropriate SQL
command. When making more complex configuration changes, you should
dump your database regularly without actually committing each
individual change. After each database dump, you can use
git diff --colorto view how your changes are affecting the database. The more you do
this, and the more familiar you get with how Drupal works under the
hood, the quicker the process will become.
There has been much discussion about how these processes can be
further automated in Drupal 7 and beyond. There are also existing
projects attempting to further automate the process for earlier
versions of Drupal, such as the
Database Scripts project found at
http://drupal.org/project/dbscripts.
Pushing Changes To Production
In previous examples, you've learned how you can use Git to manage
your website, simplifying many processes including upgrading to a newer
release of Drupal, and making configuration changes to your website.
This final section discusses using Git to push changes to your
production server. In an earlier example dealing with backups, we
configured a Git repository on a backup server with the IP address
10.10.10.10. We will use this previously configured backup server again
in this example.
At this point, you have updated your website to Drupal 6.3, and
merged all of your changes into the master branch of your Git
repository. You have tested all your changes, and are now ready to push
them to your live web server. You should first tag your release for
easy reference in the future. As you're working in a different
repository than you used in the backup example, you need to configure
the remote backup server. Then, push your current code to the remote
server:
$ git tag RELEASE-2008-07-002
$ git remote add backup-server user@10.10.10.10:backup.git
$ git push backup-server master
This process is greatly simplified if only one person (or on Git
repository) is pushing changes to the backup server. This one person
can be responsible for merging together everyone else's work, and
testing all the changes. Once the code is pushed to the backup server,
it is now available to be pulled to your website. When using this work
flow, it's important that you don't edit files directly on your web
server, but instead that you always pull changes to files via your Git
repository. On the production web server:
$ git pull user@10.10.10.10:backup.git master
If for any reason you want to revert to an earlier version of your
website, this can be easily done using tags. We'll assume that your
previous release was tagged as 'RELEASE-2008-07-001'. We use the
"--hard" option
$ git reset --hard RELEASE-2008-07-001
You can now fix whatever problems you ran into by making changes to
your local repository. Once things are fixed and tested, add a new tag
and again push your changes to the backup server. Finally, pull these
changes to your production server.
With this strategy, you always know exactly what version of your
website is currently being used in production. It also becomes possible
to quickly back out any changes if. Finally, if you have multiple web
servers, it is now trivial to keep them all in sync by checking out
files from the same remote Git repository.