Optimus Cache Prime (Legacy)

Optimus Cache Prime (Legacy)

This is the legacy Python version of Optimus Cache Prime. For the latest version, click here.

Optimus Cache Prime (OCP) is a smart cache preloader for websites with XML sitemaps. It crawls all URLs in a given sitemap so the web server builds cached versions of the pages before visitors or search engine spiders arrive.

Since Google began penalizing websites with long response times in their rankings, serving all of your pages quickly has become more important than ever. Optimus Cache Prime helps you do that by making sure your cache — be it an in-memory cache like memcached or APC, or a flat file cache like WP Super Cache or W3 Total Cache — is primed so random requests are served lightning fast.

OCP has two usage modes:

Remote Mode (default)

Run OCP on any machine specifying a target sitemap — e.g. one that links to all of your pages, or one that lists only the high-priority pages — and Optimus primes the links therein. You can enable throttling to reduce load on the web server.

Local Mode (for static file caches)

When run locally on your web server, OCP probes your static file cache before making any requests to your web server, reducing the amount of requests and redundant log messages drastically. Pages are only crawled if they aren’t already cached.

OCP checks up to 10,000 pages per second with local mode enabled if the cache is mostly primed from previous runs. It was designed for use with W3 Total Cache and WP Super Cache for WordPress, but will work with any system that uses a URL-relative flat file cache (i.e. /about/ is cached as e.g. ‘about’ or ‘about/index.html’ on the disk.)

Set the options inside ocp.py to enable Local mode.

Download

  • ocp.py — Version 1.1 – December 29th, 2010
  • Git repository: git clone git://github.com/patrickmn/ocp.git

In some browsers you may need to right-click the link, choose ‘Save as’, then save the file as ‘ocp.py’.

Usage

  • Run python ocp.py <URL/path of sitemap>
  • When configured, ocp.py can be run without parameters. See the options inside the script.

System Requirements

  • Python 2.5 or above (tested with Python 2.5, 2.6, and 2.7)
  • Local mode only: WordPress with W3 Total Cache or WP Super Cache, or something else that uses a caching system with URL-relative file names

FAQ

Q: How do I make an XML sitemap?
A: You can use an online sitemap generator like XML-Sitemaps.com. For WordPress I highly recommend Arne Brachhold’s XML Sitemap Generator. You can also make one manually — here’s an example.

Q: Do I need to have WordPress, W3 Total Cache, WP Super Cache, memcached, … to use OCP?
A: No. All you need is Python and an XML sitemap. To use Local mode you need something which stores its cached pages with file/directory names that are relative to the original URLs. (Both W3 Total Cache and WP Super Cache do just that.)

Q: How do I slow down OCP?
A: Increment the crawl_delay value inside ocp.py. A value of 1 means OCP waits one second before requesting another page, 5 means five seconds, 0.5 half a second, and so on. OCP only ever loads one page at once, so the load should be negligible even with no delay.

Q: Can you demonstrate how to use Local mode?
A: Here are some example settings you can set inside ocp.py:

WordPress with W3 Total Cache

sitemap = '/var/www/patrickmylund.com/blog/sitemap.xml'
crawl_delay = 0
local = True
url_base = 'http://patrickmylund.com/blog'
cache_dir = '/var/www/patrickmylund.com/blog/wp-content/w3tc/pgcache'

WordPress with WP Super Cache

sitemap = '/var/www/patrickmylund.com/blog/sitemap.xml'
crawl_delay = 0
local = True
url_base = 'http://'
cache_dir = '/var/www/patrickmylund.com/blog/wp-content/cache/supercache'

With WP Super Cache the URL base is just ‘http://’ because WP Super Cache also creates a folder with your domain name in the cache folder.

After configuring the settings, simply run python ocp.py.

Q: How do I know if Local mode is working?
A: You shouldn’t see any requests from “Optimus Cache Prime” in your web server’s access log, and runs subsequent to the first should complete in less than a second.

Q: How do I preload the cache regularly?
A: The easiest way is to set up a cron job. On most Linux distributions you can do this by adding a cron entry using crontab -e. The entry can be e.g. /5 * * * /usr/bin/python /home/patrick/ocp.py, which will run OCP every five minutes. For more information, see Ubuntu’s Cron Howto.

Note that Cron’s environment/path is very minimal. Commands like ‘python’ might not work — only full paths like /usr/bin/python.

Q: Why doesn’t OCP use HTTP Keep-alive?
A: To maintain a persistent connection (given that the web server actually allows them), the contents of each request response must be read before another request can be made. OCP doesn’t retrieve any of the contents of your pages in order to reduce bandwidth consumption.

License

Optimus Cache Prime (OCP) is released under the MIT license (see source).