Taming The WordPress Time Machine: WP-Cron

Bookmark and Share
   Posted by: BJ Johnson in hacks, WordPress

While dealing with server slowdowns, I'm getting into a relatively little known function in WordPress: wp-cron and what it is doing behind the scenes. A cron is a subroutine that performs future tasks according to a schedule, so it's necessary to make sure it is running and running well. It's also good to lighten the load on your server to ensure the best experience for your visitors. There are reports of wp-cron sometimes using a lot of CPU; which happened to us a couple of days back and brought the server to it knees. Everything was going along as normal then BOOM! the server load spiked through the roof: Apache had 500+ connections open, CPU at 100%, physical RAM exhausted, swap space nearly full before it finally died down. During that time, the blogs exhibited Error Connecting To Database in the browser. All blogs were off line. Not good.

Checking the Apache access logs, I found that just before the server went wheels up, wp-cron had run. OK, what caused that? A couple of lines up, there was an access from YandexBot, a crawler from Russia that is not known for its good behavior. Could be it. I blocked it temporarily to see what happens.

Just following the crawler access, there was a line from Feedburner. Going through the logs again looking for correlations, both Yandex and Feedburner show up so many times and right before wp-cron that it is difficult to tell which may be causing the wp-cron spawn; but the pattern is too prevalent to miss — or dismiss.

There are 5 blogs active, 4 with 11 scheduled cron runs each and one with 13 - totaling 57 scheduled runs in a 24 hour period. Yet, wp-cron ran 305 times the day the server choked. No posts were added or updated during this period and none were set for posting at a later date; both of which would spawn a cron run. There are spurious jobs that pop up from time to time; tweetbacks, for instance, runs a (now) job every so often but not a whole bunch. Maybe we have a smoking gun. Unsure which gun, yet, but there's definitely gunpowder in the air.

WordPress core developer Andrew Nacin visited my thread on the support forums and offered some valuable insight:

The cron gets spawned via a loopback HTTP request, which in turn is triggered by standard pageloads. Feedburner has a knack for hitting your blog at the times that the cron needs to be spawned. (Having the cron triggered by feed readers and search engine crawlers is rather common, as they make up a good portion of HTTP requests, depending on the size of the site.)

Installing wp-crontrol, a plugin by Edward Dale, to see WordPress internal cronjobs that are scheduled. (Note: the plugin is circa 2008 and compatible up to v2.5.1. In order to make it partially compatible with >= v2.7, visit this thread. After making this edit, there have been no side effects in v3.0.1 that I am aware of.)

Running it, I find that there are a number of jobs that really could be run only by the main site. Each sub-site has its own cronjobs that check: wp_update_plugins, wp_update_themes and wp_version_check, for instance, twice a day each. These only need to be run once by the main site; not by each and every sub-site in the network; they're redundant. 24 of these can be eliminated without any real loss in functionality. With the above condition of crawlers and Feedburner spawning crons with each access, reducing scheduled runs can only be a Good Thing.

Is there a way to do this?

Andrew had this to share:

I agree that the update checks should only be running on the main blog in the network. There is a related issue I discovered yesterday about this, and the ultimate solution will be to limit update checks to the server. That said, 3.0 is actually (unwittingly) way better at this than MU was. I plan to address this in 3.1.

He also pointed me to a couple of DIFF patches in Trac, which I applied. They don't directly speak to this situation but I committed them, so as to keep current with the state of the code.

Having a viewpoint from someone who sees WP from deep within the code is invaluable. There might be something further I can do with this. Using wp-crontrol again, I set the wp_version_check, wp_update_themes and wp_update_plugins cronjobs on only the sub-sites to "Non-repeating", leaving them as recurring on the main site. Didn't know if these changes would be sticky but one never knows until (s)he tries. Today, I fired up wp-crontrol again. The sub-site cronjobs do not return after running the one final time. Until the core is updated to add this efficiency, this will work just fine. Thanks, Andrew!

If you have a lot of sites, this can save a lot of wp-cron runs. Curious thing is, with these sub-site crons now out of the queue, and only the main site running them, the sub-sites still display updates of plugins and themes as they become available; as if you'd never changed anything. I'd say that makes them really redundant. Either that, or they're badly mis-named for their function.


Short URL for this post: http://spherical.org/s/3v

Tags: , ,

Related Posts...

RSS-CommentsYou can follow responses to this entry through the Comments Feed. You can skip to the end and leave a response. Pinging is currently not allowed.

3 Responses to Taming The WordPress Time Machine: WP-Cron

Rudy
  

BJ, good post there! One thing puzzling me is why they need to have wp_version_check, wp_update_themes and wp_update_plugins even running on the public side at all. Those should be run whenever a logged in person (such as an administrator) accesses the dashboard. There is no need to have this updating constantly--the public at large neither knows nor cares about plugins or versions or anything of that nature. Why retrieve this information at all if it's not needed? Grab it when an admin logs in.

On vBulletin (which HAS killed a server or two in its day, as you know ;) ), the admin updates (version check, "phone home" to check for legit software, updated news notices, etc.) only happen when you login to the admin control panel. It actually delays loading sometimes when vB's server is busy.

It also has a function that lets you adjust vB's own "cron" settings. I have everything sufficiently staggered, and also made a new database index or two that helped with some of the more nasty ones, reducing processing time from minutes to literally just a few seconds. (Whenever someone deletes private messages, they aren't instantly deleted, but saved in a batch and processed via cron...that cron job used to KILL us at the same time every hour during peak visits. Just adding an appropriate index to the PM database broke through the bottleneck.)

Anyway, good to read all of this. My WP installs are nowhere near as busy as yours, but it's helpful nonetheless. Thanks, and take care!

August 7, 2010 at 8:20 pm
Reply
    B.E. Johnson
      

    Ya know, up until this time, I thought that all of that did happen at login time. Doesn't make a whole lot of sense that way to me and yours does. Doing it the way it's done now, this has to be stored in the database waiting for an admin to login. Now, I suppose that there might be some load spread planning involved, such that the wordpress.org servers don't get a ton of update queries all bunched together when admins in certain timezones wake up and login.
    Then again, I now learn that there doesn't seem to be any correlation. The crons in the sub-sites are disabled and the update messages show up there anyway. Maybe a check at login is happening; and that makes the cron check even more redundant.

    August 7, 2010 at 9:29 pm
    Reply
Mike Otgaar
  

BJ - interesting article on wp-cron.
I agree wp-cron runs much too often. I have several WordPress sites of my own, as well as client sites, and cron tuns on all of these after bots visit the site. Worse still - cron usually runs after each lookup request by the bot unless they are very close together in time (a minute seems the upper limit!)
Cron often runs when a visitor comes to a site as well.
(Note: Using WP3.5 on all sites)
I also run Drupal sites - and cron only runs when required - usually twice a day at set times and intervals, except for a few MANUALLY set instances, e.g. to update RSS feed that run more frequently
Interestingly (to me anyway), I have my main site (Drupal 7) and 2 WordPress sub-sites on sub-domains... The Drupal site is busier - at least 6 times more traffic than the WP sub-sites - but uses fewer server resources then either WP site. Pages are also generated and served faster (about twice as fast).
I'd like to see an enterprising developer release a plugin to take total control over wp-cron - I'm sure there are may other admins out there who would agree.
Regards...

January 3, 2013 at 12:44 am
Reply
Post Your Response:

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags are not allowed.


Comment On Facebook:
 
 
Facebook Comments

Copyright © 1976-2017 The Art & Engineering of B.E.Johnson