Automate Recycling WordPress Posts on a Yearly Basis

by on August 2, 2013

3arrows-recycle-bigIf you want to skip ahead and go directly to the code I created, click here.

I write a blog about technology history (thisdayintechhistory.com) where I compose at least one post for every day of the calendar year detailing an event that took place on that day. I’ve been writing this blog for a couple of years and have developed a little bit of a following. After the first year of writing the blog, I realized I was going to have a problem. The way WordPress works, the latest post is always shown first. This was fine when I making posts daily during the first year, since I was posting in chronological order anyway. But when January 1st of the next year came around, the posts I had made for the previous year would be buried a year back. I wanted my readers to be able to visit the site daily and see the latest posts for that day, no matter what day of the year it was.

At one point I figured out a way to make WordPress display the blog posts for any calendar day, depending on what day it was. But I wasn’t particularly satisfied with that solution, especially when I realized that I had a bigger problem on my hands. The bigger issue was the fact that if I wasn’t posting articles for every day of the year, the RSS feed wouldn’t update. Since I was using the RSS feed to update articles to a Facebook page, as well as send out a daily e-mail, I had to figure out a way to “refresh” the previous year’s posts for every day of the year. Plus that way anyone who was subscribed to my feed would always get every day’s history articles for that particular day.

I did some research but could not find any automated way to do what I had in mind. So eventually I settled on simply manually updating each post. Since the events that happened on a particular day of the year don’t change, I simply “recycle” the posts by updating the year of the post when their particular day rolls around. For example, on January 1st I update the publishing date for all the posts of that day to January 1, 2013. This puts that day’s posts at the top of the blog and also re-injects them into the RSS feed. The Facebook page gets updated with that day’s posts and an e-mail goes out with the posts for that day.

The problem was that I had to do this every single day for just the posts of that day. I couldn’t do posts ahead of time, as that would make the posts disappear from the blog since they now became “scheduled” for a future time. Since I had people using my blog to research technology history for any day of the year, that wouldn’t work. I had to keep all posts available to view in the archives. So I had to try to remember every day to update my blog, preferably in the morning.

As you can imagine, this wasn’t a great solution. Life gets in the way and I’d forget to do this most days. So at times I was catching up for several days at a time. Not a huge deal, but some of my readers were complaining. I couldn’t blame them. It’s not as much fun to read daily history articles for events that happened a few days ago. So I finally decided to do something about it, since I still could not find a solution that met my needs.

My Big Fat WordPress Adventure

In my career, I have learned much about various programming and scripting languages. I am pretty good at understanding code, but by no means am I a professional programmer. This is why I put off this project for so long. I knew it would take a lot of research and testing as I hacked my way towards a solution. My adventure to create a working process for my problem actually was more difficult than I had imagined. It seemed at every point I made progress, I would discover more details that needed to be taken care of. All in all, the story I’m about to tell you covered the span of about a week-and-a-half.

I knew that WordPress stores all data in a MySQL database. My plan was to figure out a way to run a job at the same time every day that would directly edit the database with the updated date info. My first goal was to find where the date information was stored and create a MySQL statement that would update the year with the current year. I have a site setup on my reseller account that I use for testing and demoing sites for my clients, so I used that for a lot of my testing.

I found that each post stored the published date in the post_date field of the wp_posts table. I discovered that there is no simple function in MySQL to update a date field’s year to the current year. There is a function to increment a date by one year, which would work if I could guarantee that each post was only one year old. But I figured it would be better to simply set it to this year’s date regardless of which year the post was currently published. I figured this would cover more situations, as I did have a few posts that still had 2011 publish dates, plus I wanted my solution to be applicable to other people’s situations. So I created and tested a MySQL statement that accomplished what I wanted. Great, step one complete, right?

Not quite. I discovered that while the posts themselves did indeed show up with the current year, when listed in the WordPress administrative interface under the All Posts section, they showed as still being posted in the previous year. If I clicked on them to edit them, they showed the current year. Weird. What was wrong? In attempting to figure out the problem, I studied the raw XML RSS feed from my testing server and noticed that the <pubdate> element showed a timestamp represented with a +0000 time zone. +0000 represents Greenwich Mean Time, or GMT, now officially known as UTC, or Universal Time Coordinated (also Coordinated Universal Time). Many computer programs store time in UTC  to avoid the complications of time zone information. I remembered that previously I had found another field in the wp_posts table called post_date_gmt. It appears that WordPress uses both post_date and post_date_gmt for different purposes. Why WordPress stores time information in two different ways, I’m not sure. It would seem to make sense to only store it in UTC, but I’m sure they have a good reason. I found one possible explanation here. So anyway, I now had to modify my SQL statement to not only modify post_date but also post_date_gmt. After some trial and error, I settled on a method of converting post_date into UTC time and setting post_date_gmt to that value.

After further testing, I was satisfied that I had created the correct procedure for refreshing the date of a post. Now the next step was how to fire off the MySQL statement at a specific time. I researched SQL events, but discovered that my host, Hostgator, did not exactly support the set up of SQL events. I needed SUPER privileges to start the event manager and Hostgator does not support this on shared or reseller accounts. The support representative I chatted with claimed that Hostgator does support MySQL events without granting SUPER privilege but that was beyond the scope of their support. I’m not sure how this is possible, but regardless I gave up on this option because I figured out that I could run a UNIX cron job with the mysql command line tool.

Sure enough I was able to refresh my posts using the MySQL statement running from a cron job on my testing server. But there was a problem. When calling the mysql command line tool, I had to pass the password as a parameter on the command itself. This is considered a security hole, as passing a password in cleartext is not good practice, especially when the cron job logs the command and sends it by e-mail. So before I implemented the MySQL commands on my live server, I had to find a way around this. Luckily, I quickly discovered that if you store the password in a MySQL configuration file, you do not need to pass the password on the command line. With Hostgator, I could created a file called “.my.cnf” on the root of my server directory, outside of the “public_html” folder. This was a sufficiently secure method of storing the password for use with the mysql command line tool.

Feed Me, Seymour!

I implemented the process on my live server and I thought I was golden. The posts seemed to be updated correctly but I noticed that my RSS feed wasn’t updating. I use the new service FeedPress to syndicate my RSS feed, and it wasn’t reflecting the latest posts. In my testing I discovered that if I manually updated my latest post, even if I didn’t actually change anything but simply pressed the update button, the feed would in fact refresh. I studied the raw XML RSS feed from FeedPress and noticed the <lastBuildDate> element was not updated. Further research showed that <lastBuildDate> is filled with data returned by the get_lastpostmodified() function. Studying the MySQL database, I saw fields called post_modified and post_modified_gmt. Could it be that WordPress searches the post_modified and/or post_modified_gmt fields to find the latest date and uses that for <lastBuildDate>? I tested modifying both fields with MySQL statements and sure enough that seemed to bump the <lastBuildDate> element in the RSS feed from my testing server. Hoping I had reached the end of my journey, I implemented the changes to my live server.

Unfortunately, the RSS feed did not update on my live server. Obviously, the fact that I was using FeedPress created a different scenario than the setup I had on my testing server, which isn’t using FeedPress. Something wasn’t happening to trigger FeedPress to rebuild its feed. However, I could force refresh my FeedPress feed from FeedPress’s web site and I verified that the feed was updated and the <lastBuildDate> element was correct. So my feed was being properly generated, it simply wasn’t triggering FeedPress to automatically refresh itself. According to the FeedPress plugin documentation, when a post is published, it uses an XML-RPC “ping” to notify the FeedPress service to update itself when a post is published or updated. Something wasn’t happening to trigger this ping using the MySQL statements I had created. Off to do more research.

What I discovered in regard to XML-RPC pings was that WordPress automatically triggers these pings when a post is updated or published by normal means. The first thing I found was that WordPress inserts a row into the table wp_postmeta containing the id of the post that was updated, a value of “_pingme”, and a value of “1″. In english it seems WordPress is saying to itself, “I need to send out an XML-RPC ping for the post with ID xxx”. So I created and tested an additional MySQL statement to insert these rows as necessary. Was I finished? Not yet, but I was much closer.

The next issue I ran into was the fact that the rows I was inserting weren’t being processed in a consistent manner. Once WordPress processes the “_pingme” records, it removes them. I could verify that WordPress cleans up these rows when the normal publishing process was followed, but it would not remove the rows I created with MySQL – at least not right away. Further research indicated that WordPress processes those “_pingme” rows using a function called wp_cron(). Basically, WordPress uses the wp_cron() function to do various maintenance tasks on itself, including processing any XML-RPC pings that need to be fired off. Unlike the UNIX cron function which can be triggered on a time-based schedule, by default WordPress checks if it needs to run wp_cron() every time a page is loaded either in the administrative interface or by a user visiting the WordPress site. This isn’t to say that wp_cron() runs every single time a user visits a WordPress site. WordPress only checks to see if wp_cron() needs to be run. If wp_cron() actually does run is dependent on whether any events have been scheduled for it to run. So my method of inserting the rows by using MySQL statements wasn’t generating an event for wp_cron() to process. However, if anything happened to generate an event for wp_cron() to process, such as manually updating a single post, then the very next time I would load a page on my WordPress site, wp_cron() would run and process all the “_pingme” rows I had created. In theory, I could have been satisfied with the fact that I could count on WordPress to eventually process the XML-RPC pings I created, but I really wanted to make sure that things would happen in a timely basis. So I needed a way create events for wp_cron() to process. Off to do some more research.

My research lead me to the function wp_schedule_single_event(). This function was the one that would schedule events for wp_cron() to process. By passing this function a “do_pings” value, I could tell WordPress that it needed to process those “_pingme” rows I had generated with MySQL. But how could I call this function, given that WordPress functions are all in PHP code? I had to figure out a way to run PHP code from a cron job, similar to the way I was running the MySQL statements from a cron job. Yet even more research led me to the correct way to call WordPress functions from a PHP command line. So I implemented the PHP script to call wp_schedule_single_event() with a “do_pings” value, followed by a call to wp_cron() to process the event. My testing showed that everything was working correctly … yet FeedPress was still not updating! Argh!

At this point in my testing I literally went to bed to sleep on it. At some point in my sleep I came up with an epiphany. I needed to study the code of the FeedPress plugin to see how it pings the FeedPress service. It must do something outside the normal WordPress XML-RPC ping process. So when I woke up, I discovered that the FeedPress plugin uses a function called feedpress_publish_post() to send the service an XML-RPC ping. This function is added to a WordPress action called publish_post, which most certainly runs when a post is published using normal means. So it would appear that all I had to do was call this feedpress_publish_post() function in my PHP code and I should be done. I wasn’t sure if I would be able to call this function in my existing PHP code, or jump through some more hoops to call the plugin code first. So I went ahead and just added the function call to my PHP code and tested. After all this work, was I finally done?

Halleujah! The function call worked perfectly the first time I tested it! All that I had left to do was to clean up the way I was calling the MySQL statements and PHP code to satisfy my own code compulsion. I did this by putting the MySQL statements into a file instead of directly on a command line. Then I created a simple linux shell script that called first the mysql command line, running the MySQL statements, then called the PHP code that scheduled the “do_pings”, called wp_cron(), and feedpress_publish_post().

I’ve been running this setup for 4 solid days now and all has worked without a hitch. Looking back, given all that I’ve learned during this process, I’m certain I could have accomplished all this a lot cleaner by using various PHP functions. But since the process is working, at this point I’m satisfied and need to spend time on other things, like my paying clients! So if anyone would like to look over my code and suggest a more streamlined or “correct” way of implementing this, I’m happy to review your ideas. I know I’m not the only one who writes a daily history blog, so I’m sure this is something that many others need help with.

The Code

The following is the actual code I have implemented to update the publish date for every post on a particular calendar date. Offhand, I’m sure there will be a problem during a leap year, but I’ve got a few years to deal with that! If you want to implement this on your own setup, there are no guarantees it will work, but you will need to change a few lines to match your own setup, which I’ve added as comments to the code.

I set up a cron job to run every day at 8:30 AM, in my time zone:

30 8 * * * /bin/sh wp_update_year.sh

wp_update_year.sh

#!/bin/sh
echo "Starting WP Update Year ..."
mysql -v < wp_update_year.sql
php wp_do_pings.php
echo "WP Update Year complete."

wp_update_year.sql

USE thisday_wrdp1;      # Modify this line with the applicable MySQL database for your WordPress installation

UPDATE wp_posts, wp_options
SET post_date = concat(CURDATE(),' ',time(post_date)), post_date_gmt = CONVERT_TZ(concat(CURDATE(),' ',time(post_date)),option_value,'UTC' ), post_modified = NOW(), post_modified_gmt = UTC_TIMESTAMP()
WHERE MONTH(post_date) = MONTH(CURDATE())
AND DAYOFMONTH(post_date) = DAYOFMONTH(CURDATE())
AND post_type = 'post'
AND post_status = 'publish'
AND option_name = 'timezone_string';

INSERT INTO wp_postmeta (post_id,meta_key,meta_value)
SELECT id,'_pingme','1'
FROM wp_posts
WHERE MONTH(post_date) = MONTH(CURDATE())
AND DAYOFMONTH(post_date) = DAYOFMONTH(CURDATE())
AND post_type = 'post'
AND post_status = 'publish';

wp_do_pings.php

<?php

//setup global $_SERVER variables to keep WP from trying to redirect
$_SERVER = array(
  "HTTP_HOST" =--> "http://thisdayintechhistory.com",      //modify this line with your URL
  "SERVER_NAME" => "http://thisdayintechhistory.com",      //also modify this line with your URL
  "REQUEST_URI" => "/",
  "REQUEST_METHOD" => "GET"
);

//require the WP bootstrap
require_once("/home/thisday/public_html/wp-load.php"); //modify this line with the correct path for your installation

//schedule WordPress XML-RPC pings
echo "Scheduling do_pings ..." . PHP_EOL;
if ( ! wp_next_scheduled( 'do_pings' ) ) {
	wp_schedule_single_event(time(), 'do_pings');
}

echo "Starting wp_cron ..." . PHP_EOL;
wp_cron();

//The following lines are only necessary if you are using the FeedPress service
echo "Pinging Feedpress ..." . PHP_EOL;
feedpress_publish_post();

Again, I’m sure there is a lot of opportunity to make this a more streamlined and programatically “correct” process, so I’m happy to review your ideas!

  • Miguel Perez

    Thanks for the article. I am going to try the same thing in the next few days, even though I am new to this type of customization.

    • http://marcelbrown.com/ Marcel Brown

      Good luck! Let me know your site and if there’s anything I can help with.

    • http://marcelbrown.com/ Marcel Brown

      Hey Miguel, did you set this up somewhere? I’d love to know what site it was.

  • http://www.andrewpitchford.com Andrew Pitchford

    Marcel, we’ve been looking for a solution for a ‘daily reading’ type recycling of a 365 day content by a writer. This looks good. I’d just like to check if you’ve updated your process in anyway since you wrote this?

    • http://marcelbrown.com/ Marcel Brown

      I’m happy to say I have not changed anything and so far everything has been going swimmingly! In fact, I subscribe to my own RSS feed to keep an eye on the process and every once in a while I read my own posts as if I am reading them for the first time!

      Good luck implementing this and let me know what the site is where you are going to set this up.

Previous post:

Next post: