Archiving ain’t easy: bringing old one-off WP sites into WPMu

Every Summer we try and both update and archive some of the old projects we have on the various Bluehost accounts we have done over the last 3 or 4 years. It is a painstaking process, and when you have anywhere from 50-100 WordPress one-off blogs, MediaWiki installs, Drupal sites, and phpbb forums out in the wild for a number of years the possibilities for kipple haunts the archivist’s soul. So, for starters, I’ve been trying to use UMW Blogs as a space to archive the numerous WordPress blogs I setup over the last few years. It’s a logical layup, import all the data, through up a 301 redirect to the new URL, and wham, bam, one update maam. Sounds simple enough, and for WordPress blogs with one author it  actually is quite easy.

However, when you try archiving a group blogs with numerous authors into a WPMu install, the plot thickens. So, I’m going to take you through my process for archiving a group blog from back in Fall, 2007, in those fertile, heady days before UMW Blogs, hell that was even before ELS Blogs. This particular course site was centered around a directed study on Poetic Sequence led by professors Mara Scanlon and Claudia Emerson. This blog had fifteen students all independently tracing the work they were doing over the course of the semester. The group blog became a hub for sharing ideas, assignments, project progress, and their finished works. It was an interesting model for me given that they only met as a group in an actual classroom a few times over the course of the semester. In fact, it is the closest I have ever come to designing a space for a predominantly online learning environment—a fully online classroom is still something I am very interested in trying out with some of the designs we have come up with over the years.

I’m focusing on the archiving of this site in particular because I still think it is one of the best early projects I worked on, and it was also a very particular setup that clearly illustrates the challenges of importing group blogs on a single WP install into WPMu. So, here we go with the play-by-play:

Importing a One-Off WP Group Blog into WPMu

The User Conundrum

The challenge of simply exporting and importing a group blogs from a one-off WP install into WPMu has everything to do with assigning authors. Therein lies the root of the problems for about 90% of the issues when importing a one-off group blog into WPMu. See, the thing is that all the students who were part of the directed study group blog back in Fall, 2007 had graduated by the time we got UMW Blogs up and running, which means they were not users on UMW Blogs. Moreover, even if a few of them were, I would have to track down each of their email addresses and usernames and add them as a users to this one particular blog and wait for them to accept my invitation so I could them map them to the posts they wrote. (Although, I found a way to force add users as an über-admin, go to Site Admin–>Blogs an find the blog you are looking for and click the edit link, from this administrative/backend screen you can force add any user you want to a blog.) So, even if they were in the system, their email would be long defunct—a huge issue with using UMW emails in UMW Blogs which I am re-visiting thanks to D’Arcy—and adding them in this manner would prove futile.

So, to combat this issue I thought I had come up with a great idea, and when I saw Ron’s new plugin that allows you to decided what elements of a blog you want to export—I figured it was high time to try out my idea.  My plan was to use FeedWordPress to pull in all the posts from the original Poetic Sequence blog (which you can see here) to the new blog on UMW Blogs (which you can see here).  Why? Well, because FeedWordPress pulls in all the authors and immediately creates accounts for them.  It makes my job simple, all I would need to do after that is import the pages, and copy the theme into the UMW Blogs system.

A piece of cake, right? Well, kinda, it did pull in all the posts and create the authors as expected, and Ron’s Advanced Export plugin did a fine job with just importing the pages. Alas, once againthe problem was with syndicating comments, what a nightare.  It’s always the comments with FeedWordPress!!!! I had no way to import the comments cleanly, there is no special way to do it with Advanced Export, and FeedWordPress, as I have noted extensively over the last year or two, doesn’t syndicated in comments. I even tried to import the comments table into the blog on UMW Blogs archival site, but the post IDs were thrown off and the comments did not associate themselves with any posts….fail!

So, I had to delete the syndicated posts, and import them through the import file, but the cool thing I discovered is that FeedWordPress had created the users from the one-off WP blog, and they were still their and I could map the authors appropriately now from the import file.  So, my idea kind of worked. Though it adds an extra step.

The Theme and Plugins

I did eventually get all the posts and comments in and assigned to the proper author, so then I turned to preserving the original theme.  And this is one of the things that works beautifully in WPMu, and one of the things I love about it. All I had to do is copy the theme into the wp-content/themes folder and make it available to everyone for a split second.  After that, I enabled Userthemes for that blog and it is automatically copied into the the uplaod directory for that blog—something like blogs.dir/2477/themes. After that, I can delete the theme from the wp-content/themes directory and still have an archived version of an old theme which I can edit and make it match the orgianl site perfectly.  Do your own comparison between the two here and here.

With plugins, I had very few on the original blog—podPress and a quotes plugin called Yarq (which is long defunct). We have podPress on UMW Blogs—although I hate it and want to get rid of it—so I simply grabbed the mp3 URLs from the assicated fields on posts with audio (which was only one in this instance) and copied them into the post directly.  Why?  Well, one of the great benefits of Anarchy Media Player is that it will convert any url ending with mp3, mp4, etc. into a flash media player automatically—no strings attached. For the random quotes in the sidebar, I had to download the table from created from the Yarq plugin and copy and paste them into the slick Quotes Manager plugin we have in UMW Blogs.

Blogroll


Blogroll links, or just sidebar links in general, are always a special case.  TO im port a blogroll you have to go to Tools–>Import and add the following suffix to the blog url wp-links-opml.php.  So, for example, I grabbed http://poetic-sequence.elsweb.org/blog/wp-links-opml.php, and wham it’s all imported.

Links and Files

Now, to add another dimension to the archiving, there were a whole bunch of uploaded files in the wp-content/uploads directory in the one-off WordPress blog that were linked to from within a number of posts. To make the links cleaner, I downloaded the uploads directory and copied the files and contents within the uploads directory (not the uploads directory itself) into the particular uploads file for that blog on the WPMu, in this case blogs.dir/2477/files/ folder. So, once you do this, you can actually change the existing links …/wp-content/uploads/2006/09/image.jpg to the following path for WPMu  …/files/2006/09/image.jpg

Now, Shannon Hauser did all the leg work on changing the various links in the Student Projects section for the site, and this is still far too laborious.  I should have done a find and replace in the XML file exported, but I forgot this step.

MediaWiki

This class also had a MediaWiki install, though it was not very successful during the course of the semester.  It was used as a space to build a bibliography of primary and secondary sources, and it was just one page. However, one student wrote two of his papers in the MediaWiki for the class.  Rather than trying to preserve the whole MediWiki, I just copied the three pages into the UMW Blogs wiki here—which will allow me to get rid of the original and all its ugly spam. ne cool side effect of this is that using the Wiki Append (or the plugin formerly known as Wiki INC), I was able to pull the students wiki papers into blog pages in the Student Projects section. (You can see the Wikified papers here and here, and seen then here and here as blog pages being pulled in with Wiki Append).

Conclusion

This process is still far too laborious and difficult. It needs to be far, far easier than this if we are going to rpeserve some of the stuff we have done over the years..

This entry was posted in , Anarchy Media Player, archive, blog, feedwordpress, importing, mara scanlon, plugins, tutorials, UMW Blogs, umwblogs, Wordpress, wordpress multi-user, wpmu, wpmu development. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *