Home > Uncategorized > Saving the WordPress.com Export File and The Linked Media Files (and wget’s strictness)

Saving the WordPress.com Export File and The Linked Media Files (and wget’s strictness)

December 7, 2009 Leave a comment Go to comments

So I’ve been wanting a way to automatically backup my wordpress.com export file. I decided to go for a bash and wget mix to do this work. But I soon had a problem wget won’t save cookies that have a path different to the file you are downloading. This is a problem because, well here is what I basically do to get the export file.

Grab wp-login.php. This will issue a cookie that WP looks for as proof that I can indeed store cookies.

Next I post login credentials to wp-login.php. This will issue a bunch of authentication cookies. Specifically,

Set-Cookie: wordpress_test_cookie=WP+Cookie+check; path=/; domain=.wordpress.com
Set-Cookie: wordpress=some_string; path=/wp-content/plugins; domain=.wordpress.com; httponly
Set-Cookie: wordpress=some_string path=/wp-admin; domain=.wordpress.com; httponly
Set-Cookie: wordpress_logged_in=some_string; path=/; domain=.wordpress.com; httponly
Set-Cookie: wordpress_sec=some_string; path=/wp-content/plugins; domain=.wordpress.com; secure; httponly
Set-Cookie: wordpress_sec=some_string path=/wp-admin; domain=.wordpress.com; secure; httponly

The problem is Wget will refuse to save number 2,3,5 and 6 (only saving wordpress_test_cookie and wordpress_logged_in). It refuses the rest because it requires the cookie path to be the same as the path of the file you are requesting. Using –debug wget says,

cdm: 1 2 3 4 5 6 7 8Attempt to fake the path: /wp-content/plugins, /wp-login.php
cdm: 1 2 3 4 5 6 7 8Attempt to fake the path: /wp-admin, /wp-login.php
cdm: 1 2 3 4 5 6 7 8Attempt to fake the path: /wp-content/plugins, /wp-login.php
cdm: 1 2 3 4 5 6 7 8Attempt to fake the path: /wp-admin, /wp-login.php

Specifically to get the export file I need the wordpress_sec cookie for the path /wp-admin. I can’t just request /wp-admin and try to get the cookie from there because only wp-login.php will let me post credentials.

Possible solutions are A) write a hacky solution that just grabs the cookie value using grep/sed and manually add this to the cookies file, B) recompile wget to accept some other argument that will accept these cookies, or C) don’t use wget.

I took a look at the source for wget, and it was easy to identify the problem area, I could just simply remove this segment,

/* The cookie sets its own path; verify that it is legal. */
 if (!check_path_match (cookie->path, path))
 DEBUGP (("Attempt to fake the path: %s, %s\n",
 cookie->path, path));
 goto out;

But then my download script wouldn’t be as portable and I’ll have to make sure I use and have the patched wget available.

I ended up using curl for some parts, but I probably could have done option A.

Anyhow, the script is here. It should grab the export xml file as well as any media files that it references and were uploaded to that wordpress.com blog.

Categories: Uncategorized Tags:
  1. May 4, 2010 at 7:13 pm

    I also had trouble with wget, but I solved the cookie issue a bit differently, by forcing it to every request.

    My article about archiving WPMU can be found here: http://mig.hyper.fi/article/355

  2. mgutman
    September 6, 2012 at 11:07 am

    Thanks for the script, It saved me a lot of time 🙂

  1. September 23, 2010 at 8:12 pm
  2. September 6, 2012 at 11:32 am

I don't read comments anymore due to an increase in spam comments. If you want to get in touch please send me an email (see tianjara.net for details).

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: