Extracting LJ entries into local files, OS X edition

ETA 2007-08-03: I have a newer, betterer tool in progress. This tool will a) back up locally, and b) optionally migrate your posts. I need testers. If you're reasonably comfortable with the shell and running commands in the Terminal, please hop over there and give it a shot. Thanks!

Xjournal is an OS X client that has a "download all LJ posts" feature. Go to the History view (command-1), refresh the history, then click the "download" button. Wait a few minutes while it chugs and grabs all your posts.

This is handy. However, Xjournal puts the data in a really terrible location. They're all in the file ~/Library/Application Support/Xjournal/History.plist. This is a property list file, which is a common XML document type used in lots of ways by OS X apps. However, it's not very convenient for humans, if you want to do something else with your post data.

I have written a small python script that shows how you might extract the data and save it into individual files: extractXJournal.py. Download this, save it somewhere, and make it executable. That is, launch Terminal, and type:
chmod a+x extractXJournal.py
Then run it, also from a terminal window:

It'll extract the entries into the current directory, inside a file hierarchy like "year/month/date", with the files named for the post ids.

This is a five minute hack job, and there might be bugs. Tell me about them here and I will fix them immediately. It did, however, extract all my journal entries nicely into html files viewable in a browser. ljdump is another Python option to download directly. However, its file naming convention was annoying, and its output is also XML. This is handy for further scripting, but not so handy for your browsing.

ETA: Okay, we've had two people with very different environment run this successfully, so I think we've got the bugs squashed. Thanks to elementalv and memoryfloodsin for their help with testing!
ETA2: The latest version of the script converts lj user & comm tags (but NOT cut tags) to working links. It also handles illegal characters like the Windows heart character.
ETA3: The latest version (10:30am PDT May 31) also generates an index file at the top level, listing all your posts in a spectacularly ugly way.
ETA4: It's possible to modify ljdump to do something kinda neat: versions of your posts with comment threads. It's a bit of work, and I realize that the urgency level of the problem is no longer what it was yesterday. But I think I might do it. YNK when the next apocalypse is gonna come.
ETA5: Running an older version of OS X? Try extractForOldMacs.py instead. Probably won't work.
ETA6: Charm is another python command-line client that might work to archive your posts in one step. It successfully archived my personal journal, but failed several times to download all of the posts in a community I mod. So mileage may vary.
You are made of awesome and win, m'dear. (Dang it, I just deleted my "Willow prefers Macs" icon a couple days ago :( )
I get the following when I try to run your script:

Parsing entries from History.plist.
Bad plist file:
'module' object has no attribute 'readPlist'
Thanks for the report. I have duplicated the problem. I, being a big python geek, am running 2.5, and you, not being a geek, are running 2.3. Fix in progress...
I've duplicated the problem and have a fix in progress. Update in a few minutes when I have it working (it's a python version issue). Thanks for the report!
Thank you so much for this! I do not use Terminal often, but I'll watch here to see if people are having success with it and give it a try if it's favorable. You're a gem!
I've recently (as in about two weeks ago) moved to Mac. I'd like to be able to use this, I've got Xjournal downloaded and history is saved as a plist thingy, but what is python, is this another program?
Sorry if this is completely stupid, but I really am a total newbie when it comes to Macs.
Python is a scripting language, like Perl. It comes with every Macintosh, built-in, and you don't need to do anything to get it installed. You get it for free because secretly, underneath the nice user interface, OS X is Unix and is quite similar to Linux in many ways.

The only truly geeky thing you need to be able to do to run this little tool is run an app called "Terminal". This gives you a command-line view of what's going on with your Mac.

- Download this script. Move it to your desktop.
- Run Terminal.
- In the "shell" (that is, the command-line window with the prompt visible), type
cd ~/Desktop
Exactly like that. Then hit return.
- Then type: chmod a+x extractXJournal.py
- Then type: ./extractXJournal.py
- Watch for a few seconds while it tells you what it's doing. When I run it, I see this:

antenna@jetboy:Archive> extractXJournal.py
Parsing entries from History.plist.
Extracting entries for 2006 ...
     March ...
     April ...
     May ...
     June ...
     July ...
     August ...
     September ...
     October ...
     November ...
     December ...
Extracting entries for 2007 ...
     January ...
     February ...
     March ...
     April ...
     May ...
918 entries found in total.

- On your desktop, you should have a folder for each year you've been keeping your journal. Inside each year folder are folders for months with entries, then days, then finally! individual entries. Double-clicking on an entry should show you the oh-so-not-pretty extracted post data.
Yay! Thank you so much! This is a great little tool, and I had no problems running it.
Thank you so much for this! It worked wonderfully!
You wouldn't happen to know of a backup tool that grabs comments, would you? It's not really important, but I was just curious.
LJdump will do this, but again, not conveniently the way you'd like to browse them. Have been pondering a post-processing pass for that data as well.
Hi! Thanks for this, I found it through another post... thanks a lot, I'm currently downloading my posts but the server resets everytime, and the message I get says it's because the network is too busy!
But thanks, I hope I manage to run the script alright!
Thanks so much! I think I've used Terminal about 2x now... so this is easy enough for even the sadly non-techie-types to use.
Hi! I found you via catrinella's post here.

You are amazing and wonderful to put this together and provide it to random passers-by. Thank you! I think the best part is that I got to use Terminal, which made me feel kind of geeky and technosavvy. Whee!

An odd thing I noticed is that, when a post originally referred to a LJ user (e.g., "ardent_muses"), I get a blank space in the text instead of the name. I'm guessing that's something to do with the Xjournal download, because it happens in the Xjournal interface (when I review the history) as well. Or perhaps I'm just cursed among LJ users. :)

I am thrilled to have my posts backed up and readable. Thank you again!
For what it's worth, the HTML code does show the LJ user's name, but the browsers just don't seem to know what to do with the Lj user "tag". The nice thing is that it's still all there in the code. :)
Hi *^_^* I got something I haven't seen mentioned here yet....

Parsing entries from History.plist.
Bad plist file:
not well-formed (invalid token): line 4317, column 14

Thought I'd mention it. Please. Pretty please. #^___^#
Thank you for writing this.

I've been fighting XJournal all day (with a brief stop to try and user LJDump, which can't ever seem to find its config file), and so far no luck. The bloody thing keep crashing on me, which is why I suspect it's the problem when extractXJournal gave me this:

Parsing entries from History.plist.
Bad plist file:
not well-formed (invalid token): line 99, column 52

I'm going to delete the History.plist files and try that again.

Okay, deleted History.plist. Problem def. on the XJournal side. It seems that the download history feature doesn't support multiple users/journals well, so it'd gotten confused/corrupted when I'd tried to download a second journal's history earlier in the day.

Deleted original file. Tried to get my main journal (this one) which returned a server busy error (haha, like I'm surprised that other paid users are bailing too), so I tried the other one which is unpaid, on another server, yadda yadda.

Downloaded all 54 entries from the second journal, and extracted them without a problem. I made a folder in my home dir with the journal name, and then dropped the Xjournal-created History.plist file in for good measure (and to get it out of the way for the next run).

Hopefully people don't have to do this for every journal they might have - I'm hoping that this is a problem with my particular machine and her various unique issues.
When I try to download 'extractXJournal.py' it's coming up as a text file. Is it something I'm doing wrong or have you changed the link?

Also, thanks for letting me know about 'Terminal.'