pc/mac

Extracting LJ entries into local files, OS X edition

ETA 2007-08-03: I have a newer, betterer tool in progress. This tool will a) back up locally, and b) optionally migrate your posts. I need testers. If you're reasonably comfortable with the shell and running commands in the Terminal, please hop over there and give it a shot. Thanks!

Xjournal is an OS X client that has a "download all LJ posts" feature. Go to the History view (command-1), refresh the history, then click the "download" button. Wait a few minutes while it chugs and grabs all your posts.

This is handy. However, Xjournal puts the data in a really terrible location. They're all in the file ~/Library/Application Support/Xjournal/History.plist. This is a property list file, which is a common XML document type used in lots of ways by OS X apps. However, it's not very convenient for humans, if you want to do something else with your post data.

I have written a small python script that shows how you might extract the data and save it into individual files: extractXJournal.py. Download this, save it somewhere, and make it executable. That is, launch Terminal, and type:
chmod a+x extractXJournal.py
Then run it, also from a terminal window:
./extractXJournal.py

It'll extract the entries into the current directory, inside a file hierarchy like "year/month/date", with the files named for the post ids.

This is a five minute hack job, and there might be bugs. Tell me about them here and I will fix them immediately. It did, however, extract all my journal entries nicely into html files viewable in a browser. ljdump is another Python option to download directly. However, its file naming convention was annoying, and its output is also XML. This is handy for further scripting, but not so handy for your browsing.

ETA: Okay, we've had two people with very different environment run this successfully, so I think we've got the bugs squashed. Thanks to elementalv and memoryfloodsin for their help with testing!
ETA2: The latest version of the script converts lj user & comm tags (but NOT cut tags) to working links. It also handles illegal characters like the Windows heart character.
ETA3: The latest version (10:30am PDT May 31) also generates an index file at the top level, listing all your posts in a spectacularly ugly way.
ETA4: It's possible to modify ljdump to do something kinda neat: versions of your posts with comment threads. It's a bit of work, and I realize that the urgency level of the problem is no longer what it was yesterday. But I think I might do it. YNK when the next apocalypse is gonna come.
ETA5: Running an older version of OS X? Try extractForOldMacs.py instead. Probably won't work.
ETA6: Charm is another python command-line client that might work to archive your posts in one step. It successfully archived my personal journal, but failed several times to download all of the posts in a community I mod. So mileage may vary.
Tags:
You are made of awesome and win, m'dear. (Dang it, I just deleted my "Willow prefers Macs" icon a couple days ago :( )
I get the following when I try to run your script:

Parsing entries from History.plist.
Bad plist file:
'module' object has no attribute 'readPlist'
Thanks for the report. I have duplicated the problem. I, being a big python geek, am running 2.5, and you, not being a geek, are running 2.3. Fix in progress...
I've duplicated the problem and have a fix in progress. Update in a few minutes when I have it working (it's a python version issue). Thanks for the report!
Thank you so much for this! I do not use Terminal often, but I'll watch here to see if people are having success with it and give it a try if it's favorable. You're a gem!
I've recently (as in about two weeks ago) moved to Mac. I'd like to be able to use this, I've got Xjournal downloaded and history is saved as a plist thingy, but what is python, is this another program?
Sorry if this is completely stupid, but I really am a total newbie when it comes to Macs.
Python is a scripting language, like Perl. It comes with every Macintosh, built-in, and you don't need to do anything to get it installed. You get it for free because secretly, underneath the nice user interface, OS X is Unix and is quite similar to Linux in many ways.

The only truly geeky thing you need to be able to do to run this little tool is run an app called "Terminal". This gives you a command-line view of what's going on with your Mac.

- Download this script. Move it to your desktop.
- Run Terminal.
- In the "shell" (that is, the command-line window with the prompt visible), type
cd ~/Desktop
Exactly like that. Then hit return.
- Then type: chmod a+x extractXJournal.py
- Then type: ./extractXJournal.py
- Watch for a few seconds while it tells you what it's doing. When I run it, I see this:

antenna@jetboy:Archive> extractXJournal.py
Parsing entries from History.plist.
Extracting entries for 2006 ...
     March ...
     April ...
     May ...
     June ...
     July ...
     August ...
     September ...
     October ...
     November ...
     December ...
Extracting entries for 2007 ...
     January ...
     February ...
     March ...
     April ...
     May ...
918 entries found in total.

- On your desktop, you should have a folder for each year you've been keeping your journal. Inside each year folder are folders for months with entries, then days, then finally! individual entries. Double-clicking on an entry should show you the oh-so-not-pretty extracted post data.
Yay! Thank you so much! This is a great little tool, and I had no problems running it.
Thank you so much for this! It worked wonderfully!
You wouldn't happen to know of a backup tool that grabs comments, would you? It's not really important, but I was just curious.
LJdump will do this, but again, not conveniently the way you'd like to browse them. Have been pondering a post-processing pass for that data as well.
Hi! Thanks for this, I found it through another post... thanks a lot, I'm currently downloading my posts but the server resets everytime, and the message I get says it's because the network is too busy!
But thanks, I hope I manage to run the script alright!
Thanks so much! I think I've used Terminal about 2x now... so this is easy enough for even the sadly non-techie-types to use.
Hi! I found you via catrinella's post here.

You are amazing and wonderful to put this together and provide it to random passers-by. Thank you! I think the best part is that I got to use Terminal, which made me feel kind of geeky and technosavvy. Whee!

An odd thing I noticed is that, when a post originally referred to a LJ user (e.g., "ardent_muses"), I get a blank space in the text instead of the name. I'm guessing that's something to do with the Xjournal download, because it happens in the Xjournal interface (when I review the history) as well. Or perhaps I'm just cursed among LJ users. :)

I am thrilled to have my posts backed up and readable. Thank you again!
For what it's worth, the HTML code does show the LJ user's name, but the browsers just don't seem to know what to do with the Lj user "tag". The nice thing is that it's still all there in the code. :)
Hi *^_^* I got something I haven't seen mentioned here yet....

Parsing entries from History.plist.
Bad plist file:
not well-formed (invalid token): line 4317, column 14

Thought I'd mention it. Please. Pretty please. #^___^#
Thank you for writing this.

I've been fighting XJournal all day (with a brief stop to try and user LJDump, which can't ever seem to find its config file), and so far no luck. The bloody thing keep crashing on me, which is why I suspect it's the problem when extractXJournal gave me this:

Parsing entries from History.plist.
Bad plist file:
not well-formed (invalid token): line 99, column 52

I'm going to delete the History.plist files and try that again.

Okay, deleted History.plist. Problem def. on the XJournal side. It seems that the download history feature doesn't support multiple users/journals well, so it'd gotten confused/corrupted when I'd tried to download a second journal's history earlier in the day.

Deleted original file. Tried to get my main journal (this one) which returned a server busy error (haha, like I'm surprised that other paid users are bailing too), so I tried the other one which is unpaid, on another server, yadda yadda.

Downloaded all 54 entries from the second journal, and extracted them without a problem. I made a folder in my home dir with the journal name, and then dropped the Xjournal-created History.plist file in for good measure (and to get it out of the way for the next run).

Hopefully people don't have to do this for every journal they might have - I'm hoping that this is a problem with my particular machine and her various unique issues.
When I try to download 'extractXJournal.py' it's coming up as a text file. Is it something I'm doing wrong or have you changed the link?

Also, thanks for letting me know about 'Terminal.'
+Mem this entry for future reference. I've already used it, but I can't wait for you to add in the LJ-Cut tags as a working link. :D Great job!
A day late, and apparently too dim to run Xjournal. It's open, I hit Refresh in History, and yet... Nothing. I've tried running via Safari and Firefox. Have I missed something?

Thanks in advance for your assistance. I feel like I'm crashing a party and haven't brought anything for the host/ess.
First, make sure you're logged in. Does the "write a new entry" window say you're logged in, show you a user pic, and all that happy stuff? Command-N, then take a look.

Then, Window > History (or command-1) to bring up the history window. The "Refresh" button updates Xjournal's info about what you've written when. Then "Download" fetches all the post contents. You'll see a progress pane saying "Downloading history, 62 of 922" or whatever your numbers are. That might take a few minutes to chug through, depending on how much you've written and how busy LiveJournal is.

Then the next step is where it gets geeky. Quit Xjournal, then run the script as per the instructions above. This thread has the ultra-detailed instructions if you're new to the ways of the Terminal and the unix shell. But to recap:

Download the script.
Move it to your Desktop.
Run Terminal. (Applications, Utilities, Terminal.app)
In a terminal window, type cd ~/Desktop then press return.
Type: chmod a+x extractXJournal.py
Type: ./extractXJournal.py
It'll run for a few seconds, telling you what it's extracting as it goes.
On your desktop you'll now see some folders and a file named index.html. Double-click that, and zoom! Your posts!

Ideally that's what happens anyway. :)
Do I need anything special, other than OS X, to run a python script?
Nope! Python comes with OS X, right out of the box. All you need to do is download & run XJournal, and download and run the script.

You will have trouble with this script if you're running a very old version of OS X, because the version of python will be so old it won't have some libraries the script uses. 10.2.8 is too old, for instance. If you're running 10.4, you're golden.
I downloaded those files you posted... I tried to read them as well as what you posted, and I confess I understand nothing at all... I tried to do the command-1 thing, but I don't know how it can be done, since I have an ibook G4 that has no numeric pad... I'll try again later, but I confess this makes me O.O.
A numeric pad shouldn't be necessary: just the plain old number row across the top. The squiggle/command/apple key shortcut is just a shortcut for a menu item, anyway: if you browse through your menu you should see it. But, er, it's a bit confusing if you're not used to using the Terminal to do stuff, I agree.
you do iz besz
(Anonymous)
Hi

Looks good! Very useful, good stuff. Good resources here. Thanks much!

Bye





Thanks much!
(Anonymous)
Hi all!

Looks good! Very useful, good stuff. Good resources here. Thanks much!


Bye







Thanks so much for the link to XJournal, and sorry for coming into your journal just to ask a question, but I'm having a problem with the script - every time I try to type "cmod a+x extractXJournal.py" or "./extractXJournal.py", I get a message saying "No such file or directory". Am I doing something wrong?
In the Terminal, you're not in the same directory as the one you put the script in. Er. The command pwd tells you where you are in the terminal shell. If you're not where the script is, you can type cd blah, where "blah" is the folder you stuck the script file into. "cd" is short for "change directory", if that helps.

Welcome to the marvel that is the Unix underbelly of OS X.
Thanks to today's little exercise in LJ stupidity, I've started work on another tool that should do a little more than this one. I will announce when it's working :)
I came here by way of a long-ish chain of links, but I'm so glad I found this. You truly rock - this was so helpful. Worked like a charm!

One question - I did this for my personal LJ, but I'm not sure how to switch over to a community I maintain in the XJournal history view - is this possible?
Yay!
Note that I have a better tool in testing now. I will update & notify & inform the world when I'm happy with it. Definitely this weekend.