Home
Antennapedia
Books, swords, and tweed.
Extracting LJ entries into local files, OS X edition 
30th-May-2007 12:32 pm
pc/mac
ETA 2007-08-03: I have a newer, betterer tool in progress. This tool will a) back up locally, and b) optionally migrate your posts. I need testers. If you're reasonably comfortable with the shell and running commands in the Terminal, please hop over there and give it a shot. Thanks!

Xjournal is an OS X client that has a "download all LJ posts" feature. Go to the History view (command-1), refresh the history, then click the "download" button. Wait a few minutes while it chugs and grabs all your posts.

This is handy. However, Xjournal puts the data in a really terrible location. They're all in the file ~/Library/Application Support/Xjournal/History.plist. This is a property list file, which is a common XML document type used in lots of ways by OS X apps. However, it's not very convenient for humans, if you want to do something else with your post data.

I have written a small python script that shows how you might extract the data and save it into individual files: extractXJournal.py. Download this, save it somewhere, and make it executable. That is, launch Terminal, and type:
chmod a+x extractXJournal.py
Then run it, also from a terminal window:
./extractXJournal.py

It'll extract the entries into the current directory, inside a file hierarchy like "year/month/date", with the files named for the post ids.

This is a five minute hack job, and there might be bugs. Tell me about them here and I will fix them immediately. It did, however, extract all my journal entries nicely into html files viewable in a browser. ljdump is another Python option to download directly. However, its file naming convention was annoying, and its output is also XML. This is handy for further scripting, but not so handy for your browsing.

ETA: Okay, we've had two people with very different environment run this successfully, so I think we've got the bugs squashed. Thanks to [info]elementalv and [info]memoryfloodsin for their help with testing!
ETA2: The latest version of the script converts lj user & comm tags (but NOT cut tags) to working links. It also handles illegal characters like the Windows heart character.
ETA3: The latest version (10:30am PDT May 31) also generates an index file at the top level, listing all your posts in a spectacularly ugly way.
ETA4: It's possible to modify ljdump to do something kinda neat: versions of your posts with comment threads. It's a bit of work, and I realize that the urgency level of the problem is no longer what it was yesterday. But I think I might do it. YNK when the next apocalypse is gonna come.
ETA5: Running an older version of OS X? Try extractForOldMacs.py instead. Probably won't work.
ETA6: Charm is another python command-line client that might work to archive your posts in one step. It successfully archived my personal journal, but failed several times to download all of the posts in a community I mod. So mileage may vary.
Comments 
30th-May-2007 09:25 pm (UTC)
You are made of awesome and win, m'dear. (Dang it, I just deleted my "Willow prefers Macs" icon a couple days ago :( )
31st-May-2007 06:27 am (UTC)
:) Nice to have one's profession turn out to be handy for the fandom!
30th-May-2007 09:47 pm (UTC)
I get the following when I try to run your script:

Parsing entries from History.plist.
Bad plist file:
'module' object has no attribute 'readPlist'
30th-May-2007 11:18 pm (UTC)
Thanks for the report. I have duplicated the problem. I, being a big python geek, am running 2.5, and you, not being a geek, are running 2.3. Fix in progress...
30th-May-2007 10:07 pm (UTC)
First: you're made of awesome! Thank you for sharing this.

I have, however, a slight problem. Whenever I want to execute the file, I get the error message:

Parsing entries from History.plist.
Bad plist file:
'module' object has no attribute 'readPlist'

I have no idea where the problem is and it's years ago that I dug into a code (much less not mine). So... any suggestions what I miss would be great!
30th-May-2007 11:19 pm (UTC)
I've duplicated the problem and have a fix in progress. Update in a few minutes when I have it working (it's a python version issue). Thanks for the report!
30th-May-2007 10:48 pm (UTC)
Thank you so much for this! I do not use Terminal often, but I'll watch here to see if people are having success with it and give it a try if it's favorable. You're a gem!
30th-May-2007 11:57 pm (UTC)
Worked for [info]elementalv just now, so I have one report of success from somebody who isn't me :) :)
30th-May-2007 11:32 pm (UTC)
I've recently (as in about two weeks ago) moved to Mac. I'd like to be able to use this, I've got Xjournal downloaded and history is saved as a plist thingy, but what is python, is this another program?
Sorry if this is completely stupid, but I really am a total newbie when it comes to Macs.
30th-May-2007 11:51 pm (UTC)
Python is a scripting language, like Perl. It comes with every Macintosh, built-in, and you don't need to do anything to get it installed. You get it for free because secretly, underneath the nice user interface, OS X is Unix and is quite similar to Linux in many ways.

The only truly geeky thing you need to be able to do to run this little tool is run an app called "Terminal". This gives you a command-line view of what's going on with your Mac.

- Download this script. Move it to your desktop.
- Run Terminal.
- In the "shell" (that is, the command-line window with the prompt visible), type
cd ~/Desktop
Exactly like that. Then hit return.
- Then type: chmod a+x extractXJournal.py
- Then type: ./extractXJournal.py
- Watch for a few seconds while it tells you what it's doing. When I run it, I see this:

antenna@jetboy:Archive> extractXJournal.py
Parsing entries from History.plist.
Extracting entries for 2006 ...
     March ...
     April ...
     May ...
     June ...
     July ...
     August ...
     September ...
     October ...
     November ...
     December ...
Extracting entries for 2007 ...
     January ...
     February ...
     March ...
     April ...
     May ...
918 entries found in total.

- On your desktop, you should have a folder for each year you've been keeping your journal. Inside each year folder are folders for months with entries, then days, then finally! individual entries. Double-clicking on an entry should show you the oh-so-not-pretty extracted post data.
31st-May-2007 03:15 am (UTC)
Yay! Thank you so much! This is a great little tool, and I had no problems running it.
31st-May-2007 07:32 am (UTC)
Yay!
31st-May-2007 04:35 am (UTC)
Thank you so much for this! It worked wonderfully!
You wouldn't happen to know of a backup tool that grabs comments, would you? It's not really important, but I was just curious.
31st-May-2007 08:09 am (UTC)
LJdump will do this, but again, not conveniently the way you'd like to browse them. Have been pondering a post-processing pass for that data as well.
31st-May-2007 05:18 am (UTC)
Hi! Thanks for this, I found it through another post... thanks a lot, I'm currently downloading my posts but the server resets everytime, and the message I get says it's because the network is too busy!
But thanks, I hope I manage to run the script alright!
31st-May-2007 05:29 am (UTC)
This worked great, thanks.
31st-May-2007 08:12 am (UTC)
Excellent!
31st-May-2007 05:38 am (UTC)
Thanks so much! I think I've used Terminal about 2x now... so this is easy enough for even the sadly non-techie-types to use.
31st-May-2007 08:17 am (UTC)
Excellent! I'm a happy camper tonight.
31st-May-2007 06:08 am (UTC)
Hi! I found you via [info]catrinella's post here.

You are amazing and wonderful to put this together and provide it to random passers-by. Thank you! I think the best part is that I got to use Terminal, which made me feel kind of geeky and technosavvy. Whee!

An odd thing I noticed is that, when a post originally referred to a LJ user (e.g., "[info]ardent_muses"), I get a blank space in the text instead of the name. I'm guessing that's something to do with the Xjournal download, because it happens in the Xjournal interface (when I review the history) as well. Or perhaps I'm just cursed among LJ users. :)

I am thrilled to have my posts backed up and readable. Thank you again!
31st-May-2007 06:34 am (UTC)
For what it's worth, the HTML code does show the LJ user's name, but the browsers just don't seem to know what to do with the Lj user "tag". The nice thing is that it's still all there in the code. :)
31st-May-2007 06:29 am (UTC)
Hi *^_^* I got something I haven't seen mentioned here yet....

Parsing entries from History.plist.
Bad plist file:
not well-formed (invalid token): line 4317, column 14

Thought I'd mention it. Please. Pretty please. #^___^#
31st-May-2007 08:12 am (UTC)
Got a fix in progress for you. Check back tomorrow morning. Er, my time, which is US Pacific time.
31st-May-2007 06:39 am (UTC)
Thank you for writing this.

I've been fighting XJournal all day (with a brief stop to try and user LJDump, which can't ever seem to find its config file), and so far no luck. The bloody thing keep crashing on me, which is why I suspect it's the problem when extractXJournal gave me this:

Parsing entries from History.plist.
Bad plist file:
not well-formed (invalid token): line 99, column 52

I'm going to delete the History.plist files and try that again.

31st-May-2007 06:44 am (UTC)
Okay, deleted History.plist. Problem def. on the XJournal side. It seems that the download history feature doesn't support multiple users/journals well, so it'd gotten confused/corrupted when I'd tried to download a second journal's history earlier in the day.

Deleted original file. Tried to get my main journal (this one) which returned a server busy error (haha, like I'm surprised that other paid users are bailing too), so I tried the other one which is unpaid, on another server, yadda yadda.

Downloaded all 54 entries from the second journal, and extracted them without a problem. I made a folder in my home dir with the journal name, and then dropped the Xjournal-created History.plist file in for good measure (and to get it out of the way for the next run).

Hopefully people don't have to do this for every journal they might have - I'm hoping that this is a problem with my particular machine and her various unique issues.
31st-May-2007 08:55 am (UTC)
When I try to download 'extractXJournal.py' it's coming up as a text file. Is it something I'm doing wrong or have you changed the link?

Also, thanks for letting me know about 'Terminal.'
31st-May-2007 04:38 pm (UTC)
Try this link instead. I did that deliberately so people could take a look at the script, but probably it's more annoying than helpful.
31st-May-2007 12:57 pm (UTC)
Thank you so much for this script! It worked like a charm!!
4th-Jun-2007 05:13 am (UTC)
Yay! You're so welcome! (soooo slowly catching up with comments :))
31st-May-2007 01:06 pm (UTC)
+Mem this entry for future reference. I've already used it, but I can't wait for you to add in the LJ-Cut tags as a working link. :D Great job!
1st-Jun-2007 06:16 pm (UTC)
Yay thanks! (And boggle at how slow the comment notifications are...)
1st-Jun-2007 02:55 am (UTC)
A day late, and apparently too dim to run Xjournal. It's open, I hit Refresh in History, and yet... Nothing. I've tried running via Safari and Firefox. Have I missed something?

Thanks in advance for your assistance. I feel like I'm crashing a party and haven't brought anything for the host/ess.
1st-Jun-2007 03:08 am (UTC)
First, make sure you're logged in. Does the "write a new entry" window say you're logged in, show you a user pic, and all that happy stuff? Command-N, then take a look.

Then, Window > History (or command-1) to bring up the history window. The "Refresh" button updates Xjournal's info about what you've written when. Then "Download" fetches all the post contents. You'll see a progress pane saying "Downloading history, 62 of 922" or whatever your numbers are. That might take a few minutes to chug through, depending on how much you've written and how busy LiveJournal is.

Then the next step is where it gets geeky. Quit Xjournal, then run the script as per the instructions above. This thread has the ultra-detailed instructions if you're new to the ways of the Terminal and the unix shell. But to recap:

Download the script.
Move it to your Desktop.
Run Terminal. (Applications, Utilities, Terminal.app)
In a terminal window, type cd ~/Desktop then press return.
Type: chmod a+x extractXJournal.py
Type: ./extractXJournal.py
It'll run for a few seconds, telling you what it's extracting as it goes.
On your desktop you'll now see some folders and a file named index.html. Double-click that, and zoom! Your posts!

Ideally that's what happens anyway. :)
5th-Jun-2007 04:41 am (UTC)
Do I need anything special, other than OS X, to run a python script?
5th-Jun-2007 04:45 am (UTC)
Nope! Python comes with OS X, right out of the box. All you need to do is download & run XJournal, and download and run the script.

You will have trouble with this script if you're running a very old version of OS X, because the version of python will be so old it won't have some libraries the script uses. 10.2.8 is too old, for instance. If you're running 10.4, you're golden.
20th-Jun-2007 09:16 pm (UTC)
I downloaded those files you posted... I tried to read them as well as what you posted, and I confess I understand nothing at all... I tried to do the command-1 thing, but I don't know how it can be done, since I have an ibook G4 that has no numeric pad... I'll try again later, but I confess this makes me O.O.
20th-Jun-2007 09:41 pm (UTC)
A numeric pad shouldn't be necessary: just the plain old number row across the top. The squiggle/command/apple key shortcut is just a shortcut for a menu item, anyway: if you browse through your menu you should see it. But, er, it's a bit confusing if you're not used to using the Terminal to do stuff, I agree.
10th-Jul-2007 07:29 pm (UTC) - you do iz besz
Anonymous
Hi

Looks good! Very useful, good stuff. Good resources here. Thanks much!

Bye





12th-Jul-2007 03:55 pm (UTC) - Thanks much!
Anonymous
Hi all!

Looks good! Very useful, good stuff. Good resources here. Thanks much!


Bye







20th-Jul-2007 12:44 am (UTC)
Thanks so much for the link to XJournal, and sorry for coming into your journal just to ask a question, but I'm having a problem with the script - every time I try to type "cmod a+x extractXJournal.py" or "./extractXJournal.py", I get a message saying "No such file or directory". Am I doing something wrong?
20th-Jul-2007 02:27 am (UTC)
In the Terminal, you're not in the same directory as the one you put the script in. Er. The command pwd tells you where you are in the terminal shell. If you're not where the script is, you can type cd blah, where "blah" is the folder you stuck the script file into. "cd" is short for "change directory", if that helps.

Welcome to the marvel that is the Unix underbelly of OS X.
3rd-Aug-2007 06:06 pm (UTC)
You rule! Thank you so much for doing this!
3rd-Aug-2007 06:20 pm (UTC)
Thanks to today's little exercise in LJ stupidity, I've started work on another tool that should do a little more than this one. I will announce when it's working :)
3rd-Aug-2007 08:56 pm (UTC)
I came here by way of a long-ish chain of links, but I'm so glad I found this. You truly rock - this was so helpful. Worked like a charm!

One question - I did this for my personal LJ, but I'm not sure how to switch over to a community I maintain in the XJournal history view - is this possible?
4th-Aug-2007 12:09 am (UTC)
Thank you so much for writing this (and for the tip to point Terminal at the desktop above) - it worked like a charm for me!
4th-Aug-2007 12:38 am (UTC)
Yay!
Note that I have a better tool in testing now. I will update & notify & inform the world when I'm happy with it. Definitely this weekend.
Page 1 of 2
<<[1] [2] >>
This page was loaded Dec 19th 2009, 10:38 am GMT.