Poetry
I'm migrating my poetry in Serbian. Blogspot exported my blog as a single hard to decipher XML file. It looks like this:
tag:blogger.com,1999:blog-4660798809155512499.archive 2022-06-23T05:33:46.459-07:00 le film français vydd https://www.blogger.com/profile/09137899300722374576 noreply@blogger.com Blogger tag:blogger.com,1999:blog-4660798809155512499.layout 2008-09-10T14:32:16.799-07:00 2022-06-23T05:33:46.459-07:00 Шаблон: le film français <?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html>
<html b:version='2' class='v2' expr:dir='data:blog.languageDirection' expr:lang='data:blog.locale' xmlns='http://w
?xml-stylesheet>
Very funny. Now I'm writing a Python script to extract my posts and updating this file manually using Emacs and TRAMP to edit raw HTML. That date above? C-u M-! date.
UPDATE Take a look at this section to see how I might have done it much more easily using atoma.parse_atom_feed. This turns out to be a proper Atom XML.
This is not version controlled at this point. If it disappears it disappears. I think it'd be interesting to make git commits whenever this file changes instead of commiting to make the file change.
I hear that a new season of Futurama is coming out next year. But that's beside the point. I've managed to tame the export using Python. The script is the worst, but who cares - it's a one shot. You can see the results here. Everything is in Serbian. Once the big migration is complete, I might decide to translate a few I still think are good. Finally, I'm aware that special characters weren't imported with the correct encoding. That will be fixed some other time. UPDATE It was just a matter of adding <meta charset="utf-8">.
Next up: moving this to a proper git repo, then the blog posts I published on this domain.
Here's the script I used, maybe it helps if you decide to migrate your own blog.