vydd.space/2022/08/poetry

Poetry

I'm migrating my poetry in Serbian. Blogspot exported my blog as a single hard to decipher XML file. It looks like this:

          
            tag:blogger.com,1999:blog-4660798809155512499.archive2022-06-23T05:33:46.459-07:00le film françaisvyddhttps://www.blogger.com/profile/09137899300722374576noreply@blogger.comBloggertag:blogger.com,1999:blog-4660798809155512499.layout2008-09-10T14:32:16.799-07:002022-06-23T05:33:46.459-07:00Шаблон: le film français<?xml version="1.0" encoding="UTF-8" ?>
                  <!DOCTYPE html>
                  <html b:version='2' class='v2' expr:dir='data:blog.languageDirection' expr:lang='data:blog.locale' xmlns='http://w
          
        

Very funny. Now I'm writing a Python script to extract my posts and updating this file manually using Emacs and TRAMP to edit raw HTML. That date above? C-u M-! date.

UPDATE Take a look at this section to see how I might have done it much more easily using atoma.parse_atom_feed. This turns out to be a proper Atom XML.

This is not version controlled at this point. If it disappears it disappears. I think it'd be interesting to make git commits whenever this file changes instead of commiting to make the file change.

News!

I hear that a new season of Futurama is coming out next year. But that's beside the point. I've managed to tame the export using Python. The script is the worst, but who cares - it's a one shot. You can see the results here. Everything is in Serbian. Once the big migration is complete, I might decide to translate a few I still think are good. Finally, I'm aware that special characters weren't imported with the correct encoding. That will be fixed some other time. UPDATE It was just a matter of adding <meta charset="utf-8">.

Next up: moving this to a proper git repo, then the blog posts I published on this domain.

Here's the script I used, maybe it helps if you decide to migrate your own blog.

I've also found a fun snippet online to display my current progress: Blinking cursor