vydd.space/tag/meta

Lisp Again!

I've been a sucker for Common Lisp ever since I learned it. When I started the blog, I chose to use Coleslaw, a static blog generator written using the language. It worked great for writing blogposts really, but when I started migrating my complete online presence here, it suddenly felt like an unnecessary constraint. I wanted to be able to customize the experience, create private areas of the website, obtain full control. This is also known as the Lisp Curse - even if Coleslaw might have supported everything I needed, possibly with some tweaking, it was hard to not fantasize about basing the site on my own blogging solution. Knowing about the curse, I counterintuitively tried ignoring Lisp, and decided to build functionality piece by piece, using tools most appropriate for the task at hand. In the case of OpenID Connect integration I used Python, as good libraries were already available, and that worked fine for a while. I was writing more, and it seemed that I had control. But then I hit a bump in the road. In the interest of speed and developing the blog in the leanest way possible, I appended every post to one same HTML file. Blasphemy, right, but I'm glad I did it because that helped me write quick and write a lot. When it was time to split the posts into individual files, I naively assumed that HTML5 would come to rescue, because surely there would be a way to import pages by now. There is and there isn't. So now I had to decide if I would resort back to using a static site generator, if I would start using a standard platform like Wordpress, or write my own thing. Obviously, I decided on the latter.

What you are seeing now is rendered through semistatic, two hundred lines of code for something in between a static site generator and standard blogging software. I'm going to try to stabilize it a little in the coming days before putting it online, but here's what I tried designing for:

  1. I should still be able to write standard HTML pages for my blog posts. If it turns out I want to create a post with non-standard structure - a microsite really - it should come with zero overhead.
  2. In the unlikely event any of the posts I write becomes popular, semistatic should be smart enough to route the viewers to the original static version.
  3. The software should be smart enough to do the basic things that the static generators do: aggregate pages, create feeds, group posts by tags. It should also be able to resize images, apply common styling, and in general work seamlessly with static data without using any special syntax. It should support transforming posts in any way imaginable. Syntax highlighting, math formulas - which Coleslaw can do by the way.
  4. Where it gets interesting is that I want to support functionality that has to be backed by a dynamic service. If I want to restrict access to certain posts as before - that should be possible as well. If I want to show how many people are viewing the post at the same time, if I want to support Medium-style highlights, ActivityPub - that too.

Exciting, right? Let's find out how bad of an idea this was while I continue migrating stuff.

Unmanageable

The only thing I've done for the website in the past month was writing the Aurora travel log, which still stands unfinished. This is ok, there's not much point in having a website if there is no content. What bothers me however is how unmanageable the whole site has become very quickly. On one hand, that was to be expected, since I was just trying out to pump out as much as I could as quickly as possible, but still, using a single file for this feed, and another file for the blog is making edits uncomfortable. This actually became very evident when I was trying to make RSS work. Ideally, I would have a separate file for each post, a clean sheet every time I try writing something new, but how do I show multiple posts on the same page just by using HTML then? There doesn't seem to be a good way to do this. HTML imports were briefly supported in Chrome, but that - to me very obvious - feature didn't get much traction with other browsers, so Chrome killed it too. I really don't want to depend on server-side generation of content, so I think I'll have to accept that there's going to be only a single post per page for now.

This is the plan of action:

  1. Move blogposts, the travel log, and optionally poetry to individual pages.
  2. Use semantic HTML to parse it more easily for syndication.
  3. Keep this feed and the index as a single page for now.

By the way, I removed the notice related to the old content.

Feed

Writing about this website made me write more in general these couple of days. Sharp-eyed readers will notice a new blog post on the topic of walled gardens, and another story which I'll share when it's finished. This new situation - which is making me extremely happy by the way - made me think about how I might publicize the newly created content. It seems that the easiest way to start would be to use Atom or RSS. I believe both are usable by people who are using that kind of stuff, but I do want to look at both in a bit more detail to see which one I prefer as a producer. Due to the way I'm updating this space, I'm probably going to need to build a custom solution to create the feeds. Concretely, I'm posting content as I write it and planning to continue doing so, so I'll need to find a way to syndicate updates only when they are done. Optionally, I might also spend some time on pushing live updates to the website, at least the way JIRA (which I dislike intensely) does it where a notification pops up every time there's an update, but the current view doesn't actually change.

It seems that Atom is more modern, but that it doesn't something called <encapsulation> which is apparently useful for podcasts. I'm not sure if I'm going to be doing any in near future, so let's do Atom first. Maybe supporting both later is not going to be hard.

I was looking into libraries I could use, just to play with it, and found atoma, which is used for parsing, and that then turned my attention to how I've migrated my poetry. What I encountered then was a weird XML which I ended up parsing forcefully - but what if I had used this library? It's amazing how short the short term memory is, because it turns out I had actually installed it for that project, but when I tried it out it failed.

          
          >>> atoma.parse_rss_file('backup.xml')
          ...
          atoma.exceptions.FeedParseError: Cannot process RSS feed version "None"
          
        

I'm sure you can see my error here. I tried parsing the file as RSS, instead of using parse_atom_file to parse my Atom backup. Here's what the official README is saying about parsing at the time of writing:

          
          # Load and parse an Atom XML file:

          >>> import atoma
          >>> feed = atoma.parse_rss_file('rss-feed.xml')
          
        

Well, this is simply not true, and I just posted a PR to fix the docs. Atoma still seems like a great library, but the docs need some work. Finally, it doesn't seem that it has support for Blogger drafts, so parsing the feed manually turned out to be useful in the end.

So this was one huge distraction. Let's see what needs to be done to actually export the feed.

I stopped myself from adding one of those 5 hours later Sponge Bob Squarepants intermissions at this point. It seems that one of the cleanest way to do this would be to start using microformats2 when writing these posts, and then having a script look for changes in the HTML at regular intervals. I'm going to try to create a mock document and see how easy it really is before committing to using it in this and other pages.

Instagram, part 2

End of detour. Now that the authentication mechanism is up and running, I can start migrating Instagram images. Conveniently, the export contains a folder with photos partitioned in folders for each month, and a json file containing references to all the photos, grouped by post and tagged with a timestamp. Less conveniently, bandwidth costs are scary. My setup on Vultr comes with 1 TB of bandwidth. That might sound like a lot to some, but 250MB of images I'm planning to host can burn through that very quickly. Moving hosting providers now seems like too much of yak shaving, so I won't be doing that now, but it's definitely something I'm going to need to look into. Hetzner for example has a very cheap plan offering 40x more bandwidth. Tempting. I'm starting to wonder if the reason why Instagram is presenting images in a feed and aggressively trying to entwine them with ads is a fix for bandwidth costs. I'm also realizing how wonderful it would be if people could advise me in comments here, but the comment system is also something I'll need to leave for later.

Let's assume for now that my Instagram photos are not going to be downloaded 4000 times every month, and just jump to the creative part. I was thinking about displaying images similarly to how it's done in iOS Photos. All photos would be in one giant grid zoomed out. Selecting an image would present it on the screen in full - either in a side panel, or overlaid above everything else.

Ok, scratch that. Why would anyone want to navigate a wall of photos lacking description? Maybe for the kind of fix Instagram gives, but really, who cares. I'm going to write stories around photos I posted to Instagram, starting from the most recent events. If there are photos that seem more private, I'll just move those to the restricted area and link from the main content.

Why can't I open sftp://vydd@vydd.space through Finder? I just downloaded Cyberduck and Chrome decided to open the archive in Calibre. Sometimes using computers is a nightmare.

OpenID Connect

I've decided to use OAuth 2 and OpenID Connect to restrict access to certain parts of the website. Also, I've started a new section because this won't have much to do with Instagram - it's going to be a general mechanism which I'm hoping to use in a generic way. Here's 'the flow I have in mind:

  1. Restricted parts of the website can be full pages, or something as granular as a single image or a file.
  2. When a Visitor tries to access such a restricted part of the website, they are presented with the well known "log in with google / facebook / twitter / what have you" form.
  3. After completing the flow, the Visitor is put on the waitlist, of which I'm notified.
  4. I can then decide if I want to let the person in or not. If I do, they get an email notifying them that they are not in the waitlist anymore, and they can see the content.
This should sound familiar to anyone who's ever requested access for a Google Doc.

It seems the simplest to start with using a well maintained auth library such as authlib, so that's what I'm going to do, at least for the prototype. Going forward, I would like to invest into rebuilding that using Common Lisp. I'm also going to use this tutorial to setup everything.

To start, the program is going to be simple. The program will keep a list of ids retrieved from OIDC providers in memory, with a backup as a single text file, and cookies will be used to store encrypted session data necessary to decide if access restrictions should apply. I haven't used cookies in a while due to the whole world moving to "stateless", so maybe this is not how it should be done. I am expecting to find some problems with session durations and the opportunity for sessions to be stolen, but that bridge will be crossed later.

Authlib seems really nice. The example they give for integrating OAuth 2.0 login with Flask just works. I've created a page here which I'm going to use for testing. Feel free to take a look while you can!

Sometimes I feel like I understand Git, and then it just knocks me down with it dark magicks. Why is something like git config receive.denyCurrentBranch updateInstead regarded as - sane? Have a look at the documentation, then think about the time your team fought over naming, and be glad none of your teammates suggested anything as ridiculous as this. Why am I talking about Git arcana? I just want to push the auth server code to the server.

In other news, old tricks for quick development cycles still work. Whenever I need to integrate with webhook APIs or something like an OAuth 2.0 provider I'm integrating with now, I just make an SSH tunnel using ssh -N -R 8000:127.0.0.1:5000 vydd.space, and let nginx forward traffic I want to handle to 8000. Easy.

Not much progress today. I've updated Python on this server to 3.8 - a more recent version needs an update to the operating system itself. I've also written a script for manual user approval. It's nothing spectacular; it outputs emails of people who are in the waiting state, and it approves those whose emails I type in.

Following is working! Now I need to set up the auth server service so that it always runs in the background and that it's restarted when it inevitably crashes. Gunicorn has extensive deployment docs, so I'm just going to try to copy that. If it wasn't obvious already, I would completely fail at this without a search engine.

I managed to stop myself from adding a random image depicting success. You are going to get a random code dump though:

          
            $ service space.vydd.auth status
            ● space.vydd.auth.service - vydd.space auth server
          
        

This means that my auth service is happily serving auth logic, supervised by systemd. Remember that page I linked before? If you try accessing it now, you'll be greeted with a rudimentary follow page which will let you use your Google account to get on the waiting list. As there is no notification system set up yet, you will need to notify me that you are on the list manually. While you are waiting to see the page, I'll continue working on migrating photos from Instagram.

Blog

Restoring the blog posts should be pretty easy. I'm not sure what to do with tag links though. For now it might be the easiest to just remove them. Eventually, however, I would like to be able to add tags to every part of the content and deeply link as much content as possible. Sidenote: there is something about Project Xanadu that I feel is missing in the everyday web experience.

Looking at how little content I have, I think I'll just go ahead and migrate it manually.

Manual work is the worst, but I would've probably spent more time trying to write a script to do it. Good news again; the blog is up.

Git

Ordering posts is weird. I've decided to look at each step as a thread of events in asceding chronological order, but order the steps in reverse. This way, the newest part of the migration story is always at the top, while its contents can be read in order as if dates weren't there.

With the old blog, I didn't like how I had to use git to create posts. Each time I made a commit, I had to wait a considerable amount of time for the git hook that triggered static site generation to complete. It felt worse than compiling code. What I'm doing right now however is instantaneous, and ideally it will stay that way. Here's what I have in mind:

  1. I will keep adding content directly on the server.
  2. There will be a job that watches for changes in the directory. If there are any, it will commit to the local git repository.
  3. To make backups, I'll have the job push to a remote - github for example.
  4. In the hopefully not too far future, I'm going to make an entry mechanism for types of content I'm working with often, and have that replace my manual HTML work.
  5. This will also enable adding content from devices such as my smartphone - or other people's computers.

I didn't know that magit works well with TRAMP! By well I mean that it - just works. I've cloned the repo to my computer, and now I have backups. Onwards to monitoring the repo for changes.

I correctly guessed that git should have an option to return a non-zero exit code when there are uncommitted changes in the repo. I'll try using a simple cron job to look at that exit code and commit if needed. That's actually silly. I can just make the script git commit -C my-directory -a -m "automatic update" whenever it runs.

I've added a line in my crontab to do this. Hopefully, after I save this now, the commit will happen automatically.

          
            * * * * * git -C /home/vydd/www/vydd.space commit -a -m "automatic update"
          
        

I don't think there's 'any benefit to actually monitoring the files. UPDATE It's working!

With a (private) github repo up in addition to the local one, I think I can stop worrying about losing all this work. I just need to make sure I synchronize both remotes regularly.

Instagram, part 1

I used to love Instagram. It was very easy to feel creative, sharing was fun. Then Instagram became part of Meta's dystopia. While I've always been patently aware how much personal data is being shared to fuel the advertising machinery, I thought that my browsing habits which include frequent usage of the Incognito mode and regular deletion of all tracing data I'm able to find would protect me from it all. It does sound very naive now that I'm writing about it. One day a friend sent me a screenshot in which he highlighted that Instagram was recommending him an ad based on my interest in it. I slipped, and I don't know when or how. Then we were chatting about it over Viber about how everything has gone to hell, and in less than ten minutes he sent me another screenshot - this time it was a news agency recommending him an article on data protection. Now that was scary and I really can't see how that could've been a coincidence. There and then I decided I would take the time to move all of my posts away from the prying tentacles of Meta and friends. This is me actually doing it.

Migrating text was an interesting exercise, but now comes the hard part. I need to figure out how to migrate 800M of media I downloaded from Instagram so it's easy to browse, it doesn't cost me much in bandwidth, and if possible in a way that would let me limit access to my friends. I'm assuming I'm going to need to use a CDN, but I've never set it up privately. For access, I could solve it in a quick and dirty way by installing a .htaccess file, or whatever the nginx equivalent is, but I'm afraid that the already limited audience of this page will lose the little motivation it has for it that way and that the whole exercise will not be worth the trouble.

I'm going to get some sleep and let the solutioning happen in the background.

Poetry

I'm migrating my poetry in Serbian. Blogspot exported my blog as a single hard to decipher XML file. It looks like this:

          
            tag:blogger.com,1999:blog-4660798809155512499.archive2022-06-23T05:33:46.459-07:00le film françaisvyddhttps://www.blogger.com/profile/09137899300722374576noreply@blogger.comBloggertag:blogger.com,1999:blog-4660798809155512499.layout2008-09-10T14:32:16.799-07:002022-06-23T05:33:46.459-07:00Шаблон: le film français<?xml version="1.0" encoding="UTF-8" ?>
                  <!DOCTYPE html>
                  <html b:version='2' class='v2' expr:dir='data:blog.languageDirection' expr:lang='data:blog.locale' xmlns='http://w
          
        

Very funny. Now I'm writing a Python script to extract my posts and updating this file manually using Emacs and TRAMP to edit raw HTML. That date above? C-u M-! date.

UPDATE Take a look at this section to see how I might have done it much more easily using atoma.parse_atom_feed. This turns out to be a proper Atom XML.

This is not version controlled at this point. If it disappears it disappears. I think it'd be interesting to make git commits whenever this file changes instead of commiting to make the file change.

News!

I hear that a new season of Futurama is coming out next year. But that's beside the point. I've managed to tame the export using Python. The script is the worst, but who cares - it's a one shot. You can see the results here. Everything is in Serbian. Once the big migration is complete, I might decide to translate a few I still think are good. Finally, I'm aware that special characters weren't imported with the correct encoding. That will be fixed some other time. UPDATE It was just a matter of adding <meta charset="utf-8">.

Next up: moving this to a proper git repo, then the blog posts I published on this domain.

Here's the script I used, maybe it helps if you decide to migrate your own blog.

I've also found a fun snippet online to display my current progress: Blinking cursor