On Nikola

Eight months ago I switched this site to using Nikola. It is a static site generator which takes a bunch of files on disk and spits out your site in HTML, ensuring there are links everywhere as appropriate, applying templates etc. You then serve up those HTML files via any regular web server. (The opposite approach is a CMS.)

Sadly there is one design pattern that has severely affected my use of it. Nikola attempts to only do the work necessary each time you do a build. For example if you create a new post, it wants to only generate the HTML for that post and update the indexes that list posts. This should be a lot faster than generating everything from scratch.

Sadly it is nowhere near that simple. For example a setting in the configuration file could change how posts are output (eg point to a different template) which requires everything to be rebuilt. Or something could have been deleted or renamed, actions that Nikola doesn't know about. It would have to scan the output directory for files and deduce they should be deleted/renamed too. Once you add in templates, processors, configuration settings this gets very complicated.

When I first started using Nikola I quickly discovered that it would frequently get the incremental builds wrong. I did report them to the author who was surprised that any existed, and was sceptical. I have no doubt it always worked for him though!

Through the bug tracker I discovered others seeing even more cases of the incremental builds being wrong. I quickly resorted to doing full clean builds since there is no evidence Nikola would ever get incremental builds right in all circumstances. The result was me being happy.

My long experience has left me with several principles when using software to build outputs:

  • A build then clean then build should give the same results between the two builds
  • Doing the build on different machines should give the same result
  • If incremental building is possible then it must give the same results as a full clean build

If these principles are not met then the output is non-deterministic. That means you can't depend on it, and testing it is problematic.

An example of this is Nikola generated sitemaps. It uses the date of the generated file in the output directory as the date of change in the sitemap. That breaks all 3 principles above. An approach that would work correctly is to give the output file the same timestamp as the input file. Sadly the author refuses to fix the problem.

This is the straw that breaks the camel's back for me. I don't want the content of my site to vary based on what type of build I did, to differ between machines, or when I do the build. So I have to figure out what to migrate to, or write my own.

Writing general purpose software is harder than just writing something for internal use. I applaud the author for making that effort and doing a good job. But there are times where principles matter, and we don't share these particular ones.

Gay Marriage

There has been a lot of fuss this week over Brendan Eich becoming Mozilla CEO. The issue is a donation against gay marriage. For many, especially outside the US, it can be hard to see why this is a big deal, irrespective of your agreement with the issue. Twenty years ago someone was kind enough to explain it to me.

According to the United States Government Accountability Office (GAO), there are 1,138 statutory provisions in which marital status is a factor in determining benefits, rights, and privileges. [1]

Rights and responsibilities of marriages in the United States

Campaigning against gay marriage isn't about the marriage itself (whatever your moral framework), but also denies those 1,138 benefits, rights and privileges to gay couples. It makes them less equal than opposite sex couples. Understandably, making other people less equal is an issue.

It seems very contradictory to be all about inclusiveness at Mozilla, but against it outside of work for those same employees, not to mention Mozilla's customer/user base. Brendan has never publicly explained his opinion nor changed his mind - statement from 2012.

I personally think this is an important human rights issue - I'm for human rights and equality, and hope you are too.

[1] Note that those benefits, rights and privileges don't only apply to the married couple themselves, but also impose obligations on others. For example a hospital has to let the married partner visit or make care decisions, but doesn't for unmarried people. This is an example of what can happen.

Health Insurance Profit

The dysfunction in the US healthcare system, especially around how everything gets paid for is well known. My health "insurer" [1] Anthem Blue Cross of California, a subsidiary of publicly traded Wellpoint has a new way [2] of bumping their profits.

They aren't allowed to do premium changes targeting individuals, but can do it for groups (eg age ranges, geographical areas). They group people into age ranges (eg 40-44, 45-49). When your age changes into the next range up you get a ~25% premium increase [3]. The clever bit is that they don't actually wait until your birthday and instead increase the premiums near the beginning of the year. Consequently a 39 year old pays the increased premiums of a 40 year old for on average 6 months. Across their customer base this adds up quickly.

California has a regulatory agency Department of Managed Healthcare who are the regulatory agency for my plan, so I submitted a complaint to them. Sadly I got the usual nonsense in return. Like many customer support organisations, they have 10 answers and no matter what the issue the goal is to give you the closest answer to your question no matter how relevant it actually is. The answer to me was about how they aren't a regulator, plans aren't regulated etc, which is rather comical given just how often they describe themselves as exactly that. At this point I give up and pay the penalty for having a birthday late in the year. Score one point for the system.

On the technical side, the DMHC approach is beyond comical. There is lots of use of the word "secure" as in "secure web portal" and "secure email". Their response to me was an email that looked exactly like malicious emails. It was an envelope image with "click here" in the middle, and no other information about sender, why I would want to, or what the heck was going on. It was only by examining the email headers and additional digital sleuthing I was able to work out that it was actually a legitimate email. Clicking the link gave an error while copying and pasting it into the browser worked. It then proceeded to force me to setup a username and password to read the email. I finally got to read the email answering something I didn't ask, and ignoring my actual issue. When I later wanted to reread the answer, reproduce it here etc I couldn't. I kept being told I had to go to my "Inbox" to do so without any indication as to where (or what for that matter) that inbox is. I also noted how several pages had a footer saying Copyright 2011 Microsoft. Nothing says "secure" like "we haven't updated this in many years".

[1] What is provided doesn't really resemble actual insurance, and is closer to a payment and costs obfuscation mechanism.
[2] Compare to the old ways and look at how many times they have been fined.
[3] This is in addition to the historic 22% annual increases.

Recommended: Hardcore History Podcast

I highly recommend the Hardcore History podcast. The presenter Dan Carlin is an amateur historian (ie no professional reputation to protect) thoroughly researching each subject. He then takes as much time as needed to cover various topics.

I tried to come up with a list of recommended episodes, but you really should listen to all of them - they are that good.

Lifecycle of a Linux distro

There seems to me to be a pattern for how many Linux distributions gain prominence and then fade away. Here is a family tree up until 2012. Doing a distribution is a lot of work needing components like installers, package managers, pulling in upstreams regularly, configuration, security etc.

The genesis is a need not being met. It is fascinating what these have been historically ranging from how stable/fast moving they are, how they are built, preferring certain software (eg a particular gui like KDE), wanting a certain kind of community, targeting certain users or uses, and numerous other reasons.

What this means is that as a distro starts it has a reason, a focus and a way to see what work is necessary. The problems start once it becomes successful. It becomes a lot easier to add "one more thing" to the distro such as another package, another configuration option or even good old fashioned feature creep. This will gain more users and make the distro better.

But the larger distribution, more users and wider scope makes the distribution harder. The hardware it is used with, the software it interacts with, changes in upstream and the users all lead to far more work. Every change breaks something, but not changing pieces also breaks due to changes in others. The symptoms of this show up in the bug trackers with increasing numbers of open or abandoned tickets.

Some distros respond by narrowing their audience (eg charging for it, restricting use cases, narrowing hardware and software etc). Others have their developers pull back into the work they are most interested in leaving the other parts languishing. Sometimes politics break out as various parties fight for what they think is important.

Ultimately, people then see their needs not being met causing a whole new cycle of distros.

Monarchs

Every year some some Monarch butterflies overwinter in Santa Cruz. I finally went to have a look.

There weren't quite as many as I expected but that is probably due to the weather and the afternoon visit.

The images are here along with some of West Cliff Drive and the sky. The video (no audio) is looking up and around. It is a bit shaky due to being zoomed in and image stabilisation!

Long Read: Motorbiking around Angola

This is a story I reread about once a year and recommend to everyone who will listen. In 2007 some South Africans took their motorbikes around Angola. There are lots of pictures, stories about the Angolan people, touching parts, motorbiking and just an overall wonderful read.

The first part is here. Part two is a few posts down. That part then has links to each successive part.

Moving to Github

I have two active open source projects, and have been hosting them at Google Code. Now I'm moving them to Github. I'm still tidying up odds and ends.

Google Code (old home) Github (new home)
https://code.google.com/p/apsw/ https://github.com/rogerbinns/apsw
https://code.google.com/p/java-mini-python/ https://github.com/rogerbinns/jia-mini-python

So let's start with some of the good things about Google Code:

  • Mercurial: I prefer Mercurial as my DVCS of choice, and it is well supported.

  • Decent Web Interface: Because code hosting sites are built by developers they tend to have idiosyncratic usability (architecture astronaut style), with Google Code being the least worst. For example on Github you can't even sort issues by priority. It is completely absurd.

  • Multiple repositories per project: The popular hosting services have an issue tracker, wiki, links etc per project. With the exception of Google Code, you only get one source repository per project. DVCS encourages a style of single purpose smaller repositories versus the large agglomeration that was more typical in the subversion and CVS days.

    It is far more pleasant being able to use a single issue tracker, wiki etc for a group of related source repositories. Only Google Code does this.

So why leave?

  • Downloads terminated: I have 14 files per release of APSW (most are Windows binaries since Windows developers rarely have compilers and related installed) and one for the other project. Google no longer want downloads, yet that is the primary way others use my projects. Google also do not believe in supporting Linux for Drive, not to mention that it is a terrible alternative.
  • No future: There is no indication of any ongoing interest in Google Code and they refuse to take money for it. They keep shutting down services, and it is good practise to not be dependent on them. (No customer service, erroneous disabling of accounts etc.)

There are three choices for where to go:

  • Sourceforge: No chance
  • Bitbucket: I used them in the past, but they couldn't support personal and corporate use at the same time. They supported Mercurial, but Atlassian means an enterprisey design (I hate using Jira). Most importantly when I asked if they would guarantee availability of a download service for any period of time they said they would not. So they are out.
  • Github: There are many things I don't like about Github, but at the end of the day they are the least worst, and only practical alternative.

This is how I converted my projects to github:

  • Use fast-export to convert the Mercurial history to Git. (In theory you can talk Mercurial to Github but it isn't worth it. Also THG lost the ability to do line by line commits.)

  • Use github pages to generate documentation into. However do not follow all those instructions - in particular the gh-pages branch needs to be created as an orphan as it has nothing to do with the master branch and its files. (Better instructions).

    The pages didn't render correctly, which was because they returned 404 for stylesheets and images. You have to do some magic to fix it.

  • Use Google Code Issues Migrator to copy all existing issues over to Github and keep the same ids.

  • Grep the source tree for all mentions of `code.google` and fix them.

  • Edit the projects at Google Code to say they have moved

Before doing the real projects I experimented on a test repository to make sure that the documentation and releases worked. Sadly releases have to be done manually because there don't appear to be any usable command line clients (not even github's one) and none of the libraries I looked at did releases either.