On Nikola

Eight months ago I switched this site to using Nikola. It is a static site generator which takes a bunch of files on disk and spits out your site in HTML, ensuring there are links everywhere as appropriate, applying templates etc. You then serve up those HTML files via any regular web server. (The opposite approach is a CMS.)

Sadly there is one design pattern that has severely affected my use of it. Nikola attempts to only do the work necessary each time you do a build. For example if you create a new post, it wants to only generate the HTML for that post and update the indexes that list posts. This should be a lot faster than generating everything from scratch.

Sadly it is nowhere near that simple. For example a setting in the configuration file could change how posts are output (eg point to a different template) which requires everything to be rebuilt. Or something could have been deleted or renamed, actions that Nikola doesn't know about. It would have to scan the output directory for files and deduce they should be deleted/renamed too. Once you add in templates, processors, configuration settings this gets very complicated.

When I first started using Nikola I quickly discovered that it would frequently get the incremental builds wrong. I did report them to the author who was surprised that any existed, and was sceptical. I have no doubt it always worked for him though!

Through the bug tracker I discovered others seeing even more cases of the incremental builds being wrong. I quickly resorted to doing full clean builds since there is no evidence Nikola would ever get incremental builds right in all circumstances. The result was me being happy.

My long experience has left me with several principles when using software to build outputs:

  • A build then clean then build should give the same results between the two builds
  • Doing the build on different machines should give the same result
  • If incremental building is possible then it must give the same results as a full clean build

If these principles are not met then the output is non-deterministic. That means you can't depend on it, and testing it is problematic.

An example of this is Nikola generated sitemaps. It uses the date of the generated file in the output directory as the date of change in the sitemap. That breaks all 3 principles above. An approach that would work correctly is to give the output file the same timestamp as the input file. Sadly the author refuses to fix the problem.

This is the straw that breaks the camel's back for me. I don't want the content of my site to vary based on what type of build I did, to differ between machines, or when I do the build. So I have to figure out what to migrate to, or write my own.

Writing general purpose software is harder than just writing something for internal use. I applaud the author for making that effort and doing a good job. But there are times where principles matter, and we don't share these particular ones.


Comments powered by Disqus