Exit Review: Python 2 (and some related thoughts)

Roger Binns — Tue 08 December 2020

Python 2 has come to an end. I ported the last of my personal scripts to Python 3 a few months ago.

Perhaps the greatest feature of Python 2 was that after the first few releases, it stayed stable. Code ran and worked. New releases didn't break anything. It was predictable. And existing Python 2 code won't break for a long time.

The end of Python 2 has led to the end of that stability, which isn't a bad thing. Python 3 is now competing across a broader ecosystem of languages and environments trying to improve developer and runtime efficiency. Great!

I did see a quote that Python is generally the second best solution to any problem. That is a good summary, and shows why Python is so useful when you need to solve many different problems. It is also my review of Python 2.

So let's have some musings ...

Python has had poor timing. The first Python release (1994) was when unicode was being developed, so the second major Python version (2000) had to bolt on unicode support. But if it had waited a few more years, then things could have been simpler by going straight to utf8 (see also PEP 0538).

Every language has been adding async with Python 3 (2008) increasing support with each minor release. However like most other languages, functions ended up coloured. This will end up solved, almost certainly by having the runtime automagically doing the right thing.

Python 3 made a big mistake with the 2to3 tool. It works exactly as described. But it had the unfortunate effect of maintainers keeping their code in Python 2, and using that to make releases that supported both Python 2 and 3. The counter-example is javascript where tools provide the most recent syntax and transpiling to support older versions. Hopefully future Python migration tools will follow the same pattern so that code can be maintained in the most recent release, and transpiled to support older versions. This should also be the case for using the C API.

The CPython C API is quite nice for a C based object API. Even the internal objects use it. It followed the standard pattern of the time with an object (structure) pointer and methods taking it as a parameter. There are also macros for "optimised access". But this style makes changing underlying implementation details difficult, as alternate Python interpeter implementations have found out. If for example a handle based API was used instead, then it would have been slower due to an indirection, but allow easier changing of implementation details.

Another mistake was not namespacing the third party package repository PyPI. Others have made the same mistake. For example when SourceForge was a thing, they did not use namespacing so the urls were sf.net/projectname - which then led to issues over who legitimately owned projectname. Github added namespaces so the urls are github.com/user/projectname. (user can also be an organization.) This means the same projectname can exist many times over. That makes forking really easy, and is perhaps one of the most important software freedoms.

Using NPM as an example, this is the only package that can be named database. It hasn't been updated in 6 years. On PyPI this is apsw and hasn't been updated in 5 years. (I am the apsw author updating it about quarterly but not the publisher on PyPI for reasons.) Go does use namespacing. A single namespace prevents forks (under the same name) and also makes name squatting very easy. Hopefully Python will figure out a nice solution.

Posted on: Tue 08 December 2020

Category: misc – Tags: exit review, python