I've implemented full async support in APSW with documentation here. This is about the how of the implementation. The why is that I kept getting requests for it, and async programming has become very popular.
What is async?
A common problem for programmers is how to run several tasks at the same time, such as some database, network, and file access. You don't want to do them one after the other because that will take the sum of the times of the individual items.
You will eventually end up with a solution that lets you start the multiple tasks and wait for them all to complete, such as Python's concurrent.futures. It works, but quickly gets clumsy when you want to add timeouts, and cancelling tasks if their peers failed.
The "secret" about code that does networking is that it spends most of its time doing nothing, just waiting for more data to be received and responses to be sent. Rather than use heavy weight threads and processes, it instead uses an event loop in a single thread managing task progress with incoming and outgoing data, as well as timeouts. In the olden days that meant a loop using select, and its modern version io_uring. Writing code with this is painful though, especially when network code can go through multiple phases tracking their context, any of which could error. It made code hard to read, hard to write, impossible to debug, and had names like callback hell and pyramid of doom.
A new pattern emerged called async / await. The idea
is that calls don't return the result directly, but rather something
you can await when you need the result or error. It is easier to
read, write, debug etc.
# old way - does them one after the other
response1 = fetch("url1")
response2 = fetch("url2")
response3 = fetch("url3")
# new way - ask an async framework to start them all first ...
task1 = asyncio.create_task(fetch("url1"))
task2 = asyncio.create_task(fetch("url2"))
task3 = asyncio.create_task(fetch("url3"))
# ... and then process the responses as
# they complete
response1 = await task1
if response1.status == 200:
# etc
...
But this has an important consequence. fetch in the first block
returns a direct result, while in the second block it returns
something that has to be await-ed. In language implementations it
also required functions like fetch to be marked as async
because they needed different internals, and there are restrictions
because only async functions can work with an event loop.
This led to the humorous (to developers) post What Colour is Your Function. It describes the world today for many programming languages including Python where you have to colour your functions.
Consequences
It is very common for there to exist libraries providing some functionality, and a parallel variant that provides an async interface to the same functionality. Often the async version just calls the non-async version in the background, while providing equivalent async marked APIs.
But this is a lot of work - the async variant has to essentially duplicate the API surface of the non-async library, and keep it in sync. In practise that means the async ones have a subset of the API, and lag changes. Even documentation and type information can lag, because this is a thankless work.
A few libraries now do it the other way around, writing the async version as the primary, and then having a secondary library to make sync versions of the same code. It is still a lot of work and duplication.
Async Python SQLite
The Python standard library provides sqlite3, but it isn't async.
Several folks have made async versions that call it, with these being the ones I found: aiosqlite, aiosqlite3, asqlite, py-asqlite3, asqlite3, fastsqlite3, rapptz-asqlite, anysqlite3, sqlite-anyio, anyio-sqlite, anysqlite
There are multiple Python event loops including asyncio (part of Python standard library), trio (a better API), curio (OG that recently became EOL), and anyio (abstracts the actual implementation). The various async versions support different subsets.
Behind the scenes, the implementations are the same. Provide an async function that gathers its parameters, and send them to a background thread for execution, providing an awaitable result. Some go further and provide iteration of query results, so you can use async for. None of them provide the full API of the sqlite3 module.
And none of them let you use an async function in the numerous callbacks SQLite provides. So for example you can't register a function named fetch and then call it from SQL.
The existence of so many async SQLite wrappers shows the demand is there, but the maintenance, event loop support, and Python version updates over time is hard work and hard to keep up.
APSW async principles
I had been considering providing async APSW for a while, and kept thinking about the trade-offs and functionality. That led to a list of principles.
Virtual tables
Virtual tables are the natural case for async. The examples given are to fetch information via network requests to the cloud, and that is a perfect match for async. Note this is calling async code from APSW, not APSW being async itself, but the user would want both at the same time.
No separate package
I am not going to publish a separate async-apsw or similar package. It would require separate documentation, separate maintenance, a system could end up with different versions of sync and async APSW installed etc.
Complete
I am not going to do a small subset of the APSW API. It is important to me that everything you can do from C code with SQLite, you can do from Python code using APSW. That means you should be able to do all the things in async too.
Worker thread
The SQLite C code is inherently synchronous and does not have non-blocking or cooperative scheduling APIs. The only practical approach is to run the SQLite code in a background worker thread. It is possible to have pools of threads and not care which one is used for each call, and APSW is thread safe for this, but that makes things harder to debug, document, and test. It would also result in deadlocks on re-entrant calls. So one thread is the best solution, especially as it ensures a predictable order of operations.
Ergonomic
A common pattern with async wrappers is having different names for doing the same thing as sync versions. So if a developer wants to register a callback there would be register_callback method and a register_async_callback method. Wherever possible that should be avoided, because code should be able to figure it out.
Developers using the sync interface must continue to find using the APSW interface simple and pleasant. That must also be the case for async developers.
Automated maintenance
I am not going to do duplicate manual work. As each SQLite release adds and updates APIs, I have to manually update the APSW C code to reflect that. But I really am not going to repeat the work again for async.
APSW has extensive checking code and tooling behind the scenes for argument parsing, error handling, SQLite mutex handling, database of all objects, methods, attributes, docstrings, and more. I often deliberately update code wrong to verify the tooling detects problems. It must also catch issues with async code too.
What I dreaded
It seemed like the approach would be an automated version of what the other wrappers wrote by hand. For example for Connection.set_busy_timeout I'd generate something like this for an AsyncConnection class.
async def set_busy_timeout(self, milliseconds: int) -> None: # there would have to be a call to send execution to the worker # thread (exec_in_thread in this case). The AsyncConnection # would have the underlying sync connection as a member, return self._exec_in_thread( self._sync_connection.set_busy_timeout, milliseconds)But it is ugly. I want immediate exceptions for arguments such as passing a string instead of an int for milliseconds. And I'd have a very hard time verifying the code is correct with automated tooling. It gets more complicated adding in cursors, blobs, backups, sessions etc. (Again notice how the human written async wrappers omit virtually all of those.)
I also have the vast majority of the coding of APSW in C. This results in better performance because of the direct connection of the Python C API and the SQLite C API, but also I extensively use statement expressions in debug and testing builds to verify the integrity of Python and SQLite objects, mutex and GIL state, etc. Those are not visible at the Python level.
No arbitrariness
Unless there are overwhelming good reasons, there shouldn't be limitations. They are annoying to test, annoying to document, annoying to developers. For example that means you should be able to have sync and async connections in the same process. You should be able to use asyncio, trio, anyio, or anything else that comes along in the future.
I should provide defaults, but where more specialised behaviour is needed it should be possible to do so.
It isn't easy
Doing it right and complete is hard. The first 90% is trivial - generate code like above. But the last 10% is the hard part and what has to be right. I look through the issues and pull requests of existing similar projects to see what the edge cases and feature requests are. It is important to look at the closed issues too. This has saved me several times from bad design decisions, and shows what should be covered in the test suite. For example here are the issues for the popular aiosqlite project.
It begins
My first test is what happens if I provide an async function as a callback? The answer was that calling it gave back a coroutine object. At the C level they can be detected by calling PyCoro_CheckExact. A call to asyncio.run_coroutine_threadsafe will send that to the event loop, and let me wait for the result.
Contextvars for the win
To call run_coroutine_threadsafe you need the loop to run it on. In the olden days you could use thread local storage to stash a value somewhere. These days Python has contextvars which work like thread local storage for normal code, but are also async aware.
That made life extremely easy. Stash the event loop in a contextvar, and everything works. I did some manual code that ran in the async event loop receiving a call and forwarding it to the worker thread passing along the loop, and had async callbacks in APSW.
I use PyObject_VectorcallMethod (and the non-Method suffix variant) exclusively in APSW to call callbacks. It was trivial to make a version that checked if the result was a coroutine, and if so use the contextvar to send it to the event loop for evaluation.
A quick #define later, and every single callback in APSW now supported async functions. There are about 120 of these callbacks!
(It later got changed to thread local for semantic and efficiency reasons, although a controller could do either depending on needs.)
Controller
At this point I started formalising the Python code that would make everything work. It needed to start and stop the worker thread, send calls to the worker, and send coroutines back to the event loop for execution. I wrapped this up as a Controller interface.
This abstraction is necessary to support different event loops, but also to allow different implementation decisions. For example it is possible to log every call, gather statistics about how long they take to execute, change the priority of the worker thread, and many other details. I provide a standard implementation, but developers can easily make their own, as they are under 50 lines of code.
The controller to use is of course found by consulting a contextvar.
Calls
The C code wrapping each SQLite C API starts something like this, with the pattern being check for closed, figure out the arguments, acquire mutexes and then do the work.
static PyObject*
Connection_something(PyObject *self, PyObject *const *fast_args, Py_ssize_t fast_nargs, PyObject *fast_kwnames)
{
/* Sets an exception and returns if connection is closed */
CHECK_CLOSED();
{
/* tool generated and updated code that converts
parameters into C types goes here */
}
/* check database mutex can be acquired in order to
correctly handle concurrency */
DBMUTEX_ENSURE();
/* do the actual calls into SQLite here */
I could easily change the mutex acquire to fault if the connection is async and this is not the worker thread.
This path meant I didn't need separate sync and async classes - the connection knows which it is. I added a simple helper after the parameter parsing that if it is an async connection, uses the controller to call the same function with the same parameters but in the worker thread, returning an awaitable object for the result.
This means the same C implementation supports both sync and async execution paths
Timeouts and cancellation
None of the existing SQLite wrappers support timeouts or cancellations. Using my original example of fetching network resources for virtual tables, they can sometimes take a very long time so it should be possible to set an upper limit on how long something takes, because SQLite is blocked while waiting.
Best practise is to use structured concurrency when you have a group of tasks running at the same time. The concept is that you logically group them and wait for the completion of the group. If any fail, then any outstanding tasks are cancelled. You'd ideally also want a timeout for the group.
It took me two attempts to get this right. The timeouts and cancellations are in the async event loop, and need to be carried to the worker thread, and back again with any async callbacks.
I had originally started out using asyncio's Future
as the model, but that turned out inadequate. Each of the async
frameworks have the concept of event loop time which can advance
differently than real time - for example if you are testing a 10
minute timeout you don't actually want to wait 10 minutes. It is
necessary to track deadlines in the event loop time, and separately
for in the worker thread where I use time.monotonic. Trio
and AnyIO let you get the current_effective_deadline but
asyncio doesn't, so I had to provide a contextvar for that.
Fine tuning
Benchmarking insights
It is always important to make sure the code performs as expected. I
wrote a quick naive test comparing to aiosqlite, and found performance
was dismal. Tracking it down showed the root cause was iterating rows
returned from a query. Python uses a __next__ method behind
the scenes to get the next iteration value. I was doing just that
requiring a trip through the worker thread for each row. That turned
out to be very expensive.
aiosqlite was getting batches of rows (default 64) in the worker thread, and then iterating the batch directly in the event loop. That meant having to do the same which included making the batch size configurable via a contextvar. I initially made the C code be async, but later had to redo it so Python code did the work. I put effort into ensuring this batching is invisible - for example if 7 rows and then an exception happen, code will see exactly that. (aiosqlite will only provide the exception.)
Benchmarking is now included in the tools and doc. I was also surprised at how slow the inter-thread communication was. As a result of feedback in a discussion group, I made a standalone repository measuring only that performance - python-async-bench.
The notable results of that are just how much slower each framework's builtin mechanism for dispatching work to threads is.
AnyIO
I originally started writing an AnyIO controller before removing it. The point of AnyIO is to be agnostic of the underlying event loop so a specific controller isn't necessary. Later when documenting the deadline handling for AnyIO users, I had to explain how the current_effective_deadline had to be measured relative to the current_time and then adjusted to time.monotonic() to set the deadline contextvar.
That is a ridiculous burden, so I made a dedicated AnyIO controller which automatically handled the deadlines correctly with no extra coding needed.
Auto controller
Originally you had to specify the controller to use - eg
apsw.aio.AsyncIO` for asyncio, apsw.aio.Trio for trio,
etc. I was constantly changing the event loop framework in my
exploration code, so it got tedious very quickly. If you use the
wrong one, the various framework APIs will complain about there being
no event loop.
The SniffIO project showed me how to do that detection, although I had to figure out AnyIO because it explicitly is intended to not be detected!
That let me make the default be auto-detection.
Documentation and type stubs
I still had the coloured function problem in the sense that although I only had one implementation that automatically behaved as sync or async, I still needed to document that for each method and attribute.
I also needed the type stubs as used by IDEs and type checkers to believe there are actually flavoured classes - ie Connection and AsyncConnection, Backup and AsyncBackup, Session and AsyncSession etc with each method and attribute in the async version declared awaitable as implemented.
I implemented a test that called all the methods and accessed all the attributes, noting what they did. That was compared to a JSON file, so now I had the behaviour encoded as data. It also meant that all my assertions were exercised so I could verify the C code was doing the right thing. This will also maintain future changes to the codebase,
For documentation I wasn't going to duplicate the classes, so instead I added a badge in each that gives more explanatory text when clicked on.
Duplicating the classes for the type stubs then gave me an idea of what the effort and frustration would have been automatically generating an async wrapper in Python forwarding to the sync methods. It requires getting every little detail right, not just about the methods but also their parameters. The async classes allow both sync and async callbacks in many places so I also had to ensure that was reflected.
Mistake: tasks not coroutines
Because my original exploration had given me a coroutine, my initial work focussed on them as the async unit of work. How do I run one back in the event loop, how do I cancel them, how do I set a timeout etc. That worked out okay until I started fleshing out the test code and realised that while a coroutine represented some executing async code, the actual unit that all the frameworks cared about was a task.
A task wraps a coroutine, and it is tasks that they supervise, cancel, schedule etc. When all I had was a coroutine, the correct path forward is to get them seen as part of a task.
That made some code a lot simpler, and like many mistakes was a valuable learning opportunity.
Test all the things
It takes a lot longer to write the test code than the original working code. But I still test everything, including all the corner cases and combinations, no matter how redundant it seems. In this case it had two benefits - one was finding that the working code had the occasional small issue, and was what revealed my coroutine vs task focus above.
It even lead to late changes like allowing close() to be
called on async objects. Normally close() exceptions and you
must use the async version aclose() which is still
recommended. But aclose() will only work while there is a
running event loop and it is await-ed.
This interacts with a second implementation decision which is that the worker thread is not marked as daemon - ie the Python process will not exit until the thread exits. The reason is because there could still be queued operations like committing transactions, backups completing, optimize running etc, so an abrupt exit could rollback all that.
If the event loop exits for whatever reason and the database wasn't
closed, the process will hang forever. Allowing the close()
call means it can still be manually closed in that circumstance. The
database will also be automatically closed when garbage collection
happens but it is easy to have a reference that keeps it alive.
Success
It works, it is robust, it is easy to use, and meets all the original goals, especially future maintenance as SQLite and Python update their C APIs. The final result provides complete async support across the entire APSW API, including callbacks, virtual tables, and advanced SQLite features. The implementation supports asyncio, Trio, and AnyIO without requiring separate packages or duplicated APIs.
This is ably demonstrated by the tour.
Thanks
aiosqlite
aiosqlite provided a complete working implementation of how to go about this, together with example usage, documentation, open and closed issues etc. I often wondered why they made particular choices, tried alternatives and quickly found out.
If it meets your needs, use it!
AI
It is customary these days to say how you used AI/llms. What I did not do is hook up a coding agent, and have it do any of the work.
My view is that the closer to the core, the more important it is for the human to make design decisions and their trade-offs, and it is especially important to write the code so that you understand what, why, and how. It also exposes patterns and allows iteration. Even making mistakes is important because they can be learned from. And pretty much all of APSW is core - I have maintained it since 2004 and have no intention of stopping, so current details affect the future.
So what did I use AI for? I had the free versions of Claude, ChatGPT, and Gemini open in adjacent web browser tabs and would frequently post the same prompt into all of them, to compare the results. The "best" answers ebbed and flowed, with Claude the worst, ChatGPT the most helpful, and Gemini getting better after several instructions to not give multi page answers and exposition.
Better web search
Unfortunately regular web search is almost useless these days. My situation is more complicated because I am receiving calls from async, whereas most content is focussed on making calls in async.
Making matters worse is that searching the Python docs is annoying because it covers both the C and Python APIs, and I only want one of them, and the ReadTheDocs search for Trio and AnyIO is downright frustrating for finding things.
The AIs excel at giving the specific magical incantation to do things, versus trying to piece it together yourself, saving a lot of time.
Naming things
This is a lot harder than it sounds. Names have expectations attached to them so it is harmful to use something misleading. But you also want something familiar, otherwise you've imposed cognitive overhead with an unfamiliar name. The AIs are very good at giving a list and reasons for each entry. Often I'd end up combining suggestions from them.
Code Queries and Suggestions
It was nice to express a concept and have them show how they would code it, or paste in a snippet of my code that didn't work. It is a lot like rubber duck debugging with instant feedback. Or I'd try to bend an API a particular way it didn't like, and the AI would give plausible explanations as to why that wasn't allowed.
I especially liked it when they provided suggestions I had never thought of, for example detecting the anyio event loop is running by looking for a stack frame's code object matching the anyio.run method code object.
Code and data
Ok, there are 15 lines of AI generated code buried deep inside some of my processing that produces the documentation where I needed to expand the contents of something that is arbitrarily nested in square brackets.
The CSS for the badges in the doc shows the async/async applicability of each function was also generated by them, with me combining the best bits of each. They also helped with some other CSS.
My absolute favourite usage is the generation of test and example data.