Scoopfeeds — Intelligent news, curated.
computer-science

Replacing a 3 GB SQLite db with a 10 MB FST (finite state transducer) binary

Hacker News · May 10, 2026, 10:33 AM

Key takeaways

  • Note for numberphiles: all numbers have been rounded to their first significant digit, because I m a fan of Rob Eastaway s zequals method of getting to the point when it comes to estimation.
  • And this worked wonderfully for the first implementation of tsk, which was in Go (and which I have written about elsewhere and elsewhere and elsewhere).
  • It s not impossible for a single base word in the language to have over one hundred possible endings, when all combinations are considered.

Note for numberphiles: all numbers have been rounded to their first significant digit, because I m a fan of Rob Eastaway s zequals method of getting to the point when it comes to estimation. It s much more valuable to walk away with the heuristic some dude got a 300x memory reduction by swapping out a database he hacked together for a tiny, static, specialized data structure that does exactly what he needs it to and no more.

I found myself with an increasingly rare opportunity to work this weekend on Taskusanakirja, also often called tsk, a Finnish-English dictionary with incremental search-as-you-type.1 Fundamentally this problem reduces down to prefix search, and the canonical solution for prefix search with autocomplete is to implement a trie.

And this worked wonderfully for the first implementation of tsk, which was in Go (and which I have written about elsewhere and elsewhere and elsewhere). With a few basic optimizations. To prevent matching on some single-digit percentage of the mid-six-figures number of words we were baking into the binary (it s been a design goal from the start to ship the entire program as one .exe, one .app, or one statically linked binary), we set some limit of e.g. only the first 50 or 100 matches or so and then just memoized all of the 1-, 2-, and 3-character combinations, after which I ve never noticed a perceptible delay in the program again after a year of heavy personal use. We could just about squeeze a trie with some basic optimizations like that into ~60 MB of space.

Article preview — originally published by Hacker News. Full story at the source.
Read full story on Hacker News → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from Hacker News alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop