Scoopfeeds — Intelligent news, curated.
computer-science

Show HN: Keybench – Scriptable, extensible performance tool for key value stores

Hacker News · Jun 6, 2026, 11:06 PM

You write the workload in Lua. keybench drives it across one or more storage engines, times every operation, and reports throughput and latency. The same script runs unchanged against every engine, so a comparison measures the engines and not the harness. What it measures ================ keybench reports two rates and a latency distribution. wu/s workload units per second. One unit is one call to your run() function. This is the rate of whole operations as your script defines them, a "view cart" or a "checkout", regardless of how many key touches each one costs. ops/s primitive operations per second. One primitive op is a single put, get, del, range, or scan call, or one key handled inside an mget, mput, or mdel. A range or a scan counts once no matter how many rows it returns. This is the rate of raw key touches. When every unit is one primitive op the two rates are equal and the report prints one throughput line. When a unit does several primitive ops, such as a batch of B keys or a cart checkout that scans then deletes, the report prints wu/sec and ops/sec on separate lines. For a batched verb ops/sec is exactly B times wu/sec, because each unit performs B key touches, so batching shows as ops/sec rising with B while wu/sec falls, the difference being the fixed per call cost spread over more keys. Latency is recorded per operation kind as a distribution, not an average. Each of put, get, del, range, mget, mput, and mdel has its own histogram, and a scan is timed into the range histogram. The report gives p50, p99, p99.9, and the max for each. How it works ============ Five parts, each replaceable. The engine owns concurrency. keybench spawns the worker threads, splits the op budget across them, and joins them. A worker runs your script in its own Lua state with its own random seed. The harness never holds a lock around an engine call, so a serialized engine reports as serialized and a parallel one reports as parallel. The Lua script stays single threaded and never reasons about locks. The workload is a Lua table. A file returns a table with a name, an optional load function, and a run function. load seeds the store before timing, and the harness runs it on every worker thread at once, each filling its own shard, so a large dataset loads in parallel. run is the measured unit and the harness calls it until the op budget or the time budget is spent. Both receive a ctx table carrying users, items, ops, seed, batch, thread, threads, and the current iter. The store is one verb set over every engine. Your script calls a global kv table. The same eight verbs reach whichever engine is selected. keybench times the call, files the sample in the right histogram, and counts the primitive ops. The backend is a self registering plugin. Each engine lives under backends/ with its own build fragment and registers itself before main runs. Selecting one by name, or several for a comparison, is a command flag. Adding one is a new directory and a vtable, no edit to the core. The reporter is a sink. A run broadcasts its metadata, its probes, its aggregated points, and its live samples to every reporter you asked for. A console reporter prints a human report. A tsv reporter writes machine rows. A timeline reporter writes one row per live sample. You can run several at once. Building ======== keybench vendors its own Lua, so the default build needs only a C compiler and pthreads. make That produces ./keybench with the in memory skiplist engine compiled in. The skiplist needs no external library and is the reference engine. To compile in the persistent engines, name them. make ROCKSDB=1 add the RocksDB engine make TIDESDB=1 add the TidesDB engine make BACKENDS="skiplist rocksdb tidesdb" No special allocator is linked by default, keybench uses the system one. If a distro rocksdb or tidesdb was itself built against jemalloc, link the same allocator first with ALLOC=jemalloc so the whole malloc family agrees, which avoids a crash when a database is reopened across sweep points. ALLOC=tcmalloc works too, and ALLOC_LIBS="..." links any other allocator or a custom path. The en

Article preview — originally published by Hacker News. Full story at the source.
Read full story on Hacker News → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from Hacker News alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop