

Discover more from hrbrmstr's Daily Drop
Today's Drop is both "late" (apologies) and "brief". Twas a "fun" night thanks to the aftermath of the spike protein invasion, and today's WebR experiment ate up more of the "less AM time than usual" than expected.
Speedb
I'm Dropping this b/c it looked super cool at first try, but I have not had a chance to put it through some paces, so here's a quick overview, today, and we'll add some more color in the next week or three.
Speedb (GH) is (yet-another) open-source, high-performance key-value store designed that is designed to work across far too many use cases to be believed. While we haven't covered RocksDB at length, yet, I used it in a side project just under a year ago and am eager to see how Speedb compares.
The devs claim it has more stable write performance compared to RocksDB and improves overwrite performance with something called "Proactive Flushes". This doc section does a good job covering the problems with existing metatable flushing (to storage) strategies and how Speedb solve some of them. I did not run benchmarks, but the claim is that Speedb shows nearly a 70% improvement in 50% "Random Read" workload.
On top of bare, smart speed improvements, it also supports life config changes, a brand new memtable representation (compared to RocksDB) that is part of the reason it is so much faster, and also uses something called a "Paired Bloom Filter". This new "Paired" thing is a variation of the Bloom Filter data structure that is designed to provide probabilistic set membership tests for pairs of elements, rather than individual elements.
If I may wax "poetic" for a moment (I really dig Bloom Filters), a plain ol' Bloom maps a set of items to a fixed-size bit array via hashing. Each element is given the spa treatment and hashed independently. These hashes help set bits in the bit array. When you try to find something, said thing is hashed and the bit positions are tested for set membership.
Paired Blooms — you guessed it! — hash pairs of elements, then does the same dance. They aren't as space efficient as their boring Bloom sibling, but we seem to be awash in storage and memory these days, so it likely isn't an issue unless you're on the stingy side.
Check back for a real-world test! I'll try to wire it up to my previous project and do some benchmarks in the next ~month or so.
Fast Unix Commands
Fast Unix Commands (GH) is a "project that aims to create the world’s fastest Unix commands."
Somehow, even with today's ultrafast SSD storage, we humans feel the need to tune every microsecond of performance out of our daily drivers. For denizens of the command line, some of those drivers include the venerable cp
(copy stuff) and rm
(remove stuff). The ops for the former feel fast enough former, but I also come from the days when you could open up hard drives and hold the cylinders without ruining them. So, everything kind of feels fast to me. Well, except for removing gigantic directory trees (👋🏼 Hi, Xcode.app!). So, I could benefit from an improvement of the latter.
Alex Saveau, is seemingly pretty darned good at this performance thing, and also 👍🏼 at designing algorithms. His post on the File Tree Fuzzer is required reading before continuing here, or using his new cpz
and rmz
tools.
The key speedups are from a nice artifact of the way most modern filesystems + kernel operations work. In Alex's words:
The key insight is that file operations in separate directories don’t (for the most part) interfere with each other, enabling parallel execution. The intuition here is that directories are a shared resource for their direct children and must therefore serialize concurrent directory-modifying operations, causing contention. In brief, file creation or deletion cannot occur at the same time within one directory. Thus, the goal is to schedule one task per directory and execute each task in parallel.
Alex is also an excellent communicator, so I'll close with a request to hit the links at the top of this section, and then prove out the claims with some speed tests of your own this week. I can 100% say that it makes removing Xcode betas much a much less time-consuming experience.
Rspack
Now that I'm back deep in the throes of web development (thanks to WebR), speeding up deployment workflows for my work and personal projects is a task that is on the horizon, which I’ve been preparing for the past few weeks.
Now, for experiments I'm building, the seconds I'll be shaving won't really matter in the long run. But, I'm also trying to show R folks the WebR + JavaScript ropes and would like to leave them with the bestest tooling possible when all these experiments of mine have quiesced.
Rspack (GH) is a "bundler" — a tool that slurps up all production things associated with your amazing web app/site and wraps them up, so they can be sent to their final homes. It combines TypeScript and Rust with a parallelized architecture in the hopes of speeding up the (front-end) developer experience.
Their (ByteDance's) bundler has a built-in incremental compilation mechanism that provides an innovative approach to "Hot Module Replacement" — a feature of some modern web development tools and frameworks that allows developers to see changes to their code reflected in the browser immediately, without needing to manually refresh the page ± for large-scale projects. This replacement strategy works by identifying which parts of the code have been changed, and then replaces only those parts in the running application, without reloading the entire page. This means that developers can see changes in real-time as they make them, without losing the current state of the application. It is particularly useful during development, as it allows developers to iterate quickly and see the results of their changes immediately. It can also help to reduce development time (the core selling point of Rspack) and improve the overall quality of the code, as developers can catch and resolve issues more quickly.
Rspack is also compatible with existing webpack plugins and config, making it easy to integrate into existing ecosystems. It comes with TypeScript, JSX, CSS, CSS Modules, Sass, and many more batteries included. It also has built-in, modern "tree shaking" and minification components. You can use any front-end framework, and Rspack will still be at home.
I'll be showcasing it in an upcoming WebR experiment, since it's working super well in some of my half-finished, unpublished ones.
FIN
I'll close with a link to a company that is literally trying to make human virtual "hamster wheels". ☮