Dragonfly; FastFEC; Kagi Search
It's been a soul-crushing week in the U.S. provided one actually still has any humanity left in them. Hopefully, one or all of these resources will help take your mind off even one of the many tragic events that have occurred.
There are many in-memory datastores to choose from these days, and one might think that'd it'd be hard to wrest the crown from the likes of Redis or memcached given how well both have been battle-tested (and raided by attackers when left open on the cold, cruel internets).
Well, think again, and meet Dragonfly (GH), an open source, "modern in-memory datastore, fully compatible with Redis and Memcached APIs". It boasts 25x the performance of those it seeks to replace in your stack, and is designed with speed, efficiency, and (for now) vertical scalability.
The speed boost comes from some clever thread/core models, I/O improvements in the Linux kernel, and dashtables. I'll let the Dragonfly devs explain said dashtables:
Similarly to a classic hashtable, dashtable (DT) also holds an array of pointers at front. However, unlike with classic tables, it points to segments and not to linked lists of items. Each segment is, in fact, a mini-hashtable of constant size. The front array of pointers to segments is called directory. Similarly to a classic table, when an item is inserted into a DT, it first determines the destination segment based on item's hashvalue. The segment is implemented as a hashtable with open-addressed hashing scheme and as I said - constant in size. Once segment is determined, the item inserted into one of its buckets. If an item was successfully inserted, we finished, otherwise, the segment is "full" and needs splitting. The DT splits the contents of a full segment in two segments, and the additional segment is added to the directory. Then it tries to reinsert the item again. To summarize, the classic chaining hash-table is built upon a dynamic array of linked-lists while dashtable is more like a dynamic array of flat hash-tables of constant size.
You can read their full explanation over at GitHub.
It's refreshing to see significant improvements on what many might call a solved problem, especially when those improvements come because of a combination of hardware and software innovations.
Even if you aren't in a position to run Dragonfly (due to the system requirements), I encourage you to at least scroll to the bottom of the README to see why these clever folks decided to create this new (and kind of exciting) datastore.
It's election season in the U.S., which means it's time to continuously scrutinize what funds political hopefuls bring in and how they spend said coin. The Federal Election Commission (GH — yes, the FEC has a GH!) requires candidates to file reports on their financial dealings and makes this data available to the public. They use a file format they unimaginatively named "
FEC", which is a delimited-text format that uses the Unicode INFORMATION SEPARATOR FOUR / FILE SEPARATOR (decimal:
0x1C) character as a delimiter.
It's pretty straightforward to read these files, but the Washington Post's spiffy data team made a tool called FastFEC which makes quick work of these filings, allowing researchers to specify either:
a numeric ID, in which case the filing is streamed from the FEC website
a file, in which case the filing is read from disk at the specified local path
a URL, in which case the filing is streamed from the specified remote URL
They also did something pretty cool: converted it to Webassembly and integrated it into a small single page utility that enables anyone to load up (100% locally, no uploads or remote processing is performed) FEC files and download CSVs (handy for folks who are CLI timid or have limited ability to install things locally).
The raw FEC data includes addresses (not available in bulk), is available faster than the FEC post-processed data, and avoids API rate-limiting constraints.
The online FastFEC tool could be a neat thing for teachers in classrooms full of Chromebooks to, perhaps, use in data courses in the fall.
This is just a teaser drop for a longer post next week, but I wanted to let folks head into the weekend with a new web search service to explore. Kagi Search is "a quick, user-centric, 100% privacy-respecting search engine with results augmented by non-commercial indexes and personalized searches."
I post it here because it is out of private beta, has a free tier (more on that in a moment), and has some ridiculously cool features such as lenses:
Yes, this is a paid search engine ($10/month), and I know we're all getting pretty sick of subscriptions, but by paying for the service (and paying attention to ToS changes) — I am doing a trial sub — I have more assurance of privacy and have some real recourse if they do violate ToS.
The extra features, so far, seem worth the price of unlimited searches, and the service works great with both Vivaldi (too much stuff at work breaks in WebKit, so it's my work-daily browser, again) and Orion, a zero-telemetry, privacy-preserving WebKit browser made by the same developers as Kagi Search.
I'll be investigating all the features Kagi Search has to offer and reporting back here next week with the results.
Today marks the 100th day of Putin's evil war against Ukraine. Please pause for even a moment to reflect on the Ukrainian people's determination to stand up to authoritarian evil and remain free. ☮