Drop #215 (2023-03-08): Need For Speed 0.2.0
McFly, hey; Lance
Programming note: this is the second installment of the Drop's Need For Speed.
It also finally dawned on me that I can include
#hashtags within each edition (which I’ll place at the bottom of the posts). This will make it possible to craft handy URLs like that “Need For Speed” one to link to related posts without having to sift through the extra cruft in the Substack archive search results page. Whatever algorithm Substack uses to recommend “similar” posts does not seem to work very well.
I'll be (slowly) going back through the archive to add tags, and am also working on a resource index to make it easier for you (and me) to lookup previously covered resources.
Today, we feature some resources to satisfy the speed 🏎️ demon that lurks inside all of us.
First, we tap a resource that will accelerate a very common terminal op. Then, we run over to a tool that will help you find and fix bottlenecks in server land. And, finally, a 100x speedup for anyone who does data work with Arrow or DuckDB.
This is another one of those “do one thing, and do it well” utilities that is 100% guaranteed to speed up terminal ops by leveling up the way you work with shell command history.
The command history feature is available in various operating system shells, computer algebra programs, programming language REPLs, and other software. This feature enables us to retrieve, modify, and rerun past commands. Command line history was first introduced in Unix through Bill Joy's C shellback in 1978.
Joy channeled inspiration from an earlier version implemented in Interlisp. The feature became extremely popular among users as it made the C shell more efficient and user-friendly. Since then, history has become a standard feature in other shells, including
Bash. This feature primarily addresses two scenarios: executing the same command or a short sequence of commands repeatedly and correcting errors or rerunning a command with minor adjustments.
The original C shell history implementation only had some basic features and a few shortcuts. Newer generation shells continue to enhance and evolve how we work with command history. There is also a robust subculture dedicated to creating and evolving even more advanced shell command history operations.
McFly is one of those enhancements. Here's the pitch from McFly's creator, Andrew Cantino:
McFly replaces your default
ctrl-rshell history search with an intelligent search engine that takes into account your working directory and the context of recently executed commands. McFly's suggestions are prioritized in real time with a small neural network.
The tool is written in Rust and is backed by a SQLite database, but preserves the traditional command history file associated with whatever shell you use. By using a proper database, McFly can track additional information — like command exit status, the timestamp when the command was run, and which directory you were in when you ran the command.
The section header shows McFly's TUI. Searches are powered by a small neural network which has been trained to guess the most relevant commands (versus just traditional pattern matching). It uses various features for this AI augmentation:
how often + when you run the command
if you've selected the command in McFly before.
the directory where you ran the command (you're likely to run that command in the same directory in the future)
what commands you typed before the command (a.k.a. the command's execution context)
the command's historical exit status. (I mean, who wants to make the same mistake twice?)
One handy feature to help clean up crufty shell history (like, say, if you thought your clipboard had a command line string in it, but it really had the text of a log file you were debugging). Another is the use of SQL's
% wildcard matching operator.
If you switch shells, McFly can come along for the ride, and was designed with extensibility in mind (i.e., to make it easy to support as-yet-uncreated shells).
McFly is far from the only shell history ops augmentation tool, and we'll cover some other ones over the coming weeks that offer similar, and other modern approaches for working with your past self on the command line.
A long time ago on servers far, far away, Apache httpd stole the top spot from NCSA's httpd server when it came to hosting internet accessible HTTP services. This is no longer true, though it does, still, power over 233 million sites.
It came with many batteries included, one of which is the diminutively named ab utility, which is short for “Apache HTTP server benchmarking tool”. It was designed to help server runners gauge server/site performance, including a key metric: how many requests per second a server is capable of serving.
There are, now, a ridiculous number of HTTP server performance benchmarking tools and services. One of them is hey. While other tools (which we'll eventually cover) pack in tons of features,
hey takes a “keep it simple/focused” approach. So much so, that you can glean all you can do with it from the CLI's help:
Usage: hey [options...] <url> Options: -n Number of requests to run. Default is 200. -c Number of workers to run concurrently. Total number of requests cannot be smaller than the concurrency level. Default is 50. -q Rate limit, in queries per second (QPS) per worker. Default is no rate limit. -z Duration of application to send requests. When duration is reached, application stops and exits. If duration is specified, n is ignored. Examples: -z 10s -z 3m. -o Output type. If none provided, a summary is printed. "csv" is the only supported alternative. Dumps the response metrics in comma-separated values format. -m HTTP method, one of GET, POST, PUT, DELETE, HEAD, OPTIONS. -H Custom HTTP header. You can specify as many as needed by repeating the flag. For example, -H "Accept: text/html" -H "Content-Type: application/xml" . -t Timeout for each request in seconds. Default is 20, use 0 for infinite. -A HTTP Accept header. -d HTTP request body. -D HTTP request body from file. For example, /home/user/file.txt or ./file.txt. -T Content-type, defaults to "text/html". -a Basic authentication, username:password. -x HTTP Proxy address as host:port. -h2 Enable HTTP/2. -host HTTP Host header. -disable-compression Disable compression. -disable-keepalive Disable keep-alive, prevents re-use of TCP connections between different HTTP requests. -disable-redirects Disable following of HTTP redirects -cpus Number of used cpu cores. (default for current machine is 8 cores)
Back in those early days of serving up HTML and images over HTTP, many of us used
ab to brag about our
l33t system tuning skills, usually on mailing lists or Usenet groups. These days, I don't see that as much, though I also don't generally hang with the performance tuning community.
The tool has two output modes: CSV and a summary view that looks like:
Summary: Total: 0.0527 secs Slowest: 0.0399 secs Fastest: 0.0003 secs Average: 0.0103 secs Requests/sec: 3793.4198 Total data: 123000 bytes Size/request: 615 bytes Response time histogram: 0.000  |■ 0.004  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.008  |■■■■■■■■■■■■■■■■■■■■■■ 0.012  |■■■■■■■■■■■■■■■ 0.016  |■■■■■■ 0.020  |■■■■■■■■■ 0.024  |■■■■■■■■■■■■■ 0.028  |■■■■■■■ 0.032  | 0.036  |■ 0.040  |■■■ Latency distribution: 10% in 0.0009 secs 25% in 0.0031 secs 50% in 0.0072 secs 75% in 0.0192 secs 90% in 0.0232 secs 95% in 0.0260 secs 99% in 0.0395 secs Details (average, fastest, slowest): DNS+dialup: 0.0021 secs, 0.0003 secs, 0.0399 secs DNS-lookup: 0.0017 secs, 0.0000 secs, 0.0083 secs req write: 0.0005 secs, 0.0000 secs, 0.0118 secs resp wait: 0.0038 secs, 0.0003 secs, 0.0206 secs resp read: 0.0007 secs, 0.0000 secs, 0.0070 secs Status code distribution:  200 responses
The above is the output from a vanilla run of the command between two hosts on my local lab network. The target was an nginx default page hosted over HTTPS.
If you've never benchmarked a web server before,
hey might be a good way to explore this space. Be careful, though! Once you get bit by the performance optimization bug, there's no turning back.
I am only going to lightly introduce Lance, today, since the issue is getting a bit long in the tooth, and I'll be featuring it in the weekend Bonus Drop.
Lance is a “modern columnar data format that is optimized for machine learning workflows and datasets”. It's designed to be used with images, videos, 3D point clouds, audio, and (ofc) tabular data.
Key features of Lance include:
high-performance random access: *100x faster than Parquet.
vector search: find nearest neighbors in under 1 millisecond and combine OLAP-queries with vector search.
zero-copy, automatic versioning: manage versions of your data automatically, and reduce redundancy with zero-copy logic built-in.
compatibility with Arrow and DuckDB
It's written in Rust and has a robust Python API.
You can convert existing Parquet files to the Lance format in just a couple lines of Rust or Python code. Arrow and DuckDB compatibility means that you're also not limited to working with Lance data files in only Rust or Python. Anything that speaks Arrow or DuckDB can play along.
If you do any data work, Lance is something you should start poking at, if you aren't already using it.
Speaking of speed…
The D2 “modern diagramming scripting language” — which we covered back in December 2022 — has been churning out new releases at a frenetic pace. If it piqued your interest back then, but you haven't dug into it yet, you may want to check out the new features.
In other news, The Daily Drop hit 500 subscribers today! Thanks to all for supporting the newsletter! ☮
#Diagram #Hey #McFly
Ref: Section 2.3; Page 13; Bill Joy's, “An Introduction to the C shell”.
Which will eventually make its way into an upcoming M-F issue.
NCSA's httpd’s main guy was Rob McCool, Stunt Programmer™️, who left with Marc Andreasson to start Netscape a few years later, where he reimplemented it (presumably from memory) as a commercial product for http and https.