OK dkim; Awesome Quarto; lazyhtml
Unfortunately, I lost control of both Saturday and Sunday. I'm going to call also week a miss of my 5/5 goal and strive to get more of these in a queue that I can pick from when weeks get crazy.
Back in 2020, Matthew Green (@matthew_d_green) wrote a solid post on DomainKeys Identified Mail (DKIM). DKIM⁺ is "a protocol that allows an organization to take responsibility for transmitting a message by signing it in a way that mailbox providers can verify. DKIM record verification is made possible through cryptographic authentication."
Electronic mail was created back in the days when the Internet was still the ARPANET. This was a gentler time when modern security measures — and frankly, even the notion that the Internet would need security — was a distant, science-fiction future.
The early email protocols (like SMTP) work on the honor system. Emails can arrive at your mail server directly from a sender’s mail server, or they can pass through intermediaries. In either case, when an email says it comes from your friend Alice, you trust that it comes from Alice. What possible reason would there be for anyone to lie?
The mainstream adoption of email showed that this attitude was pretty badly miscalibrated. In the space of a few years, Internet users discovered that there were plenty of people who would lie about who they were. Most of these were email spammers, who were thrilled that SMTP allowed them to impersonate just about any sender — your friend Alice, your boss, the IRS, a friendly Nigerian prince. Without a reliable mechanism to prevent that spamming, email proved hilariously vulnerable to spoofing.
In trying to "solve" the spam problem, DKIM created another, kind of big one: "a life-long guarantee of email authenticity that anyone can use to cryptographically verify the authenticity of stolen emails, even years after they were sent."
Rob Graham (@erratarob) used it to verify the authenticity of a certain person's leaked emails:
Ian Jackson⁺ wrote dkim-rotate in an attempt to "solve" the non-repudiation problem. It does this by (surprise) rotating the DKIM keys and publishing them to internets. Ian discusses why this is a complicated task over at his journal, and it's worth a read even if you just want to see an example of solid failure-mode planning.
This tool won't be for everyone (it has some major DIY requirements that weren't so major back in the day), but hopefully the topic itself sheds some light on one of those "hidden" parts of the internet (I mean, nobody looks at email headers anymore, right?).
Quarto is an open-source scientific and technical publishing system built on the shoulders of Haskell (i.e. Pandoc). With it, you can:
create dynamic content with Python, R, Julia, and Observable (in reality, you can use almost any programming language)
author documents as plain text markdown or Jupyter notebooks
publish high-quality articles, reports, presentations, websites, blogs, and books in HTML, PDF, MS Word, ePub, and more
author with scientific markdown, including equations, citations, crossrefs, figure panels, callouts, advanced layout, and more
It's the successor to R Markdown.
I noticed Quarto ages ago (Quarto was "technically" released at RStudio's July 2022 conference) due to my git stalking and have dabbled a bit with it, but I'm far from the only human who has done so.
Mickaël Canouil (@mickaelcanouil) started one of those "awesome-" GitHub repos that curates resources associated with a given topic. In this case, the topic is Quarto, and it already has scads of fantastic resources from around the internets.
Take some time to go through these resources and dig into Quarto. It is not just for "data science", and you may find it ticks your productivity up a few notches.
It's been forever and a day since I dropped some Rust on y'all, so here's lazyhtml, a "HTML5-compliant parser and serializer that enables building transformation pipelines in a pluggable manner". Note, first, that it's one of those "awesome tools written by a terrible company". As time goes by, I'm trying to use fewer of those tools, but not every person in every terrible organization is responsible for the organization itself being terrible, so I choose to hope that it is those folks who encourage publishing of said tools.
If you have need to take in an HTML document, parse it, and process those parsed chunks, you will like this library.
It makes heavy use of the seriously cool Ragel framework, which we'll spend some time on in a future edition.
The repo is super-detailed, and has a comparison of
lhtml) and other popular/widely used similar libraries.
Part of the weekend's lack of newsletter productivity was due to U.S. liberal Democracy taking a few serious hits last week. If you're in the U.S., it'd be worth your time to read this 2021 Journal of Democracy piece on The Rise of Political Violence in the United States if you still aren't convinced we're headed in the wrong direction. ☮