Discover more from hrbrmstr's Daily Drop
Marp; MLU-Explain; curl-impersonate
While I'm very partial to Quarto, it is not the only markdown-to-XYZ ecosystem in the universe, especially when it comes to presentation slides. And, if you only need to make slides, and want to try an alternative to the Reveal(js) ecosystem, you may be interested in Marp [GH].
Dubbed as the "Markdown Presentation Ecosystem", it has the familiar feel of Quarto/R Markdown/Reveal deck markdown syntax with its own rendering system, Marpit [GH] that includes theming capabilities along with rendering choices of HTML, PDF, and PowerPoint.
Unlike Quarto (et al.) there is no dependency on Pandoc, and there is no real need to install anything, since you can do pretty much everything right in your browser. (The section banner image is a capture from the browser app).
The Marp ecosystem appears to be rich and full of features, but I'm not sure if I'd be as bold as the Marpers are in using "the" in front of the phrase.
Have a go and share one of the decks you make!
There's always room for more educational resources on statistics and machine learning (ML). Amazon seems to believe this knowledge needs to be more ubiquitous as they have their own Machine Learning University, described thusly:
Machine Learning University (MLU) provides anybody, anywhere, at any time access to the same machine learning courses used to train Amazon’s own developers on machine learning. With MLU, all developers can learn how to use machine learning with the learn-at-your-own-pace MLU Accelerator learning series. The MLU Accelerator series is designed to kick-start your ML journey with three, three-day foundational courses on Natural Language Processing, Tabular Data, and Computer Vision. Upon completion of the Accelerator Series, the Decision Trees and Ensemble Methods course offers a more advanced, five-day lecture series on tree-based and ensemble models. Through sequential YouTube videos taught by Amazon scientists with hands-on practical examples, Jupyter notebooks, and slide decks, MLU provides a comprehensive self-service pathway to understanding the foundations of machine learning. Course materials are available on GitHub, see below for more details about our courses.
Paired with the MLU resources is MLU-Explain, a set of visual essays that present ML topics in a "fun, informative, and accessible manner".
Current MLU-Explain topics include:
They're cute, interesting, and accurate resources, plus they're open source with a pretty permissible license.
If I were teaching these concepts, I'd definitely be incorporating them into my syllabus.
I do a tad less internet-scale scraping these days since I'm more listening the poking at GreyNoise (though we do scrape the internet, but more on that at a later date). Curl is still the workhorse of many programs that fetch internet resources, whether used as a standalone command line binary or via the Curl library (many of your "smart" retail appliances run curl in some way).
Using straight curl to hit internet resources may not get you what you're looking for, even if you get all clever and change your
User-Agent string, as there are other ways for a server to fingerprint clients that connect to it, like via TLS profiling. Wouldn't it be handy if you could use the same core tool, but really have it look like a real browser? Well, you can, and I'll let the curl-impersonate developers explain it:
When you use an HTTP client with a TLS website, it first performs a TLS handshake. The first message of that handshake is called Client Hello. The Client Hello message that most HTTP clients and libraries produce differs drastically from that of a real browser.
If the server uses HTTP/2, then in addition to the TLS handshake there is also an HTTP/2 handshake where various settings are exchanged. The settings that most HTTP clients and libraries use differ as well from those of any real browsers.
For these reasons, some web services use the TLS and HTTP handshakes to fingerprint which client is accessing them, and then present different content for different clients. These methods are known as TLS fingerprinting and HTTP/2 fingerprinting respectively. Their widespread use has led to the web becoming less open, less private and much more restrictive towards specific web clients.
With the modified curl in this repository, the TLS and HTTP handshakes look exactly like those of a real browser.
What dark magic did they use to create this Franken-curl? Here are some of the modifications they made:
Compiling curl with nss, the TLS library that Firefox uses, instead of OpenSSL. For the Chrome version, compiling with BoringSSL, Google's TLS library.
Modifying the way curl configures various TLS extensions and SSL options.
Adding support for new TLS extensions.
Changing the settings that curl uses for its HTTP/2 connections.
Running curl with some non-default flags, for example
You can hit this JSON endpoint to see what browsers they currently impersonate, and you can hit up their GitHub to see how to use the modified curl as a library. You can also use their pre-built Docker images if that's the way you roll.
I think we're going to be playing quite heavily with curl-impersonate at GN, so I'll report back on if/how it changes some of our scanning results. ☮