Discover more from hrbrmstr's Daily Drop
Drop #117 (2022-10-12): Terminal (Browser) Velocity
glow; hget; crawley
Programming note: I need to keep the screenshots in this edition fairly small as Substack will cut off the email version of the newsletter if I do not, so definitely check out each utility to see larger examples in your own terminal.
Glow is a Golang-based terminal-based markdown reader. While that short description is accurate, it does not do justice to this utility.
glow in a directory will cause the program to find all markdown files in the directory tree below and present an interactive navigator that lets you cursor your way around:
and render a given markdown file in whatever theme you like:
It can also read markdown from a URL or
stdin, and has a built-in pager that behaves like
less. Glow also supports mouse interaction, but pro terminal jockeys won't need that option.
It has a fancy way of securely letting you "stash" markdown you're reading (which seems only really useful if hitting a URL) that takes advantage of Charm Cloud. The data is encrypted, and only you can decrypt it.
We're going to use
glow in the next section, so head there to see another view of it.
These days, I'm finding it increasingly difficult to trust any browser and the organizations that create them. As a result, I'm on the hunt for a decent command line "browser" experience, so you can expect to see newsletter sections with various ones I'm [re-]trying.
Today's installment is hget, a "CLI and an API to convert HTML into plain text. Can be used to fetch a site's HTML version and convert it into plain text, or to deliver plain text versions of your site dynamically."
You can also convert HTML into HTML, ignoring certain document elements, and starting at a root element other than
<html>. Better still, you can choose to convert HTML to raw markdown output. The default, however, is terminal-formatted plain text.
We're going to use
hget’s markdown option with
glow (above) to read a post from Heather Cox Richardson's newsletter (I receive nada for this plug; it's just a good read that folks might like):
hget will use whatever PAGER you want by default, we'll ask it to retrieve Heather’s most recent post, ensure it spits out Markdown, and use
glow to read it with glow's paging interface:
hget --no-paging \ https://substack.com/inbox/post/77694575 \ --markdown | glow --pager -
This is a snippet of the result:
While I'm using hget plus glow as a poor-dude's terminal browser,
hget can also just be a utility you use to turn any given URL into plain text or markdown, which is handy on its own.
I won't be settling on any single terminal browser for a while, but will keep using each as I find them, and (heh) render a verdict in a few weeks.
Browsing from the terminal is nice, and all, but sometimes you just need the links from a given page, and crawley is pretty darned good at this specific task.
grabs most of useful resources URLs (pics, videos, audios, forms, etc...)
streams (unique) discovered URLs to
can crawl rules and sitemaps from
brutemode which enables scanning HTML comments for URLs
supports HTTP proxies
enables supplying cookies and headers
can be customized to only pull links from various HTML tags
is able to ignore URLs with specified strings
I've found this tool to be quite useful, especially when I want a local copy of "awesome" lists without the surrounding cruft; i.e. you can get all the Awesome Quarto links via:
Golang goroutines make this tool superfast, too.
If you have a fav CLI “browser” tool (or two), drop a note in the comments. ☮