

Discover more from hrbrmstr's Daily Drop
Get Internet
(Note: this is heavily U.S. focused.)
Yesterday, The U.S. White House did a full court press pimping the Affordable Connectivity Program (ACP), which provides eligible households $30 per month off their internet bills.
If you're in the U.S. and follow tech news or politics, you likely know this already. I'm mentioning it here, because the individuals and families that truly need to hear about this likely did not see this news. They will likely not even know about this provision in the Bipartisan Infrastructure Law (BIL) — or even that there is a BIL (as this administration is terrible about promotion).
Not only can the ACP help these folks and families get online and significantly reduce the cost of having internet connectivity, there's also a provision to subsidize the purchase of a device. It's meager ($100.00 USD), but if the participating ACP organizations have any decency, it may not cost too much more than that to get a minimal Chromebook. This can open doors for both kids and adults and be a game changer for future education and employment opportunites.
There's a "Find out if you qualify" section that needs you to click a "+" button to expand (I'm not sure that was the best way to do that), but I find the main ACP site to be a much better resource about the entire program. Said site has a cadre of outreach resources.
I mention those resource because I have a challenge to all U.S. residents reading this post: make at least one outreach effort using these tools.
It can be something as simple as sharing the news with a friend or family you believe might be eligible.
It could be holding a session at your local library.
You could also pleasantly surprise a local school board and ask to do a quiet, productive presentation on the topic at school or even a board meeting (which would likely be a welcome change from the shouting, violent MAGA blatherings).
Perhaps, ask for a (free), small space at a local craft fair or farmers' market.
Your outreach effort could be the catalyst for massive positive change in someone's life.
The Incredible, Shrinking Web
(Since the first story was, in a way, about bandwidth, this one can be said to be about saving said resource.)
Minify HTML is an "extremely fast and smart HTML + JS + CSS minifier, available for Rust, Node.js, Python, Java, and Ruby" (though, calling the repo minify-html
might not convey the extent of the tool's capabilities).
In many places on this planet, bandwidth is neither cheap nor speedy. Even Elon's new night sky-cluttering satellites have both bandwidth and latency challenges. Redcucing the size of content that flows from servers to endpoints can improve the experience for everyone.
I link to it as it is, indeed, super-fast, and works across many languages (for now, R folks can use it via the Python library, but I may make a package for it since I've got some grey matter clarity again). The authors and contributors also used some very clever techinques which they expound upon, a bit, and you can examine for yourself in the source code.
Mining Microdata
Yesterday's post had a feature on archiving web content, focusing on link rot. I decided to keep a "web scraping" theme this week (plus you have to have internet access to scrape content, which makes the first ACP story, above, all the more essential).
I 💙 the folks over at ScrapingHub (now "Zyte") and find their extruct library for extracting embedded metadata from HTML markup immensely useful.
What is "embedded metadata"? Rather than turn this edition into a tome, I'll point you to a pretty thorough 'splainer by Tony Gill, Murtha Baca, Joan Cobb, Nathaniel Deines, and Moon Kim over at the Getty Foundation.
The extruct tool covers quite a bit of ground, as it can pull:
Microformat via mf2py
from pretty much any HTML document.
I highly suggest reading the above, referenced 'splainer, as one can't always trust what is in metadata, though I'd argue that's actually a pretty solid reason to extract metadata from HTML content (i.e. to help verify the veracity of what you scraped).
The repo has lots of examples, but I'll close with a use of the extruct library from R via {reticulate}, and use the Getty article as an example as they have some pretty thorough Dublin Core Metadata in their posts.
library(reticulate)
extruct <- import("extruct")
httr::GET("https://www.getty.edu/publications/intrometadata/metadata-and-the-web/") |>
httr::content(as = "text") -> res
meta <- extruct$extract(res, "https://www.getty.edu/publications/intrometadata/metadata-and-the-web/")
str(meta, 2)
## List of 6
## $ microdata : list()
## $ json-ld : list()
## $ opengraph : list()
## $ microformat: list()
## $ rdfa : list()
## $ dublincore :List of 1
## ..$ :List of 3
meta$dublincore[[1]]$elements |>
str(2)
## List of 20
## $ :List of 3
## ..$ name : chr "description"
## ..$ content: chr "Metadata provides a means of indexing, accessing, preserving, and discovering digital resources. The volume of "| __truncated__
## ..$ URI : chr "http://purl.org/dc/elements/1.1/description"
## $ :List of 3
## ..$ name : chr "dcterms.creator"
## ..$ content: chr "Murtha Baca"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/creator"
## $ :List of 3
## ..$ name : chr "dcterms.contributor"
## ..$ content: chr "Tony Gill"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/contributor"
## $ :List of 3
## ..$ name : chr "dcterms.contributor"
## ..$ content: chr "Anne J. Gilliland"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/contributor"
## $ :List of 3
## ..$ name : chr "dcterms.contributor"
## ..$ content: chr "Maureen Whalen"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/contributor"
## $ :List of 3
## ..$ name : chr "dcterms.contributor"
## ..$ content: chr "Mary S. Woodley"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/contributor"
## $ :List of 3
## ..$ name : chr "dcterms.date"
## ..$ content: chr "2016-07-20"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/date"
## $ :List of 3
## ..$ name : chr "dcterms.description"
## ..$ content: chr "Metadata provides a means of indexing, accessing, preserving, and discovering digital resources. The volume of "| __truncated__
## ..$ URI : chr "http://purl.org/dc/elements/1.1/description"
## $ :List of 3
## ..$ name : chr "dcterms.format"
## ..$ content: chr "text/html"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/format"
## $ :List of 3
## ..$ name : chr "dcterms.identifier"
## ..$ content: chr "978-1-60606-500-6"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/identifier"
## $ :List of 3
## ..$ name : chr "dcterms.language"
## ..$ content: chr "en-US"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/language"
## $ :List of 3
## ..$ name : chr "dcterms.publisher"
## ..$ content: chr "Getty Research Institute, Los Angeles"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/publiser"
## $ :List of 3
## ..$ name : chr "dcterms.rights"
## ..$ content: chr "© 2008, 2016 J. Paul Getty Trust"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/rights"
## $ :List of 3
## ..$ name : chr "dcterms.subject"
## ..$ content: chr "Reference"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/subject"
## $ :List of 3
## ..$ name : chr "dcterms.subject"
## ..$ content: chr "ART / Reference"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/subject"
## $ :List of 3
## ..$ name : chr "dcterms.subject"
## ..$ content: chr "ART / Digital"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/subject"
## $ :List of 3
## ..$ name : chr "dcterms.subject"
## ..$ content: chr "LANGUAGE ARTS & DISCIPLINES / Library & Information Science / General"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/subject"
## $ :List of 3
## ..$ name : chr "dcterms.title"
## ..$ content: chr "Introduction to Metadata"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/title"
## $ :List of 3
## ..$ name : chr "dcterms.type"
## ..$ content: chr "InteractiveResource"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/type"
## $ :List of 3
## ..$ rel : chr "publisher"
## ..$ href: chr "http://www.getty.edu"
## ..$ URI : chr "http://purl.org/dc/elements/1.1/publiser"
FIN
Don't forget to share the news about the ACP! (And, as usual, be kind in the comments). ☮