

Discover more from hrbrmstr's Daily Drop
CLI Data Wrangling
This category will come up fairly frequently as there are so many great CLI tools to work with various kinds of data. If the acronym is unfamiliar, it stands for Command Line Interpreter, which is a fancy way of saying that unadorned black-background, text-only window that gets you to the terminal/shell of your favorite operating system.
For macOS users, there’s the built-in Terminal.app
, but there are more modern alternatives such as iTerm and WezTerm. Linux users likely already know what’s going on CLI-wise, and Windows folks should really consider using Microsoft’s modern Terminal. As with all things “app”, there are scads of alternatives on all three operating systems. If you are new to the command line, there are also scads of learning resources, including this one.
There are many established “go-to” CLI data wrangling tools, including some built-in to most operating system command libraries that we may cover at a later date, but today we’re focusing on trdsql
, a tool that can execute SQL queries on CSV, LTSV, JSON and TBLN and output to various formats.
“CSV” should be familiar to most folks reading this as it’s the central way to platform agnostically move tabular data around. “LTSV” stands for Labeled Tab Separated Values and is a variant of Tab-separated Values (“TSV”) and differs from a traditional TSV in that:
Each record in a LTSV file is represented as a single line. Each field is separated by TAB and has a label and a value. The label and the value have been separated by ':'. With the LTSV format, you can parse each line by spliting with TAB (like original TSV format) easily, and extend any fields with unique labels in no particular order.
“TBLN” (for the life of me I can’t figure out what it expands to) is like a CSV file but contains more metadata and can also include comments. (FWIW, I’ve never seen a TBLN file IRL.) While the linked site has amazing documentation, I’ll drop this tiny example of turning the output of the ps
command (which displays running processes) into newline delimited JSON:
$ ps -ef | trdsql -ojson -id " " "SELECT * FROM -"
[
{
"c1": "UID",
"c2": "PID",
"c3": "PPID",
"c4": "C",
"c5": "STIME",
"c6": "TTY",
"c7": "TIME",
"c8": "CMD"
},
{
"c1": "0",
"c2": "1",
"c3": "0",
"c4": "0",
"c5": "Thu03PM",
"c6": "??",
"c7": "70:07.66",
"c8": "/sbin/launchd"
},…
(NB: I’ll likely refrain from using Substack code blocks given how poor the choices are for rendering)
What impresses me the most is that it’s both fast and memory efficient:

While R is my 🔨, I’m looking to “play the field” a bit more this year. I may just try to crank out a {dbplyr} back-end (or at least a package wrapper) for this handy CLI tool.
Tolkien Maps
Our entire clan can quote large passages from J. R. R. Tolkien’s works, and I find one particular feature of his LoTR and Hobbit books incredibly compelling: his hand drawn maps.
The Tolkien Estate recently dropped high resolution versions of selected maps Tolkien created along with some commentary on the creation process.
The above is a tiny version of one of them as the Estate seems to have gone to great lengths to attempt to copy protect the site (the maps page is one, giant SVG!), and I prefer to honor such efforts at least when I feel they are justifi.
The caption for the map notes tht it “grew ove time” (by taping sheets together) as Tolkien was world-building. I can only imagine what he would have crafted with modern tooling.
DataVis Design Principles
If you thought that my tioning of “design principles” in the previous post was a signal there’d be future inents, each diving into specific ones, you were correct!
I 💙 data visualization, and making good datavis is hard. Design principles can help by providing structure and guidance to the creative process to help ensure you’re crafting the desired narrative.
I found this “Dos and don’ts of data visualisation” resource by the European Envit Agency (EEA) to be one of the better principled design guides when it comes to datavis. Each principle, such as “Do tell the ‘why’ and ‘how’: annotations,” has:
a quick overview of the principle
a larger solid exposition with detailed, specific guidance
on-page good/bad chart examples
links to IRL published materials that embody the for opinions
EEA covers quite a bit of ground:
Hilight Your Message
Do tell the ‘why’ and ‘how’: annotations.
Do highlight what’s important, tell one story
Hierarchy of the information
Choose Your Chart
Tables are preferable to graphics for many small data sets
Exploratory/explanatory: do choose the right format (flow chart)
Static or interactive?
Do choose the chart type wisely
Bar chart: do use the full axis and avoid distortion
Pie charts: cons (and pros)
Small multiples
Stacked charts are difficult for comparing data
Dual axis charts, pros and cons
Make Charts Easy To Read
Do use clear language and avoid acronyms
Do remove any visual clutter (increase data-ink ratio, Tufte’s principle)
Do rotate bar chart when category names are too long
Don’t use a legend when you have only one data category
Do use direct labelling wherever possible, avoiding indirect look-up
Do sort your data for easier comparisons
Don't use more than (about) six colours
Do be aware of colour blindness (colour vision deficiency)
Make Charts Correct
Do use consistent intervals on axis (be transparent on data gaps)
Do use proper aspect ratio to minimise dramatic slope effects
Don't confuse correlation with causation
Do adjust for inflation in long-time series
Do be careful about how you treat ‘no-data/missing data’
Don't compare apples with oranges
Do show the level of confidence
Dashboard
10 best practices for building effective dashboards
Final Checks
Data visualisation checklist
Do ask others for opinions
A personal goal for 2022 is to start using guides like this in a more regular and deliberative fashion and in a future edition I’ll reference some of the tools I use to keep resources like this handy.
FIN
That’s a wrap for this post! If you choose to interact in the comments, the only rule is to be kind to each other. ☮