Discover more from hrbrmstr's Daily Drop
Drop #142 (2022-11-23): Data/Vis/Git Toolbox Tidbits
Crosswalker ⟕; RIATE 🗺️ 🛠️; Husky 🐾
There are some universal truths that will never go away. James Holden loves coffee; There's "OPA", and there's "O.P.A."; and, data is messy.
One of the problems with messy data is that you can have one, lovingly-curated, pristine dataset — say a list of voting precincts — that you need to join with another dataset, which was "crafted" by someone who has not had good data care and feeding skills beaten into them — yet. Your precinct
names may be correct, but they may not be spelled the way various precinct captains think they should, or typing errors may have just mangled entries sufficiently to make it impossible to just
LEFT JOIN your way to insights.
Crosswalker was originally developed as "Precinct Matcher," a tool with the express use case of matching precinct names for elections at The Washington Post (precinct names change each election cycle, and it's hard to build election models matching historical data when the names slightly change). This first iteration of the tool was designed hastily but still proved to be very useful for teammembers who used to employ a very manual code-based approach.
In this redesign of the tool, Crosswalker was made to be more generally useful by abstracting the details away from precincts and broadening the tool's scope to be broadly for text-based crosswalking problems. Counties became "join" columns, additional info to show like GeoJSON IDs became "metadata." The original tool only presented a single possible match for each precinct, using a brittle text matching algorithm. The redesigned tool presents every possible match in an interactive, performant spreadsheet as ranked by a new fast and thorough algorithm.
It's a really cool tool that I will likely use for both work and personal projects quite a bit.
RIATE 🗺️ 🛠️
#30DayMapChallenge is nearly over, but that doesn't mean we have to stop talking about all things geospatial!
This year, I've been nerding out on Bertin.js in many of the challenge entries. Bertin is an incredibly rich and diverse toolbox, but it's only one from the RIATE team. RIRATE is a "Support and Research Unit of the CNRS and the Université Paris Cité (UAR2414). RIATE's activities are part of a reproducible research approach favoring the production and dissemination of free and open source data and methodologies."
jsts. It allows to simply deal with GeoJSON properties and provides several GIS operations useful for thematic cartography.
Geocountries has tools to get ISO codes and geometries from country names. It has a "generous" mode, akin to the Crosswalk tool in the first section.
Go play with the Observable notebooks to see how useful these resources can be (some outside of geo-proper).
Git hooks can help ensure your project meets minimum coding standards and save you from some crushingly bad deployments. While you can code them up on your own, there are many frameworks to choose from. I saw WaPo using Husky [GH] and decided to give it a quick look.
If you don't know what these git hooks are and did not tap the definition link: TL;DR: Hooks are programs you can place in a special directory to trigger actions at certain points in git’s execution.
Say you want to make sure your code is in the right style for your team (so they don't strangle you) and/or only gets pushed if the linter is happy with it. Git hooks got ya covered.
Husky provides a framework for managing these Git hook scripts, and this is a great intro article on how to get started with both Git hooks and Husky if you're looking for something to do over the upcoming break in the Colonies.
Safe travels to everyone travelling this week! ☮