

Discover more from hrbrmstr's Daily Drop
Drop #142 (2022-11-23): Data/Vis/Git Toolbox Tidbits
Crosswalker โ; RIATE ๐บ๏ธ ๐ ๏ธ; Husky ๐พ
Crosswalker โ
There are some universal truths that will never go away. James Holden loves coffee; There's "OPA", and there's "O.P.A."; and, data is messy.
One of the problems with messy data is that you can have one, lovingly-curated, pristine dataset โ say a list of voting precincts โ that you need to join with another dataset, which was "crafted" by someone who has not had good data care and feeding skills beaten into them โ yet. Your precinct name
s may be correct, but they may not be spelled the way various precinct captains think they should, or typing errors may have just mangled entries sufficiently to make it impossible to just LEFT JOIN
your way to insights.
If you find yourself in need of some record linking, then take a look at Crosswalker [GH] is "a general purpose tool [from WaPo] for joining columns of text data that donโt match perfectly."
Crosswalker was originally developed as "Precinct Matcher," a tool with the express use case of matching precinct names for elections at The Washington Post (precinct names change each election cycle, and it's hard to build election models matching historical data when the names slightly change). This first iteration of the tool was designed hastily but still proved to be very useful for teammembers who used to employ a very manual code-based approach.
In this redesign of the tool, Crosswalker was made to be more generally useful by abstracting the details away from precincts and broadening the tool's scope to be broadly for text-based crosswalking problems. Counties became "join" columns, additional info to show like GeoJSON IDs became "metadata." The original tool only presented a single possible match for each precinct, using a brittle text matching algorithm. The redesigned tool presents every possible match in an interactive, performant spreadsheet as ranked by a new fast and thorough algorithm.
It's a really cool tool that I will likely use for both work and personal projects quite a bit.
RIATE ๐บ๏ธ ๐ ๏ธ
The #30DayMapChallenge
is nearly over, but that doesn't mean we have to stop talking about all things geospatial!
This year, I've been nerding out on Bertin.js in many of the challenge entries. Bertin is an incredibly rich and diverse toolbox, but it's only one from the RIATE team. RIRATE is a "Support and Research Unit of the CNRS and the Universitรฉ Paris Citรฉ (UAR2414). RIATE's activities are part of a reproducible research approach favoring the production and dissemination of free and open source data and methodologies."
Bertin.js is just one of many JavaScript libraries they've built and maintain, and should be your go-to resource for thematic mapping (or mapping in general).
Geotoolbox is a JavaScript tool for geographers based on
d3geo
,topojson
andjsts
. It allows to simply deal with GeoJSON properties and provides several GIS operations useful for thematic cartography.Geoverview โ which is based on Maplibre, is a tool to effortlessly display any GeoJSON (and the information it contains) on a map.
Statsbreaks is a JavaScript package to group the values of a statistical series into classes (easiest n-tile JS lib I've found).
Geocountries has tools to get ISO codes and geometries from country names. It has a "generous" mode, akin to the Crosswalk tool in the first section.
Go play with the Observable notebooks to see how useful these resources can be (some outside of geo-proper).
Husky ๐พ
Git hooks can help ensure your project meets minimum coding standards and save you from some crushingly bad deployments. While you can code them up on your own, there are many frameworks to choose from. I saw WaPo using Husky [GH] and decided to give it a quick look.
If you don't know what these git hooks are and did not tap the definition link: TL;DR: Hooks are programs you can place in a special directory to trigger actions at certain points in gitโs execution.
Say you want to make sure your code is in the right style for your team (so they don't strangle you) and/or only gets pushed if the linter is happy with it. Git hooks got ya covered.
Husky provides a framework for managing these Git hook scripts, and this is a great intro article on how to get started with both Git hooks and Husky if you're looking for something to do over the upcoming break in the Colonies.
FIN
Safe travels to everyone travelling this week! โฎ