hrbrmstr's Daily Drop

Share this post

Drop #142 (2022-11-23): Data/Vis/Git Toolbox Tidbits

dailyfinds.hrbrmstr.dev

Drop #142 (2022-11-23): Data/Vis/Git Toolbox Tidbits

Crosswalker ⟕; RIATE 🗺️ 🛠️; Husky 🐾

boB Rudis
Nov 23, 2022
2
Share this post

Drop #142 (2022-11-23): Data/Vis/Git Toolbox Tidbits

dailyfinds.hrbrmstr.dev

Crosswalker ⟕

There are some universal truths that will never go away. James Holden loves coffee; There's "OPA", and there's "O.P.A."; and, data is messy.

One of the problems with messy data is that you can have one, lovingly-curated, pristine dataset — say a list of voting precincts — that you need to join with another dataset, which was "crafted" by someone who has not had good data care and feeding skills beaten into them — yet. Your precinct names may be correct, but they may not be spelled the way various precinct captains think they should, or typing errors may have just mangled entries sufficiently to make it impossible to just LEFT JOIN your way to insights.

If you find yourself in need of some record linking, then take a look at Crosswalker [GH] is "a general purpose tool [from WaPo] for joining columns of text data that don’t match perfectly."

Crosswalker was originally developed as "Precinct Matcher," a tool with the express use case of matching precinct names for elections at The Washington Post (precinct names change each election cycle, and it's hard to build election models matching historical data when the names slightly change). This first iteration of the tool was designed hastily but still proved to be very useful for teammembers who used to employ a very manual code-based approach.

In this redesign of the tool, Crosswalker was made to be more generally useful by abstracting the details away from precincts and broadening the tool's scope to be broadly for text-based crosswalking problems. Counties became "join" columns, additional info to show like GeoJSON IDs became "metadata." The original tool only presented a single possible match for each precinct, using a brittle text matching algorithm. The redesigned tool presents every possible match in an interactive, performant spreadsheet as ranked by a new fast and thorough algorithm.

It's a really cool tool that I will likely use for both work and personal projects quite a bit.

RIATE 🗺️ 🛠️

flat ray photography of book, pencil, camera, and with lens
Photo by Dariusz Sankowski on Unsplash

The #30DayMapChallenge is nearly over, but that doesn't mean we have to stop talking about all things geospatial!

This year, I've been nerding out on Bertin.js in many of the challenge entries. Bertin is an incredibly rich and diverse toolbox, but it's only one from the RIATE team. RIRATE is a "Support and Research Unit of the CNRS and the Université Paris Cité (UAR2414). RIATE's activities are part of a reproducible research approach favoring the production and dissemination of free and open source data and methodologies."

  • Bertin.js is just one of many JavaScript libraries they've built and maintain, and should be your go-to resource for thematic mapping (or mapping in general).

  • Geotoolbox is a JavaScript tool for geographers based on d3geo, topojson and jsts. It allows to simply deal with GeoJSON properties and provides several GIS operations useful for thematic cartography.

  • Geoverview — which is based on Maplibre, is a tool to effortlessly display any GeoJSON (and the information it contains) on a map.

  • Statsbreaks is a JavaScript package to group the values of a statistical series into classes (easiest n-tile JS lib I've found).

  • Geocountries has tools to get ISO codes and geometries from country names. It has a "generous" mode, akin to the Crosswalk tool in the first section.

  • Go play with the Observable notebooks to see how useful these resources can be (some outside of geo-proper).

Husky 🐾

siberian husky puppy with red and black collar
Photo by Wesley Sanchez on Unsplash

Git hooks can help ensure your project meets minimum coding standards and save you from some crushingly bad deployments. While you can code them up on your own, there are many frameworks to choose from. I saw WaPo using Husky [GH] and decided to give it a quick look.

If you don't know what these git hooks are and did not tap the definition link: TL;DR: Hooks are programs you can place in a special directory to trigger actions at certain points in git’s execution.

Say you want to make sure your code is in the right style for your team (so they don't strangle you) and/or only gets pushed if the linter is happy with it. Git hooks got ya covered.

Husky provides a framework for managing these Git hook scripts, and this is a great intro article on how to get started with both Git hooks and Husky if you're looking for something to do over the upcoming break in the Colonies.

FIN

Safe travels to everyone travelling this week! ☮

Share this post

Drop #142 (2022-11-23): Data/Vis/Git Toolbox Tidbits

dailyfinds.hrbrmstr.dev
Previous
Next
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 boB Rudis
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing