

Discover more from hrbrmstr's Daily Drop
dot
Before diving into "dot", I think it would be helpful if you did a quick read about a process called "KYC", which stands for “Know Your Client”. The folks at Stoke did a decent job explaining it. We'll be here when you get back.
Back? Good!
Now, on to dot, which stands for "Deepfake Offensive Toolkit".
You'll see that the aforelinked GitHub repo is owned Sensity, a company with KYC, fraudulent document detection, and deepfake detection and monitoring offerings. If you've managed not to come across the term "deefake" before, you can catch up here, and know that you also have my envy regarding better life choices.
Here's the setup from Sensity's paper
that introduces dot:Deepfakes are exploited by fraudsters to bypass biometric KYC checks in online onboarding processes used in banking, fintech, telco, insurance, crypto, and gambling. This report details Sensity's systematic penetration tests of 10 among the top KYC vendors worldwide, equivalent to an estimated 1/4 of the global market share. We have created dot (the deepfake offensive toolkit), the industry-first toolkit for injecting deepfakes into biometrics systems, with the specific goal to test the vulnerability of the standard steps in every KYC solution: ID document authentication, face matching and liveness detection. We found that 90% of the vendors are severely vulnerable to deepfake attacks.
Don't judge Sensity too harshly for having a service that tries to defend organizations from KYC attacks, yet also created and makes available a free tool for conducting such attacks (it's a practice in old school cybersecurity spaces like vulnerability management, too).
dot is a real-time deepfake face swapper that will likely run smoothly on your PC or Mac if you have the patience to go through the setup processes. Give the model a photo, fire up dot, and use the dot stream as an OBS virtual camera. Now you, too, can commit wire fraud from the comfort of your own home/desk.
The Biometrics Research Group's Chris Burt (@afakechrisburt) covered Sensity's creation and the overall topic of deepfakes in the KYC biometric space pretty well, so I'll leave you with that as a reading assignment along with a very accessible video on deepfakes from friend-of-the-newsletter and all around stellar human, Erick Galinkin (@ErickGalinkin):
Oh, and Sensity's own disclaimer, so I don’t get in legal trouble for advocating wire-fraud:
dot is developed for research and demonstration purposes. As an end user, you have the responsibility to obey all applicable laws when using this program. Authors and contributing developers assume no liability and are not responsible for any misuse or damage caused by the use of this program.
Roslingifier
Between the TED platform, YouTube, and other viewing venues, Hans Rosling's 2006 "Debunking myths about the 'third world'" (the section banner video) has garnered tens of millions of views. In it, Rosling narrated an animated scatterplot to show the progression of various country demographics through time, in an attempt to shatter our biases. You can interact with a modern version of said chart and data over at Gapminder.
Rosling's creation was quite compelling, though I'm fairly certain it had a greater impact on future work in data-driven storytelling than it did on societal biases. Nevertheless, this type of storytelling is compelling, if only in the moment, but not everyone is capable of wielding the tools and the talents necessary to pull off a "Rosling", even with all our modern tools.
A group of researchers set out to democratize both the tooling and narrative script crafting, and recently published their findings in "Roslingifier: Semi-Automated Storytelling for Animated Scatterplots" [PDF].
I'll let them introduce it:
We present Roslingifier, a data-driven storytelling method for animated scatterplots. Like its namesake, Hans Rosling (1948–2017), a professor of public health and a spellbinding public speaker, Roslingifier turns a sequence of entities changing over time—such as countries and continents with their demographic data—into an engaging narrative telling the story of the data. This data-driven storytelling method with an in-person presenter is a new genre of storytelling technique and has never been studied before. In this paper, we aim to define a design space for this new genre—data presentation—and provide a semi-automated authoring tool for helping presenters create quality presentations. From an in-depth analysis of video clips of presentations using interactive visualizations, we derive three specific techniques to achieve this: natural language narratives, visual effects that highlight events, and temporal branching that changes playback time of the animation. Our implementation of the Roslingifier method is capable of identifying and clustering significant movements, automatically generating visual highlighting and a narrative for playback, and enabling the user to customize. From two user studies, we show that Roslingifier allows users to effectively create engaging data stories and the system features help both presenters and viewers find diverse insights.
The paper is very accessible (a rare attribute of academic publications), and the authors walk the reader through how they came up with this classification of storytelling techniques and intentions:
and how they went about crafting the "Roslingifier" tool.
I'm all in favor of giving more folks easy to use tools to help share their own stories. I just hope we're able — as a society — to get back (soon) to a place where we're ready, willing, and able to start listening to said stories, again.
Back at the beginning of 2020, at the early stages of the pandemic, we were awash in what now seems like only a handful of us considered visually compelling, data-driven, narrative frames of how we could — together — get through SARS-CoV-2. The evidence is pretty clear that the effects of said work were meager at best.
You can get a glimpse of the tool and hear more from the researchers over at YouTube, which should be a far more uplifting experience than the previous paragraph:
JSONPath
Since I made y'all read a ton of material in the first two sections, I'll do a quick drop of a focused tool to round out this edition.
I grew up with XPath, which is a way to select/target one or more nodes in an XML document. At some point, the data masses cut themselves one too many times on XML's point sharp bracket edges and decided to stop the bloodshed and migrated over to JSON as the de-facto data standard (XML is dead! Long live XML!). In doing so, they lost the ability to perform said node selection. That is, they did until JSONPath came around.
With XML, you can get to the title of the first book in the classic bookstore demo XML via this XPath expression:
/catalog/book[1]/title
If one were to encode the bookstore data as JSON, the JSONPath expression would be:
$.catalog.book[0].title
We've mentioned some CLI JSON tools in previous newsletter editions, and most JSON tools support JSONPath expressions (some add their own extraction syntax into the mix because tool lock in is appreciated by so many of us).
Getting the hang of JSONPath can be tricksy, and figuring out the right path for some truly gnarly JSON (we switched from XML again, why?) can be maddening.
Enter JSONPath Online Evaluation Tool (GH), a tool that does what it says on the tin.
Give it your JSON and start throwing random JSONPath strings at it until you get what you want. (What…that's not how we're supposed to do it?)
FIN
I'm eager to see if anyone tries dot IRL during their next work Zoom call. Drop a note in the comments if you do! ☮