Discover more from hrbrmstr's Daily Drop
Drop #277 (2023-06-14): Multi-threaded Edition v0.7.0
PassGPT; Guidelines for the Organization of Fully Online Meetings; Text Editor Data Structures; DEI (Language) In Cyber
I'll be peppering Drops this and next week with bits I'm catching up on from being AFK last week. A bit of said spice is in the first section on a novel password guesser and generator based on GPT-2. However, an entire Drop on passwords/credentials would be dreadfully boring, so we'll lean into the "multi-threaded” catch-all Drop title and walk through a somewhat (there are two “cyber” bits) diverse array of other, recent-ish finds.
I realize we've been promised a future without passwords, but I suspect we'll still be using them for quite some time. As such, it should be obvious by now that a credential manager of some sort is something we all should install immediately on any new system/browser. As we saw with plethora of blog posts after the LastPass debacle, these managers use various techniques to not only keep passwords safe, but to generate new, safe ones. Such generation techniques can and do evolve, and credential keeper/generators have a new method at their disposal thanks to our AI overlords.
The paper “PassGPT: Password Modeling and (Guided) Generation with Large Language Models” (direct PDF) — by Javier Rando, Fernando Perez-Cruz, and Briland Hitaj — discusses a large language model (LLM) trained on password leaks for password generation that is useful in both generating new ones (character-by-character) and understanding and predict human password creation patterns.
Here's the abstract:
Large language models (LLMs) successfully model natural language from vast amounts of text without the need for explicit supervision. In this paper, we investigate the efficacy of LLMs in modeling passwords. We present PassGPT, an LLM trained on password leaks for password generation. PassGPT outperforms existing methods based on generative adversarial networks (GAN) by guessing twice as many previously unseen passwords. Furthermore, we introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints, a feat lacking in current GAN-based strategies. Lastly, we conduct an in-depth analysis of the entropy and probability distribution that PassGPT defines over passwords and discuss their use in enhancing existing password strength estimators.
While I tend to think of LLMs/GPTs as slow beasties (based upon almost every interaction with them in generic contexts), domain-specific ones that don't use the most gigantic base models are 100% performant enough to use in practical contexts. PassGPT uses GPT-2, so no API key is required, modern CPUs are all that are needed, and the skeezy folks at OpenAI won't steal your passwords.
PassGPT is pretty impressive when compared with previous work on other deep generative models in this password domain. It can guess 20% more unseen passwords and demonstrates decent generalization capabilities to novel leaks.
The researchers also enhanced PassGPT with vector quantization, resulting in PassVQT, an architecture that can increase the perplexity of generated passwords. As noted up top, the model sequentially samples each character, allowing a more granular guided exploration of the search space based on arbitrary constraints — a significant improvement over existing GAN-based strategies, which lack this feature.
They also discuss how password probabilities under their PassGPT model can be used to enhance existing strength estimators, potentially identifying passwords that are easy to guess by generative approaches, even though they are considered “strong” by conventional metrics.
It was a very accessible read, and I hope credential managers are also experimenting with similar approaches. It'll also be neat to see how the “breakers” in cybersecurity adopt this and similar tools.
Guidelines for the Organization of Fully Online Meetings
An immediately practical-for-all RFC hit my RSS reader this week: RFC 9400 — Guidelines for the Organization of Fully Online Meetings. This information-only RFC provides guidelines for planning and organizing fully online [IETF] meetings.
While it is directed at IETF meetings, the document provides a useful set of guidelines for planning and organizing any online meeting or workshop that has a potentially large attendee list from diverse places across the globe. It exists due to the shift to fully online IETF meetings during the first two years of the ongoing COVID-19 pandemic. IETF folks are deep data nerds, so they did what any good data nerd would do: analyzed the efficacy of the choices made to figure out what did/didn't work so future meetings could be more engaging and productive.
One of the core guidelines/recommendations is the selection of meeting time zones that rotate across regions to minimize late night sessions for participants.
Time-of-day is not the only consideration, and the researchers found that session lengths of 60 and 120 minutes with 30-minute breaks provide a good balance for the majority of participants.
While it may be tempting to have parallel tracks (since meatspace is not a consideration), the authors note that these may still increase scheduling conflicts. Interim meetings can help reduce conflicts but lack the cross-participation of full meetings.
There are other great suggestions in the RFC that anyone organizing an online event/conference/workshop should consider experimenting with.
Text Editor Data Structures
ripgrep through the markdown history of the Drop suggests I occasionally note the oddity that is our human need to build new text editors.
We have another installment in this space via a blog post/project by Cameron DaCamara that walks through the complexities associated with designing and building yet-another-text-editor, focusing the discussion on the data structures used to represent a “document”.
It's a 15-20 minute read, and Cameron is a solid communicator, so I won't detract from their work with an attempt at a summary here. Grab a cup/mug of your fav beverage and dig into Cameron's dive into piece trees/tables (which have their origins dating back all the way to 1998).
Every time I read one of these “build a new text editor” deep-dives, I am very thankful that I have never been bitten by that particular “need to write one” bug.
DEI (Language) In Cyber
I am usually loath to say anything nice about my professions “premier” accreditation organization (ISC(2)), but they actually did A Good Thing™!
Cybersecurity is still one of the least inclusive professions. It is dominated by the usual suspects and has an ongoing problem with “bro” culture, especially at the “summer camp” events in August of every year.
We urgently need more diversity, equity, and inclusion, so I'm thrilled the ISC(2) folks were brave enough to release a DEI Language Guide (direct large PDF) to help us move away from some fairly old school toxic/archaic language.
The document provides an “Alternative Vocabulary Guide” with the following categories:
Race and Ethnicity
Gender and Orientation
Military and Criminal Justice
As the authors note, inclusive terminology is just one way to demonstrate commitment to DEI and cultivate a sense of belonging; but, it should help show that the cybersecurity industry can and is evolving and that there are an increasing number of us committed to removing barriers.
I tried to reduce the tech-heaviness today, but fear I only increased cognitive load with this tome of an edition. We'll try to keep things a bit shorter tomorrow preparing for Friday's WPE! ☮