

Discover more from hrbrmstr's Daily Drop
While I have no doubt some percentage of Drop readers have explored the world of graph databases and may even use them as a daily driver, or at least in some task-specific ways. However, my Spidey-sense suggests that many of you are like me, and lean into rectangular data way more often than not.
So, today, we introduce (or refresh) what graph databases are, how they work, why you might choose one over a "traditional" relational database or key/value store, and then close with a little project that won't require much infrastructure at all, but will help you get [re]familiar with these networked DBeasties, so you have more choices at your disposal when deciding where to store that next bit of information.
Substack has a fancy new “generate” option for images, so I used it and included the (lame) prompts I “crafted” to have it pass it on to whatever back-end it is using.
Graph Databases 101
A graph database is a type of database management system (DBMS) that is designed specifically for storing, managing, and querying graph data. In a graph database, data is represented as a graph, consisting of nodes (also called vertices) and edges (also called relationships or — my preferred term — edges) that connect the nodes.
In contrast to traditional relational databases, which use tables and columns to organize data, graph databases use a graph model, where the nodes and edges have properties that can be queried and analyzed. This makes graph databases well-suited for managing complex data structures and relationships, such as social networks, recommendation engines, and fraud detection systems.
Graph databases are designed to be highly scalable, and most advertise that they can handle large amounts of data with ease. In relational database land, you usually need to define what you're storing up front (with table schemas), along with the relationships between sets of things (i.e. primary/foreign keys across tables). In contrast, graph databases are pretty very flexible, allowing users to easily add or modify the structure of the data without the need for complex schema migrations. Key/value stores are also somewhat flexible, but for some use cases the "value" is often a binary blog requiring the use of protocol buffers to make sense of, and that means updating the protocol buffer spec/code whenever the contents changes.
There are several popular, "big time" graph databases available, including Neo4j, Amazon Neptune, and JanusGraph, among others. These databases are typically used in a variety of applications, including social networks, financial services, healthcare, and e-commerce.
These can be a bit, er, intimidating if you're not used to them, so let's start off by letting you get familiar with graph databases in a risk-free, straightforward environment.
Graph Databases 201
CogDB is a focused Python module for working with graph data. It is not as extensive as the bigger players (including the one you're getting for homework), but it's pretty friendly, and you don't need to install anything to check out the basic concepts of graph databases. Make sure to keep the docs bookmarked, since this isn't a super popular package, and the (scant) info provided from the developers will be useful as you explore it. I don't suggest spending a ton of time in it, but it is a good environment to help nail down the concept of vertices, edges, and relations.
If working locally, you'll need to do python -m pip install cogdb
, otherwise head over to the playground. I did the following in Quarto, but you should be able to copy each block after g = Graph("books")
into the playground query pane and have it "just work".
Let's create the graph (use the "Create" button in the playground):
from cog.torque import Graph
g = Graph("books")
Now, let's add some relations:
g.put("James S.A. Corey", "authored", "Leviathan Wakes")
g.put("James S.A. Corey", "authored", "Persepolis Rising")
g.put("George Orwell", "authored", "1984")
g.put("J.R.R. Tolkien", "authored", "The Lord of the Rings")
g.scan()
and see if it captured all the vertices:
{'result': [{'id': '1984'},
{'id': 'James S.A. Corey'},
{'id': 'The Lord of the Rings'},
{'id': 'J.R.R. Tolkien'},
{'id': 'Persepolis Rising'},
{'id': 'Leviathan Wakes'},
{'id': 'George Orwell'}]}
Now, let's see if it captured our relation (the "authored" edge):
g.lsv()
Note that "follows
" is a built-in edge.
Which books did "James S.A. Corey" author?
g.v("James S.A. Corey").out().all()
{'result': [{'id': 'Persepolis Rising'}, {'id': 'Leviathan Wakes'}]}
How many was that, again?
g.v("James S.A. Corey").out().count()
Can I "see" all the authors/books in the database?
g.v().tag("from").out("authored").tag("to").view("authored").render()
I kept my intro short, and the examples will give you some more things to try, but I really only intended this to help you get your feet wet and your mind used to the terms. This installs everywhere (I may try to set up a Pyodide app for this over the weekend, too).
Your Mission
If you're a graph database "pro", your mission is to go out and get some fresh air and de-screen for a bit 😎.
For everyone else, I'm going to invite you over to the Neo4j Academy where you can learn how to wield practical, real-world, modern graph database concepts in focused "chunks" regardless of what level you are at right now.
I'm saving "why" for the end, which you are almost at!
FIN
So, why bother with this when there are so many Chat-ty toys to play with?
When the grifter-caused hoopla finally settles down, we're going to have a few dozen LLMs capable of spitting out data in almost every field (hopefully trained on focused data sets to solve specific problems). They will help us make connections between bits of data. In fact, they already are.
Knowing how to wield that “connections” data will be the real superpower (along with being a top-notch prompt engineer). I'd really like as many folks to be ready for that future as possible, hence the topic of today's Drop.
If you have found other resources for learning about "graphs", please share them in the comments. ☮