

Discover more from hrbrmstr's Daily Drop
I suspect I am not the only one who deals with massive quantities of JSON on a regular basis. One particular bit of JSON is quite central to the data workflows where I work, and I have become even more fond of JSON correctness than I was when I headed up the team at the Verizon DBIR.
So, today, we take a look at some ways to keep your JSON neat, tidy, and correct.
All three resources are in the “do one thing, well” category, so this is also a pretty short edition blather-wise.
TL;DR
This is an AI-generated summary of today's Drop.
Perplexity had the notion of 'bots' long before OpenAI's reveal last week; they just didn't call them that. So, I finally configured a system prompt for this section into its own 'collection' (Perplexity's term) and it seems to have done a better job than the cut/paste prompt routine I'd been using.
The blog post begins by discussing the importance of JSON correctness and introduces the concept of JSON Schema, a way to define the expected structure of a JSON data source. It highlights a tool called json-schema-generator that can be used to convert JSON records into a schema.
The post then moves on to explain how to use the generated schema to validate records. It introduces v8r, a JSON and YAML validator that uses Schema Store to detect a suitable schema for input files based on the filename or explicit input.
Lastly, the post briefly touches on how to check whether two JSON files are the same using a tool called json_diff. It explains that this tool is particularly useful because JSON fields can be in any order, so a specialized tool is needed to accurately compare two JSON files.
json-schema-generator
Folks who live in R and Python land tend to not have to think too much about reading and writing JSON. Both languages will gladly [de]serialize anything, figuring out what to do on-the-fly. Go, Rust, Swift, and other languages need you to specify the structure for the [de]serialization, but that does little good if the JSON ends up being garbage.
Enter, JSON Schema, a way to define the expected structure of a JSON data source so you can validate it, and/or generate the structures needed by the aforementioned Go, Swift, Rust, et al.
A nifty tool to turn your JSON records into a schema is json-schema-generator (GH)
It's simple to use (yes, 'simple'):
$ json-schema-generator rule1.json -o rule-schema.json
Here's what that outputs:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "",
"type": "object",
"properties": {
"name": {
"type": "string",
"minLength": 1
},
"id": {
"type": "string",
"minLength": 1
},
"uuid": {
"type": "string",
"minLength": 1
},
"description": {
"type": "string",
"minLength": 1
},
"sub_category": {
"type": "string",
"minLength": 1
},
"confidence": {
"type": "string",
"minLength": 1
},
"intention": {
"type": "string",
"minLength": 1
},
"references": {
"type": "array",
"items": {
"required": [],
"properties": {}
}
},
"cves": {
"type": "array",
"items": {
"required": [],
"properties": {}
}
},
"queries": {
"type": "array",
"items": {
"required": [],
"properties": {}
}
},
"negates": {
"type": "boolean"
},
"silent": {
"type": "boolean"
},
"user_submitted": {
"type": "boolean"
},
"recommend_block": {
"type": "boolean"
},
"enabled": {
"type": "boolean"
},
"category": {
"type": "string",
"minLength": 1
},
"created": {
"type": "string",
"minLength": 1
}
},
"required": [
"name",
"id",
"uuid",
"description",
"sub_category",
"confidence",
"intention",
"references",
"cves",
"queries",
"negates",
"silent",
"user_submitted",
"recommend_block",
"enabled",
"category",
"created"
]
}
You can then tweak the schema to your heart's content.
Next, we'll show you how to use said schema to validate your records.
v8r
v8r (GH) is a JSON and YAML validator, and a handy tool that defaults to using Schema Store (GH) to detect a suitable schema for your input files based on the filename (or via explicit input).
The options kind of self-document the utility (as all help screens should):
--help
: Shows help information [boolean]--version
: Shows the version number [boolean]-v, --verbose
: Runs with verbose logging. Can be stacked e.g: -vv -vvv [count]-s, --schema
: Specifies a local path or URL of a schema to validate against. If not supplied, the utility will attempt to find an appropriate schema on schemastore.org using the filename. If passed with glob pattern(s) matching multiple files, all matching files will be validated against this schema [string]-c, --catalogs
: Specifies a local path or URL of custom catalogs to use prior to schemastore.org [array]--ignore-errors
: Exits with code 0 even if an error was encountered. Passing this flag means a non-zero exit code is only issued if validation could be completed successfully and one or more files were invalid [boolean] [default: false]--cache-ttl
: Removes cached HTTP responses older than seconds old. Passing 0 clears and disables cache completely [number] [default: 600]--format
: Specifies the output format for validation results [string] [choices: "text", "json"] [default: "text"]
Here's a sample run against good and bad records for the schema in the previous section:
$ npx v8r@latest rule1.json -c ~/Data/jschema.json
ℹ No config file found
ℹ Processing ./rule1.json
ℹ Found schema in /Users/hrbrmstr/Data/jschema.json ...
ℹ Validating ./rule1.json against schema from https://rud.is/dl/tag-schema.json ...
✔ ./rule1.json is valid
$ npx v8r@latest rule2.json -c ~/Data/jschema.json
ℹ No config file found
ℹ Processing ./rule2.json
ℹ Found schema in /Users/hrbrmstr/Data/jschema.json ...
ℹ Validating ./rule2.json against schema from https://rud.is/dl/tag-schema.json ...
✖ ./rule2.json is invalid
./rule2.json# must have required property 'sub_category'
You can also run your own Schema Store.
This is a super handy utility to use in CI/CD workflows.
json_diff
Super quick section that has almost nothing to do with JSON schemas, but is something you can do to check whether two JSON files are the same.
json_diff is meant for humans (no JSON/parse-able output unless you use the lib crate to write your own), and provides quick clarity on where two files diverge.
If you're wondering, "Why not just use diff
", the answer would be that JSON fields can be in any order, so to truly get a real diff, you need to remove that potential variability from the test.
This, too, is simple ot use.
$ json_diff f rule1.json rule2.json
Extra on left:
sub_category
FIN
What are your fav JSON schema/diff'ing/sanity-preserving tools? ☮️