

Discover more from hrbrmstr's Daily Drop
Some folks strive to grab the shortest username on a new service/system. Others race to register a prized single character domain of a new, 2-character TLD (usually for nefarious purposes). And, a remaining few humans seem to be intent on rapidly reducing the number of available 2-character combinations of “[[:alpha:]]q
” names for command line utilities for the rest of us. I don't know if the "q" craze started with the venerable jq
utility, but folks certainly have hopped on that bandwagon.
Today, we cover three of them that all hold to the q
part standing for “query”.
fq
We “data” people typically have it so good. CSV. TSV. JSON. XML (that one is def a fringe “good”, tho). Lots of solid text formats that we can slice, dice and clean to our hearts' content. But, we're not always that lucky.
One area where my “cyber” side gets to nerd out with my “data” side is when I encounter mysterious binary blobs. An example of this is my arabia R package which I wrote to help someone out back in my “live on StackOverflow” days. These “fine folks” do not publish the format of their data files and someone needed to be able to read them in R, and I like solving binary data puzzles.
Rant warning for the next ❡.
There are locked/hidden binary formats in our data world, too. One is from horrendous, dragon-droppings (play on “drag and drop”) Tableau, where they'll sue you into oblivion if you even attempt to decode their locked format. This is made even more terrible since they add inertia to enabling the “data” option in the “download” button on visualizations organizations and municipalities publish, further keeping precious data away from more useful, open tools.
Other companies also have binary blobs with no public documentation.
There are many ways to dissect binary files to open up their secrets, but a recent one by Mattias Wadman is pretty darn cool in current form, and has much future promise.
fq is inspired by the well known jq tool and language and allows you to work with binary formats the same way you would using jq. In addition it can present data like a hex viewer, transform, slice and concatenate binary data. It also supports nested formats and has an interactive REPL with auto-completion.
It was originally designed to query, inspect and debug media codecs and containers like mp4, flac, mp3, jpeg. But has since then been extended to support a variety of formats like executables, packet captures (with TCP reassembly) and serialization formats like JSON, YAML, XML, ASN1 BER, Avro, CBOR, protobuf. In addition it also has functions to work with URL:s, convert to/from hex, number bases, search for things etc.
In summary it aims to be jq, hexdump, dd and gdb for files combined into one.
It already has support for a crazy number of formats:
aac_frame, adts, adts_frame, amf0, apev2, ar, asn1_ber, av1_ccr, av1_frame, av1_obu, avc_annexb, avc_au, avc_dcr, avc_nalu, avc_pps, avc_sei, avc_sps, avi, avro_ocf, bencode, bitcoin_blkdat, bitcoin_block, bitcoin_script, bitcoin_transaction, bits, bplist, bsd_loopback_frame, bson, bytes, bzip2, cbor, csv, dns, dns_tcp, elf, ether8023_frame, exif, fairplay_spc, flac, flac_frame, flac_metadatablock, flac_metadatablocks, flac_picture, flac_streaminfo, gif, gzip, hevc_annexb, hevc_au, hevc_dcr, hevc_nalu, hevc_pps, hevc_sps, hevc_vps, html, icc_profile, icmp, icmpv6, id3v1, id3v11, id3v2, ipv4_packet, ipv6_packet, jpeg, json, jsonl, macho, macho_fat, markdown, matroska, mp3, mp3_frame, mp3_frame_tags, mp4, mpeg_asc, mpeg_es, mpeg_pes, mpeg_pes_packet, mpeg_spu, mpeg_ts, msgpack, ogg, ogg_page, opus_packet, pcap, pcapng, png, prores_frame, protobuf, protobuf_widevine, pssh_playready, rtmp, sll2_packet, sll_packet, tar, tcp_segment, tiff, toml, udp_datagram, vorbis_comment, vorbis_packet, vp8_frame, vp9_cfm, vp9_frame, vpx_ccr, wasm, wav, webp, xml, yaml, zip
The usage is also pretty straightforward once you get used to it (it's very similar to jq
so it may be immediately grokable by a large swath of readers):
# recursively display decode tree but truncate long arrays
fq d file
# same as
fq display file
# display all bytes for each value
fq 'd({display_bytes: 0})' file
# display 200 bytes for each value
fq 'd({display_bytes: 200})' file
# recursively display decode tree without truncating
fq da file
# recursively and verbosely display decode tree
fq dv file
# JSON repersenation for whole file
fq tovalue file
# recursively look for decode value roots for a format
fq '.. | select(format=="jpeg")' file
# can also use grep_by
fq 'grep_by(format=="jpeg")' file
# recursively look for first decode value root for a format
fq 'first(.. | select(format=="jpeg"))' file
fq 'first(grep_by(format=="jpeg"))' file
# recursively look for objects fullfilling condition
fq '.. | select(.type=="trak")?' file
fq 'grep_by(.type=="trak")' file
# grep whole tree
fq 'grep("^prefix")' file
fq 'grep(123)' file
fq 'grep_by(. >= 100 and . =< 100)' file
# decode file as mp4 and return a result even if there are some errors
fq -d mp4 file.mp4
# decode file as mp4 and also ignore validity assertions
fq -o force=true -d mp4 file.mp4
I'll dig around SO for some other questions regarding how to open various binary files to show a practical example of how to use fq
to explore unknown formats and then write a format handler for it. Drop a note in the comments if you have such a use case!
hq
This is a quick one as I actually use a different utility for the follwing use-case, but I felt compelled to stick with the “2-characters” shtick (so stay tuned for a better alternative), and that it's a highly focused utility.
hq is an “HTML processor inspired by jq.” The CLI help really explains it all (see below), but, TL;DR: given an HTML data source and CSS selector, return either raw data, innerHTML
text, or node attribute contents:
hq (html query) - commandline HTML processor © Robin Broda, 2018
Usage: build/hq [options] <selector> <mode> [mode argument]
Options:
-h, --help
show this text
-f, --file <file>
file to read (defaults to stdin)
-d, --delimiter <delim>
delimiter character to use between results (defaults to newline)
-0, --null
uses \0 as delimiter
<selector>
CSS selector to match against
<mode>
processing mode
may be one of { data, text, attr }:
data - return raw html of matching elements
text - return inner text of matching elements
[mode argument: formatting]
supported modes: { plain, ansi, md }
default: plain
for plain, ANSI, or markdown formatted output respectively
attr - return attribute value of matching elements
<mode argument: attr>
attribute to return
Examples:
curl -sSL https://example.com | build/hq a data
curl -sSL https://example.com | build/hq a attr href
You can do far more inside any given programming language, but this is handy at the command line.
oq
oq [GH] is “a performant, portable jq
wrapper that facilitates the consumption and output of formats other than JSON; using jq
filters to transform the data.”
It supports JSON, YAML (ugh), and XML and is written in Crystal (which means I have to cover that language, soon).
This short example should be enough to grok how it works:
$ echo '{"name": "Jim"}' | oq -o xml .
<?xml version="1.0" encoding="UTF-8"?>
<root>
<name>Jim</name>
</root>
It's an especially nice tool if you have some icky XML/YAML and want to convert it to proper JSON.
FIN
🎄D e c e m b e r 1 s t !! (which means I get to see #2.1 IRL in a scan 2.25 weeks!) ☮