

Discover more from hrbrmstr's Daily Drop
yq; Seafowl
Programming note: I spent an inordinate number of minutes (that I should not have spared) on an attempt to come up with a cool tag line for this Drop. I eventually decided that writing was more important than an ill-fated attempt at being “cool”. To that end, I'm giving myself the freedom to lean on an idiom I started with the “Need For Speed” versioned tag line series. When there is no single theme topic to a Drop, and also when I get stuck on a pseudo-clever tag line, I'll resort to bumping up the semver on this tag line.
I've been a wee bit distracted since around the middle of last week thanks to webR. I'm not going to bore folks with that here, but if you're curious about it, this is one blog post I made that shows how to work with it in an “app-y” way. There will be references to webR associated bits in this edition, but that is only due to how enthralled I am with the project.
Today, we cover just two wholly different tools. The “just” is because I got carried away experimenting with Seafowl, and the Seafowl section is a tad long.
yq
Unless you're new to the Drop, you know at least two things about me. First: I detest YAML. Second: I like two-letter command line utilities. Let's bring both of those together to help rescue everyone from YAML Hades.
The yq (GH) utility is named after the venerable jq
, and is a Golang-based portable command line YAML, JSON, XML, CSV and properties processor. It supports most of the query syntax / capability of jq
.
While you can filter with it (which is my most common use of jq
), it really shines at being a tool to help edit YAML files. Think, “being able to replace a field / value across a whole directory of YAML files”. The “How It Works” section covers that jq
-ish functionality better than I can (or should) here.
I really like yq
for a few other reasons.
First, I can use it to convert saner formats into YAML. Please tap this link to see one of webR's GH Actions. We can turn that into much nicer JSON:
curl -s https://raw.githubusercontent.com/r-lib/actions/v2-branch/.github/workflows/rtools.yaml | \
yq -o json
{
"on": {
"workflow_dispatch": null
},
"name": "Rtools test",
"jobs": {
"R-CMD-check": {
"runs-on": "windows-latest",
"name": "Windows, R ${{ matrix.config.r }}, Rtools ${{ matrix.config.rtools-version }}",
"strategy": {
"fail-fast": false,
"matrix": {
"config": [
{
"r": "release"
},
{
"r": "release",
"rtools-version": "42"
},
{
"r": "devel"
}
]
}
},
"env": {
"GITHUB_PAT": "${{ secrets.GITHUB_TOKEN }}",
"R_KEEP_PKG_SOURCE": "yes"
},
"steps": [
{
"uses": "actions/checkout@v3"
},
{
"uses": "./setup-pandoc"
},
{
"uses": "./setup-r",
"with": {
"r-version": "${{ matrix.config.r }}",
"rtools-version": "${{ matrix.config.rtools-version }}",
"use-public-rspm": true
}
},
{
"name": "Check what version of rtools is installed",
"run": "ls c:/\n"
},
{
"name": "Install package from source",
"run": "install.packages(\"filelock\", type = \"source\")\n",
"shell": "Rscript {0}"
}
]
}
}
}
I can edit that JSON — which is way easier to not mess up — and turn it back to YAML. You can also verify that the above `yq` incantation is not breaking anything by doing a round trip conversion back to YAML:
curl -s https://raw.githubusercontent.com/r-lib/actions/v2-branch/.github/workflows/rtools.yaml | \
yq -o json | \
yq -o yaml -p json
Speaking of GH Actions, yq
's repo provides one you can use, which can be beneficial if you, say, want to automate the conversion of some XML/JSON/YAML/CSV/TSV into another format, or just ensure you've got valid YAML.
It's also smart enough to recognize files with YAML “frontmatter” — such as Quarto documents — and let you just perform yq
ops on the frontmatter.
Another neat feature is the ability to use yq
to split one file into multiple ones based on a yq
expression.
Finally, there are some handy tips and tricks you should pore over and consider using, especially if you need to deal with YAML in a shell scripting context.
Seafowl

I could have just dedicated a full Drop to Seafowl (GH). But, I really wanted to get this in front of folks today, and the documentation site for it does a fantastic job communicating what it does and how to use it. So, I'll just provide some info on what it is, and talk a bit about its strengths, then do a little demo.
Seafowl is “an analytical database for modern data-driven Web applications”. For those that are not steeped in data industry jargon, an analytic database is a database management system that is optimized for business analytics applications and services. They're different from transactional databases, which are databases that support ACID (atomicity, consistency, isolation, and durability) transactions.
It uses Apache Data Fusion under the hood. Data Fusion is a Rust-based extensible query execution framework that takes advantage of the awesomeness that is Apache Arrow for fast / efficient ops. It does everything from building logical query plans, to optimizing queries, and sports a bonkers fast query execution engine. It works super well on partitioned data.
Rather than force you to learn yet-another-SQL variant, Seafowl supports almost all of PostgreSQL's SQL features.
One of the absolute froodiest features is that you can write user-defined functions (UDF) in any programming language that can spit out WebAssembly. You only need to ensure your library adheres to the Seafowl UDF data contract.
You use Seafowl over an HTTP REST API (which I'll demonstrate momentarily). Since this is an analytical database, you are more likely to be pulling data with it than you are storing data in it. If it's likely alot of folks are going to be making the same query (think “a canned query inside an iOS app”, or “Observable Notebook”), and that the data being queried has some decent lifetime before it radically changes, Seafowl is configured to be super HTTP cache-friendly. This means that ISP, CDN, and browser caches can re-deliver results superfast because the data now lives as close as possible to the thing that queries it.
The magic behind the caching involves “running queries by passing the query hash in the URL and the actual query in the request header (with support for URL-encoding unprintable and non-ASCII characters) or a GET request body”.
Speaking of Observable, there is a snazzy Seafowl demo you can walk through (it also embeds very nicely).
I extracted a list of all the presently available R packages for webR, put them into a Parquet file, and added them to my playground. )You should be able to query it if I set up CORS properly.) Steps to perform something similar are in the Seafowl tutorial. You can play with it on Observable or via your REST API client of choice:
curl --silent \
-H "Content-Type: application/json" \
"https://circus.rudis.net/seafowl/q" \
-d'{"query": "SELECT * FROM demo.wasmrpackages LIMIT 2"}' | \
yq -p json
depends: R (>= 2.9.0)
enhances: png
license: GPL-2 | GPL-3
md5sum: 0f476dacdd11a3e0ad56d13f5bc2f190
needs_compilation: yes
package: base64enc
repository: https://repo.r-wasm.org/src/contrib
version: 0.1-3
---
depends: R (>= 2.9.2)
license: GPL-2 | GPL-3
md5sum: df4e6215d31058ea2860fadc7a7c80f4
needs_compilation: yes
package: bit
repository: https://repo.r-wasm.org/src/contrib
suggests: |-
testthat (>= 0.11.0), roxygen2, knitr, rmarkdown,
microbenchmark, bit64 (>= 4.0.0), ff (>= 4.0.0)
version: 4.0.5
While that is an affront to the eyes, it shows off both Seafowl and yq
.
Seafowl is also container-friendly.
I barely scratched the surface of what Seafowl can do. Have some fun playing with it! Oh, I guess you could use it for work, too.
Please take a mo’ and go through their introduction and demo. The site was designed as well as the actual too.
FIN
U.S. folk: I really hope you — unlike me — remembered yesterday was clock-switch day. ☮
#yq #seafowl #observable #arrow #datafusion #xml #csv #yaml #json #quarto #postgres #postgresql #jq