Postgres Unbound; Accessible APIs; Connected WASM
Today’s Bonus Drop has three sections that all focus on either retrieving data remotely or providing data to remote calls with a clever addition to the WASM ecosystem.
Topics Covered
Postgres Unbound
A recent article — Remote Access Anything from Postgres — from the folks at Crunchy Data (they do managed/supported Postgres) reminded me that I have not sung the praises of a part of the GIS world either at all, or at least in a while.
Fret not! This isn’t a “map” post!
I’m covering this to show that the rest of us have the GIS folks to thank for so many things when it comes to data access/wrangling.
I would rather not steal too much thunder from the aforelinked article, so I’ll just round out some corners, starting with “foreign data wrappers”.
If you’re a die-hard Postgres user, you likely know about “foreign data wrappers” (FDWs). They’re a type of PostgreSQL extension that gives us the superpower of being able to access and manipulate data from external sources using good, ol’ SQL statements, such as SELECT
and UPDATE
. They were introduced as part of the SQL/MED (SQL/MED – PostgreSQL wiki) specification in the SQL standard. PostgreSQL 9.1 initially supported read-only access to this standard, and write support was added in PostgreSQL 9.3.
FDWs enable PostgreSQL to connect to various remote data stores, including other SQL databases, NoSQL databases, flat files, and even platforms like (ugh) Twitter/X and Facebook. They can simplify data querying and analysis when we need data from disparate sources. Some FDWs also benefit from a pushdown mechanism, which allows running SQL to update, filter, or sort data in the data source.
To set up a Foreign Data Wrapper in PostgreSQL, you need to follow these steps:
Install the required FDW extension
Create a “server” that defines the connection to the external data source using the
CREATE SERVER
command.(If necessary) Set up user mapping to provide the necessary credentials for connecting to the external data source.
Import the foreign schema or create foreign tables that act as proxies for the external data source.
Now, there are already dedicated FDWs for many data sources. But, we can take advantage of the hard work done by the Geospatial Data Abstraction Library (GDAL/OGR) folks to use it to gain access to a whole host of formats that may or may not be strictly associated with GIS ops.
If you aren’t a GIS-person, you may not have any of these tools/libraries installed. Covering that is a bit beyond the objectives of this section, so I’ll leave you in these good hands if you need to do that step.
Newcomers to this space will not believe the wide-ranging data format/access support provided by this ecosystem. It’s bonkers:
$ ogrinfo --formats
Supported Formats:
FITS -raster,vector- (rw+): Flexible Image Transport System
PCIDSK -raster,vector- (rw+v): PCIDSK Database File
netCDF -raster,multidimensional raster,vector- (rw+vs): Network Common Data Format
PDS4 -raster,vector- (rw+vs): NASA Planetary Data System 4
VICAR -raster,vector- (rw+v): MIPL VICAR file
JP2OpenJPEG -raster,vector- (rwv): JPEG-2000 driver based on OpenJPEG library
PDF -raster,vector- (rw+vs): Geospatial PDF
MBTiles -raster,vector- (rw+v): MBTiles
BAG -raster,multidimensional raster,vector- (rw+v): Bathymetry Attributed Grid
EEDA -vector- (ro): Earth Engine Data API
OGCAPI -raster,vector- (rov): OGCAPI
ESRI Shapefile -vector- (rw+v): ESRI Shapefile
MapInfo File -vector- (rw+v): MapInfo File
UK .NTF -vector- (rov): UK .NTF
LVBAG -vector- (rov): Kadaster LV BAG Extract 2.0
OGR_SDTS -vector- (rov): SDTS
S57 -vector- (rw+v): IHO S-57 (ENC)
DGN -vector- (rw+v): Microstation DGN
OGR_VRT -vector- (rov): VRT - Virtual Datasource
REC -vector- (ro): EPIInfo .REC
Memory -vector- (rw+): Memory
CSV -vector- (rw+v): Comma Separated Value (.csv)
NAS -vector- (rov): NAS - ALKIS
GML -vector- (rw+v): Geography Markup Language (GML)
GPX -vector- (rw+v): GPX
LIBKML -vector- (rw+v): Keyhole Markup Language (LIBKML)
KML -vector- (rw+v): Keyhole Markup Language (KML)
GeoJSON -vector- (rw+v): GeoJSON
GeoJSONSeq -vector- (rw+v): GeoJSON Sequence
ESRIJSON -vector- (rov): ESRIJSON
TopoJSON -vector- (rov): TopoJSON
Interlis 1 -vector- (rw+v): Interlis 1
Interlis 2 -vector- (rw+v): Interlis 2
OGR_GMT -vector- (rw+v): GMT ASCII Vectors (.gmt)
GPKG -raster,vector- (rw+vs): GeoPackage
SQLite -vector- (rw+v): SQLite / Spatialite
ODBC -vector- (ro):
WAsP -vector- (rw+v): WAsP .map format
PGeo -vector- (ro): ESRI Personal GeoDatabase
MSSQLSpatial -vector- (rw+): Microsoft SQL Server Spatial Database
OGR_OGDI -vector- (ro): OGDI Vectors (VPF, VMAP, DCW)
PostgreSQL -vector- (rw+): PostgreSQL/PostGIS
MySQL -vector- (rw+): MySQL
OpenFileGDB -vector- (rov): ESRI FileGDB
DXF -vector- (rw+v): AutoCAD DXF
CAD -raster,vector- (rovs): AutoCAD Driver
FlatGeobuf -vector- (rw+v): FlatGeobuf
Geoconcept -vector- (rw+v): Geoconcept
GeoRSS -vector- (rw+v): GeoRSS
GPSTrackMaker -vector- (rw+v): GPSTrackMaker
VFK -vector- (ro): Czech Cadastral Exchange Data Format
PGDUMP -vector- (w+v): PostgreSQL SQL dump
OSM -vector- (rov): OpenStreetMap XML and PBF
GPSBabel -vector- (rw+): GPSBabel
OGR_PDS -vector- (rov): Planetary Data Systems TABLE
WFS -vector- (rov): OGC WFS (Web Feature Service)
OAPIF -vector- (ro): OGC API - Features
SOSI -vector- (ro): Norwegian SOSI Standard
Geomedia -vector- (ro): Geomedia .mdb
EDIGEO -vector- (rov): French EDIGEO exchange format
SVG -vector- (rov): Scalable Vector Graphics
CouchDB -vector- (rw+): CouchDB / GeoCouch
Cloudant -vector- (rw+): Cloudant / CouchDB
Idrisi -vector- (rov): Idrisi Vector (.vct)
ARCGEN -vector- (rov): Arc/Info Generate
XLS -vector- (ro): MS Excel format
ODS -vector- (rw+v): Open Document/ LibreOffice / OpenOffice Spreadsheet
XLSX -vector- (rw+v): MS Office Open XML spreadsheet
Elasticsearch -vector- (rw+): Elastic Search
Walk -vector- (ro):
Carto -vector- (rw+): Carto
AmigoCloud -vector- (rw+): AmigoCloud
SXF -vector- (rov): Storage and eXchange Format
Selafin -vector- (rw+v): Selafin
JML -vector- (rw+v): OpenJUMP JML
PLSCENES -raster,vector- (ro): Planet Labs Scenes API
CSW -vector- (ro): OGC CSW (Catalog Service for the Web)
VDV -vector- (rw+v): VDV-451/VDV-452/INTREST Data Format
GMLAS -vector- (rwv): Geography Markup Language (GML) driven by application schemas
MVT -vector- (rw+v): Mapbox Vector Tiles
NGW -raster,vector- (rw+s): NextGIS Web
MapML -vector- (rw+v): MapML
TIGER -vector- (rw+v): U.S. Census TIGER/Line
AVCBin -vector- (rov): Arc/Info Binary Coverage
AVCE00 -vector- (rov): Arc/Info E00 (ASCII) Coverage
HTTP -raster,vector- (ro): HTTP Fetching Wrapper
For “normal” data folks:
CSV
SQLite
MySQL
XLS
XLSX
may be the most immediately useful from that list. Note that your list may be different from mine, depending on what was available in the way you installed these GIS components.
Now, data file format support is nice, and all, but we’re additionally focused on remote access to these data sources. On top of just the formats, this ecosystem also has the concept of a “virtual file system”.
GDAL Virtual File Systems (VFS) are a feature of the GDAL that makes it possible to access various types of files, such as in-memory files, compressed files (.zip, .gz, .tar, .tar.gz archives), encrypted files, and files stored on networks (either publicly accessible or in private buckets of commercial cloud storage services). Each special file system has a prefix, and the general syntax to name a file is /vsiPREFIX/
.
Since we’re talking “remote”, now, we need to focus on the GDAL Virtual File Systems network-based filesystems, of which there are many:
/vsicurl/
: (http/https/ftp files: random access)/vsicurl_streaming/
: (http/https/ftp files: streaming)/vsis3/
: (AWS S3 files)/vsis3_streaming/
: (AWS S3 files: streaming)/vsigs/
: (Google Cloud Storage files)/vsigs_streaming/
: (Google Cloud Storage files: streaming)/vsiaz/
: (Microsoft Azure Blob files)/vsiaz_streaming/
: (Microsoft Azure Blob files: streaming)/vsiadls/
: (Microsoft Azure Data Lake Storage Gen2)/vsioss/
: (Alibaba Cloud OSS files)/vsioss_streaming/
: (Alibaba Cloud OSS files: streaming)/vsiswift/
: (OpenStack Swift Object Storage)/vsiswift_streaming/
: (OpenStack Swift Object Storage: streaming)/vsihdfs/
: (Hadoop File System)/vsiwebhdfs/
: (Web Hadoop File System REST API)
While the primary linked article has a few walkthroughs, we’ll cover one here (just for kicks).
So, how do we let Postgres use the powerful data abstraction/access features of GDAL/OGR? Well, we can use the tools provided by the pgsql-ogr-fdw project, which provides a PostgreSQL foreign data wrapper for OGR. This seems to be in pretty much every package manager. But, just in case it’s not in your fav repo, I asked Perplexity to give y’all the deets on how to get this on your boxes (since this section is already too long). Once it’s installed, you’l need to tell Postgres about it (from within a psql
session):
CREATE EXTENSION ogr_fdw;
Now, if you also build from the sources in that repo, you’ll also get a handy CLI tool — ogr_fdw_info
— which simplifies wiring up Postgres to remote data sources.
For example, I have a CSV file that maps CVEs to CWEs that I generate every month or so for use in Observable and other remote places. (CVEs are vulnerability descriptions and CWEs are software weaknesses that make the given CVE components vulnerable). We can generate a foreign, remote wrapper for this by first getting the available “layers” (remember, we’re hacking on to GIS tools which are all about “layers”):
$ ./ogr_fdw_info -s "https://rud.is/data/cve-to-cwe.csv"
Format: CSV
Layers:
cve-to-cwe
Then asking it to define all the bits we need to paste into a Postgres CLI
$ ./ogr_fdw_info -s "https://rud.is/data/cve-to-cwe.csv" -l cve-to-cwe
CREATE SERVER "cwe"
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource '/vsicurl/https://rud.is/data/cve-to-cwe.csv',
format 'CSV');
NOTE that in the above, generated SQL, we need to prepend /vsicurl/
to the https
URL to tell it how to access the file.
CREATE FOREIGN TABLE cve_to_cwe (
fid bigint,
cve_id varchar,
cwe_id varchar
) SERVER "cwe"
OPTIONS (layer 'cve-to-cwe');
The utility did the tedious work of giving us the SERVER
and table/schema for the CSV. Now, we can query it like any other data source:
bob=# select * from cve_to_cwe limit 10;
fid | cve_id | cwe_id
-----+----------------+---------
1 | CVE-2022-20741 | CWE-79
2 | CVE-2022-20780 | CWE-284
3 | CVE-2022-20804 | CWE-754
4 | CVE-2022-36428 | CWE-79
5 | CVE-2022-36853 | CWE-20
6 | CVE-2022-36009 | CWE-863
7 | CVE-2022-36322 | CWE-88
8 | CVE-2022-36865 | CWE-284
9 | CVE-2022-36089 | CWE-294
10 | CVE-2022-36969 | CWE-611
(10 rows)
There’s tons more to dig into, and we may cover some other aspects of this ecosystem in future Drops.
Accessible APIs
Given the infoDoS in the first section, we’ll lighten things up a bit with a list of nice distractions from boring/tedious work. Said list is “just” a directory of APIs you can hit for free and without credentials. It’s pretty well maintained, and there are some gems in it, like:
There are over 200 APIs to dig into during your next, dreadfully boring Microsoft Teams meeting.
Connected WASM
The pace of WebAssembly (WASM) advancements is only slightly slower than the “there’s a new GPT/LLM every four seconds” pace in AI-land. One very promising one is WAGI, which is a way to write HTTP handlers (the “GI” is an homage to “CGI”) in WASM with a minimal amount of work
Mete Atamel, a Software Engineer and a Developer Advocate who works in the Developer Relations team at Google, recently posted a 👍🏽 blog showing how to add HTTP around Wasm with Wagi.
Given that WASM is still pretty niche tech, I’ll leave the folks interested in this topic in Mete’s capable hands.
FIN
It’s super hard to believe July is almost over. I hope folks in both hemispheres are dealing with this year’s climate wackiness as well as possible. Continued 🙏🏽 for your support! ☮
Leave a comment