Bonus Drop #20 (2023-07-30): Remote Possibilities

Postgres Unbound; Accessible APIs; Connected WASM

Today’s Bonus Drop has three sections that all focus on either retrieving data remotely or providing data to remote calls with a clever addition to the WASM ecosystem.

Topics Covered


Postgres Unbound

photo of 5-story library building

A recent article — Remote Access Anything from Postgres — from the folks at Crunchy Data (they do managed/supported Postgres) reminded me that I have not sung the praises of a part of the GIS world either at all, or at least in a while.

Fret not! This isn’t a “map” post!

I’m covering this to show that the rest of us have the GIS folks to thank for so many things when it comes to data access/wrangling.

I would rather not steal too much thunder from the aforelinked article, so I’ll just round out some corners, starting with “foreign data wrappers”.

If you’re a die-hard Postgres user, you likely know about “foreign data wrappers” (FDWs). They’re a type of PostgreSQL extension that gives us the superpower of being able to access and manipulate data from external sources using good, ol’ SQL statements, such as SELECT and UPDATE. They were introduced as part of the SQL/MED (SQL/MED – PostgreSQL wiki) specification in the SQL standard. PostgreSQL 9.1 initially supported read-only access to this standard, and write support was added in PostgreSQL 9.3.

FDWs enable PostgreSQL to connect to various remote data stores, including other SQL databases, NoSQL databases, flat files, and even platforms like (ugh) Twitter/X and Facebook. They can simplify data querying and analysis when we need data from disparate sources. Some FDWs also benefit from a pushdown mechanism, which allows running SQL to update, filter, or sort data in the data source.

To set up a Foreign Data Wrapper in PostgreSQL, you need to follow these steps:

  • Install the required FDW extension

  • Create a “server” that defines the connection to the external data source using the CREATE SERVER command.

  • (If necessary) Set up user mapping to provide the necessary credentials for connecting to the external data source.

  • Import the foreign schema or create foreign tables that act as proxies for the external data source.

Now, there are already dedicated FDWs for many data sources. But, we can take advantage of the hard work done by the Geospatial Data Abstraction Library (GDAL/OGR) folks to use it to gain access to a whole host of formats that may or may not be strictly associated with GIS ops.

If you aren’t a GIS-person, you may not have any of these tools/libraries installed. Covering that is a bit beyond the objectives of this section, so I’ll leave you in these good hands if you need to do that step.

Newcomers to this space will not believe the wide-ranging data format/access support provided by this ecosystem. It’s bonkers:

$ ogrinfo --formats
Supported Formats:
  FITS -raster,vector- (rw+): Flexible Image Transport System
  PCIDSK -raster,vector- (rw+v): PCIDSK Database File
  netCDF -raster,multidimensional raster,vector- (rw+vs): Network Common Data Format
  PDS4 -raster,vector- (rw+vs): NASA Planetary Data System 4
  VICAR -raster,vector- (rw+v): MIPL VICAR file
  JP2OpenJPEG -raster,vector- (rwv): JPEG-2000 driver based on OpenJPEG library
  PDF -raster,vector- (rw+vs): Geospatial PDF
  MBTiles -raster,vector- (rw+v): MBTiles
  BAG -raster,multidimensional raster,vector- (rw+v): Bathymetry Attributed Grid
  EEDA -vector- (ro): Earth Engine Data API
  OGCAPI -raster,vector- (rov): OGCAPI
  ESRI Shapefile -vector- (rw+v): ESRI Shapefile
  MapInfo File -vector- (rw+v): MapInfo File
  UK .NTF -vector- (rov): UK .NTF
  LVBAG -vector- (rov): Kadaster LV BAG Extract 2.0
  OGR_SDTS -vector- (rov): SDTS
  S57 -vector- (rw+v): IHO S-57 (ENC)
  DGN -vector- (rw+v): Microstation DGN
  OGR_VRT -vector- (rov): VRT - Virtual Datasource
  REC -vector- (ro): EPIInfo .REC
  Memory -vector- (rw+): Memory
  CSV -vector- (rw+v): Comma Separated Value (.csv)
  NAS -vector- (rov): NAS - ALKIS
  GML -vector- (rw+v): Geography Markup Language (GML)
  GPX -vector- (rw+v): GPX
  LIBKML -vector- (rw+v): Keyhole Markup Language (LIBKML)
  KML -vector- (rw+v): Keyhole Markup Language (KML)
  GeoJSON -vector- (rw+v): GeoJSON
  GeoJSONSeq -vector- (rw+v): GeoJSON Sequence
  ESRIJSON -vector- (rov): ESRIJSON
  TopoJSON -vector- (rov): TopoJSON
  Interlis 1 -vector- (rw+v): Interlis 1
  Interlis 2 -vector- (rw+v): Interlis 2
  OGR_GMT -vector- (rw+v): GMT ASCII Vectors (.gmt)
  GPKG -raster,vector- (rw+vs): GeoPackage
  SQLite -vector- (rw+v): SQLite / Spatialite
  ODBC -vector- (ro):
  WAsP -vector- (rw+v): WAsP .map format
  PGeo -vector- (ro): ESRI Personal GeoDatabase
  MSSQLSpatial -vector- (rw+): Microsoft SQL Server Spatial Database
  OGR_OGDI -vector- (ro): OGDI Vectors (VPF, VMAP, DCW)
  PostgreSQL -vector- (rw+): PostgreSQL/PostGIS
  MySQL -vector- (rw+): MySQL
  OpenFileGDB -vector- (rov): ESRI FileGDB
  DXF -vector- (rw+v): AutoCAD DXF
  CAD -raster,vector- (rovs): AutoCAD Driver
  FlatGeobuf -vector- (rw+v): FlatGeobuf
  Geoconcept -vector- (rw+v): Geoconcept
  GeoRSS -vector- (rw+v): GeoRSS
  GPSTrackMaker -vector- (rw+v): GPSTrackMaker
  VFK -vector- (ro): Czech Cadastral Exchange Data Format
  PGDUMP -vector- (w+v): PostgreSQL SQL dump
  OSM -vector- (rov): OpenStreetMap XML and PBF
  GPSBabel -vector- (rw+): GPSBabel
  OGR_PDS -vector- (rov): Planetary Data Systems TABLE
  WFS -vector- (rov): OGC WFS (Web Feature Service)
  OAPIF -vector- (ro): OGC API - Features
  SOSI -vector- (ro): Norwegian SOSI Standard
  Geomedia -vector- (ro): Geomedia .mdb
  EDIGEO -vector- (rov): French EDIGEO exchange format
  SVG -vector- (rov): Scalable Vector Graphics
  CouchDB -vector- (rw+): CouchDB / GeoCouch
  Cloudant -vector- (rw+): Cloudant / CouchDB
  Idrisi -vector- (rov): Idrisi Vector (.vct)
  ARCGEN -vector- (rov): Arc/Info Generate
  XLS -vector- (ro): MS Excel format
  ODS -vector- (rw+v): Open Document/ LibreOffice / OpenOffice Spreadsheet
  XLSX -vector- (rw+v): MS Office Open XML spreadsheet
  Elasticsearch -vector- (rw+): Elastic Search
  Walk -vector- (ro):
  Carto -vector- (rw+): Carto
  AmigoCloud -vector- (rw+): AmigoCloud
  SXF -vector- (rov): Storage and eXchange Format
  Selafin -vector- (rw+v): Selafin
  JML -vector- (rw+v): OpenJUMP JML
  PLSCENES -raster,vector- (ro): Planet Labs Scenes API
  CSW -vector- (ro): OGC CSW (Catalog  Service for the Web)
  VDV -vector- (rw+v): VDV-451/VDV-452/INTREST Data Format
  GMLAS -vector- (rwv): Geography Markup Language (GML) driven by application schemas
  MVT -vector- (rw+v): Mapbox Vector Tiles
  NGW -raster,vector- (rw+s): NextGIS Web
  MapML -vector- (rw+v): MapML
  TIGER -vector- (rw+v): U.S. Census TIGER/Line
  AVCBin -vector- (rov): Arc/Info Binary Coverage
  AVCE00 -vector- (rov): Arc/Info E00 (ASCII) Coverage
  HTTP -raster,vector- (ro): HTTP Fetching Wrapper

For “normal” data folks:

  • CSV

  • SQLite

  • MySQL

  • XLS

  • XLSX

may be the most immediately useful from that list. Note that your list may be different from mine, depending on what was available in the way you installed these GIS components.

Now, data file format support is nice, and all, but we’re additionally focused on remote access to these data sources. On top of just the formats, this ecosystem also has the concept of a “virtual file system”.

GDAL Virtual File Systems (VFS) are a feature of the GDAL that makes it possible to access various types of files, such as in-memory files, compressed files (.zip, .gz, .tar, .tar.gz archives), encrypted files, and files stored on networks (either publicly accessible or in private buckets of commercial cloud storage services). Each special file system has a prefix, and the general syntax to name a file is /vsiPREFIX/.

Since we’re talking “remote”, now, we need to focus on the GDAL Virtual File Systems network-based filesystems, of which there are many:

  • /vsicurl/: (http/https/ftp files: random access)

  • /vsicurl_streaming/: (http/https/ftp files: streaming)

  • /vsis3/: (AWS S3 files)

  • /vsis3_streaming/: (AWS S3 files: streaming)

  • /vsigs/: (Google Cloud Storage files)

  • /vsigs_streaming/: (Google Cloud Storage files: streaming)

  • /vsiaz/: (Microsoft Azure Blob files)

  • /vsiaz_streaming/: (Microsoft Azure Blob files: streaming)

  • /vsiadls/: (Microsoft Azure Data Lake Storage Gen2)

  • /vsioss/: (Alibaba Cloud OSS files)

  • /vsioss_streaming/: (Alibaba Cloud OSS files: streaming)

  • /vsiswift/: (OpenStack Swift Object Storage)

  • /vsiswift_streaming/: (OpenStack Swift Object Storage: streaming)

  • /vsihdfs/: (Hadoop File System)

  • /vsiwebhdfs/: (Web Hadoop File System REST API)

While the primary linked article has a few walkthroughs, we’ll cover one here (just for kicks).

So, how do we let Postgres use the powerful data abstraction/access features of GDAL/OGR? Well, we can use the tools provided by the pgsql-ogr-fdw project, which provides a PostgreSQL foreign data wrapper for OGR. This seems to be in pretty much every package manager. But, just in case it’s not in your fav repo, I asked Perplexity to give y’all the deets on how to get this on your boxes (since this section is already too long). Once it’s installed, you’l need to tell Postgres about it (from within a psql session):

CREATE EXTENSION ogr_fdw;

Now, if you also build from the sources in that repo, you’ll also get a handy CLI tool — ogr_fdw_info — which simplifies wiring up Postgres to remote data sources.

For example, I have a CSV file that maps CVEs to CWEs that I generate every month or so for use in Observable and other remote places. (CVEs are vulnerability descriptions and CWEs are software weaknesses that make the given CVE components vulnerable). We can generate a foreign, remote wrapper for this by first getting the available “layers” (remember, we’re hacking on to GIS tools which are all about “layers”):

$ ./ogr_fdw_info -s "https://rud.is/data/cve-to-cwe.csv"
Format: CSV

Layers:
  cve-to-cwe

Then asking it to define all the bits we need to paste into a Postgres CLI

$ ./ogr_fdw_info -s "https://rud.is/data/cve-to-cwe.csv" -l cve-to-cwe
CREATE SERVER "cwe"
  FOREIGN DATA WRAPPER ogr_fdw
  OPTIONS (
    datasource '/vsicurl/https://rud.is/data/cve-to-cwe.csv',
    format 'CSV');

NOTE that in the above, generated SQL, we need to prepend /vsicurl/ to the https URL to tell it how to access the file.

CREATE FOREIGN TABLE cve_to_cwe (
  fid bigint,
  cve_id varchar,
  cwe_id varchar
) SERVER "cwe"
OPTIONS (layer 'cve-to-cwe');

The utility did the tedious work of giving us the SERVER and table/schema for the CSV. Now, we can query it like any other data source:

bob=# select * from cve_to_cwe limit 10;
 fid |     cve_id     | cwe_id
-----+----------------+---------
   1 | CVE-2022-20741 | CWE-79
   2 | CVE-2022-20780 | CWE-284
   3 | CVE-2022-20804 | CWE-754
   4 | CVE-2022-36428 | CWE-79
   5 | CVE-2022-36853 | CWE-20
   6 | CVE-2022-36009 | CWE-863
   7 | CVE-2022-36322 | CWE-88
   8 | CVE-2022-36865 | CWE-284
   9 | CVE-2022-36089 | CWE-294
  10 | CVE-2022-36969 | CWE-611
(10 rows)

There’s tons more to dig into, and we may cover some other aspects of this ecosystem in future Drops.

Accessible APIs

black and white cat lying on brown bamboo chair inside room

Given the infoDoS in the first section, we’ll lighten things up a bit with a list of nice distractions from boring/tedious work. Said list is “just” a directory of APIs you can hit for free and without credentials. It’s pretty well maintained, and there are some gems in it, like:

There are over 200 APIs to dig into during your next, dreadfully boring Microsoft Teams meeting.

Connected WASM

empty walkway during daytime

The pace of WebAssembly (WASM) advancements is only slightly slower than the “there’s a new GPT/LLM every four seconds” pace in AI-land. One very promising one is WAGI, which is a way to write HTTP handlers (the “GI” is an homage to “CGI”) in WASM with a minimal amount of work

Mete Atamel, a Software Engineer and a Developer Advocate who works in the Developer Relations team at Google, recently posted a 👍🏽 blog showing how to add HTTP around Wasm with Wagi.

Given that WASM is still pretty niche tech, I’ll leave the folks interested in this topic in Mete’s capable hands.

FIN

It’s super hard to believe July is almost over. I hope folks in both hemispheres are dealing with this year’s climate wackiness as well as possible. Continued 🙏🏽 for your support! ☮

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.