

Discover more from hrbrmstr's Daily Drop
Happy Weekend Project Edition recovery day for all who celebrate!
In the wide world of web development, working with the Document Object Model (DOM) is a central aspect of creating dynamic and interactive web applications and data visualizations. Those who are able to bend any given DOM to their will have a mighty, digital Green Lantern-esque superpower, indeed.
The presence of a DOM usually means that a web browser is nearby, but — thanks to some clever folks — we shall see that is not necessarily the case.
So, today we look at a way to work with the DOM without a traditional browser context being handy, show a practical example of the possibilities of said work, and close with a DOM-adjacent command line utility.
JSDOM
While full-on headless browsers, such as headless Chrome, are a dime a dozen these days, using them requires actually having one of those pesky, resource-hogging, insecure entities around. This is serious overkill and a needless, giant dependency for a bonkers number of tasks one might want to perform. There's got to be a better way, and…there is!
JSDOM is a pure JavaScript implementation of the venerable Document Object Model, which means it can be used in a Node.js environment without the need for a browser. This opens up a world of possibilities for web developers (et al.), as it enables server-side rendering of anything that can be represented in a DOM; this includes full web pages or — as we'll see in the next section — fancy data visualizations. Additionally, JSDOM enables developers to write and run tests for their web creations in a simulated and safe-r browser environment, making it easier to ensure the reliability and stability of their code. Ultimately, JSDOM aims for as much WHATWG standards compatibility as is technically possible.
One of the key benefits of JSDOM is its compatibility with various front-end libraries and frameworks, such as the three I work hard to never use: React, Vue, and Angular. This means that developers leaning into those gargantuan frameworks can leverage JSDOM to render components and manipulate the DOM in a way that is consistent with their chosen albatross, streamlining the development process and reducing the learning curve.
Other benefits include:
DOM simulation: jsdom provides a complete DOM implementation, allowing you to create, query, and manipulate HTML elements and their attributes as you would in a browser. This includes support for the DOM tree structure, events, and traversal methods.
browser-like APIs: jsdom provides implementations of various browser APIs, such as
window
,document
, andnavigator
, making it easier to work with web pages in a Node.js environment.CSS support: jsdom can parse and apply CSS styles to elements, allowing you to inspect computed styles and perform layout calculations.
javascript execution: jsdom can execute JavaScript code embedded in web pages, allowing you to interact with scripts and dynamic content.
fetch
andXMLHttpRequest
support: jsdom supports fetching resources, such as images or stylesheets, using the Fetch API or XMLHttpRequest. This can be useful when working with web pages that load resources dynamically.a fully customizable environment: jsdom allows you to configure various aspects of the simulated browser environment, such as the user agent string or the referrer policy.
A small example might help grok the power and utility of JSDOM.
To start using it, you'll first need to install it as a dependency in a Node.js project. You can do this by running the following command:
npm install jsdom
in a bare-bones project directory.
Once installed, you can import JSDOM into your project and begin using it to create and manipulate DOM elements. Here's a straightforward example of how to create a new DOM instance and add an element to it:
import { JSDOM } from "jsdom"
const dom = new JSDOM('<!DOCTYPE html><html><body></body></html>');
// this will make it feel like you're in a real browser environment
// but I tend to be un-DRY and use the fully dotted notation to
// remind me that I am _not_ in a real browser environment so I
// remember there are limits to what can be done and edge-case security
// concerns that I should not forget exist. Feel encouraged to
// not be as daft as me and use this handy shortcut method.
const document = dom.window.document;
const heading = document.createElement('h1');
heading.textContent = 'Hello, Daily Drop readers!';
document.body.appendChild(heading);
console.log(dom.serialize());
In this example, we first import JSDOM and create a new instance with a basic HTML structure. We then create an <h1>
element, set its text content, and append it to the body of the document. Finally, we log the serialized HTML to the console, which will display the updated DOM structure.
The capabilities of JSDOM are far too wide-ranging to cover here, so apart from the main repo, you should check out:
Testim's guide on how to get started with JSDOM;
Twilio's guide to performing web scraping tasks with JSDOM; and,
This older-ish post which covers many features of JSDOM, though much has changed in ~5 years.
If you need more resources, def hit me up.
We'll see a full practical example of JSDOM in the next section.
JSDOM + OJS Plot
If you're sick of working with Observable Plot after the most recent Weekend Project Edition, then that's just too bad, since OJS Plot and JSDOM were a match made in cyber-heaven.
While y'all were cranking away on re-creating {ggplot2} examples in OJS Plot, one of my weekend hacking projects was showing how to put together a repeatable CLI workflow for generating JS-free static OJS Plots with JSDOM1.
My uncreatively named ojs-plot-jsdom GH repo contains a small Node.js CLI app that shows how to create said static plots, complete with self-contained title + subtitle, and dark/light-mode support. NOTE: if you use git’s time-travel capabilities, you can see the original version, which has a much less-ambitious Plot example if you want to keep it simple.
The project functionality and layout is pretty self-explanatory, and gives you the ability to create lightweight SVGs to power a fast-loading static dashboard, or generate static assets for a larger application. Drop an issue, comment, or socmed post if any of it needs any further clarity.
Given said assertion, I thought I'd close out this section with a peek behind the curtain for what comes along for the ride with the two primary dependencies (@observablehq/plot
, and jssdom
) of this CLI app:
├─┬ @observablehq/plot@0.6.6
│ ├─┬ d3@7.8.4
│ │ ├─┬ d3-array@3.2.3
│ │ │ └── internmap@2.0.3
│ │ ├── d3-axis@3.0.0
│ │ ├─┬ d3-brush@3.0.0
│ │ │ ├── d3-dispatch@3.0.1 deduped
│ │ │ ├── d3-drag@3.0.0 deduped
│ │ │ ├── d3-interpolate@3.0.1 deduped
│ │ │ ├── d3-selection@3.0.0 deduped
│ │ │ └── d3-transition@3.0.1 deduped
│ │ ├─┬ d3-chord@3.0.1
│ │ │ └── d3-path@3.1.0 deduped
│ │ ├── d3-color@3.1.0
│ │ ├─┬ d3-contour@4.0.2
│ │ │ └── d3-array@3.2.3 deduped
│ │ ├─┬ d3-delaunay@6.0.4
│ │ │ └─┬ delaunator@5.0.0
│ │ │ └── robust-predicates@3.0.1
│ │ ├── d3-dispatch@3.0.1
│ │ ├─┬ d3-drag@3.0.0
│ │ │ ├── d3-dispatch@3.0.1 deduped
│ │ │ └── d3-selection@3.0.0 deduped
│ │ ├─┬ d3-dsv@3.0.1
│ │ │ ├── commander@7.2.0
│ │ │ ├── iconv-lite@0.6.3 deduped
│ │ │ └── rw@1.3.3
│ │ ├── d3-ease@3.0.1
│ │ ├─┬ d3-fetch@3.0.1
│ │ │ └── d3-dsv@3.0.1 deduped
│ │ ├─┬ d3-force@3.0.0
│ │ │ ├── d3-dispatch@3.0.1 deduped
│ │ │ ├── d3-quadtree@3.0.1 deduped
│ │ │ └── d3-timer@3.0.1 deduped
│ │ ├── d3-format@3.1.0
│ │ ├─┬ d3-geo@3.1.0
│ │ │ └── d3-array@3.2.3 deduped
│ │ ├── d3-hierarchy@3.1.2
│ │ ├─┬ d3-interpolate@3.0.1
│ │ │ └── d3-color@3.1.0 deduped
│ │ ├── d3-path@3.1.0
│ │ ├── d3-polygon@3.0.1
│ │ ├── d3-quadtree@3.0.1
│ │ ├── d3-random@3.0.1
│ │ ├─┬ d3-scale-chromatic@3.0.0
│ │ │ ├── d3-color@3.1.0 deduped
│ │ │ └── d3-interpolate@3.0.1 deduped
│ │ ├─┬ d3-scale@4.0.2
│ │ │ ├── d3-array@3.2.3 deduped
│ │ │ ├── d3-format@3.1.0 deduped
│ │ │ ├── d3-interpolate@3.0.1 deduped
│ │ │ ├── d3-time-format@4.1.0 deduped
│ │ │ └── d3-time@3.1.0 deduped
│ │ ├── d3-selection@3.0.0
│ │ ├─┬ d3-shape@3.2.0
│ │ │ └── d3-path@3.1.0 deduped
│ │ ├─┬ d3-time-format@4.1.0
│ │ │ └── d3-time@3.1.0 deduped
│ │ ├─┬ d3-time@3.1.0
│ │ │ └── d3-array@3.2.3 deduped
│ │ ├── d3-timer@3.0.1
│ │ ├─┬ d3-transition@3.0.1
│ │ │ ├── d3-color@3.1.0 deduped
│ │ │ ├── d3-dispatch@3.0.1 deduped
│ │ │ ├── d3-ease@3.0.1 deduped
│ │ │ ├── d3-interpolate@3.0.1 deduped
│ │ │ ├── d3-selection@3.0.0 deduped
│ │ │ └── d3-timer@3.0.1 deduped
│ │ └─┬ d3-zoom@3.0.0
│ │ ├── d3-dispatch@3.0.1 deduped
│ │ ├── d3-drag@3.0.0 deduped
│ │ ├── d3-interpolate@3.0.1 deduped
│ │ ├── d3-selection@3.0.0 deduped
│ │ └── d3-transition@3.0.1 deduped
│ ├─┬ interval-tree-1d@1.0.4
│ │ └── binary-search-bounds@2.0.5
│ └── isoformat@0.2.1
└─┬ jsdom@21.1.1
├── abab@2.0.6
├─┬ acorn-globals@7.0.1
│ ├── acorn-walk@8.2.0
│ └── acorn@8.8.2 deduped
├── acorn@8.8.2
├── UNMET OPTIONAL DEPENDENCY canvas@^2.5.0
├─┬ cssstyle@3.0.0
│ └── rrweb-cssom@0.6.0 deduped
├─┬ data-urls@4.0.0
│ ├── abab@2.0.6 deduped
│ ├── whatwg-mimetype@3.0.0 deduped
│ └── whatwg-url@12.0.1 deduped
├── decimal.js@10.4.3
├─┬ domexception@4.0.0
│ └── webidl-conversions@7.0.0 deduped
├─┬ escodegen@2.0.0
│ ├── esprima@4.0.1
│ ├── estraverse@5.3.0
│ ├── esutils@2.0.3
│ ├─┬ optionator@0.8.3
│ │ ├── deep-is@0.1.4
│ │ ├── fast-levenshtein@2.0.6
│ │ ├─┬ levn@0.3.0
│ │ │ ├── prelude-ls@1.1.2 deduped
│ │ │ └── type-check@0.3.2 deduped
│ │ ├── prelude-ls@1.1.2
│ │ ├─┬ type-check@0.3.2
│ │ │ └── prelude-ls@1.1.2 deduped
│ │ └── word-wrap@1.2.3
│ └── source-map@0.6.1
├─┬ form-data@4.0.0
│ ├── asynckit@0.4.0
│ ├─┬ combined-stream@1.0.8
│ │ └── delayed-stream@1.0.0
│ └─┬ mime-types@2.1.35
│ └── mime-db@1.52.0
├─┬ html-encoding-sniffer@3.0.0
│ └── whatwg-encoding@2.0.0 deduped
├─┬ http-proxy-agent@5.0.0
│ ├── @tootallnate/once@2.0.0
│ ├─┬ agent-base@6.0.2
│ │ └── debug@4.3.4 deduped
│ └─┬ debug@4.3.4
│ └── ms@2.1.2
├─┬ https-proxy-agent@5.0.1
│ ├── agent-base@6.0.2 deduped
│ └── debug@4.3.4 deduped
├── is-potential-custom-element-name@1.0.1
├── nwsapi@2.2.4
├─┬ parse5@7.1.2
│ └── entities@4.5.0
├── rrweb-cssom@0.6.0
├─┬ saxes@6.0.0
│ └── xmlchars@2.2.0
├── symbol-tree@3.2.4
├─┬ tough-cookie@4.1.2
│ ├── psl@1.9.0
│ ├── punycode@2.3.0
│ ├── universalify@0.2.0
│ └─┬ url-parse@1.5.10
│ ├── querystringify@2.2.0
│ └── requires-port@1.0.0
├─┬ w3c-xmlserializer@4.0.0
│ └── xml-name-validator@4.0.0 deduped
├── webidl-conversions@7.0.0
├─┬ whatwg-encoding@2.0.0
│ └─┬ iconv-lite@0.6.3
│ └── safer-buffer@2.1.2
├── whatwg-mimetype@3.0.0
├─┬ whatwg-url@12.0.1
│ ├─┬ tr46@4.1.1
│ │ └── punycode@2.3.0 deduped
│ └── webidl-conversions@7.0.0 deduped
├─┬ ws@8.13.0
│ ├── UNMET OPTIONAL DEPENDENCY bufferutil@^4.0.1
│ └── UNMET OPTIONAL DEPENDENCY utf-8-validate@>=5.0.2
└── xml-name-validator@4.0.0
We all get by with a little help from our friends, including these two powerful JS libraries.
webpalm

Webpalm is a command-line tool that generates a tree of all the webpages and their links on a website. It can also dump data from the body of the pages using regular expressions and store the result in a file. It is quite= useful for getting a quick overview of a website structure and checking for [sensitive] data using whatever regular expressions you can think of. Furthermore, it is especially good at “spidering” website networks and going into all the linked nooks and crannies to pull from any resource it can find.
Since it is a modern CLI application, it has fancy, colorful human output, along with practical JSON, XML, or plain text output. You have full control over how to deal with various HTTP status codes, how deep the “palming” operations go, and can watch it work in real time.
Webpalm was designed to be an OSINT (Open-Source Intelligence) tool, but anyone can use it for a host of other practical activities.
Here are some examples of how to use webpalm brazenly stolen from the GH repo:
webpalm -u https://google.com -l1 --live
: this command will get the palm tree of a website and show live output mode.webpalm -u https://google.com -l1 -x 404,500
: this command will get the palm tree of a website and exclude some status codes.webpalm -u https://google.com -l1 --regexes comments="\<\!--.*?-->" -o result.json
: this command will dump the comments of each page in the body of the page and export it to a JSON file.
Webpalm also provides several baked-in regex patterns that users can use, such as emails, comments, tokens, and passwords (b/c “OSINT”).
If you do any web scraping at all, this should be in your toolbox.
FIN
I cannot believe it's taken me this long to cover JSDOM.
Oh, and “Happy 1/3 of Q2 2023 is gone, gone, gone”! ☮
#jsdom #plot #whatwg #webpalm
How is that even a legit English sentence?