Warp; Academic Link Usage & Rot; Scrape All The Things
If the 2022-04-13.01 post did not convey my 💙 for both the command line and terminal emulators, then today's absolutely will.
I recently discovered Warp (GH) and have made it my default terminal to put it through the paces of daily work. This new entrant on the terminal scene bills itself as "a blazingly-fast modern Rust based GPU-accelerated terminal built to make you and your team more productive." (
#protip to aspiring terminal builders: tagging your creation with 'Rust' and 'GPU-accelerated' is a guaranteed way to get me to try a new app.)
How is Warp different from the plethora of alternatives? So far, the most significant difference involves the framing of what a terminal interface should be. I grew up in CRT green screen 80x24 land and have always appreciated the same minimalism in even the most modern of terminal apps. Warp fundamentally re-imagines the terminal in the following ways:
there are "cloud" components, both for some (legit) features based on artificial intelligence and for team collaboration
output from commands are "blocks" that can be interacted with (e.g., you can copy the command and/or the output; create a "permalink" to share with team members or the world — like this one from the screenshot, above; and, more)
the prompt is a fully-fledged code-editor (multi-cursors! spiffy tab completion, too)
customizable command workflows that do not require scads of clicks to setup, modify, and use
natural language-based shell commands builder, based on OpenAI’s codex engine
They've got a great "how we did this" 'splainer, and plans for many new features in-development.
For the momemnt, it's just for macOS, but the company behind Warp plans to suport Linux and Windows. It is free to use for individuals, and paid plans for teams/organizations are in the works.
Academic Link Usage & Rot
Today, we have another installment from the ODU WSDL files, this time covering link rot in academic papers.
At the end of 2021, ODU joined with other institutions on a two year project titled "Collaborative Software Archiving for Institutions (CoSAI)", with a charter focusing on "institutional approaches to provide machine-repeatable and human-understandable workflows for preserving web-based scholarship, specifically source code, while forefronting the role of education, outreach, and community building."
This phase of the project focused on links from academic papers to GitHub, GitLab, SourceForge, and Bitbucket (references as GHP for "Git Hosting Platforms"). One statistic that will likely not surprise anyone is that papers overwhelmingly link to GitHub repositories vs the other GHPs in the research corpus:
Source: ODSU WSDL
1.56 million publications
4,039,772 URIs to the Web
218,192 URIs to the 4 GHPs (92% are to GitHub)
Emily Escamilla (@EmilyEscamilla_) has a very accessible presentation on the to-date findings, including a note on how necessary and difficult it is to preserve the GHP resources referenced by academic papers, including the issue threads, pull requests, and wiki contents that may be part of any GHP project.
There's great food for thought and I suggest anyone even remotely interested in academic knowledge preservation keep an eye on Emily and this ODSU project. (NOTE: this effort is not the only take on such an analysis).
Scrape All The Things
Folks may remember a legal battle between hiQ Labs and LinkedIn a few years ago. LinkedIn (initially successfully) sued to stop hiQ from pulling down data from public profile pages of LinkedIn users. hiQ successfully appealed the decision. And, lawyers being lawyers, the appeal escalations continued all the way up to the Supreme Court, who tossed it back to the Ninth Circuit for reconsideration. A panel machinated said reconsideration and found in favor of hiQ [direct PDF].
The battle is not over, but — for now — you can freely scrape anything on the web that doesn't require authentication (as that implies the data is not "public"). LinkedIn (and other organizations) is also allowed to put technical controls in place to prevent service abuse (i.e. in the event you and hiQ essentially create a denial of service situation by flooding a site with too many requests).
Tread carefully and keep an eye on this case if you are a scraper.
I have mixed thoughts on teh hiQ/LinkedIn decision that I may further opine on the main blog in the not-too-distant future.
If y'all engage in the comments, the only rule is to be kind. ☮