Today's Drop is fairly obviously AI-themed.
I mention that up front since I know there is a burgeoning cadre of folks who, lacking a better phrase, are “AI weary”. If that describes you, then you may want to bail from this edition while you have the chance.
At this point, I don't think I need a preamble on large language models (LLMs), Generative Pre-trained Transformers (GPTs), or latent diffusion models
(LDMs). I receive so many LLM, GPT, and LDM related blog posts, newsletters, and mainstream news pieces in my inbox, apps, and RSS feeds that I suspect nigh everyone in tech circles knows wht they are.We're covering two “practical” uses of GPT, and sliding in another, new, typography-related AI project that pairs nicely with a recent, previous Drop topic.
Before we get to the resources, I'm going to try to contain any “soapboxing” here, vs pepper said boxing in each section. That should help keep this edition brief. Well, at least for folks who skip this next part, since all three resource sections link to astonishingly well-crafted, long-form, or super straightforward “how to install/use” pieces that won't benefit (much) from my blatherings.
Now's your chance to do said skipping to the sections if you'd like to avoid a brief pontification on a personal, ethical problem I'm wrestling with whilst attempting to fold use of this new tech into my work/hobbies.
A disturbing percentage of civilization “level ups” that have occurred across the centuries have been built by relying on processes that either seriously harm the regional or global environment or take major advantage of what my faith would call “the least of us”
. And, the reward$ of those level ups also tend to be deposited into the (offshore) bank accounts of folks who you'd have a hard time distinguishing from Scrooge McDuck.The same is true for many of the hugely popular AI projects that have come to life over the past few years. I'm acrophobic, so I really need to dismount from this soapbox soon, which means I'll focus my opinionated ire on GPT-#.
One of my work responsibilities is keeping up with the latest advancements in “data science” to determine if any shiny new toy could improve our platform or enhance how we work as teams of humans in a company. I held off as long as possible, but I finally, personally and professionally, dropped some coin on OpenAI's paid offering for GPT-#.
I paused for, no joke, around ninety seconds before ultimately pressing the button that would seal the transaction. Like Google-cum-Alphabet, OpenAI just about lied to us all about their work and intentions. They also used the equivalent of slave labor to train their models. And, between them and the grifters who finance[d] and hype them — as they have so many previous “revolutions” — stand to profit wildly off the backs of this deceit and exploitation. What's more, they are increasingly being more opaque about how their newer offerings truly work.
As a result, each prompt I pass over to ChatGPT, and each ChatGPT-“enabled” service
I use weighs pretty heavily on me. Going “all in” on these new knowledge work aids means that I will likely become as desensitized to these concerns as I shamefully am with my use of Apple's ecosystem of devices, which are also built on the backs of “the least of us”, and most certainly increase the coffers of other McDucks.I realize we would not be able to function at all in modern society if the weight of each comfort or utility came crushing down on us each time we engage in some activity. However, I am hoping I — and the rest of us — do not adopt and embrace these “revolutionary” AI modern comforts without at least some personal acknowledgement that, when we do so, we are most certainly saying “I endorse this”.
scraperghost
In this two part
series on “Automated Scraping with GPT-4”, James Turk (@jamesturk@mastodon.social
) takes us on a journey of experimenting on a practical use of GPT-# in the world of web scraping. James used to run Open States, a project that scraped state legislative websites to make them more accessible to the public. So, he knows _tons_ about the topic of web scraping.
As James notes in Part 1, writing web scrapers is a translation task. You take a piece of HTML and transform it to a structured data format, which makes this something LLMs should be pretty decent at. He walks us through the steps of creating a semantic schema for what we want the extracted data to look like, e.g.:
schema={
"name": "string",
"url": "url",
"district": "string",
"party": "string",
}
then passing the HTML from a given target election results page into the model to have it extract the entities.
This is your periodic reminder that neither ChatGPT-3 nor ChatGPT-4 can grab data from a URL in a prompt. When you use their API, you will need to pass in any text you want processed with the prompt.
Now, these are probabilistic models, so I'm going to be skeptical about the efficacy of relying solely on them for such an important task, but you should read James' take on what his assumptions were going into this effort as well.
Part 2 introduces scrapeghost, a Python library and CLI built on James' initial experiments.
You can read up on how that works, since I wanted to leave some space to discuss the elephant in the room: ChatGPT and OpenAI's API are not free for any practical use of the tech. I rarely see folks talking about that, likely due to some bits of it being free
(for now).One major aspect I appreciated about James' series is his coverage of these costs and an emphasis on using “boring” (my term) old/existing tech to reduce the inputs sufficiently so as not to go broke.
Follow the series, start git-stalking James, and keep an eye on that API usage balance.
DS-Fusion
We introduced a fun new way to augment text segments, displayed in any font, with semantic meanings using Stable Diffusion in a recent Drop. Today, we now take a (brief) look at something new!
“DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion” is a “novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable”.
Here's the abstract of their paper [direct PDF]:
We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable. To address an assortment of challenges with our task at hand including conflicting goals (artistic stylization vs. legibility), lack of ground truth, and immense search space, our approach utilizes large language models to bridge texts and visual images for stylization and build an unsupervised generative model with a diffusion model backbone. Specifically, we employ the denoising generator in Latent Diffusion Model (LDM), with the key addition of a CNN-based discriminator to adapt the input style onto the input text. The discriminator uses rasterized images of a given letter/word font as real samples and output of the denoising generator as fake samples. Our model is coined DS-Fusion for discriminated and stylized diffusion. We showcase the quality and versatility of our method through numerous examples, qualitative and quantitative evaluation, as well as ablation studies. User studies comparing to strong baselines including CLIPDraw and DALL-E 2, as well as artist-crafted typographies, demonstrate strong performance of DS-Fusion.
I'm not going to blather much because (a) there's no code for me to link to (yet), and (b) I just know — almost for certain — that Lynn will be covering this and will do a much better job than me.
The header image gives you an idea of what you'll eventually be able to do with this neat new toy, and I'm eagerly awaiting the results of all the new storytelling opportunities both this and the project in the previous Drop are providing.
chatblade
Given the length of today's edition, we'll cut right to the chase.
Self-described as a “A CLI Swiss Army Knife for ChatGPT”, chatblade is a tool for interacting with OpenAI's ChatGPT. Key features include:
Accepts piped input, arguments, or both
Allows saving common prompt preambles
Provides utility methods to extract JSON or Markdown
Supports continuation of a previous conversation (preserving context)
Has an option to check token count and estimated costs
Lets us use custom prompts
It requires OpenAI API key setup and supports ChatGPT-3.5 and ChatGPT-4 models.
FIN
I’m curious as to how many others are working to figure out ways to practically use these LLMs (GPT-# or anything else) in the context of their knowledge work. ☮
I 100% do not mind! I totally get it.
a.k.a. Stable Diffusion.
More often than not, both.
e.g., Kagi's, Raycast's, and Notion's new built-AI enhanced offerings.
I highly suspect there are more coming.
ICYMI: we were also used as slave labor by OpenAI.