Discover more from hrbrmstr's Daily Drop
Drop #338 (2023-09-21): Happy ThursdAI
How to prompt Code Llama; An AI capitalism primer; BERT For Laptops
It feels like it's been a minute since I did a ThursdAI edition, but that's only due to September compressing the time stream, at least here in Southern Maine (where did this month go?). Today's section mirepoix is of mix of practical and “meta” (kind of literally, too), with hopefully something new (and, useful) for everyone.
This is an AI-generated summary of today's Drop.
Perplexity failed in all three link embeds.
How to prompt Code Llama: A guide on how to structure prompts for Code Llama, an AI model built on top of Meta's Llama 2, designed to generate and discuss code, making workflows more efficient for coders and lowering the barrier to entry for beginners.
An AI capitalism primer: Ben Werdmuller's article discussing the economics of AI, focusing on legal aspects of AI-generated content, copyright issues, and the equitable distribution of profits in the AI economy.
BERT For Laptops: Sam van Herwaarden's notebook providing a step-by-step guide to training a BERT lookalike model on a laptop, with a focus on educational purposes and obtaining ~94% of the performance of the original BERT-base on the GLUE benchmark.
How to prompt Code Llama
Some readers picked up on an unstated goal of the still somewhat new-ish TL;DR section. Said goal was to show the need to regularly iterate on prompts (even in such a focused use case). While, I agree with HBR that folks should not bet their careers on becoming “Prompt Engineers”, those of us who need to interact with these beasts will need to up our prompting craft, and get better at problem formation (the HBR article is def worth your time).
The folks behind ollama continue to strive to make it easy for us mere mortals to run Meta's (eh, let's just remind folks that they're “Facebook” and not give in to corporate attempts to whitewash evil history) entry into the environmental disaster that are LLMs. Before we get to a new resource from them, we should talk about the llama in the room.
By now, almost every reader likely knows that Code Llama is an AI model built on top of Meta's dual-humped beast (Llama 2; the second link in the above ❡), fine-tuned for generating and discussing code. It is designed to make workflows faster and more efficient for coders and lower the barrier to entry for folks just learning to code.
Code Llama can generate code and blathering about code from both code and natural language prompts, making it a useful productivity and educational tool for programmers. It can also be used for code completion and debugging.
It supports a range of programming languages, including C++, Java, PHP, TypeScript, C#, and Bash. The core model includes support for Python, but there's also a highly-focused one for Python that has 100B additional Python tokens (b/c Python programmers need all the help they can get, I suppose).
A neat aspect of the Llama efforts is that there are quantized versions of the model that we can run on systems without 5,000 super fancy GPUs. A quantized LLM is a language model that has been compressed using quantization techniques. Quantization is the process of reducing the number of bits used to represent a number or value, which involves converting the weights of the model from higher precision data types to lower-precision ones. In the context of LLMs, this helps reduce the memory footprint and improve computational efficiency, making them easier to deploy on resource-constrained devices such as mobile phones, embedded systems, or IoT devices.
Quantized LLMs offer several benefits, including reduced memory requirements, faster inference times, and potential cost savings. However, there are some potential disadvantages to consider, such as a trade-off between model size, computational efficiency, and model accuracy, as well as increased development complexity. But, that's a cautionary tale for another Drop.
So, we have Code Llama. Yay?!
But, how should we use it?
I occasionally play with it in Perplexity's Lab, and bolted it on to VS Code (via Cody and Continue). However, I don't code all day, and I'm not part of a software engineering/development team who has to write code for a living and do all the formal things like code reviews. I suspect many readers fall into a similar description.
The Ollama folks have us covered in a new guide that explains how to structure prompts for each variation using their Ollama project. In the post, they walk through prompting each model variant. The “instruct” model responds to questions like writing a Fibonacci function. The “code completion” model generates subsequent code tokens based on a prompt. It also supports an infill format to complete code within an existing function. The “Python variation” is (as noted) fine-tuned on additional Python data, making it suitable for machine learning tasks.
While it may not be a full-on “prompt engineering” course, it may help folks use this model in a more efficient way.
It's also nice to not have to pay the
OPENAI_API_KEY tax to use something.
An AI capitalism primer
This article is nearly two weeks old, so it may have made it to your inbox via one of the 3+ billion other newsletters out there. If so, def jump ahead to the next section.
Ben Werdmuller wrote a well-formed piece that breaks down the economics of AI. How is it going to make money, and for whom?
It is a deliberately non-technial piece, so you won't have to suffer through the “basics of LLM/GPTs” or words like “quantize”. The post is 100% focused on the 💶.
Ben delves into the legal aspects of AI-generated content, specifically focusing on copyright issues. He refers to a federal court ruling stating that AI-generated content cannot be copyrighted, which has prompted the US Copyright Office to re-evaluate relevant laws. The discussion raises important questions about the future of copyright legislation, such as whether it will evolve to protect publishers or further empower AI vendors.
In the last part of the article, he highlights the fact that many content creators and publishers, whose work is essential for AI models to function, are not reaping the financial benefits of AI. The author argues that value is being extracted from these individuals and downstream users, whose data is fed into AI systems. This phenomenon disproportionately affects smaller publishers and underrepresented voices, raising concerns about the equitable distribution of profits in the AI economy.
I'm deliberately leaving the details out, since Ben did a great job on the post.
I also hope this article is spread far and wide, since I'm still not sure a sufficient mass of humans realizes we've pwnd ourselves once again.
BERT For Laptops
By now, the acronym “GPU” may even be something that the former Hermit of Maine would have known about. It's a unit of technology that is now the barrier to entry to “do data science” (Narrator: “It's really not; NVIDIA execs just want you to believe that.”)
GPUs are, however, a necessity when one is working with modèles de langage volumineux, and even ones not so YUGE.
If you're blessed with a gaming laptop, Sam van Herwaarden has put together a solid notebook with extensive commentary to help you through the process of training a “simple” BERT lookalike. I'm going to grift from the notebook's preamble to try to convince you to at least skim over it. Rather than indent all the things, every word until the section end is Sam's (with some extra links tossed in and light edits to read better in this context).
The notebook is developed for educational purposes more than performance, but in a bit more than half a day of training you can get a model that (after further fine-tuning) obtains ~94% of the performance of the original BERT-base on the GLUE benchmark. The code here builds on work by Geiping & Goldstein, Izsak et al., and Karpathy, who have all made LLMs more accessible for modest budgets.
You can execute the notebook from start to end to see the full process of setting up and training a tokenizer, pre-training a BERT model, and fine-tuning a BERT model on downstream NLP tasks. Most of the code from the notebook can also be found in the accompanying repository in regular Python files if you prefer, together with a few extra bits (e.g. SpanBERT style sample generation).
The document is split into three sections:
data: this is where we obtain and pre-process the data for pre-training, and build and train the BPE tokenizer.
architecture: this is where we define our BERT.
training: first we pre-train the model on a “masked language modelling” (MLM) objective with a lot of data, then we fine-tune on a few smaller tasks from the GLUE benchmark.
If you want to run the full notebook on a full-size model, expect training the tokenizer to take ~15 hours, pre-training with the MLM objective to take ~17 hours (on a 3070 RTX, adjust expectations for your system), and fine-tuning to take about an hour. The notebook was tested with 32 GB of regular RAM and 8 GB of GPU memory, if you have less, you might need to make some changes.
I hope the lazy tech and regular media soon become sufficiently distracted with the upcoming U.S. POTUS death match to tone down all the AI hype. ☮️
hrbrmstr's Daily Drop is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.