Discover more from hrbrmstr's Daily Drop
Drop #351 (2023-10-12): 🛡️ Happy (Secure) ThursdAI
nbdefense; modelscan; rebuff; Perplexity.ai API
We're a week early with our ~monthly recurring glance at advancements in the gradual subjugation of humanity, but when two of my worlds collide — cybersecurity and data science — I gotta make exceptions. Plus, I'm still in travel mode — and all 3 + 1 (the last one has nothing to do with cybersecurity) resources have excellent documentation — which means I can let them do most of the heavy lifting when it comes to providing examples.
This is an AI-generated summary of today's Drop.
I switched Perplexity over to Claude-2, today, with a prompt that explicitly said, “four sections” and it only gave me three, but the three were superb and quite concise.
Here is a concise three bullet summary of the key sections in the attached blog post:
NBDefense helps secure Jupyter notebooks by detecting leaked credentials, PII, and licensing issues
ModelScan scans AI models for malicious code before deployment or loading
Rebuff protects against prompt injection attacks on large language models like GPT-3
Perplexity has a new/second API endpoint.
This is the first of three open-source tools that Protect AI has released to help orgs and individuals stay safe as they (I guess I have to say “we”) crank on environmentally-destructive data science tasks.
If you are in the “notebook cult”, you are a target. Attackers are clever, and have figured out that anything “Jupyter” can be (pretty easily) pwnd.
NBDefense can scan Jupyter notebooks and detect potential security issues such as:
Usage of open source code with restrictive licenses
It can be used as either a JupyterLab extension that scans notebooks within JupyterLab or as a CLI tool that can scan notebooks and projects. The CLI is especially useful for scanning many notebooks at once or setting up automated scanning workflows. The JupyterLab extension enables continuous scanning of notebooks as y'all work on them interactively.
Jupyter Notebooks often contain an organization's most sensitive code and data. According to research from Protect AI, public notebooks from major tech companies frequently contain credentials, PII, and licensing issues.
Attackers can exploit these notebooks to steal data, hijack accounts, or reverse engineer models. Notebooks may also unintentionally expose your organization to legal risks if they use restrictive open-source licenses improperly.
Detailed installation instructions are available in the NBDefense documentation.
AI models are increasingly being used to make critical decisions across various industries. However, like any software application, AI models can have vulnerabilities that malicious actors could exploit. A new type of attack called “model serialization attacks” poses a particular risk for organizations using AI.
Model serialization attacks involve inserting malicious code into AI model files that are shared between teams or deployed into production. For example, an attacker could add code that steals credentials or sensitive data when the model file is loaded. This is similar to a Trojan horse attack.
To defend against model serialization attacks, organizations need to scan AI models before use to detect any malicious code that may have been inserted. However, until recently, there were no open-source tools available to easily scan models from various frameworks like PyTorch, TensorFlow, and scikit-learn.
ModelScan is a project from Protect AI that provides protection against model serialization attacks. It is the first model scanning tool that supports multiple model formats including H5, Pickle, and SavedModel.
ModelScan scans model files to detect unsafe code without actually loading the models. This approach keeps the environment safe even when scanning a potentially compromised model. ModelScan can quickly scan models in just seconds and classify any unsafe code it finds as critical, high, medium or low risk.
With ModelScan, we can:
scan models from PyTorch, TensorFlow, Keras, scikit-learn, XGBoost, and more
check models in seconds by reading files instead of loading models
see which bits of code are unsafe (i.e., categorized as critical, high, medium or low risk)
It can also be used at any stage in the development process:
before loading models: Scan all pre-trained models from third parties before use in case they have been compromised.
during model development: Scan models after training to detect any poisoning attacks on new models.
before model deployment: Verify models contain no unsafe code before deployment to production.
in ML pipelines: Integrate ModelScan scans into CI/CD pipelines and at each stage of ML workflows.
Provided your Python environment is not woefully busted, you are a quick:
$ python3 -m pip install modelscan $ modelscan -p /path/to/model_file
away from leveling up your safety.
It's no secret that prompt injection attacks have emerged as a serious threat to AI systems built on large language models (LLMs) like GPT-3 and ChatGPT. In these attacks, adversaries manipulate the prompts fed into the LLM to make it behave in unintended ways.
Prompt injection can provide a means for attackers to exfiltrate sensitive data, take unauthorized actions, or cause the model to generate harmful content. Recent examples include bypassing content filters, extracting training data, and stealing API keys. Prompt injection has been ranked as the number one threat to LLMs (direct PDF) by OWASP.
The core vulnerability arises from the fact that LLMs process instructions and input text in the same way. There is no built-in mechanism to distinguish harmless user input from malicious instructions designed to manipulate the model.
filters potentially malicious inputs before they reach the LLM
uses a separate LLM to analyze prompts for attacks
stores embeddings of past attacks to recognize similar ones
plants canary words to detect data leaks
This new tool makes it possible to integrate prompt injection protection into our LLM apps with minimal code changes. We simply need to pass the prompt through Rebuff's
detect_injection method, which returns whether an attack was detected. (NOTE: “simply” is doing an awful lot of heavy lifting in that sentence).
While Rebuff mitigates many prompt injection risks, it is not foolproof. Skilled attackers may still find ways to bypass protections. However, Rebuff offers a pretty spiffy first line of defense.
I pay my AI tax to Perplexity (and, also — indirectly — to Kagi, too, I guess) and they've added
replit-code-v1.5-3b (ref 1/ref 2) via a new
/completions endpoint. They're mimicking the OpenAI API, but for those not familiar with that, this endpoint generates text to complete a non-conversational prompt. They've had a
/chat/completions endpoint for a few weeks — now.
Find out more at the API docs.
In unrelated — and, arguably, far more important — news: those looking to provide some practical help to folks in need during this time of crisis and conflict in the Middle East, CharityWatch has a list of “legitimate, efficient, and accountable charities involved in efforts to aid and assist the people of Israel – Palestine during active conflict in the region”. ☮️
hrbrmstr's Daily Drop is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.