Drop #313 (2023-08-09): Watch Out Wednesday

[Blocking] GPTBot; Residential Proxy Harms; Cyber Trust Mark

It’s “Hacker Summer Camp” week, so I’ll take this as an opportunity to dedicate a full Drop to cybersecurity content.

TL;DR

This is an AI-generated summary of today’s post.

Today, I uploaded the Markdown copy of this post to claude.ai with the prompt “please provide a concise three-bullet summary of the attached file with a link to the main resource in each section”. I did not need to edit the summary.

  • OpenAI has published information on how to block its GPTBot web crawler, including its IP ranges and honoring robots.txt. You can decide if you want your content used for AI training. Full details

  • Residential proxy services route traffic through residential IPs, but can enable harmful activities. You could face legal liability. Consider terminating relationships with major providers like Smartproxy or NetNut. More context

  • New U.S Cyber Trust Mark will help consumers identify IoT devices meeting security standards. Look for the logo when buying smart home tech by end of 2024. FCC details


[Blocking] GPTBot

black and white brick wall

OpenAI has published information on GPTBot, the web crawler it uses to interact with and grift content from web sites. They claim they will honor robots.txt rules, so if you want to prevent your content from being slurped up into their ecosystem, you can use something like:

User-agent: GPTBot
Disallow: /

or, more targeted path rules, to block it.

They also published their crawler’s IP ranges, which are (at present):

20.15.240.64/28
20.15.240.80/28
20.15.240.96/28
20.15.240.176/28
20.15.241.0/28
20.15.242.128/28
20.15.242.144/28
20.15.242.192/28
40.83.2.64/28

should you want to block their infrastructure outright. Be warned, though, that those ranges are owned by Microsoft. They can either be changed at any time, or may prevent future Bing from indexing you, should OpenAI crash, burn, and die in a fire and Microsoft, then, decides to re-use them for Bing or other Azure services.

My sentiments on this matter are likely pretty obvious from language I’ve used in this section. However, we should take a moment for a more thoughtful/nuanced look at why you may, or may not, want entities such as OpenAI accessing and using your content in formulating prompt responses or training new models.

On the “pro-blocking” side, some content creators may not want their work to be used for training AI models without permission, credit, or compensation. It’s also far too easy to casually include just sensitive enough information in blog posts or articles that one may want to prevent potential misuse of such information.

When it comes to the “pro-AI-scraping” perspective, you may want these AI-bots to ingest and use your content, so your creations do become part of the AI zeitgeist and help others who are prompting these systems. Furthermore, unless you credential-wall content, it’s public data (though public information is 100% copywritable) and you likely let Google, Bing, Kagi, etc. scrape your site, today. Is there any real difference in letting OpenAI have it directly (REMEMBER, though, that OpenAI is — for all intents and purposes — Microsoft, given their anti-trust-busting deliberate 49% — vs. a majority — ownership in it)?

Whichever side you personally fall on (you can see where I stand), you may want to have a discussion with your team, lab, organization (provided you’re in a position to do so) about the topic as well. Reddit, Stack Overflow, and others have decided they want cold, hard, cash from these AI systems for their content, and your organization should have some similar policy in place.

Residential Proxy Harms

Residential proxy services are becoming increasingly popular as they provide users with a way to access the internet through a residential (vs cloud or data center) IP address, allowing them to bypass certain restrictions and maintain anonymity. However, allowing these services to use your internet connection can pose several risks.

These services work by routing your (client) internet traffic through an intermediary server. This server assigns you an alternative IP address associated with physical locations, such as residences, which are (oddly) considered more trustworthy by websites due to their residential nature. Web scraping is the most common use for these services. But, you can also use them as a proxy gateway to the internet if you are just looking to have your browser access websites through an alternate IP address. They can also help you look like you are coming from a different region of the globe. While you are, in some way, masking your source address, these services do not provide the same level of pseudo-anonymity you would get by using, say, Tor.

Last month, our internet sensor network snared an individual who was using one of these services to try to gain access to WordPress sites in Ukraine. Normally, using a residential proxy network is fairly cheap, depending on what you are using it for. But, we managed to cause this individual to spend some non-trivial coin (it appears they weren’t monitoring their script), as they burned through over 89,000 U.S. IP addresses in their ill-fated malicious attempts. The section header is a map of all of those geolocated points in (limited to the continental U.S.).

I used the word “harms” in the section title, since this individual most certainly was trying to do harm to entities in a country that’s trying to fend off a global tyrant. If you allowed one of these services to use your resources in such a way, you most certainly were aiding and abetting said harm. They’re also used for other harms, like gift card fraud, chargeback fraud, and click fraud. In fact, you could be held legally responsible for the actions of third parties using your connection. And, you may violate the terms of service put forth by your ISP.

I won’t name the service directly, but very common residential proxy service company include: Smartproxy, Bright Data, NetNut, IPRoyal, and Oxylabs. If you’re presently involved with any of them, I’d highly suggest terminating said business relationship.

Cyber Trust Mark

brown metal shield wall decor

The Biden administration has introduced the U.S. Cyber Trust Mark, a new cybersecurity labeling program for smart devices, aimed at protecting American consumers from cyber threats. It’s a voluntary program proposed by Federal Communications Commission (FCC) Chairwoman Jessica Rosenworcel, which aims to raise the bar for cybersecurity across common Internet of Things (IoT) devices, such as smart refrigerators, televisions, climate control systems, and fitness trackers. Devices that meet established cybersecurity criteria will display a distinct shield logo, the Cyber Trust Mark, to inform consumers about their security features.

The criteria for the Cyber Trust Mark are based on guidelines (direct PDF) from the National Institute of Standards and Technology (NIST). They include such measures as use of unique and strong default passwords, data protection, software updates, and incident detection capabilities.

As IoT devices become more common and ubiquitous, they also become more vulnerable to cyberattacks. This new “nutrition label” for digital devices you procure can help you make more informed choices and help make the internet (and your home) a bit safer.

I am not authorized to display the mark in this blog post (I didn’t reach out for the required permission) but you can see the various flavors of it on the FCC’s website.

The U.S. Cyber Trust Mark is expected to be in place by the end of 2024. As the program rolls out, consumers will be able to look for the distinct shield logo on IoT devices to make informed decisions about their security features.

FIN

If y’all have other cybersecurity topics you’d like to see me cover more often, drop a note wherever you’re most comfortable. ☮

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.