Despite your Drop herder still being in “quick hits” mode, we can't go into a weekend without an [optional] project to tackle. This week, it's all about reducing costs, saving time, and keeping data you need as close as possible via caching.
While we're going to be focused on HTTP API caching, said focus is on generic HTTP API caching vs, say, setting up a caching layer for a specific API you might be running.
But, we're getting ahead of ourselves. Let's start by discussing what this “caching” is.
A [Quick?] Intro To Caching
HTTP caching is a technique used to store frequently accessed resources, such as web pages, API results, or images, in some type of data store. When a client requests a resource, the cache can serve the resource directly instead of requesting it from the server.
Implementing a caching layer can provide several benefits, including:
faster response times: by serving cached resources instead of requesting them from the server, response times can be significantly reduced
reduced network bandwidth consumption: caching can reduce the amount of data that needs to be transferred over the network, which can be especially beneficial for mobile devices or users with limited bandwidth
improved scalability: caching can reduce the load on the server by serving cached resources instead of requesting them for every request
frugality: most APIs are not free, so caching can help significantly reduce costs and help keep your API credits topped off, especially if the data/pages/images/resultsets being fetched do not change often.
There are several popular idioms for performing generic HTTP caching:
expiration-based caching: in this approach, the server includes an expiration time in the response headers, indicating how long the resource can be cached. The client can then use this information to determine whether to serve the cached resource or request a fresh copy from the server
validation-based caching: In this approach, the server includes an ETag (entity tag) in the response headers, which is a unique identifier for the resource. The client can then use this identifier to check whether its cached copy of the resource is still valid by sending a conditional request to the server. If the resource has not changed, the server can respond with a 304 Not Modified status code and the client can serve its cached copy of the resource
Cache-Control
header: the Cache-Control header allows servers to specify how clients should cache responses. For example, it can be used to specify whether a response can be cached at all, whether it should be revalidated before being served from cache, and how long it can be cached
HTTP API requests are a slightly different beast when it comes to caching. While some API servers will inform you that the data has not changed since the last time you made the exact same query, most do not. It's really up to you to know the in's and out's of the service you are querying to have the most robust caching strategy as possible.
For example, at present, my $WORK GreyNoise dataset updates (just about) hourly. This means you are quite safe caching the results of a lookup for, say, a given IP address for at least an hour. Older API data results can be cached for a tad longer, but we do regularly perform “backfills” when new tags are created or existing tags are updated.
NVD's CVE API, on the other hand, is far more static. This means you are very likely safe caching each CVE entry for a much longer period of time. In fact, you probably should, since they also force a 0.6-second delay between API calls and will likely ban you if you regularly try to hammer the service.
You likely have more than a few APIs you hit on-the-regular, and there are many ways to slide a caching strategy into your routine.
Generic API/Resource Caching Resources (That Aren't Part Of Your Mission-proper)
While we're not covering these today, you might want to try using one of the more traditional/legacy caching solutions to help keep things simple.
Nginx, Apache httpd, and Caddy all have robust built-in and third-party generic caching solutions.
The venerable Squid proxy has been a dedicated generic caching solution for ages, and Traefik is a modern, robust caching solution.
IETF RFCs Relevant To HTTP/Caching
Groking how caching fits into the way the HTTP was designed and evolved can help wrap a mental model around what we're trying to accomplish. Here are three to get you started:
Your Mission: Deploy Kong!
Kong is an open-source API gateway and microservices management layer that can be used to manage, secure, and scale APIs. Some features of Kong include:
Load balancing
Authentication and authorization
Rate limiting
Logging and monitoring
Caching
Plugins for extending functionality
It can be used with any programming language and any backend service, making it a super flexible solution for managing all the APIs you need to access. It is available as a free, open-source community edition or as a paid enterprise edition with additional features and support.
Your mission is to: Get Kong and configure Kong Proxy Caching for at least one API you regularly use.
The Docker setup drops in like a hot knife through butter, and I know each and every one of the Drop's readers can follow their most excellent documentation.
Since the NVD API (mentioned above) is free, and has a time-based restriction between GET
requests, that might be a fun one to get working behind Kong before handling more complex situations that, say, require authentication.
FIN
We're back on the road tomorrow (Saturday) so look for a proper-length Bonus Drop on Sunday and a return to Drop normalcy on Monday! ☮
A sign that you know the youth of your readers is that you didn’t feel it useful to mention, in passing, that Andreasson rode the 304 bus to the first big dotcom IPO by partially alleviating the world wide wait.