hrbrmstr's Daily Drop

Share this post

Drop #265 (2023-05-19): Weekend Project “Quick Hits” Edition

dailyfinds.hrbrmstr.dev

Discover more from hrbrmstr's Daily Drop

A digest of all the interesting data, packages, blogs and papers covering lots of programming languages, CLI utilities, cybersecurity, data visualization, data science, web-scraping and more!
Continue reading
Sign in

Drop #265 (2023-05-19): Weekend Project “Quick Hits” Edition

Cache Me If You Can

boB Rudis
May 19, 2023
2
Share this post

Drop #265 (2023-05-19): Weekend Project “Quick Hits” Edition

dailyfinds.hrbrmstr.dev
1
Share

Despite your Drop herder still being in “quick hits” mode, we can't go into a weekend without an [optional] project to tackle. This week, it's all about reducing costs, saving time, and keeping data you need as close as possible via caching.

While we're going to be focused on HTTP API caching, said focus is on generic HTTP API caching vs, say, setting up a caching layer for a specific API you might be running.

But, we're getting ahead of ourselves. Let's start by discussing what this “caching” is.

A [Quick?] Intro To Caching

person holding yellow and green plastic bottle
Photo by Martin Lostak on Unsplash

HTTP caching is a technique used to store frequently accessed resources, such as web pages, API results, or images, in some type of data store. When a client requests a resource, the cache can serve the resource directly instead of requesting it from the server.

Implementing a caching layer can provide several benefits, including:

  • faster response times: by serving cached resources instead of requesting them from the server, response times can be significantly reduced

  • reduced network bandwidth consumption: caching can reduce the amount of data that needs to be transferred over the network, which can be especially beneficial for mobile devices or users with limited bandwidth

  • improved scalability: caching can reduce the load on the server by serving cached resources instead of requesting them for every request

  • frugality: most APIs are not free, so caching can help significantly reduce costs and help keep your API credits topped off, especially if the data/pages/images/resultsets being fetched do not change often.

There are several popular idioms for performing generic HTTP caching:

  • expiration-based caching: in this approach, the server includes an expiration time in the response headers, indicating how long the resource can be cached. The client can then use this information to determine whether to serve the cached resource or request a fresh copy from the server

  • validation-based caching: In this approach, the server includes an ETag (entity tag) in the response headers, which is a unique identifier for the resource. The client can then use this identifier to check whether its cached copy of the resource is still valid by sending a conditional request to the server. If the resource has not changed, the server can respond with a 304 Not Modified status code and the client can serve its cached copy of the resource

  • Cache-Control header: the Cache-Control header allows servers to specify how clients should cache responses. For example, it can be used to specify whether a response can be cached at all, whether it should be revalidated before being served from cache, and how long it can be cached

HTTP API requests are a slightly different beast when it comes to caching. While some API servers will inform you that the data has not changed since the last time you made the exact same query, most do not. It's really up to you to know the in's and out's of the service you are querying to have the most robust caching strategy as possible.

For example, at present, my $WORK GreyNoise dataset updates (just about) hourly. This means you are quite safe caching the results of a lookup for, say, a given IP address for at least an hour. Older API data results can be cached for a tad longer, but we do regularly perform “backfills” when new tags are created or existing tags are updated.

NVD's CVE API, on the other hand, is far more static. This means you are very likely safe caching each CVE entry for a much longer period of time. In fact, you probably should, since they also force a 0.6-second delay between API calls and will likely ban you if you regularly try to hammer the service.

You likely have more than a few APIs you hit on-the-regular, and there are many ways to slide a caching strategy into your routine.

Generic API/Resource Caching Resources (That Aren't Part Of Your Mission-proper)

assorted handheld tools in tool rack
Photo by Barn Images on Unsplash

While we're not covering these today, you might want to try using one of the more traditional/legacy caching solutions to help keep things simple.

Nginx, Apache httpd, and Caddy all have robust built-in and third-party generic caching solutions.

The venerable Squid proxy has been a dedicated generic caching solution for ages, and Traefik is a modern, robust caching solution.

IETF RFCs Relevant To HTTP/Caching

close view of The Compact Encyclopedia collection
Photo by James on Unsplash

Groking how caching fits into the way the HTTP was designed and evolved can help wrap a mental model around what we're trying to accomplish. Here are three to get you started:

  • RFC 9110 - HTTP Semantics

  • RFC 9111 - HTTP Caching

  • RFC 9205 - Building Protocols with HTTP

Your Mission: Deploy Kong!

Kong is an open-source API gateway and microservices management layer that can be used to manage, secure, and scale APIs. Some features of Kong include:

  • Load balancing

  • Authentication and authorization

  • Rate limiting

  • Logging and monitoring

  • Caching

  • Plugins for extending functionality

It can be used with any programming language and any backend service, making it a super flexible solution for managing all the APIs you need to access. It is available as a free, open-source community edition or as a paid enterprise edition with additional features and support.

Your mission is to: Get Kong and configure Kong Proxy Caching for at least one API you regularly use.

The Docker setup drops in like a hot knife through butter, and I know each and every one of the Drop's readers can follow their most excellent documentation.

Since the NVD API (mentioned above) is free, and has a time-based restriction between GET requests, that might be a fun one to get working behind Kong before handling more complex situations that, say, require authentication.

Share

FIN

We're back on the road tomorrow (Saturday) so look for a proper-length Bonus Drop on Sunday and a return to Drop normalcy on Monday! ☮

2
Share this post

Drop #265 (2023-05-19): Weekend Project “Quick Hits” Edition

dailyfinds.hrbrmstr.dev
1
Share
Previous
Next
1 Comment
Share this discussion

Drop #265 (2023-05-19): Weekend Project “Quick Hits” Edition

dailyfinds.hrbrmstr.dev
Richard Careaga
May 19Liked by boB Rudis

A sign that you know the youth of your readers is that you didn’t feel it useful to mention, in passing, that Andreasson rode the 304 bus to the first big dotcom IPO by partially alleviating the world wide wait.

Expand full comment
Reply
Share
Top
New
Community

No posts

Ready for more?

© 2023 boB Rudis
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing