Drop #296 (2023-07-17): Load Reduction

The Biggest (Docker Image Size) Loser; Spying On Social URLs; Gita

Happy Monday, Drop readers! July is half over, O_O!

This year, summer speeding by (in the Northern Hemisphere) is kind of a good thing, since I’m somewhat tired of melting each time I leave the abode. Hopefully, today’s resources will help keep y’all company as you huddle near the A/C.

Today, we have three resources that (in order) reduce:

  • size;

  • volume; and,

  • complexity

Let’s dig in!

The Biggest (Docker Image Size) Loser

selective focus photography of tape measure

As readers are no doubt keenly aware of, I’m more than willing to blather on (and, on; and, on) about any given resource that makes it to the Drop, but — sometimes — brevity has its place. Given that this section is a link to how to make Docker images smaller, I’ll do my best to be brief.

Full disclosure: the post itself is about “Go” stuff, but the advice holds across other domains; I just happen to be keenly interested in reducing Golang Docker image sizes at this particular moment.

Fundamentally, producing smaller docker images means less storage cost and faster deployment times. The article I keep talking around discusses several techniques to reduce the size of a Docker image for a Go application.

TL;DR:

  • Switching the base image from Debian to Alpine Linux reduced the image size by half.

  • Separating the build and distribution images and only copying the compiled binary further reduced the size.

  • Omitting debugging symbols and compressing the binary using UPX reduced the final image size to just 1% of the original.

With these optimizations, the author(s) achieved a 99% reduction in bandwidth and storage costs.

We’ve covered this broader topic before, but I found this particular post to be very targeted and digestible. Hopefully, it will be of use to you as well!

Spying On Social URLs

silhouette photography of man

I do my best to feature other resources (i.e. ones besides my own) on the Drop. And, granted, most readers likely have me in one or more of their socmed timelines and already know about this. But, It’s been working well enough, and is self-contained enough, and is potentially useful enough that I figured it warranted a place in today’s editon.

This past Saturday, to distract from present-moment elder care issues, I built a small JavaScript CLI to yank URLs from the Bluesky firehose.

The AT Protocol (which I covered in a recent Bonus Drop that was sent to all readers) is verbose, but it’s also fairly well-conceived. Their “firehose” is just a Websocket subscription. At the moment, you do not need to authenticate to gain access to the firehose. Websockes does support auth, so expect this to change once Bluesky starts bleeding cash.

In theory, Bluesky wants you to subscribe to the firehose to create algorithmic feeds that you and others can use. But, you can do anything with this data. And, you should!

So, I set up a small script that logs all URLs posted to Bluesky to a SQLite database.

No “Kubernetes”. No “Postgres”. No “message queue”. Just something you can run from your laptop/desktop with almost no fuss.

It should be fairly straightforward for anyone with even a tiny amount of javascript knowledge to riff from.

I have this ObservableHQ notebook doing some presentation-layer bits on the SQLite db.

Now, the database is starting to get to the “probably should not be loading a multi-MB file into your browser” size. So, I’ll have to come up with a different presentation layer, but you can use this starter to begin building RSS feeds or analyzing anything else you want to capture.

Caveat: Bluesky has had a really rough time of it the past couple weeks. Each “rough” event has all been a self-inflicted wound. Not to say “I told you so”, or anything, but I still stand by my “I don’t really trust their intentions” vibe in a previous rant about them on the Drop.

However, I think the AT Protocol is likely here to stay, at least in some form. And, it is built on already established tech that we all should be at least have a passing familiarity with. So, exercising some muscles on a service that already has some data shouldn’t be an errant time sink.

If you do or have built anything that speaks Bluesky/ATProto, def drop a note, so others can learn!

Gita

high-angle photo of trains

Gita is an open-source command-line tool, created by Dong Zhou, that simplifies the orchestration of multiple Git repositories. As someone who has — between various social coding sites — close to a thousand Git repos, I’m constantly searching for something that will help me tame this Git sprawl.

From trials, so far, Gita has let me view the status of all my repositories at once, including branch, modification, and commit messages. This is helpful, as I do have quite a bit of “in flight” projects and ideas, and the fog of long covid has diminished some instant recall I used to have at the ready.

It also lets me execute Git commands or shell commands on multiple repositories from any working directory. This has already saved me a bit of time and effort when performing tasks like fetching updates or making updates to projects that have some commonalities.

You can go beyond just plain “shell commands” execution, though. Gita allows the creation of custom commands, where — if you have some tasks you regularly perform manually or with some script — you can tailor the tool to fit precisely how you work.

To get started with Gita, you can install it using pip:

python3 -m pip install gita

Once installed, it’s easy (yes, a deliberate use of that word) to add a repo to Gita’s management domain:

gita add /path/to/repo

To view the status of all your repositories, use the gita ll command:

gita ll

This will display the status of all your repositories, including branch, modification, and commit messages.

To execute Git commands on multiple repositories, use the gita command followed by the Git command you want to execute:

gita fetch

This will fetch updates from all your repositories. This has turned out to be one of the most used features of it, for me, as I do work across at least three different systems.

It supports shell auto-complete and makes deliberate use of colors and symbols.

The branch color (which is 100% customizable) distinguishes 5 situations between local and remote branches:

  • white: local has no remote

  • green: local is the same as remote

  • red: local has diverged from remote

  • purple: local is ahead of remote (good for push)

  • yellow: local is behind remote (good for merge)

The status symbols denote:

  • +: staged changes

  • *: unstaged changes

  • ?: untracked files/folders

Hit up the repo to learn why purple and yellow were chosen (the author def deserves some 👀 on their work). You’ll also want to keep it handy as a full command reference, as it has an extensive command set.

FIN

Last week, I had the privilege of meeting some socmed pals IRL for the first time, and also the secondary privilege of presenting at the 2023 New York R Conference. My slides on WebR are available, now, and the YouTube videos of all the presentations should be up soon. It was a great conference, and I am so thankful for the opportunity to hang with such amazing humans.

Also, many thanks to those who have bumped up to supporting readers last week! ☮

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.