Discover more from hrbrmstr's Daily Drop
Drop #255 (2023-05-05): Sync[o] de MinIO 🎉
Litestream; MinIO; Pure [ba]sh bible
Well, it turns out your Drop herder did, indeed, overreach this week. Who knew two hours of driving, IRL conference presenting + hob-knobbing, attending a National Honor Society induction for offpring #4 and cooking dinner all on the same day would be so tiring for someone who is still enduring the long after effects of a spike protein invasion?
As a result, the planned Weekend Project Edition must be postponed til next week (they take a bit more planning than the normal Drops).
In its stead, you get a lame holiday pun as a Drop tagline and three diverse and fun resources to check out over the weekend.
SQLite is the most used database on the planet. It's baked into bonkers numbers of integral tasks in applications and operating systems, and has a very low barrier to entry when it comes to learning and weilding it.
However, at the end of the day, it's just a file that can be deleted or corrupted at any point in time, and it lives on just one system since there's no “server”. This is a disaster waiting to happen.
Enter Litestream (GH), a “standalone disaster recovery tool for SQLite. It runs as a background process and safely replicates changes incrementally to another file [or storage object]. Litestream only communicates with SQLite through the SQLite API, so it will not corrupt your database.”
With it, you can continuously stream SQLite changes to AWS S3, Azure Blob Storage, Google Cloud Storage, SFTP, or NFS (et al.). When failure strikes (b/c it always will), you can quickly recover to the point of failure if your server goes down.
It's super easy to use, and you may want to consider using it with the next item in today's Drop.
When someone says they're “using S3”, you may assume they mean Amazon's nigh ubiquitous service. But, the S3 protocol has kind of become the de-facto object storage API, and there are many other commercial services and open-source projects that ride on S3's coattails.
One of these (that is both commercial and open-source) is MinIO (GH). This is an open-source/freemium, distributed object storage server designed for local/cloud applications and hybrid multi-cloud deployments (which is kind of super cool when you think about it, since you aren't beholden to a single provider). As noted, it's built on top of the Amazon S3-compatible API, which allows for seamless integration with existing applications and tools. It core three focus areas are: simplicity, performance, and scalability (I can attest to all three, too).
If you're not familiar with “object storage”, that fancy term is nothing more than an approach which stores data as objects instead of traditional file hierarchies or block storage. Each object consists of the data, its associated metadata, and a unique identifier. This architecture allows for easy scaling and management of massive amounts of unstructured data, such as images, videos, and logs. These days, it's often much faster to use object storage environments (even locally) than it is to use raw filesystems, especially for lots of small files.
While you can use it with the tool in the previous section, you can also wire it up to Presto/Drill as part of your (sigh) data lake. Or, you can have the Arrow ecosystem pull Parquet from it vs. raw filesystems; or, just get some practice in before helping Jeff Bezos buy another yacht.
The installation is 100% open-source and can be done in a few minutes, and the mc CLI is super handy + very intuitive.
I've migrated most of the data storage on my home data science/security/development lab server to MinIO and will never look back.
Pure [ba]sh bible
The Pure sh bible and Pure bash bible are two books with the common goal of documenting commonly known and lesser-known methods of doing various tasks using only built-in POSIX [ba]sh features. They provide scads of snippets which can help remove unneeded dependencies from scripts and — in most cases — make them faster.
Each covers alot of territory, with core topics that include:
Strings: Operations for manipulating and analyzing strings, such as stripping patterns, trimming white-space, checking for substrings, and splitting strings.
Files: Operations for handling files, such as parsing key-value files, counting lines, creating empty files, and counting files or directories.
File Paths: Operations for working with file paths, such as extracting directory names and base-names.
Loops: Techniques for iterating over ranges of numbers, file contents, and files or directories.
Variables: Operations for naming and using variables based on other variables.
Escape Sequences: Techniques for manipulating text output, such as colors, attributes, cursor movement, and erasing text.
Parameter Expansion: Operations for manipulating variables and their values, such as prefix and suffix deletion, length, and default values.
Conditional Expressions: Techniques for testing file properties, variable conditions, and variable comparisons.
Arithmetic Operators: A range of assignment, arithmetic, bitwise, logical, and miscellaneous operators.
Arithmetic: Ternary tests, checking if a number is a float or integer.
Traps: Operations for handling script termination and ignoring terminal interrupts.
Obsolete Syntax: Deprecated command substitution techniques.
Internal and Environment Variables: Operations for accessing and manipulating built-in shell variables, such as opening a text editor, getting the current working directory, and shell options.
They're absolutely worth including in your personal snippets library.
What's that? You don't have a personal snippets library? No worries! We'll be covering that in the Bonus Drop this weekend!
Happy 🌮 🍹 🫔 🌯 consumption and cultural appropriation day! ☮