Discover more from hrbrmstr's Daily Drop
Drop #214 (2023-03-07): Underused LOLbins And Oxidized Multicall Bins
comm; join; Re-imagining Coreutils
The phrase “Living off the land” was coined by Christopher Campbell and Matt Graeber at DerbyCon 3, one of far, far too many cybersecurity conferences, over nine years ago. The phrase refers to an attacker technique which involves primarily using only tools and information available on a given system/device to achieve their nefarious goals.
It can be risky for attackers to download custom toolkits. Such activities generate extra/anomalous internet traffic/system activity, which may be detected by various security tools.The term “LOLBins” specifically refers to “living off the land binaries” and was coined by Philip Goh in a Twitter discussion back in 2018. These are local executables that come along for the ride with operating system or application installs.
While the technique and term are primarily used in contexts involving malicious activity, there are times when data scientists, researchers, or the average user might want to use OS utilities and features that are part of the system default installation. Why create a dependency on Python or R when you can accomplish some given task that performs equally well with tools that are guaranteed to be available and portable?
Today, we cover two often overlooked LOLBin utilities that are “batteries included” on any modern, useful operating system (so, that excludes all of Microsoft's pane-ful OSes)
We then take a peek at a modern re-imagining of coreutils — basic file, shell, and text manipulation utilities originally developed as part of the GNU operating system. These (mostly) come along for the ride on macOS and Linux. The name implies that these are “core utilities” which are expected to exist on every operating system.
Often, it is necessary to compare files or directories, and identify the differences between them. One might even say this is a (…wait for it…) very comm-on task.
Now, you absolutely know about
diff, the most overused member of the Diffverse; and, if you work collaboratively with a decent sized team, there's a good chance you've dug deep into git and even used
diff3. But, when's the last time you used comm?
comm (short for “common”) utility is a tool that is used to compare two sorted files line by line, and display the differences or similarities between them.
comm displays three columns of output:
lines that are only in the first file
lines that are only in the second file
lines that are common to both files
Before we look at an example, one YUGEly important thing to remember when using
comm is that both files need to be sorted before using it.
Say we've got two files:
Holden Burton Nagata Kamal Draper Johnson Miller
Holden Burton Nagata Kamal
After ensuring they're both sorted:
$ sort -o roci-crew-manifest-1.txt roci-crew-manifest-1.txt $ sort -o roci-crew-manifest-2.txt roci-crew-manifest-2.txt
let's see what
comm can tell us with vanilla invocations, which will produce three columns of output:
lines unique to the first file
lines unique to the second file
lines that appear in both files
$ comm roci-crew-manifest-1.txt roci-crew-manifest-2.txt Burton Draper Holden Johnson Kamal Miller Nagata
$ comm roci-crew-manifest-2.txt roci-crew-manifest-1.txt Burton Draper Holden Johnson Kamal Miller Nagata
While that lets us eyeball the common/unique entries, I wouldn't want to have to parse that if all I needed was just unique or common line info. Thankfully, the makers of
comm didn't either, so you can use various options to get what you want
-1: suppress printing of column 1 (lines unique to the first file).
-2: suppress printing of column 2 (lines unique to the second file).
-3: suppress printing of column 3 (lines that appear in both files).
-i: ignore case differences in the input files.
-u: suppress printing of lines that appear in both files.
My most comm-on usage pattern is to find the unique entries:
$ comm -23 roci-crew-manifest-1.txt roci-crew-manifest-2.txt Draper Johnson Miller
As you can see,
comm is pretty handy to have around.
Databases are great! We even have lightweight and lightning fast ones like
duckdb which can help make quick work of everyday data tasks. But, they aren't listed on the Monroney sticker of the standard equipment package of most operating systems. What's more, you need to shove data into databases to perform the operations. Still, databases give us powerful operations, such as the ability to SQL join two tables by one or more fields.
But, we don't necessarily need to use a full-on database to perform a join task thanks to the spot-on-uncreatively named join utility.
join is primarily used to merge two or more files on a common field or key, in similar fashion to the aforementioned SQL join operation.And, unlike the lazy
join will take care of sorting your files if you forget to do that on your own.
Folks usually use
join with options, and the options vary by operating system, so we'll just focus on some common ones:
-1 FIELD: join on this FIELD of file 1
-2 FIELD: join on this FIELD of file 2
-e MISSING: specifies the string to use for missing fields in the output.
-i: performs a case-insensitive join.
-t CHAR: specifies the field delimiter character.
Absolutely do a
man join on your operating system, since the version that comes with, say, Debian-esque systems has some very useful extra options.
Examples > blatherings.
Rocinante,class1,book1 Canterbury,class3,book1 Razorback,class2,book1 Barbapiccola,class4,book4 Defiant,,
class3,Water Hauler class1,Corvette class2,Racing Pinnace class4,Freighter
book1,Leviathan Wakes book2,Caliban's War book3,Abaddon's Gate book4,Cibola Burn book5,Nemesis Games book6,Babylon's Ashes book7,Persepolis Rising book8,Tiamat's Wrath book9,Leviathan Falls
Add the full name for the ship class, using
NA for missing fields:
$ join -t, -a 1 -e NA -1 2 -2 1 ships.db classes.db class1,Rocinante,book1,Corvette class3,Canterbury,book1,Water Hauler class2,Razorback,book1,Racing Pinnace class4,Barbapiccola,book4,Freighter NA,Defiant,NA
See the book a ship first appeared in, omitting ones that aren't in the Expanse series:
$ join -t, -1 1 -2 3 books.db ships.db book1,Leviathan Wakes,Rocinante,class1 book1,Leviathan Wakes,Canterbury,class3 book1,Leviathan Wakes,Razorback,class2 book4,Cibola Burn,Barbapiccola,class4
It is a bit hard to justify using
join when you can do so much more with
duckdb, but it is comforting knowing you can still do basic data ops on foreign systems without your fav enhanced tools around.
This is a FOSDEM 2023 Daily Drop featured talk.
Sylvestre Ledru presented “Reimplementing the Coreutils in a modern language (Rust): Doing old things with modern tools” at FOSDEM 2023. The talk title is pretty self-explanatory.
This re-imagining project (under the
uutils moniker) can be found over at GitHub.
The goal is to make Coreutils work on as many platforms as possible, to help ensure, for example, that scripts can be easily transferred between platforms. Rust was chosen not only because it is fast and safe, but is also excellent for writing cross-platform code.
You can try it out right now, if you have a local Rust installation, since it's mostly feature-complete. Just clone the repo and do:
$ cargo build --release
That will build the most portable common core set of
uutils into a multicall binary, named
coreutils, on most Rust-supported platforms.
A multicall binary is an executable that performs the action of more than one utility. Multicall binaries take advantage of a number of operating system features — including ISO-IEC 9899 220.127.116.11.1 (page 24; direct PDF) — that make it possible for a user of a system to not even know that the programs they are running are all, in fact, the same file.
Linux and macOS folks have a bunch of multicall executables on their systems right now. One pair is
bzdiff which compare bzip2 compressed files. The former will accept
cmp (another file comparison utility) options and the latter will accept
diff options. You should be able to do the following to prove they're the same:
$ find -L /usr/bin -samefile bzdiff /usr/bin/bzdiff /usr/bin/bzcmp
samefile options are used to discover hard and soft links to a file. When the linked binary executes, it determines the name it was called under, and then picks the operations to used based on it.
Check out the repo to see how to only build in some Coreutils utilities into the resultant executable, or how to build them each as standalone utilities.
What other, generally underused LOLbins are in your daily arsenal? ☮
It was super hard to type that without bursting into laughter, as most organizations couldn’t detect a meteorite if it landed right on their headquarters.
I’m here til Thursday! Try the veal and make sure to tip the waitstaff and bartender.