

Discover more from hrbrmstr's Daily Drop
Drop #214 (2023-03-07): Underused LOLbins And Oxidized Multicall Bins
comm; join; Re-imagining Coreutils
The phrase “Living off the land” was coined by Christopher Campbell and Matt Graeber at DerbyCon 3, one of far, far too many cybersecurity conferences, over nine years ago. The phrase refers to an attacker technique which involves primarily using only tools and information available on a given system/device to achieve their nefarious goals.
It can be risky for attackers to download custom toolkits. Such activities generate extra/anomalous internet traffic/system activity, which may be detected by various security tools.
The term “LOLBins” specifically refers to “living off the land binaries” and was coined by Philip Goh in a Twitter discussion back in 2018. These are local executables that come along for the ride with operating system or application installs.While the technique and term are primarily used in contexts involving malicious activity, there are times when data scientists, researchers, or the average user might want to use OS utilities and features that are part of the system default installation. Why create a dependency on Python or R when you can accomplish some given task that performs equally well with tools that are guaranteed to be available and portable?
Today, we cover two often overlooked LOLBin utilities that are “batteries included” on any modern, useful operating system (so, that excludes all of Microsoft's pane-ful OSes)
We then take a peek at a modern re-imagining of coreutils — basic file, shell, and text manipulation utilities originally developed as part of the GNU operating system. These (mostly) come along for the ride on macOS and Linux. The name implies that these are “core utilities” which are expected to exist on every operating system.
comm
Often, it is necessary to compare files or directories, and identify the differences between them. One might even say this is a (…wait for it…) very comm-on task.
Now, you absolutely know about diff
, the most overused member of the Diffverse; and, if you work collaboratively with a decent sized team, there's a good chance you've dug deep into git and even used diff3
. But, when's the last time you used comm?
The comm
(short for “common”) utility is a tool that is used to compare two sorted files line by line, and display the differences or similarities between them.
By default, comm
displays three columns of output:
lines that are only in the first file
lines that are only in the second file
lines that are common to both files
Before we look at an example, one YUGEly important thing to remember when using comm
is that both files need to be sorted before using it.
Say we've got two files:
roci-crew-manifest-1.txt
:
Holden
Burton
Nagata
Kamal
Draper
Johnson
Miller
roci-crew-manifest-1.txt
:
Holden
Burton
Nagata
Kamal
After ensuring they're both sorted:
$ sort -o roci-crew-manifest-1.txt roci-crew-manifest-1.txt
$ sort -o roci-crew-manifest-2.txt roci-crew-manifest-2.txt
let's see what comm
can tell us with vanilla invocations, which will produce three columns of output:
lines unique to the first file
lines unique to the second file
lines that appear in both files
$ comm roci-crew-manifest-1.txt roci-crew-manifest-2.txt
Burton
Draper
Holden
Johnson
Kamal
Miller
Nagata
$ comm roci-crew-manifest-2.txt roci-crew-manifest-1.txt
Burton
Draper
Holden
Johnson
Kamal
Miller
Nagata
While that lets us eyeball the common/unique entries, I wouldn't want to have to parse that if all I needed was just unique or common line info. Thankfully, the makers of comm
didn't either, so you can use various options to get what you want
-1
: suppress printing of column 1 (lines unique to the first file).-2
: suppress printing of column 2 (lines unique to the second file).-3
: suppress printing of column 3 (lines that appear in both files).-i
: ignore case differences in the input files.-u
: suppress printing of lines that appear in both files.
My most comm-on usage pattern is to find the unique entries:
$ comm -23 roci-crew-manifest-1.txt roci-crew-manifest-2.txt
Draper
Johnson
Miller
As you can see, comm
is pretty handy to have around.
join
Databases are great! We even have lightweight and lightning fast ones like sqlite
and duckdb
which can help make quick work of everyday data tasks. But, they aren't listed on the Monroney sticker of the standard equipment package of most operating systems. What's more, you need to shove data into databases to perform the operations. Still, databases give us powerful operations, such as the ability to SQL join two tables by one or more fields.
But, we don't necessarily need to use a full-on database to perform a join task thanks to the spot-on-uncreatively named join utility.
join
is primarily used to merge two or more files on a common field or key, in similar fashion to the aforementioned SQL join operation.And, unlike the lazy comm
utility, join
will take care of sorting your files if you forget to do that on your own.
Folks usually use join
with options, and the options vary by operating system, so we'll just focus on some common ones:
-1 FIELD
: join on this FIELD of file 1-2 FIELD
: join on this FIELD of file 2-e MISSING
: specifies the string to use for missing fields in the output.-i
: performs a case-insensitive join.-t CHAR
: specifies the field delimiter character.
Absolutely do a man join
on your operating system, since the version that comes with, say, Debian-esque systems has some very useful extra options.
Examples > blatherings.
ships.db
Rocinante,class1,book1
Canterbury,class3,book1
Razorback,class2,book1
Barbapiccola,class4,book4
Defiant,,
classes.db
class3,Water Hauler
class1,Corvette
class2,Racing Pinnace
class4,Freighter
books.db
book1,Leviathan Wakes
book2,Caliban's War
book3,Abaddon's Gate
book4,Cibola Burn
book5,Nemesis Games
book6,Babylon's Ashes
book7,Persepolis Rising
book8,Tiamat's Wrath
book9,Leviathan Falls
Add the full name for the ship class, using NA
for missing fields:
$ join -t, -a 1 -e NA -1 2 -2 1 ships.db classes.db
class1,Rocinante,book1,Corvette
class3,Canterbury,book1,Water Hauler
class2,Razorback,book1,Racing Pinnace
class4,Barbapiccola,book4,Freighter
NA,Defiant,NA
See the book a ship first appeared in, omitting ones that aren't in the Expanse series:
$ join -t, -1 1 -2 3 books.db ships.db
book1,Leviathan Wakes,Rocinante,class1
book1,Leviathan Wakes,Canterbury,class3
book1,Leviathan Wakes,Razorback,class2
book4,Cibola Burn,Barbapiccola,class4
It is a bit hard to justify using join
when you can do so much more with sqlite
or duckdb
, but it is comforting knowing you can still do basic data ops on foreign systems without your fav enhanced tools around.
Reimagining Coreutils
This is a FOSDEM 2023 Daily Drop featured talk.
Sylvestre Ledru presented “Reimplementing the Coreutils in a modern language (Rust): Doing old things with modern tools” at FOSDEM 2023. The talk title is pretty self-explanatory.
This re-imagining project (under the uutils
moniker) can be found over at GitHub.
The goal is to make Coreutils work on as many platforms as possible, to help ensure, for example, that scripts can be easily transferred between platforms. Rust was chosen not only because it is fast and safe, but is also excellent for writing cross-platform code.
You can try it out right now, if you have a local Rust installation, since it's mostly feature-complete. Just clone the repo and do:
$ cargo build --release
That will build the most portable common core set of uutils
into a multicall binary, named coreutils
, on most Rust-supported platforms.
A multicall binary is an executable that performs the action of more than one utility. Multicall binaries take advantage of a number of operating system features — including ISO-IEC 9899 5.1.2.2.1 (page 24; direct PDF) — that make it possible for a user of a system to not even know that the programs they are running are all, in fact, the same file.
Linux and macOS folks have a bunch of multicall executables on their systems right now. One pair is bzcmp
and bzdiff
which compare bzip2 compressed files. The former will accept cmp
(another file comparison utility) options and the latter will accept diff
options. You should be able to do the following to prove they're the same:
$ find -L /usr/bin -samefile bzdiff
/usr/bin/bzdiff
/usr/bin/bzcmp
The L
and samefile
options are used to discover hard and soft links to a file. When the linked binary executes, it determines the name it was called under, and then picks the operations to used based on it.
Check out the repo to see how to only build in some Coreutils utilities into the resultant executable, or how to build them each as standalone utilities.
FIN
What other, generally underused LOLbins are in your daily arsenal? ☮
It was super hard to type that without bursting into laughter, as most organizations couldn’t detect a meteorite if it landed right on their headquarters.
I’m here til Thursday! Try the veal and make sure to tip the waitstaff and bartender.