Discover more from hrbrmstr's Daily Drop
Drop #335 (2023-09-14): The Commands We [Deliberately] Left Behind
task-spooler; nq; pueue
Despite all of us essentially having supercomputers adorning our wrists, shoved in our pockets, and sitting on our desks/laps, far too many things that execute on them take time.
You know what I mean.
cargo install you thought would be quick, but stalls your active terminal session much longer than you had anticipated.
./configure && make -j8 && make install you were hoping would take mere seconds just continues to scroll, and scroll, and scroll text into your terminal's output history, seemingly with no end in sight.
That Amazon Athena query you dreaded incanting, knowing full well that Jeff Bezos seems to hate you on an oddly personal level, given how long you always have to wait for it to finish.
In these situations, one could have the foresight to use
tmux, or can just pop open another terminal window/tab, but what if you need to do something immediately after and dependent upon those time-consuming operations?
Those with even greater foresight might toss a script together, but I assert that most of us aren't that proactive.
Thankfully, we are awash in tooling that lets us channel our inner Brit, and queue up commands with style and ease. We'll look at three of them today.
This is an AI-generated summary of today's Drop.
Perplexity messed up one URL, again. Sigh.
Here is a concise three bullet summary of the blog post:
Task Spooler: A simple Unix batch system that allows users to manage a per-user task queue, monitor the queue, and view task results. It is particularly useful for running multiple commands without waiting for one to finish before starting the next. GitHub
nq: A set of three small utilities for creating lightweight job queue systems without setup or maintenance. It is designed for ad-hoc queuing of command lines and can be used as an alternative to
nohupfor running processes in the background. GitHub
Pueue: A Rust-based command-line task management tool for sequential and parallel execution of long-running tasks. It allows users to process a queue of shell commands and offers several convenient features and abstractions. Pueue is not bound to any terminal, so tasks can be controlled from any terminal on the same machine. GitHub
Task Spooler (GH) is a modern replacement for a utility originally written by Lluis Batlle i Rossell. It is a simple Unix batch system that enables us to manage a per-user task queue. We can add commands to the queue, monitor the queue at any time, and view the task results, including standard output and exit errors. Task Spooler is particularly useful when you have to run multiple commands, but don't want to wait for one command to finish before running the next one. You can queue them all up, and Task Spooler will execute them one by one.
Unlike its predecessor, this modern version knows about GPUs, so it is especially handy for GPU-centric workloads (e.g., machinating models or hacking hashes).
The project has solid installation instructions; the above-linked homepage (do we still call them homepages?) does a fine job describing the basics; and, there's an excellent walkthru using a deep learning example, we'll just include a small/basic “how to use it”, here.
To add a command to the queue, simply run
tsp followed by the command you want to execute:
$ tsp your_command
To view the current task queue, run
tsp without any arguments:
To view the live output of a running job, use the
-t option followed by the job ID:
$ tsp -t job_id
To remove a job from the queue, use the
-r option followed by the job ID:
$ tsp -r job_id
Tis incredibly lightweight and runs pretty much everywhere.
nq is a set of three small utilities that allow you to create very lightweight job queue systems with no setup, maintenance, supervision, or long-running processes. Developed by Leah Neukirchen, It is designed to run on any POSIX.1-2008 compliant system that also provides a working
The repo just drops a mention of that
flock requirement without explanation. You can
man flock or hit up the manual page online for all the deets, but
flock is a function that applies or removes an advisory lock on the file associated with a given file descriptor. Essentially, these locks are a mechanism that allows processes to request exclusive access to a file or portion of a file. They are called “advisory” because they do not enforce mutual exclusion; rather, they rely on cooperation between processes to avoid conflicts.
Advisory locks are used in the
nq suite to coordinate access to files among multiple processes. These utilities are designed to run concurrently and need a way to prevent race conditions and ensure consistent results. Advisory locks allow them to achieve this without resorting to mandatory locks, which could potentially block other processes unnecessarily.
nq, advisory locks are typically acquired through the
nq command, which is used to create and manage named queues. Each queue is represented by a file on disk, and the
nq command uses advisory locks to control access to those files. For example, when a producer wants to add a new item to a queue, it must first acquire a write lock on the corresponding file. Once it has the lock, it can safely append the new item to the end of the queue. Similarly, consumers must acquire a read lock on the file before they can start processing items from the queue. By using advisory locks in this way, the
nq Suite ensures that producers and consumers can operate independently without interfering with each other.
As the project's README notes well (this is stolen from it):
<begin-content-grifting/> the intended purpose of
nq is ad-hoc queuing of command lines (e.g., for building several targets of a Makefile, downloading multiple files one at a time, running benchmarks in several configurations, or simply as a glorified
nohup). But as any good Unix tool, it can be abused for whatever you like.
Job order is enforced by a timestamp
nq gets immediately when started. Synchronization happens on file-system level. Timer resolution is milliseconds. No sub-second file system time stamps are required. Polling is not used. Exclusive execution is maintained strictly.
Enforcing job order works like this:
every job has a flock'ed output file, ala
every job starts only after all earlier flock(2)ed files are unlocked
To demonstrate how
nq works, let's consider a simple example. Suppose we have a script uncreatively named
process_data.sh that processes a large dataset — perhaps with the new AWK! — and takes a long time to complete. You want to run this script multiple times with different input files, but you don't want to run them all at once to avoid overwhelming your system.
nq, you can easily queue these tasks:
$ nq process_data.sh input1.txt $ nq process_data.sh input2.txt $ nq process_data.sh input3.txt
These commands will add the tasks to the queue. To start processing the tasks in the queue, you can use the
fq will run the tasks one by one in the order they were added to the queue. If you want to check the status of the tasks, you can use the
nq command with the
$ nq -s
This command will show you the status of the tasks in the queue, including whether they are running, waiting, or completed.
It, like Task Spooler (in section one) runs pretty much everywhere and incredibly lightweight. It makes for an especially handy alternative to
nohup for running processes in the background.
The first two utilities are C-based, so we gotta get some Rust in to avoid overflowing our buffers.
Pueue is a Rust-based command-line task management tool designed for sequential and parallel execution of long-running tasks. It enables us to process a queue of shell commands and offers several convenient features and abstractions. Since
pueue is not bound to any terminal, we can control our tasks from any terminal on the same machine, and the queue will be continuously processed even if you no longer have any active ssh sessions.
Like the other two tools mentioned today,
pueue addresses the need for a simple and efficient way to manage and execute long-running tasks on the command line. It solves for:
needing to execute tasks sequentially or in parallel.
managing tasks from any terminal on the same machine.
ensuring that tasks continue to run even after disconnecting from an ssh session.
One added bit of complexity with
pueue is that it does need a running daemon.
The official instructions show how to get that set up on *nix, but macOS folks should just
brew install pueue and follow the instructions it provides to start it as a service.
We add tasks to the queue using the
pueue add command:
$ pueue add -- "your_command_here"
Note that the
-- is only needed if you want to supply arguments to
your_command_here, but I find it easier to just always use it.
We can check the status of our tasks with the
pueue status command:
$ pueue status
And we can also pause, resume, or remove tasks from the queue using the respective commands:
$ pueue pause <task_id> $ pueue resume <task_id> $ pueue remove <task_id>
We can also control
pueued (the server component) remotely, but it might just be easier to use ssh to do any orchestration.
I strongly suggest hitting up the official wiki, and reading it cover-to-cover if you intend to give
pueue a go. It’s filled to the brim with highly useful information.
Now, if only there was a
pueue for IRL task management. ☮
hrbrmstr's Daily Drop is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.