

Discover more from hrbrmstr's Daily Drop
Tokei
Many a tome has been penned on the efficacy of counting "lines of code" ("LoC" from now on) in a project. This is one of them; this is another; as is this. Each is a sample from a different "camp", and those are only three of many camps in this debate.
My first take on LoC is that if you're measuring the worth of an individual developer or team by LoC, you should probably seek a career move, since you're likely a pretty bad manager.
My second take on counting LoC is that I find it useful, especially in a data science context, to understand the complexity and scope of a particular project. I say "especially", since we are in an age where I can have a datasci project that is wrapped up in a Quarto document where I use six or more languages to gather, clean, analyze, and communicate results, and that's not counting things like icky YAML files, and such.
In modern web application projects, HTML, SCSS/CSS, JavaScript/TypeScript, and even Rust→Wasm are often intermingled, and you can use good ol' wc
(the word count command) to grok scope/complexity of a given application. Further, more traditional LoC utilities are generally ill-equipped to account for this language intermingling, leaving you more blind to the scope/complexity than you may imagine.
What can you do to regain some objective visibility? For starters, you can check out Tokei [GH] (docs), a "program that displays statistics about your code." Tokei will show the number of files, total lines within those files and code, comments, and blanks grouped by language. One of the cool features is that Tokei can analyze and count multiple languages embedded in your source code, as well as adding support for [icky] Jupyter Notebooks.
Here's their pitch:
it is very fast, and is able to count millions of lines of code in seconds.
it is accurate, Tokei correctly handles multi line comments, nested comments, and not counting comments that are in strings. Providing an accurate code statistics.
it has huge range of languages, supporting over 150 languages, and their various extensions.
it can output in multiple formats(CBOR, JSON, YAML) allowing Tokei's output to be easily stored, and reused. These can also be reused in tokei combining a previous run's statistics with another set.
it is available on Mac, Linux, and Windows.
it is also a library allowing you to easily integrate it with other projects.
it comes with and without color. Set the env variable NO_COLOR to 1, and it'll be black and white.
This is an out-of-the-box sample output run on my Rust-based Apple WeatherKit CLI app (Substack code block apologies, once more):
===============================================================================
Language Files Lines Code Comments Blanks
===============================================================================
HTML 41 1106 1106 0 0
JavaScript 4 4 4 0 0
Markdown 4 201 0 135 66
Shell 2 6 2 2 2
TOML 1 31 28 1 2
-------------------------------------------------------------------------------
Rust 4 1185 833 12 340
|- Markdown 3 45 0 33 12
(Total) 1230 833 45 352
===============================================================================
Total 56 2533 1973 150 410
===============================================================================
Quarto/R Markdown support isn't great (guess I've got a TODO, eh?), yet, but with the support it has for so many other languages, and a vibrant community backing it, I'm sure it will only get better.
The Tokei docs are great, so there's not much more I can or should drop here, save for the fact that I'm doing my first vow-break to say I'll be making an R package for this as I did for cloc, and that you should check out tokei-pie, a companion project you can use to generate (interactive) images like the one in the one in the section header.
Quadratic
Quadratic [GH] is a relatively new kid on the data science block that feels like a bit like a Frankenstein prouduct of an unholy genetic combination of EtherCalc, Figma, and Jupyter. In the project’s own words, "Quadratic is a data science spreadsheet; an Infinite data grid with Python, JavaScript, and SQL built-in; with Data Connectors to pull your data from SaaS tools and databases.". The section header has a screenshot of some Quadratic cells.
I normally wait until projects have at least went through one EasyBake Oven cycle before dropping them in the newsletter (Quadratic is SUPER alpha), but Spidey-sense is telling me this might be something that helps make data science topics more approachable and usable by the average tool-wielder (and, I'm all for deomcratizing datasci).
Here are Quadratics's MVP goals:
State-of-the-art free form spreadsheet, for everyday use.
Endless grid, in all directions. Pinch, pan, and zoom.
Support for Python, Javascript, Excel Formulas, and SQL. Natively.
Easily add live data with Third-Party Connectors (SaaS tools, DB, etc).
Accessible to everyone + powerful for power users.
Add UI elements to quickly build internal apps that your whole team can use.
Multiplayer, see others mouse movements and keystrokes.
Set Themes and install Extensions to personalize your environment.
Improve the way people do science, finance, math, marketing and more.
Spreadsheets touch so many aspects of our work, and we deserve a better spreadsheet!
Another plus is that their grid file format is just schema'd JSON, so no daft XML or proprietary formats to deal with.
There's a live demo environment you can play in, and the instructions to install it locally work as described on the tin. I'm hoping the drop here inspires a few folks to pile on to the project to keep it moving forward.
xx
The xx utility is for anyone who like to poke at file formats. Self-described as a _"simple text-based file format for creating binary files and data buffers," the author does not do themselves justice with said words. For example…
Substack's platform won't do this justice, so please hit up this gist to see the actual text of this xx
ICMP PCAP packet descriptor in the image, below:
If you run the xx.py
script in the repo on that spec, you'll get a valid ICMP PCAP file you can examine in, say, Wireshark. The section header image shows support for fancy ANSI colors, and you should keep an eye on this twitter search for more examples (though, with a program name of xx
, one never knows what’s going to pop up in there, so caveat 👀er).
If you have need to work or describe things at this low-level layer, xx
should be something you keep in your toolbox.
FIN
Q3 2022 ends this week, folks! I don't know about y'all, but it sure feels like this year has flown by. ☮