Drop #259 (2023-05-11): Source Search
Code Search Guide; hound; searchcode; Bonus: Your New Personal jq Library
Programming note: TIL that the Deno TUI library I'm using for the Weatherflow app I mentioned yesterday does not re-paint the entire screen on field updates. This made the already janky TUI even more janky in the browser. Apologies to the ~20ish folks who checked it out!
Keeping on the topic of seashells, I robustified the server code a tad and added some more configuration variables, plus enhanced the README a bit.
Finally: I let Substack put a “Notes tab” on the home page, since I do occasionally drop resources throughout the week there.
You may be thinking, “why cover 'code search' when GitHub recently-ish opened up their new groovy code search for everyone”? Well, as you should know by now, I'm #notafan of GH. Plus, GH is not the be-all, end-all of code hosting platforms. And, lots of us have our own local/private internal directories and repos full of useful code that needs searching.
So, today we democratize code search a bit with a guide, a tool you can run, and a tool you can use. Plus toss in something I keep forgetting to mention, but got a reminder of due to the second section.
Code Search Guide
Sourcegraph is a code intelligence platform that helps developers search, navigate, and understand code across their personal/organization codebase. We're not covering them today because they're fairly well-known. So, why the namedrop? Well, the Sourcegraph team are serious experts in this space. When serious experts choose to make time to share knowledge, it seems pretty daft not hoover up as much of said knowledge as possible.
These experts have created a lovely guide to code search that is not just a sham self-promotion document. It contains information about “code search from around the internet and from interviews with devs who use code search”. This is how they define “code search”:
A tool that’s separate from, but integrated with, your code host and editor that lets you search, navigate, and understand all code. Usually this means your own code, your organization’s code, the code of projects you depend on and that depend on your code, and any other code that’s relevant to you.
At the start, they help us try to grok “why codesearch” and note that it can help with being able to form a clear mental model of the overall system, plus knowing:
where the code is or being able to find it
what dependencies there are on the code (i.e., what would be the impact of changing the code)
what the code depends on (i.e., internal/external libraries used)
why the code was written as such
They dig into four use cases for code search:
Further, they provide examples of how code search is used in some organizations you all know the names of.
Penultimately, they showcase many other code search tools besides their own (which is where I found the tool we're covering in the next section).
Finally, they give you a launchpad to go forth and learn more about code search.
Keep it in your bookmarks/Arc favs since it's a great reference on the topic.
hound
Ripped right from the README: Hound is “an extremely fast source code search engine. The core is based on this article (and code) from Russ Cox: Regular Expression Matching with a Trigram Index. Hound itself is a static React frontend that talks to a Go backend. The backend keeps an up-to-date index for each repository and answers searches through a minimal API.”
I recommend building the Golang binaries (it comes with a CLI search tool — hound
— and the houndd
server) vs. trying the Docker route, as the Docker route failed me with an error that looks like the container has some missing files.
You'll need a config file that lists all the remote repos and local repos/directories you want to include in the search. To help y'all out, I made a small shell script that will bootstrap the config file with all your GH repos:
#!/bin/bash
HOUND_DIR="${HOME}/Data/hound"
mkdir -p "${HOUND_DIR}"
cd "${HOUND_DIR}" || exit
GH_USER=$(gh api user --jq '.login')
REPOS=$(gh api --paginate users/"${GH_USER}"/repos --jq '.[] | "\"\(.name)\" : { \"url\" : \"\(.html_url)\" },"')
REPOS="${REPOS%,}"
PRE='{
"max-concurrent-indexers" : 2,
"dbpath" : "db",
"title" : "hrbrsearch",
"health-check-uri" : "/healthz",
"vcs-config" : {
"git": {
"detect-ref" : true
}
},
"repos": {
'
POST='
}
}'
echo "${PRE} ${REPOS} ${POST}" > config.json
WARNING: Mine (~800 repos) eat ~19 GB of space in the hound database, so you might want to start small if your repo count is high.
It's lightweight, super-fast, and pretty straightforward to build/use.
searchcode
Searchcode is a source code search engine that enables us to search for code snippets using various filters such as:
function/method names
constant and variable names
operations
security flaws
usage
special characters
and tons more.
The search results can be filtered down to a specific source or identified language using various refinement options. The estimated cost for any file or project is created using the Basic COCOMO algorithmic software cost estimation model.
Along with the actual code-file result, you get some code-metadata such as code/blank/comment line count, complexity score, and hash.
Say you want to see how to use the {tidyverse} pivot_wider
function from real world use vs. contrived code examples. Just specify the language, add the function name and 💥 https://searchcode.com/?q=lang:r+pivot_wider.
One thing that makes codesearch truly impressive is that it's the work of one dude. I highly suggest carving out some time to read more about how it came to be, and check out his tech blog while you're at it.
Your New JQ Library
As I was crafting the script for the second section, I remembered that I’ve been meaning to mention the -L
option to jq
which lets you point it at a local directory (or file) that contains a “library” of personal jq
functions. This is super useful if you find yourself doing alot of copypasta from snippets or other jq
scripts.
I'm surprised there aren't more collections of generic convenience jq
function libraries out there, but this one from Zoltán Reegn is super handy for changing case.
FIN
Seek and ye shall (usually) find; but, if these tools fail you, you can always ask ChatGPT 😎. ☮