Discover more from hrbrmstr's Daily Drop
Drop #348 (2023-10-06): Weekend Project Edition
A common idiom for projects that end up facing a web audience is for the content being presented to be searchable in some way, shape, or form.
For some contexts, the browser's handy
Edit⇒Search/Find feature may work perfectly. Other contexts may require more heavy-lifting on the server-side of things, but introducing another component into the mix adds both complexity and one or more additional failure modes.
With even the most resource-constrained, modern mobile device having a fair amount of memory, and baseline bandwidth offerings being quite fast enough for large text blob downloading, more search ops are being thrust fully into browser-land.
Now, there are scads of in-browser search libraries. Some are even quite mature. But, with said maturity comes a fair amount of complexity, or — at least — many opportunities for either decision paralysis or just getting lost in API docs.
If your (data) search needs are not on the yuge side, the library we'll be using in today's setup project could just be the one you default to when building a new data+search-centric app/site.
Some key features include:
a memory-efficient index, designed to support memory-constrained use cases like mobile browsers
exact match, prefix search, fuzzy match, and field boosting configuration options
an auto-suggestion engine, for auto-completion of search queries
a modern search result ranking algorithm.
the ability to add/remove documents from the index at any time
MiniSearch uses a combination of algorithms and techniques to provide its full-text search capabilities. One of the key algorithms it uses is the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm. TF-IDF is a numerical statistic that reflects how important a word is to a document in a collection or corpus. It is often used in information retrieval and text mining.
The TF-IDF algorithm works by increasing proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.
In addition to TF-IDF, MiniSearch also uses prefix search and fuzzy search techniques. Prefix search allows for searching of terms that start with the given prefix, which is useful for auto-completion of search queries. Fuzzy search allows for finding terms that are similar to the given term, which is useful for handling typos or variations in spelling.
Finally, MiniSearch uses a ranking algorithm to order the search results.
You can (and, should!) get further information about it right from the author (Luca Ongaro) before continuing.
We're going to take a look at a small, starter web app that lets folks search what should be a very familiar dataset to readers by now: CISA's Known Exploited Vulnerabilities catalog.
On load, the app will:
fetch the JSON
setup MiniSearch and index selected fields
have a search box that reacts to keystrokes and does live search-as-you-type thing
as it does ^^, the resultset will display/change in real time.
I've got two different versions of it:
This minimally styled one which uses Vite, Tachyons (CSS), and Lit, with the project broken up into separate files.
These are what they look like:
For the rest of the WPE breakdown, I'll need to shunt you over to the companion post.
You can also hit the GitLab repo where the code lives. ☮️