YaCy; SIMD Sort; DNS Toys
I've harped on privacy for the past couple editions, and it occurred to me that one reason we're in a privacy pickle is that, at some point, "the internet" transitioned from a place where an inordinate number of individuals experimented with building and hosting their own infrastructure/services to having millions of users — myself included — use centralized services. I used to run my own mail server, RSS reader, calendar server, chat server, and more, but don't anymore, mostly out of convenience. Sure, there are folks who still run their own infrastructure/services, but the fraction is much smaller than it used to be.
As noted in the previous edition, one service that is particularly problematic for privacy is the search engine, especially when they're run by the likes of evil corporations like Microsoft and Google, both of whom also have their own browsers and tracker bits on more websites than one can count. As a result, both of them know more about you than you can really imagine. But, we can't run our own search engines anymore, can we? The web is just too big, isn't it?
Meet YaCy (GH), an open-source personal (or peer-to-peer) search engine that you can run on your own system (including desktops/laptops). It's Java-based, and has a Docker version, and it's easy to set up, and I seldom use the term "easy" lightly, and didn't here as well. I had a YaCy node (
hrbrsrch is my node ID if you set one of your own up) up in about 90 seconds (it only took that long due to the YaCy download being slower than expected) and figured out how to index my
rud.is web site shortly thereafter (there's a big "crawler" button that takes you to a page with a big "enter address" input box). It quickly did said crawl:
You can just run and search on your own server for ultimate privacy, or use the peer-to-peer option to search the distributed hash table (DHT). A DHT is (lifted from Wikipedia "a class of a decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption."
Their FAQ should answer most of your questions about how YaCy works, privacy protections built into the peer-to-peer component, and more. There's also a paper [PDF] on YaCy if you want to know more about the inner-workings.
I'll report back on YaCy in future editions as I work with it more and figure out how to get a certificate on it.
One thing to be cautious about: while YaCy's examples have you hitting
localhost, it's binds to all addresses, so anyone can see your open port
8090, and if you run it on your desktop/laptop they have instructions about how to poke a port hole in your router (which you may or may not want to do). Make sure to set the password on it and/or read about how to change that option. This "security" stuff is one big reason folks tend to just use centralized services and give up their privacy.
When one sees a sentence like:
"Today we're sharing open source code that can sort arrays of numbers about ten times as fast as the C++ std::sort, and outperforms state of the art architecture-specific algorithms, while being portable across all modern CPU architectures"
one cannot help but be curious, especially if one has been programming computers their entire adult life.
This week, Google released an open source SIMD vectorized and performance-portable Quicksort implementation, promising generalized performance of sorting at 1 GB/s on a single CPU core.
The linked blog post explains things well enough that I can avoid taking up storage and bandwidth withy any further expository.
I've always been a sucker for [ab]using DNS to do other things besides return answers to traditional lookups. So, when I heard about DNS Toys I had to both take a look and drop a link in this edition.
The website is all the documentation you need, but I'll whet your lookup appetite with a demonstration:
$ dig +short help @dns.toys "get time for a city" "dig mumbai.time @dns.toys" "convert currency rates" "dig 99USD-INR.fx @dns.toys" "get your host's requesting IP." "dig ip @dns.toys" "get weather forestcast for a city." "dig berlin.weather @dns.toys" "convert between units." "dig 42km-cm.unit @dns.toys" "convert numbers to words." "dig 123456.words @dns.toys"
Now, if you're at a command line and want to know the weather, your fingers don't have to leave the keyboard.
DNS is a great way to bypass corporate firewall restrictions, too. Just ask any decent attacker.
Lots of toys to play with in today's edition! ☮