Drop #257 (2023-05-09): Advanced Encoding, Simple Markup, And Web Integrity
basE91; smu; Web Environment Integrity
Thankfully, there are no (known) security holes in any of the resources in today's Drop, though I will wax opinionated about whether you can truly trust the intentions espoused in the third section.
basE91
Standards are great, until they're not. What I mean by that is that humans ultimately control what “standards” are, and we humans are a cagey, self-centered lot, with tons of biases, and — ultimately — finite attention spans and awareness thresholds. As a result, we're stuck in the mud with a 15% less efficient message encoding mechanism thanks to the paid-off
purveyors of RFC 2045.We can, and should, be better than our basest selves!
Bombastic introduction aside, chances are you've never heard of base 91 encoding, which uses all ASCII characters except “-” (0x2D), “" (0x5C), and “'" (0x27).
) I hadn't, either, until stumbling across this WASM library for it.Base 91 works by dividing the binary data into groups of 13 bits, which can represent any value between 0 and 8191. These 13-bit values are then mapped to one of 91 ASCII characters using a lookup table. The resulting string of ASCII characters is the base 91-encoded representation of the original binary data.
The lookup table (ref: section header) used in base 91 is designed to ensure that the encoded data does not contain any characters that may cause problems when transmitted over certain protocols or stored in certain file formats. For example, the lookup table excludes characters that may be interpreted as line breaks, carriage returns, or other special characters.
We use ASCII encodings of things to make them “safe-r” to transport over woefully outdated message exchange systems like email. You'd think we'd want these systems to be as efficient as possible, but you'd then, also, be very, very wrong.
Sure, the efficiency gains of base 91 generally are seen in messages with higher byte counts. And, virtually nobody, save monsters, attempted to send gigabyte email attachments back when RFC 2045 was being cooked up. But, even on the smaller side of the email attachment size distribution, you can even see the difference with your own eyes. Take this base 64 (top) and base 91 (bottom) encoding of some totally random
text:Ikxpc3RlbiwgYnVkLCIgc2FpZCBGb3JkLCAiaWYgSSBoYWQgb25lIEFsdGFpcmlhbiBkb2xsYXIg
Zm9yIGV2ZXJ5IHRpbWUgSSBoZWFyZCBvbmUgYml0IG9mIHRoZSBVbml2ZXJzZSBsb29rIGF0IGFu
b3RoZXIgYml0IG9mIHRoZSBVbml2ZXJzZSBhbmQgc2F5ICdUaGF0J3MgdGVycmlibGUnIEkgd291
bGRuJ3QgYmUgc2l0dGluZyBoZXJlIGxpa2UgYSBsZW1vbiBsb29raW5nIGZvciBhIGdpbi7igJ0K
Mi+<[["@/HAwPy3Pk"cQFlJB!/r@%*7F>5tE%Z>WzI,WT[;R1B+xYi.z+f5=TX1Ts!,alLdrFKL,
&eL%9tgQrmKBnGV/=HKUztk.8j~FYJ;WSk#F]4pE`dwa^gp@8e#F$&7mkLe5gZ5={Yi#@+_XFlZB
,2,Wzv;R%dJQ6p)z7g,W]@7R^,wPmL#49f=[(R$F!0wbHl<in<3W3B|%O;+x:m#4>vb,1RxS,/N.
hk>W_1f,U)x#509MjLXPM=8=U)2$I9N.hkVR/2:WMCN#Z%xYdt>dA
Your own, human, eyes can even see that there are ~20 fewer characters in the bestest ASCII binary-to-text encoding scheme ever invented!
The main base 91 link has source for C, Java, PHP, 8086 assembly, and AWK versions of base 91 tooling, and there are Rust crates and Golang modules for you to play with. R folks can use
this decoder I hacked together:base91_decode <- function(input_string) {
b <- "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!#$%&()*+,./:;<=>?@[]^_`{|}~\""
d <- rep(-1, 256)
for (i in 1:91) {
d[as.integer(charToRaw(substr(b, i, i))[[1]])] <- i - 1
}
b <- 0
n <- 0
v <- -1
output <- raw()
for (c in strsplit(input_string, "")[[1]]) {
index <- d[as.integer(charToRaw(c)[[1]])]
if (index < 0) next
if (v < 0) {
v <- index
} else {
v <- v + index * 91
b <- b + v * 2^n
n <- n + ifelse(v %% 8192 > 88, 13, 14)
while (n > 7) {
b <- b - (tmp_c <- b %% 256)
output <- c(output, as.raw(tmp_c))
b <- b / 256
n <- n - 8
}
v <- -1
}
}
if (v >= 0) output <- c(output, as.raw(b + v * 2^n))
rawToChar(output, multiple = FALSE)
}
as such:
paste0(
readLines(
con = "/path/to/some/base/91/encoded/file.txt"
),
collapse = ""
)
smu
As wisely heralded by the renowned “Another spec on the [doc] wall”:
We don't need more markdown fomats
We don't need no more doc control
No more syntax to memorize
Developers: Leave our md's alone
Except that, of course we do!
Just like we humans seem to need to birth a new text editor every 24–48 hours, we're heck bent on “improving” upon established formats, such as markdown, and the tooling that processes said formats.
One such creation is smu, which is (stealing gobs of text from the SourceHut repo) a fork of the original smu by Enno Boland (gottox).
The main differences to the original smu are:
Support for code fences
Improved CommonMark compatibility. E.g.
Code blocks need four spaces indentation instead of three
Skip empty lines at end of code blocks
Ignore single spaces around code spans
Keep HTML comments in output
Improved spec compliance for lists
Nesting code blocks in blockquotes works
“Empty” lines in lists behave identically, no matter how much whitespace they contain
No backslash escapes in code blocks
Use the first number as the start number for ordered lists
Added a simple test suite to check for compliance and avoid regressions
The format differs from CommonMark in the following ways:
No support for reference style links
Stricter indentation rules for lists
Lists don't end paragraphs by themselves (blank line needed)
Horizontal rules (
<hr>
) must use- - -
as syntaxCode fences have stricter syntax
You're one:
git clone https://git.sr.ht/~bt/smu && cd smu && make
away from using smu.
We're covering smu today, so I can cover something that uses smu in an upcoming Drop.
Web Environment Integrity
The internet is a dangerous place, but the most dangerous part is very likely the computing environment of the glowing rectangle you're reading this Drop on. We do all sorts of terrible things to our systems and browsing environments. Many of us do this unintentionally. However, there is a sizable crowd of mischievous malcontents who do truly terrible things to their compute environments to win games, hack sites, and cheat in many other ways. An honest website purveyor has almost no way to tell if their content is being executed in a safe space.
The Web Environment Integrity API is a proposed “solution” to the problem. It allows websites to request a token that attests key facts about the environment their client code is running in. For example, it could show a target site that a user is operating a web client on a secure Android device
, and tampering with the attestation will be prevented by signing the tokens cryptographically.The trust relationship between websites and clients is frequently established through the collection and interpretation of highly re-identifiable information. Unfortunately, the signals that are considered essential for these safety use cases can also serve as a near-unique fingerprint that can be used to track users across sites without their knowledge or control. The proposed API aims to address these use cases with better privacy-respecting properties.
The proposal calls for at least the following information in the signed attestation:
the attester's identity
a verdict saying whether the attester considers the device trustworthy
the platform identity of the application that requested the attestation.
and discusses challenges and threats to address, such as quality of attesters, tracking users' browser history, and cross-site tracking. And, the document concludes by asking for feedback from the community group on the idea of a holdback and alternative suggestions that would allow both goals to be met.
This is very likely going to become a standard, and the four folks who put forth this proposal are Google employees. Let me be up front and apologize (on so many levels) if you, dear reader, work for Google. My default assumption is that if you work for Google, you're part of the problem (or, at least, aren't really concerned with your fellow humans as much as you may think or say you do). That may seem unfair, but I also hold the same default assumption if you work for Twitter and a number of other companies. I grok you may be handcuffed to said job. I also grok you may be trying to affect change by working at one of these orgs. That’s why my assumption is a default and not a permanent, hard-coded setting.
I don't know any of these four folks personally. So, I'm going to suggest that the real intent of this standard is an attempt to whitewash a power grab by Google
to grasp more control over you, and gain even more ability to track what you do. Please feel encouraged to serve up a slice of humble pie if this does make it into Chrome/Chromium in a totally benign way and never gets abused.FIN
Longer-time readers may recall a post about WASM-ified GraphViz back in August. The API changed and broke the example I threw together since I did not version pin it. Said example has now been un-broked. ☮
totally made this up
mimes do ruin everything, don't they?
one variant uses “" (0x5C) in place of “"” (0x22)
do not even attempt to decode them. Who knows what will happen if you do!
again, under no circumstances should you run that on the above encoded bits.
there is no such thing as a secure android device
and, likely Microsoft since they use Chromium in Edge