

Discover more from hrbrmstr's Daily Drop
2022-05-03.01
On {groundhog}s And Reproducible R Research; Uncurled; Kat's Mastodon Quickstart for Twitter Users
On {groundhog}s And Reproducible R Research
A recent blog post titled, "Groundhog: Addressing The Threat That R Poses To Reproducible Research" appeared on the scene and has quite a stir in part of the R language social sphere.
The article is short, and the fundamental premise is that frequent changes in R package "APIs" can break things.
Yes.
The author's solution is their {groundhog} package (hopefully the irony of them positing using a package solving package-caused reproducibility problems is as "not lost" on you as it is "not lost" on me). In fact, the concept of sustaining reproducibility is large enough to warrant there being a Reproducible Research CRAN Task View with numerous ways to help ensure code re-execution consistency (hit the CTV page for links to each package):
checkpoint
: Allows you to install packages as they existed on CRAN on a specific snapshot date as if you had a CRAN time machine.groundhog
: Make R scripts that rely on packages reproducible, by ensuring that every time a given script is run, the same version of the used packages are loaded.liftr
: Persistent reproducible reporting by containerization of R Markdown documents.miniCRAN
: Makes it possible to create an internally consistent repository consisting of selected packages from CRAN-like repositories.packrat
: Manage the R packages your project depends on in an isolated, portable, and reproducible way.rbundler
: Manages a project-specific library for dependency package installation.renv
: Create and manage project-local R libraries, save the state of these libraries to a ‘lockfile’, and later restore your library as required.Require
: A single key function, ‘Require’ that wraps ‘install.packages’, ‘remotes::install_github’, ‘versions::install.versions’, and ‘base::require’ that allows for reproducible workflows.switchr
: Provides an abstraction for managing, installing, and switching between sets of installed R packages.
The idea of reproducibility is also widely discussed, including:
Keeping science reproducible in a world of custom code and data [Wired]
Reproducible Research and Reports with R [American Association for Clinical Chemistry]
Reproducible Research in R [Monash University]
The issue of reproducibility is not just limited to R packages and R versions. The operating system environment can have an impact on whether you get the same results as the original crafter of the analyses. We can solve this as well (this is a non-exhaustive list but good starting points):
Reproducibility issues are not just limited to R. Python has it (good starting point here), even Matlab has it.
I'm not sure this concept needed a bombastic headline to stir up even more heated emotions/debate than already existed. At least the author provided some links to other resources.
One premise the author makes, which I vehemently disagree with, is that none of the other solution work for plain ol' R scripts (i.e. outside of a project or package). I'd argue that they'd overlooked a resource noted in the above list — {miniCRAN}, which would let you create a local CRAN-like directory along with a bare R script (which I would never suggest actually doing) and then ensure the local "mini-CRAN" is first in the packages search path.
The {groundhog} package was funded R&D by the school the author is affiliated with, and it's fine to have multiple ways to solve problems, but perhaps the next version of {groundhog} can be introduced with more finesse than flair.
Uncurled
It is likely that every single one of you reading this post has used or will use curl
(in some way) today. Curl, and the libcurl
library, power most anything that requires fetching network resources; from toasters, to televisions, to code.
Daniel Stenberg has been working on curl
since he created it and fosters a strong community that continues to make curl
better with each release.
Since Daniel has logged many, many years in the opens source community, he finally took time to sit down and write Uncurled, a book about "everything [he knows] and learned about running and maintaining Open Source projects for three decades.".
It covers quite a bit of ground:
Experience. Stories from half a dozen Open Source projects I have created or joined and spent a significant time and energy in.
Start. Some words and advice about getting started with Open Source.
People. I have learned something about working with humans after a while.
Project. Lessons and insights about project specific things.
Money. Conclusions about the monetary side of Open Source.
Source. The code is certainly key in a project and there are clues for us there.
Security. A really important area that we must never ignore.
Maintainer. What is it and what does it take to be a project maintainer?
Evolution. Let me take you with you on a little journey and show how Open Source has developed.
Life. Can you be a successful Open Source maintainer and have a life at the same time?
Emails. A collection of "interesting" questions and feedback that I received over the years.
It is also a phenominal resource from one of the most impactful developers in modern history.
You can read Daniel's introduction to the project on his blog, and follow along the development of it over at GitHub
Kat's Mastodon Quickstart for Twitter Users
(Yes, the image is an elephant vs a mastodon, but image uploading was busted when I tried to post this so you’re left with what Substack had via Unsplash).
If you are one of us digital nomads who are considering abandoning Twitter due to Rocket Man's purchase of the platform, rest assured there are alternatives. I'll be presenting a few of them over the coming editions, but today I'm linking to a solid and quick "how to jump on the Mastodon" bandwagon.
Kat Marchán is former tech lead/architect for the NPM CLI, who is now working for Microsoft on the NuGet package manager. Kat is also a Rust aficionado, so you know they're good people as well.
Kat's guide to groking and using Mastodon is a great booststrap for folks wanting to keep their options open. Kat provides options for using existing Mastodon instances and links to resources to run your own.
FIN
The edition is late today as it's been a super weird day with the Supreme Court leak, here in the U.S., and an extended FaceTime session with my 6 mos old grandson. I'll try to Wednesday's out all the more earlier. Remember to be kind in the comments! ☮