Despite this being the "no assumptions" year, I almost have to assume everyone who reads the Daily Drop knows about GitHub Actions. On the outside chance I'm wrong, "GitHub Actions" is an automation platform integrated with GitHub, enabling users to create custom workflows for continuous integration, deployment, and task automation within their repositories.
We've tapped into this subject before, but we'll cover it a bit differently today, first with a resource that will let you break free of the evil that is Microsoft and run actions on your own (it's just container orchestration, after all), then take a look at two useful actions you may want to add to your workflow library.
act
Proudly sporting the motto, "Think globally, act locally" right up front, act lets you run your GitHub Actions locally.
Why would you want to do this? Well, they have an answer:
Fast Feedback - Rather than having to commit/push every time you want to test out the changes you are making to your
.github/workflows/
files (or for any changes to embedded GitHub actions), you can useact
to run the actions locally. The environment variables and filesystem are all configured to match what GitHub provides.Local Task Runner - I love make. However, I also hate repeating myself. With
act
, you can use the GitHub Actions defined in your.github/workflows/
to replace yourMakefile
!
act
itself is a Golang binary that reads all the things in .github/workflows/
and what actions need to be run. You need to have a Docker environment installed (it failed to run under OrbStack), as it uses the Docker API to either pull or build the necessary images. It resolves all the workflow and container dependencies as well. Then, it runs containers for each action based on the images it has prepared. All your environment variables and filesystem requirements are configured to match what would be expected in the GH environment.
It's super straightforward to use (output reformatted b/c “Substack”):
$ act -l
Stage: 0
Job ID: scheduled
Job name: scheduled
Workflow name: Daily scraper of CISA KEV
Workflow file: scraper.yml
Events: workflow_dispatch,schedule
That's from my hrbrmstr/cisa-known-exploited-vulns
repo, and that action grabs the KEV JSON from CISA every day. The act schedule
command will kick it off (it's a tad more complex than that with some secrets involved, but that's all in the README).
You have three runner options:
Large size image: +20GB Docker image, includes almost all tools used on GitHub Actions (IMPORTANT: currently only ubuntu-18.04 platform is available)
Medium size image: ~500MB, includes only necessary tools to bootstrap actions and aims to be compatible with all actions
Micro size image: <200MB, contains only NodeJS required to bootstrap actions, doesn't work with all actions
It's not perfect, and I've seen a few workflows fail, but it's nice to have as an option. I've ended up having to use the large size one, but storage is pretty inexpensive these days.
urlcheck
I 💙 semantically named tools! urlcheck (docs) is a GH action to "collect and check URLs in a project (code and documentation). The action aims at detecting and reporting broken links".
Their example workflow YAML isn't too big to put here, and it’s pretty self-documenting:
name: Check URLs
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: urls-checker
uses: urlstechie/urlchecker-action@master
with:
# A subfolder or path to navigate to in
# the present or cloned repository
subfolder: docs
# A comma-separated list of file types
#to cover in the URL checks
file_types: .md,.py,.rst
# Choose whether to include file with
# no URLs in the prints.
print_all: false
# The timeout seconds to provide to
# requests, defaults to 5 seconds
timeout: 5
# How many times to retry a failed
# request (each is logged, defaults to 1)
retry_count: 3
# A comma separated links to exclude during URL checks
exclude_urls: https://github.com/SuperKogito/URLs-checker/issues/1,https://github.com/SuperKogito/URLs-checker/issues/2
# A comma separated patterns to exclude during URL checks
exclude_patterns: https://github.com/SuperKogito/Voice-based-gender-recognition/issues
# choose if the force pass or not
force_pass : true
This is an especially handy one to use when dealing with CRAN submissions, but it's also a good tool to run in larger projects, or repositories that are more focused on external content (think: "static blog sites").
Automated Data Scraping
We mentioned shot-scraper in a previous Drop, but Simon's epic tooling isn't the only route to go when using GitHub Actions for scraping data.
swyx has a short, focused post that outlines the general use case, provides an example idiom to follow, and discusses the pros/cons of engaging in this endeavor.
Along with the CISA KEV scraper, I have a few others running on GitHub's dime since I know Microsoft can well-afford it. Having version-controlled data is a super nice plus.
FIN
Today's Drop title was chosen b/c — for some reason — I really liked that Star Trek episode when I was a kid. ☮
#act #urlcheck #github #githubactions #scraping #webscraping