hrbrmstr's Daily Drop

Share this post

Drop #226 (2023-03-23): A Piece Of The Action(s)

dailyfinds.hrbrmstr.dev

Drop #226 (2023-03-23): A Piece Of The Action(s)

act; urlcheck; Automated Data Scraping

boB Rudis
Mar 23, 2023
1
Share
Share this post

Drop #226 (2023-03-23): A Piece Of The Action(s)

dailyfinds.hrbrmstr.dev

Despite this being the "no assumptions" year, I almost have to assume everyone who reads the Daily Drop knows about GitHub Actions. On the outside chance I'm wrong, "GitHub Actions" is an automation platform integrated with GitHub, enabling users to create custom workflows for continuous integration, deployment, and task automation within their repositories.

We've tapped into this subject before, but we'll cover it a bit differently today, first with a resource that will let you break free of the evil that is Microsoft and run actions on your own (it's just container orchestration, after all), then take a look at two useful actions you may want to add to your workflow library.

act

group of people running on stadium
Photo by Steven Lelham on Unsplash

Proudly sporting the motto, "Think globally, act locally" right up front, act lets you run your GitHub Actions locally.

Why would you want to do this? Well, they have an answer:

  • Fast Feedback - Rather than having to commit/push every time you want to test out the changes you are making to your .github/workflows/ files (or for any changes to embedded GitHub actions), you can use act to run the actions locally. The environment variables and filesystem are all configured to match what GitHub provides.

  • Local Task Runner - I love make. However, I also hate repeating myself. With act, you can use the GitHub Actions defined in your .github/workflows/ to replace your Makefile!

act itself is a Golang binary that reads all the things in .github/workflows/ and what actions need to be run. You need to have a Docker environment installed (it failed to run under OrbStack), as it uses the Docker API to either pull or build the necessary images. It resolves all the workflow and container dependencies as well. Then, it runs containers for each action based on the images it has prepared. All your environment variables and filesystem requirements are configured to match what would be expected in the GH environment.

It's super straightforward to use (output reformatted b/c “Substack”):

$ act -l
        Stage: 0
       Job ID: scheduled
     Job name: scheduled
Workflow name: Daily scraper of CISA KEV
Workflow file: scraper.yml
       Events: workflow_dispatch,schedule

That's from my hrbrmstr/cisa-known-exploited-vulns repo, and that action grabs the KEV JSON from CISA every day. The act schedule command will kick it off (it's a tad more complex than that with some secrets involved, but that's all in the README).

You have three runner options:

  • Large size image: +20GB Docker image, includes almost all tools used on GitHub Actions (IMPORTANT: currently only ubuntu-18.04 platform is available)

  • Medium size image: ~500MB, includes only necessary tools to bootstrap actions and aims to be compatible with all actions

  • Micro size image: <200MB, contains only NodeJS required to bootstrap actions, doesn't work with all actions

It's not perfect, and I've seen a few workflows fail, but it's nice to have as an option. I've ended up having to use the large size one, but storage is pretty inexpensive these days.

Share

urlcheck

magnifying glass near gray laptop computer
Photo by Agence Olloweb on Unsplash

I 💙 semantically named tools! urlcheck (docs) is a GH action to "collect and check URLs in a project (code and documentation). The action aims at detecting and reporting broken links".

Their example workflow YAML isn't too big to put here, and it’s pretty self-documenting:

name: Check URLs

on: [push]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3
    - name: urls-checker
      uses: urlstechie/urlchecker-action@master
      with:
        # A subfolder or path to navigate to in 
        # the present or cloned repository
        subfolder: docs

        # A comma-separated list of file types 
        #to cover in the URL checks
        file_types: .md,.py,.rst

        # Choose whether to include file with 
        # no URLs in the prints.
        print_all: false

        # The timeout seconds to provide to 
        # requests, defaults to 5 seconds
        timeout: 5

        # How many times to retry a failed 
        # request (each is logged, defaults to 1)
        retry_count: 3

        # A comma separated links to exclude during URL checks
        exclude_urls: https://github.com/SuperKogito/URLs-checker/issues/1,https://github.com/SuperKogito/URLs-checker/issues/2

        # A comma separated patterns to exclude during URL checks
        exclude_patterns: https://github.com/SuperKogito/Voice-based-gender-recognition/issues

        # choose if the force pass or not
        force_pass : true

This is an especially handy one to use when dealing with CRAN submissions, but it's also a good tool to run in larger projects, or repositories that are more focused on external content (think: "static blog sites").

Automated Data Scraping

monitor screengrab
Photo by Stephen Phillips - Hostreviews.co.uk on Unsplash

We mentioned shot-scraper in a previous Drop, but Simon's epic tooling isn't the only route to go when using GitHub Actions for scraping data.

swyx has a short, focused post that outlines the general use case, provides an example idiom to follow, and discusses the pros/cons of engaging in this endeavor.

Along with the CISA KEV scraper, I have a few others running on GitHub's dime since I know Microsoft can well-afford it. Having version-controlled data is a super nice plus.

FIN

Today's Drop title was chosen b/c — for some reason — I really liked that Star Trek episode when I was a kid. ☮

#act #urlcheck #github #githubactions #scraping #webscraping

1
Share
Share this post

Drop #226 (2023-03-23): A Piece Of The Action(s)

dailyfinds.hrbrmstr.dev
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 boB Rudis
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing