When I started in computer science, one thing that fascinated me about security is that “secure” is not really an end-state, but a process. There’s an inherent asymmetry in this industry: to guard against attacks you must prevent all possible avenues of attack, but to successfully exploit an application you only need to find one way in. Any non-trivial application left to collect dust will eventually become insecure as we learn about new and exciting ways in which people can exploit.

If you’re a service team, these concerns might not be front of mind. Investing in security tooling and processes takes away time from being able to develop new features and fix existing bugs and maybe, just maybe, pay back some of that tech debt. Dependency updates need to be tested to prevent additional bugs, taking away even more of the critical resource of time. With all of that in mind, is it so bad to let a few versions go by?

As it turns out, vulnerability patching is no joke: the Canadian Centre for Cyber Security ranks patching applications and operating systems as one of the top 10 security actions to take, placing it at #2. When you consider that 75% of attacks in 2020 used vulnerabilities known for at least 2 years, it becomes even more critical to keep software and dependencies up-to-date.

In this post I will go over some tools that you can use to scan dependencies and containers for vulnerabilities. We will also use Github Actions to automate the use of these tools to give us regular updates on the status of a service’s container image.

Preliminaries

If you are familiar with software development and containers as a whole, you can skip this section.

General Concepts

  • A software vulnerability is a problem in a software that leaves it exposed to potential attacks.
  • A Continuous Integration / Continuous Deployment (CI/CD) system is a system that lets you run automated testing, building, and deploying of applications
  • Github Actions is a CI/CD platform integrated into GitHub
  • A pipeline is a specific definition inside your CI/CD platform. It can be used to build, test, and release software, among other things. In Github Actions, a pipeline is synonymous to a workflow
  • A branch is a concept related to the version control system called git. Usually, service deployments are done from the main branch, usually called (appropriately) main.
  • A commit is a specific point in a branch. This effectively acts as a snapshot of your code, at a specific branch, at a specific moment in time.

Containers

I assume for this post some level of familiarity with container tools such as Docker. Below is a list of concepts that you need to know:

  • A container is a way to package software and its dependencies to help it be more portable
  • Docker is a tool to let you build and run containers
  • An image is the immutable definition of the container. You can think of them as a template.
  • An image has some base image, which usually has some tools and software installed on it. Unless you are building an image from scratch, the base image you use will have been published by someone. An example of this is alpine:latest, or rust:1.67.
  • An image tag is some semantic name given to a specific version of the image. Tags are not immutable, meaning the image that is associated with a specific tag can change.
  • A container registry allows you to upload container images, which can then be distributed. Github’s integrated container registry is called Github Packages.
  • A container image is “uniquely” identified by its hash. This is done by a specific of a version called the Secure Hash Algorithm, or SHA

Required Software

In this post I will assume you have Docker installed. I also mention Rust tooling for vulnerability scanning, but no practical example is done.

You should be familiar with how to pull an image from a registry if you would like to follow along.

Dependency Scanning

A key benefit of having security be so front-of-mind in modern development is that there has been a lot of tooling developed specifically for improving security of applications. In particular, most languages have some sort of dependency scanning tool that compares your project’s dependencies against a list of known-vulnerable packages in your ecosystem.

For Rust, one such tool is cargo-audit, which uses the Rust Advisory Database to check for vulnerabilities. It integrates very nicely with cargo, the Rust package manager.

To install, run cargo install cargo-audit, and run a check with cargo audit. You can also check additional options for how to customize your scan a little more granularly.

Container Scanning with Trivy

You might assume that code dependencies are the only thing you have to worry about scanning, but sadly in the world of containers this isn’t really true.

Most base images available are based off of some operating system, and give you many of the same tools. For example, by using the ubuntu:latest image, you have access to a whole suite of tools. These are not dependencies that you explicitly opted-in to, but ones that came bundled in.

Beyond that, if you use a language that requires an interpreter (like Python or Javascript) or some additional runtime engine (like Java or C#), these will also have their own set of dependencies that they need to run. This introduces additional avenues for vulnerabilities that would not be caught by a dependency scanner.

Thankfully, container scanning tools are also fairly prevalent. One common one is a tool called trivy, which lets you (among other things) scan container images for vulnerable packages, tools, libraries, etc.

Manually Scanning Images

First, you should install Trivy however makes the most sense for your system and usecase. I will assume that however you’ve installed it, you can access it from your command-line with the trivy command.

Next, let’s pull an image with some known vulnerabilities:

docker pull python:3.7-slim-stretch

Next, let’s run a trivy scan:

trivy image python:3.7-slim-stretch

You should see a table output with lots of vulnerabilities listed (by code), the package that is vulnerable, it’s severity rating, what version you have installed, and the minimum version needed to fix the vulnerability.

Now, let’s run a scan on an image that has no vulnerabilities (at the time of writing). Note, there’s a chance that when you are reading this, the image might be vulnerable. I’ve picked Alpine for this purpose because their vulnerability patching tends to be very timely.

docker pull alpine:latest
trivy image alpine:latest

Running Regular Scans

Now that we’re able to scan our dependencies and containers for known vulnerabilities, it’s time to add some automation. While it’s lovely to be able to do the process manually, leaving this task for developers will add overhead that you just don’t need, or it will mean that it doesn’t get done.

Besides, I’m very lazy. If a computer can do something for me instead, that is preferable.

A few things to keep in mind as we start this process:

  • Regular scanning is nice, but you also need to have a plan for when vulnerabilities are found.
    • It’s possible for a dependency to release a fix for a vulnerability alongside a breaking change. Good testing will allow you to catch this and address necessary changes.
    • If you prevent deployments from happening when the images fail a vulnerability scan, you could inadvertently prevent critical fixes from being deployed in your own service. Have a plan in mind for how to deploy hotfixes even if there is a dependency you can’t upgrade yet
  • Not all vulnerabilities are created equal, but the security team won’t love you if you point that out
    • Especially as teams and companies grow, it’s harder and harder to ensure that whomever is in charge of security is aware of all the required context to accept the risk of a specific vulnerability.
    • The easiest way to ensure that your software isn’t vulnerable even by transitivity is to not let any part of it have an unpatched known vulnerability.
  • Your release cycle needs to be considered when setting the frequency and type of scans

For web services I’ve written as personal projects, I like to do three types of scans as part of my nightly scans:

  1. Scan the dependencies to see if anything needs to be updated
  2. Scan the containers we most recently deployed (the latest tag, usually). These should be the containers that are running in production.
  3. Build new release images at the latest commit of the main branch, and then scan those images.

This is by no means an exhaustive list (we could, for example, scan the files for secrets), but is a very good foundation.

A Note On Published vs Built Containers

It’s quite common for the most recent commit to the main branch to also be what was most recently deployed. You might be wondering, then: why do both scans?

This is when release cycle comes into play- Not only your own release cycle, but the release cycle of whatever container dependencies you use in your project!

It’s reasonably common to use version tags that get updated, such as rust:1.67 (which gets updated on patch versions), as well as the use of latest tags. If that is the case in your service, it is possible to have the most recently deployed version be flagged as vulnerable while the most recently built container is not.

If you use tags that don’t tend to change (like rust:1.67.1) or use SHA hashes for your containers, doing both scans won’t give you anything different. In those cases, you’ll need to upgrade the dependency manually.

Automating the Scans

Since I use Github as my primary code storage, I tend to use Github Actions for automation. Thankfully, it supports running on a schedule with a similar syntax to cron, which we can leverage.

Below is the workflow file definition used for my websvc-rs repository template, which covers all three of the discussed scans. I have annotated it with additional explanations of what everything each job does.

It’s important to keep in mind that, if your service gets released in multiple ways per version, you should scan all of them. In this example, websvc-rs releases 4 images for every version, broken down into:

  • One version compiled with musl that is based off of scratch, and includes nothing except the service and a tool to check that the service is healthy, tagged latest
  • One version compiled with musl that is based off of alpine:latest that includes additional tools on top of what I’ve built (i.e. a shell, a package manager, etc), tagged debug
  • One version compiled with glibc that is based off of scratch, tagged latest-gnu
  • One version compiled with glibc that is based off of alpine:latest, tagged debug-gnu

Each of these versions should be scanned.


# .github/workflows/nightly-scan.yml
name: nightly-scan

on:
  schedule:
    # 7 AM UTC every day. Roughly translates to 2 AM in the east coast
    - cron: "0 7 * * *"
  # Allows the workflow to be manually triggered
  workflow_dispatch:

env:
  # Github container registry's domain
  REGISTRY: ghcr.io
  # The name of the image. I always use the repository
  # name when only publishing one container image
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # Job to run the audit scan for `cargo` dependencies
  cargo-audit:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Install rust tool chain
        uses: dtolnay/rust-toolchain@master
        with:
          toolchain: stable
          components: rustfmt, clippy

      - name: Run the security scanner
        run: cargo audit

  # Job to build an image from the latest commit &
  # scan the resulting image. This is because the
  # service uses `alpine:latest` for debug containers.
  #
  # Prod targets are still scanned despite being based
  # off of `scratch` in case the basis changes. For example,
  # if instead of `scratch` I ended up needing to use one of
  # the distroless images, they will contain some dependencies
  sha-scan:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        target:
          - prod
          - debug
          - prod-gnu
          - debug-gnu
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Build the docker container
        run: |
          docker build -t ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
            --target ${{ matrix.target }} \
            .
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: "${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}"
          format: "table"
          exit-code: "1"
          ignore-unfixed: true

  # Pulls from Github Packages each of the 4 images that
  # are released, and scans them.
  registry-image-scan:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        tag:
          - latest
          - latest-gnu
          - debug
          - debug-gnu
    steps:
      - name: Pull the docker image to scan
        run: docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ matrix.tag }}

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: "${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ matrix.tag }}"
          format: "table"
          exit-code: "1"
          ignore-unfixed: true

Now, any time this workflow fails, I will get a Github notification and be able to see what went wrong and, potentially, run a new deployment to fix it!

Why Scan scratch Images?

You might be wondering why I bother scanning the latest and latest-gnu images. After all, if they only have the service and the health check, won’t they always come back clean when our service doesn’t have a vulnerability itself?

Yes, that is exactly correct. However, this assumes that these images are currently based off of scratch and will always be based off of scratch. Similarly, it assumes that nothing else will ever get added to the image.

If any of these assumptions change down the line, we would have to update our nightly scanner, which means we will also have to remember to update our nightly scanner. However, if we scan them (even when we don’t need to), we are covered when our initial assumptions no longer hold.

Wrapping Up

In this post we talked about the importance of security scanning, and gave context to the different methods of scanning. We covered an example of scanning two real containers for vulnerabilities, and set up a Github Actions workflow to run regularly-scheduled scans.

Questions to Reflect On

  1. Since we can use a workflow file to deploy our application, we could have this nightly scan deploy a new version from the newly-built containers if the existing ones are vulnerable and the new ones are not. What are some potential benefits to doing that? What are some potential drawbacks?
  2. In this example, I configure Trivy to ignore unfixed issues. What are some benefits of doing this? What are the drawbacks?
  3. In this post, we only covered one specific dependency scanning tool, and only for one programming language. If Rust is not your primary language, what dependency scanning tool is the most popular one for your language of choice?
  4. If Trivy can scan so many targets, including filesystems and git repositories, why did we go over how to run a dependency scan using cargo-audit? Are there (security or otherwise) benefits to using it instead of Trivy?