Deploying apps via GitHub and Docker.

This is mostly for self-documentation, because it's pretty convenient and I remember having trouble - heh.

Created: Jul 27, 2023

By

~15 min read


Been a while.

From a birdseye view, the way this site gets deployed now is that I push a commit, which triggers a workflow that proceeds to copy the artifacts on over to the server, and then runs the artifact.

It’s incredibly convenient, but there’s some work in setting the whole thing up - so lets start simple.

The Dockerfile

Docker is so useful for replicating an environment so that you don’t have to fiddle around trying to set everything up yourself. Instead of needing to manually configure your environment, because it might be on a different OS - missing some things - or any number of other changes, you have a replicable container that should - technically - work anywhere that docker is installed.

As is, the Dockerfile I have right now is… really barebones.

FROM node:16

WORKDIR /usr/src/app

COPY . .

At least, for the frontend. For the backend, it’s a bit more complicated:

FROM node:lts-slim

WORKDIR /usr/src/app

# We don't need the standalone Chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true

# Flag for code to use
ENV CHROME_PATH /usr/bin/google-chrome

# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
RUN apt-get update && apt-get install gnupg wget -y && 
  wget --quiet --output-document=- https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg && 
  sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && 
  apt-get update && 
  apt-get install google-chrome-stable -y --no-install-recommends && 
  rm -rf /var/lib/apt/lists/*

COPY . .

Fun fact, I use puppeteer for webscraping. The Youtube API is an infuriating nightmare that is obnoxious to deal with - and for cases where, for example, I want to have a real-time stream of data coming from an existing youtube stream’s chat (my own, so I can make my own stream widgets), there is quite literally no usable API.

Youtube is stuck in the past, and requires you literally just poll for messages. Add to that the fact that you get a pitiful amount of credits, making you poll very infrequently, and it becomes functionally useless. Instead, I just scrape the chat and build the service myself.

Of course, this is much harder than I’m making it seem, and I’ll probably make another article about that at some point - but for now all you need to know is that all the extra garbage in the backend Dockerfile is purely to facilitate puppeteer - commands being run on the container (which is built using node:lts-slim - an existing image - as a base.)

Google makes it hard for you to like it.

Docker Compose

A dockerfile is really nice… for a container. But in an environment you probably want to setup more stuff. Maybe a database or two. Or maybe you want to do more stuff but don’t want to fiddle around the CLI or docker syntax, and would rather settle with a yaml file.

And that’s where docker-compose.yml comes in - a file that you can write in pure yaml that can then be used to setup a whole environment. So lets start with the frontend, where things are always simpler…

Frontend

version: "3"
services:
  web:
    build: "."
    command: node build
    volumes:
      - .:/usr/src/app
      - /usr/src/app/node_modules
    ports:
      - "3000:3000"

We’re using version 3 evidently, and then we’re defining a single service - web, because it’s the frontend. Duh.

We then specify where the dockerfile is - which happens to be in the same directory. You need to specify this, since you need a Dockerfile to… y’know. Build a docker container. This is why the dockerfile was so barebones by the way - it’s there to facilitate this.

After that, well we need to pull up the server. node build looks like we’re building with node, but actually we’re just running the script in build/index.js.

note:

Notice how it’s node build, and not something like npm run build. This is because by the point the docker gets setup, we will only have the production code - the result of building this svelte codebase. In other words, it’s the direct result of compilation. We’ll get into this in the workflow stage.

Then there’s the volumes. These effectively map a directory in our system to one in the container. To explain further, let’s look into .:/usr/src/app.

This maps our current directory (.), to /usr/src/app in the container. If the current directory is just the production code, this means that our application will be stored in /usr/src/app on the container.

The second one is an anonymous volume. The reasoning behind this is this -

  1. Firstly, docker builds the image using the Dockerfile we specified before. The file may be barebones, but it still has two instructions:
    1. Setting the work directory to /usr/src/app - look familiar?
    2. Copying the contents of the current directory - which contains the built code - onto the current directory of the container - which is /usr/src/app as per the previous command.
  2. This is good, but we want to bind our directory to the container’s - incase we change a file for our system for any reason - such as a quick bugfix or any other reason. That’s where .:/usr/src/app comes in - we’re binding our current directory to /usr/src/app.
  3. However we don’t want all files to be bound. node_modules marks the dependencies, and we’d like to keep these the same all throughout. So we then specify an anonymous volume /usr/src/app/node_modules - which will leave it alone when binding.

Now, full transparency here. I’m still a bit shaky on volumes and how they fully work. Looking around has helped me understand a bit, but it’s not 100%. The anonymous volume - at least in my case - could be extra.

Usually it’s there because people start with the uncompiled code - and need to build + install all the dependencies during the docker build. In this case, the volume is required since you don’t start with node_modules - meaning all the dependencies installed during building the image will be overwritten with whatever you have (which is nothing since no one commits node_modules for multiple reasons.)

You can tell how confused I was while making this. I should’ve taken notes.

Lastly, we’re mapping ports. By default, the container doesn’t expose any ports - so we expose the 3000 port in the container to 3000 on the server. This whitelisting is very useful for security - such as cases where you want a certain port to just be inaccessible externally.

Backend

version: "3"
services:
  backend:
    build: "."
    restart: unless-stopped
    command:
      - /bin/sh
      - -c
      - |
        node build/index.js
    volumes:
      - .:/usr/src/app
      - /usr/src/app/node_modules
    ports:
      - "1600:1600" # websocket endpoint
      - "1601:1601" # api endpoint
    environment:
      ENVFILE: /run/secrets/environment
      NODE_ENV: production
    secrets:
      - environment

secrets:
  environment:
    file: /root/.env

As always, the backend version is more complicated. The backend service, in addition to having build: "." (same reason, Dockerfile is in the same directory), we also have restart: unless-stopped. Why? Because if my backend fails critically I would rather have it restart than stay dead and keep the main site down until I notice. Of course, most failures I get are at a startup level where this doesn’t benefit me much - but still, it’s a precaution.

Then we have a list for command - mostly to be incredibly specific. The full command becomes /bin/sh -c node build/index.js - which is similar to node build but relies a lot less on assumptions. Why is this one different? Honestly it could just be rampant debugging, and this could be replaced with node build. It works well though, so I haven’t really given it much thought.

The volumes work similarly - it’s a node backend so that makes sense - same for ports, though in this case we have another one since I have both HTTP APIs and a websocket (and couldn’t find a way to make them work on the same port).

However, then we have environment and secrets. See, the backend is where all the spicy secrets go. For many APIs, you need some kind of authentication to actually hit them - using your credentials. I don’t need to tell you how bad it would be if your credentials were exposed. Since the backend lives on the server - which is safe beyond hacking attempts - that’s where secrets end up.

But putting your secrets in the code is a horrible idea for many reasons - code exists on git repositories and wherever you’ve cloned said repository - which means suddenly you have multiple points of weakness. Instead, they should be stored on the server itself.

The way I did this is by putting all my secrets in /root/.env, and then injecting those secrets into the docker container - found under /run/secrets/environment. Storing secrets in plaintext is usually bad - but in all honesty the only way to expose these would be if someone got access into my server - at which point it’s frankly kind of a total game over anyways.

The truth is, storing them as plaintext in the server is safe - so long as you both secure your server and never commit said plaintext via version control. If you’re going to update them, do so manually by accessing your server and running commands directly.

The Workflow

Finally, comes the actual workflow. The reason why this all ends up automated on every push. This is where I stop posting the full file, and start posting snippets. It’s also where things stop being so different between the frontend and the backend.

Huge thanks to philo.dev who effectively served as my main source on how to do any of this.

name: Build Website

on:
  push:
    branches:
      - 'main'
  workflow_dispatch:

First the simple bit - name and triggers. All this shows is that the following workflow will run when a push happens on main - or if it’s manually started from GitHub itself.

jobs:
  setup:
    runs-on: ubuntu-latest
    outputs:
      DEPLOYMENT_MATRIX: ${{ steps.export-deployment-matrix.outputs.DEPLOYMENT_MATRIX }}
    steps:

Then the setup. It will run on ubuntu-latest - but it also defines an output. This will be used later on deployment, and elaborated on when we get there.

- name: Pull repository
  uses: actions/checkout@v3

- name: Setup Node
  uses: actions/setup-node@v3

- name: Installing dependencies
  run: npm i

- name: Building site
  run: npm run build

- name: Prune dev dependencies
  run: npm prune --production

The start of the workflow works simple enough - you pull the repository, setup node in order to build, install the needed dependencies, and build the actual code. The last step prunes dev dependencies, because we don’t need those - this is strictly for the production environment.

- name: Prepare artifact
  env:
    GITHUB_SHA: ${{ github.sha }}
  run: tar -czf "${GITHUB_SHA}".tar.gz node_modules package.json build Dockerfile docker-compose.yml

- name: Upload production artifacts
  uses: actions/upload-artifact@v3
  with:
    name: site-build
    path: ${{ github.sha }}.tar.gz

Then we create and upload the artifact. Why? Because the built code will not persist past this job, and it’s kind of the thing we’re actually deploying. Uploading it as an artifact persists it for the workflow, so that other jobs can use it.

But what do we call it? We need something unique, since otherwise it’ll conflict with other deployments on the server. So we just use the SHA of the commit - available via github.sha. Since an artifact is singular, we archive all the files we actually need into a .tar.gz - and then the next step proceeds to upload it under the name site-build.

- name: Export deployment matrix
  id: export-deployment-matrix
  run: |
    delimiter="$(openssl rand -hex 8)"
    JSON="$(cat ./.github/workflows/servers.json)"
    echo "DEPLOYMENT_MATRIX<<${delimiter}" >> "${GITHUB_OUTPUT}"
    echo "$JSON" >> "${GITHUB_OUTPUT}"
    echo "${delimiter}" >> "${GITHUB_OUTPUT}"

    echo "$JSON"
    echo "${GITHUB_OUTPUT}"

The last step is where we get into the deployment matrix. This, right here, configures what we’ll actually be deploying to. It may be overkill since I only have a single server - but it’s useful to setup and realistically doesn’t complicate things too much.

First, I’ll show the source .json file from which we’re building said matrix:

[
    {
        "name": "enbyss-main-server",
        "ip": "49.12.76.246",
        "username": "github",
        "port": "22",
        "beforeHooks": "",
        "afterHooks": "",
        "path": "/home/github/enbyss.com"
    }
]

This specifies all the useful information for the deployment.

  • name is just the name of deployment, just for cosmetic purposes like displaying in a job title.
  • ip is the actual IP of the server we’re deploying to.
  • username is the user with which we’ll access the server - in this case I made a special user github because why would I use root privileges for this.
  • port is the SSH port that’ll be used - here it’s 22 which is the default.
  • beforeHooks and afterHooks are remnants of this being copied and pasted from elsewhere (thanks philo.dev)
  • path is where we want the artifact to actually go on the server.

This step parses servers.json and outputs it as DEPLOYMENT_MATRIX - so that it can be used by other jobs for the actual deployment.

prepare-release-on-servers:
  name: "${{ matrix.server.name }}: Prepare release"
  runs-on: ubuntu-latest
  needs: setup
  strategy:
    matrix:
      server: ${{ fromJson(needs.setup.outputs.DEPLOYMENT_MATRIX) }}
  steps:

Here’s the job in question. There you can see the name being used for its purpose - cosmetics. You can also see us setting up the actual matrix - since we want this job to run for every server we specify.

So as a result, we parse DEPLOYMENT_MATRIX as json - which we know is a list. Each server will be available as matrix.server.

- uses: actions/download-artifact@v3
  with:
    name: site-build

- name: Upload
  uses: appleboy/scp-action@master
  with:
    host: ${{ matrix.server.ip }}
    username: ${{ matrix.server.username }}
    key: ${{ secrets.SSH_KEY }}
    port: ${{ matrix.server.port }}
    source: ${{ github.sha }}.tar.gz
    target: ${{ matrix.server.path }}/artifacts

First we download the artifact we’d just uploaded - called site-build - and then we proceed to upload said artifact onto the server. This can be done with scp, which lets you copy files over ssh connections - and since someone already spent time making a reusable action for us, we’re using that instead of scp directly. Thank you appleboy.

This is where we shove all of our details - including the source, aka what we want to copy, and the target, aka where we want to copy it - in this case being an artifacts subfolder of whatever we set as the path.

You can also see the key - which is where the convenience sacrifices on security. Unfortunately, we need to somehow let github access our server to actually copy the files over - and the only way to do that is to give it the ssh key for its own user. This adds a point of weakness in my GitHub account, which although unfortunate, at least means that I can rely on the security that GitHub itself has.

It’s an extra point of weakness, but at least it’s a well protected one unless I fuck up significantly and leave my own account vulnerable. If I ever self-host my own Gitea instance, hopefully this weakness can be patched out as a result.

Enough about that though - we copied the files but now we need to deploy what we’ve got. Or more accurately, redeploy since the old version would still be running at this point (hopefully).

- name: Extract archive and redeploy
  uses: appleboy/ssh-action@master
  env:
    GITHUB_SHA: ${{ github.sha }}
  with:
    host: ${{ matrix.server.ip }}
    username: ${{ matrix.server.username }}
    key: ${{ secrets.SSH_KEY }}
    port: ${{ matrix.server.port }}
    envs: GITHUB_SHA
    script: |
        mkdir -p "${{ matrix.server.path }}/releases/${GITHUB_SHA}"
        tar xzf ${{ matrix.server.path }}/artifacts/${GITHUB_SHA}.tar.gz -C "${{ matrix.server.path }}/releases/${GITHUB_SHA}"

        rm -rf ${{ matrix.server.path }}/production/*
        cp -r ${{ matrix.server.path }}/releases/${GITHUB_SHA}/* ${{ matrix.server.path }}/production/

        cd ${{ matrix.server.path}}/production
        docker compose up -d --build --force-recreate

scp has copied our files over, time for ssh to let us in and write some commands (again, thank you appleboy)

We specify all the required values again, and then start working.

  1. First, create a releases subfolder for the artifact so that we can extract its contents into there.
  2. Then, extract the actual contents into that subfolder.
  3. In order to do the Indiana Jones switcheroo between the old and new versions, first you have to remove the old one. We do this using our trusty rm -rf on the contents of the production folder.
  4. Copy the releases subfolder’s contents into the production folder so that we can start it up.
  5. Enter into production and use docker compose to build and (re)start everything up.

I decided to have 3 main folders mostly for organization - artifacts is for the raw artifact, releases is for the extracted contents, and production is for the currently running code. This does let all these artifacts and releases accumulate, but I’m mostly keeping them for backups and so far haven’t run into storage issues. I just need to occasionally clear them out.

Technically I could write a script to remove everything but the last 5, but that’s for another day.

note:

The backend works in the same way. Technically speaking, anything would follow the same format of setup, copy with scp, deploy - the only things that would change are particulars, like how to setup, where to move the files, so on. Also, never - never ssh with a username and password. Always do it using keys.

Conclusion

This is, in effect, me self-documenting what I’ve done to make this all work. Some particulars may be wrong, but in my opinion it provides a solid basis on which to start - where then it can be customized according to different approaches.

I want to thank philo.dev once again for effectively being my basis for doing all of this - I just changed some things around and have a public example of it working, in essence - while adding some docker information.

If I got something wrong, feel free to send a message to contact@enbyss.com, or use any of the links I’ve got stashed in the sidebar for social media and what have you. I’d love to correct any mistakes here.