It's been a while since I've done web stuff.

azonenberg@ioc.exchange

It's been a while since I've done web stuff. Did I screw up anything too horribly here?

scopehal-ci-scripts/api/github-hook.php at main · ngscopeclient/scopehal-ci-scripts

Test scripts for continuous integration builds on our internal cluster - scopehal-ci-scripts/api/github-hook.php at main · ngscopeclient/scopehal-ci-scripts

GitHub (github.com)

This is a web hook that is only triggered by the 'push' event on ngscopeclient/scopehal-apps.

Goal is to ensure that

a) nobody but github can trigger builds (to prevent DoSing the CI platform with a ridiculous number of builds etc)

b) a compromise of GitHub's webhook infrastructure can trigger builds of real commits in the repo, but not run arbitrary shell commands on the CI runner or pull from an untrusted fork (i.e. no shell command injections etc in the branch/commit strings which will be fed to a zillion bash scripts downstream)

azonenberg@ioc.exchange

One trust boundary I'm trying to maintain here is that the orchestrator node (the box that runs this hook and launches the actual runner instances) is slightly more trusted than the runners.

While right now, we're only building from master on the upstream repo, I would like to eventually use the same platform to test pull requests after a cursory human review.

The runner VMs are ephemeral and blown away after a job completes and live in an isolated sandbox network, so the damage they can cause if compromised is limited (assuming nobody cares enough to burn a Xen hypervisor escape on me).

But I want to avoid any opportunity for pivoting to the orchestrator which e.g. will be able to publish build artifacts to other servers

cr1901@mastodon.social

@azonenberg > i.e. no shell command injections etc in the branch/commit strings

Yea, that was my question... how are you sanitizing $branch and $after?

azonenberg@ioc.exchange

@cr1901 The checks on line 54 and 65 are intended to make sure that $branch is only alphanumeric characters and dashes, and $after is only lowercase hex characters.

After that, they should be safe to e.g. "git checkout $COMMIT"

The question is...
* Did I do that right? I'm not a regex expert
* Is there any way that this can lead to me checking out a branch/commit from a fork rather than the upstream repo via some github server side weirdness?

azonenberg@ioc.exchange

@cr1901 The downstream bash scripts will do no further sanitizing, as it's assumed the branch and ref are valid by the time you get to them.

Right now these arguments are ignored and I always build latest master, but that is about to change and I want to not get pwned when that happens

dch@bsd.network

@azonenberg I've built one of these, and one thing you may end up needing it is *before* doing the HMAC validation, is to limit by inbound IP, e.g. at firewall or (in my case) haproxy.

https://docs.github.com/en/webhooks/using-webhooks/best-practices-for-using-webhooks#allow-githubs-ip-addresses & use https://api.github.com/meta to populate it.

azonenberg@ioc.exchange

@dch the hmac is fast enough it's not a viable DoS vector. My inbound pipe will saturate before I'll run out of CPU doing hashes.

If somebody wants to DDoS my API endpoint there's not a whole lot I can do about it.

azonenberg@ioc.exchange

@dch The overall flow is that incoming traffic to my site hits the core router / firewall which blocks requests to unrecognized ports or from blocklisted IPs etc.

Then anything headed to port 443 or 80 goes to the VM server where a single nginx instance does HTTPS termination as well as serving some static web endpoints like my blog. Non-HTTPS traffic is 301 redirected to HTTPS.

A bunch of other endpoints like the CI dashboard and API are reverse-proxied (currently with no TLS but I will eventually be using internal enterprise-CA certificates here just to be extra safe) to other VMs on the same physical host, like the one the dashboard is hosted by as well as the one the webhook API endpoint lives on.

Since the main TLS termination host needs to accept connections from anywhere on the internet, I would need IP filtering in that nginx instance to drop stuff going to the API endpoint from non-GitHub IPs. While doable, the hash is fast enough I don't think it is going to really protect me from much. The whole point of the HMAC itself is that anyone without the preshared key can't spoof requests.

CIRCLE WITH A DOT

It's been a while since I've done web stuff.

scopehal-ci-scripts/api/github-hook.php at main · ngscopeclient/scopehal-ci-scripts

scopehal-ci-scripts/api/github-hook.php at main · ngscopeclient/scopehal-ci-scripts

scopehal-ci-scripts/api/github-hook.php at main · ngscopeclient/scopehal-ci-scripts

scopehal-ci-scripts/api/github-hook.php at main · ngscopeclient/scopehal-ci-scripts