Writing a GitHub Webhook as an Azure Function

I recently experimented with Azure Functions and GitHub apps, and I wanted to share what I learned.

A bit of background

As you may already know, I’m one of the maintainers of the FakeItEasy mocking library. As is common in open-source projects, we use a workflow based on feature branches and pull requests. When a change is requested in a PR during code review, we usually make the change as a fixup commit, because it makes it easier to review, and because we like to keep a clean history. When the changes are approved, the author squashes the fixup commits before the PR is merged. Unfortunately, I’m a little absent minded, and when I review a PR, I often forget to wait for the author to squash their commits before I merge… This causes the fixup commits to appear in the main dev branch, which is ugly.

Which leads me to the point of this post: I wanted to make a bot that could prevent a PR from being merged if it had commits that needed to be squashed (i.e. commits whose messages start with fixup! or squash!). And while I was at it, I thought I might as well make it usable by everyone, so I made it a GitHub app: DontMergeMeYet.

GitHub apps

Now, you might be wondering, what on Earth is a GitHub app? It’s simply a third-party application that is granted access to a GitHub repository using its own identity; what it can do with the repo depends on which permissions were granted. A GitHub app can also receive webhook notifications when events occur in the repo (e.g. a comment is posted, a pull request is opened, etc.).

A GitHub app could, for instance, react when a pull request is opened or updated, examine the PR details, and add a commit status to indicate whether the PR is ready to merge or not (this WIP app does this, but doesn’t take fixup commits into account).

As you can see, it’s a pretty good fit for what I’m trying to do!

In order to create a GitHub app, you need to go to the GitHub apps page, and click New GitHub app. You then fill in at least the name, homepage, and webhook URL, give the app the necessary permissions, and subscribe to the webhook events you need. In my case, I only needed read-only access to pull requests, read-write access to commit statuses, and to receive pull request events.

At this point, we don’t yet have an URL for the webhook, so enter any valid URL; we’ll change it later after we actually implemented the app.

Azure Functions

I hadn’t paid much attention to Azure Functions before, because I didn’t really see the point. So I started to implement my webhook as a full-blown ASP.NET Core app, but then I realized several things:

  • My app only had a single HTTP endpoint
  • It was fully stateless and didn’t need a database
  • If I wanted the webhook to always respond quickly, the Azure App Service had to be "always on"; that option isn’t available in free plans, and I didn’t want to pay a fortune for a better service plan.

I looked around and realized that Azure Functions had a "consumption plan", with a generous amount (1 million per month) of free requests before I had to pay anything, and functions using this plan are "always on". Since I had a single endpoint and no persistent state, an Azure Function seemed to be the best fit for my requirements.

Interestingly, Azure Functions can be triggered, among other things, by GitHub webhooks. This is very convenient as it takes care of validating the payload signature.

So, Azure Functions turn out to be a perfect match for implementing my webhook. Let’s look at how to create one.

Creating an Azure Function triggered by a GitHub webhook

It’s possible to write Azure functions in JavaScript, C# (csx) or F# directly in the portal, but I wanted the comfort of the IDE, so I used Visual Studio. To write an Azure Function in VS, follow the instructions on this page. When you create the project, a dialog appears to let you choose some options:

New function dialog

  • version of the Azure Functions runtime: v1 targets the full .NET Framework, v2 targets .NET Core. I picked v1, because I had trouble with the dependencies in .NET Core.
  • Trigger: GitHub webhooks don’t appear here, so just pick "HTTP Trigger", we’ll make the necessary changes in the code.
  • Storage account: pick the storage emulator; when you publish the function, a real Azure storage account will be set instead
  • Access rights: it doesn’t matter what you pick, we’ll override it in the code.

The project template creates a class named Function1 with a Run method that looks like this:

public static class Function1
{
    [FunctionName("Function1")]
    public static async Task<HttpResponseMessage> Run(
        [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)]HttpRequestMessage req, TraceWriter log)
    {
        ...
    }
}

Rename the class to something that makes more sense, e.g. GitHubWebHook, and don’t forget to change the name in the FunctionName attribute as well.

Now we need to tell the Azure Functions runtime that this function is triggered by a GitHub webhook. To do this, change the method signature to look like this:

    [FunctionName("GitHubWebHook")]
    public static async Task<HttpResponseMessage> Run(
        [HttpTrigger("POST", WebHookType = "github")] HttpRequestMessage req,
        TraceWriter log)

GitHub webhooks always use the HTTP POST method; the WebHookType property is set to "github" to indicate that it’s a GitHub webhook.

Note that it doesn’t really matter what we respond to the webhook request; GitHub doesn’t do anything with the response. I chose to return a 204 (No content) response, but you can return a 200 or anything else, it doesn’t matter.

Publishing the Azure Function

To publish your function, just right click on the Function App project, and click Publish. This will show a wizard that will let you create a new Function App resource on your Azure subscription, or select an existing one. Not much to explain here, it’s pretty straightforward; just follow the wizard!

When the function is published, you need to tell GitHub how to invoke it. Open the Azure portal in your browser, navigate to your new Function App, and select the GitHubWebHook function. This will show the content of the (generated) function.json file. Above the code view, you will see two links: Get function URL, and Get GitHub secret:

Azure Function URL and secret

You need to copy the URL to the Webhook URL field in the GitHub app settings, and copy the secret to the Webhook secret field. This secret is used to calculate a signature for webhook payloads, so that the Azure Function can ensure the payloads really come from GitHub. As I mentioned earlier, this verification is done automatically when you use a GitHub HTTP trigger.

And that’s it, your webhook is online! Now you can go install the GitHub app into one of your repositories, and your webhook will start receiving events for this repo.

Points of interest

I won’t describe the whole implementation of my webhook in this post, because it would be too long and most of it isn’t that interesting, but I will just highlight a few points of interest. You can find the complete code on GitHub.

Parsing the payload

Rather than reinventing the wheel, we can leverage the Octokit .NET library. Octokit is a library made by GitHub to consume the GitHub REST API. It contains classes representing the entities used in the API, including webhook payloads, so we can just deserialize the request content as a PullRequestEventPayload. However, if we just try to do this with JSON.NET, this isn’t going to work: Octokit doesn’t use JSON.NET, so the classes aren’t decorated with JSON.NET attributes to map the C# property names to the JSON property names. Instead, we need to use the JSON serializer that is included in Octokit, called SimpleJsonSerializer:

private static async Task<PullRequestEventPayload> DeserializePayloadAsync(HttpContent content)
{
    string json = await content.ReadAsStringAsync();
    var serializer = new SimpleJsonSerializer();
    return serializer.Deserialize<PullRequestEventPayload>(json);
}

There’s also another issue: the PullRequestEventPayload from Octokit is missing the Installation property, which we’re going to need later to authenticate with the GitHub API. An easy workaround is to make a new class that inherits PullRequestEventPayload and add the new property:

public class PullRequestPayload : PullRequestEventPayload
{
    public Installation Installation { get; set; }
}

public class Installation
{
    public int Id { get; set; }
}

And we’ll just use PullRequestPayload instead of PullRequestEventPayload.

Authenticating with the GitHub API

We’re going to need to call the GitHub REST API for two things:

  • to get the list of commits in the pull request
  • to update the commit status

In order to access the API, we’re going to need credentials… but which credentials? We could just generate a personal access token and use that, but then we would access the API as a "real" GitHub user, and we would only be able to access our own repositories (for writing, at least).

As I mentioned earlier, GitHub apps have their own identity. What I didn’t say is that when authenticated as themselves, there isn’t much they’re allowed to do: they can only get management information about themselves, and get a token to authenticate as an installation. An installation is, roughly, an instance of the application that is installed on one or more repo. When someone installs your app on their repo, it creates an installation. Once you get a token for an installation, you can access all the APIs allowed by the app’s permissions on the repos it’s installed on.

However, there are a few hoops to jump through to get this token… This page describes the process in detail.

The first step is to generate a JSON Web Token (JWT) for the app. This token has to contain the following claims:

  • iat: the timestamp at which the token was issued
  • exp: the timestamp at which the token expires
  • iss: the issuer, which is actually the app ID (found in the GitHub app settings page)

This JWT needs to be signed with the RS256 algorithm (RSA signature with SHA256); in order to sign it, you need a private key, which must be generated from the GitHub app settings page. You can download the private key in PEM format, and store it somewhere your app can access it. Unfortunately, the .NET APIs to generate and sign a JWT don’t handle the PEM format, they need an RSAParameters object… But Stackoverflow is our friend, and this answer contains the code we need to convert a PEM private key to an RSAParameters object. I just kept the part I needed, and manually reformatted the PEM private key to remove the header, footer, and newlines, so that it could easily be stored in the settings as a single line of text.

Once you have the private key as an RSAParameters object, you can generate a JWT like this:

public string GetTokenForApplication()
{
    var key = new RsaSecurityKey(_settings.RsaParameters);
    var creds = new SigningCredentials(key, SecurityAlgorithms.RsaSha256);
    var now = DateTime.UtcNow;
    var token = new JwtSecurityToken(claims: new[]
        {
            new Claim("iat", now.ToUnixTimeStamp().ToString(), ClaimValueTypes.Integer),
            new Claim("exp", now.AddMinutes(10).ToUnixTimeStamp().ToString(), ClaimValueTypes.Integer),
            new Claim("iss", _settings.AppId)
        },
        signingCredentials: creds);

    var jwt = new JwtSecurityTokenHandler().WriteToken(token);
    return jwt;
}

A few notes about this code:

  • It requires the following NuGet packages:
    • Microsoft.IdentityModel.Tokens 5.2.1
    • System.IdentityModel.Tokens.Jwt 5.2.1
  • ToUnixTimeStamp is an extension method that converts a DateTime to a UNIX timestamp; you can find it here
  • As per the GitHub documentation, the token lifetime cannot exceed 10 minutes

Once you have the JWT, you can get an installation access token by calling the "new installation token" API endpoint. You can authenticate to this endpoint by using the generated JWT as a Bearer token

public async Task<string> GetTokenForInstallationAsync(int installationId)
{
    var appToken = GetTokenForApplication();
    using (var client = new HttpClient())
    {
        string url = $"https://api.github.com/installations/{installationId}/access_tokens";
        var request = new HttpRequestMessage(HttpMethod.Post, url)
        {
            Headers =
            {
                Authorization = new AuthenticationHeaderValue("Bearer", appToken),
                UserAgent =
                {
                    ProductInfoHeaderValue.Parse("DontMergeMeYet"),
                },
                Accept =
                {
                    MediaTypeWithQualityHeaderValue.Parse("application/vnd.github.machine-man-preview+json")
                }
            }
        };
        using (var response = await client.SendAsync(request))
        {
            response.EnsureSuccessStatusCode();
            var json = await response.Content.ReadAsStringAsync();
            var obj = JObject.Parse(json);
            return obj["token"]?.Value<string>();
        }
    }
}

OK, almost there. Now we just need to use the installation token to call the GitHub API. This can be done easily with Octokit:

private IGitHubClient CreateGitHubClient(string installationToken)
{
    var userAgent = new ProductHeaderValue("DontMergeMeYet");
    return new GitHubClient(userAgent)
    {
        Credentials = new Credentials(installationToken)
    };
}

And that’s it, you can now call the GitHub API as an installation of your app.

Note: the code above isn’t exactly what you’ll find in the repo; I simplified it a little for the sake of clarity.

Testing locally using ngrok

When creating your Azure Function, it’s useful to be able to debug on your local machine. However, how will GitHub be able to call your function if it doesn’t have a publicly accessible URL? The answer is a tool called ngrok. Ngrok can create a temporary host name that forwards all traffic to a port on your local machine. To use it, create an account (it’s free) and download the command line tool. Once logged in to the ngrok website, a page will give you the command to save an authentication token on your machine. Just execute this command:

ngrok authtoken 1beErG2VTJJ0azL3r2SBn_2iz8johqNv612vaXa3Rkm

Start your Azure Function in debug from Visual Studio; the console will show you the local URL of the function, something like http://localhost:7071/api/GitHubWebHook. Note the port, and in a new console, start ngrok like this:

ngrok http 7071 --host-header rewrite

This will create a new hostname and start forwarding traffic to the 7071 port on your machine. The --host-header rewrite argument causes ngrok to change the Host HTTP header to localhost, rather than the temporary hostname; Azure Functions don’t work correctly without this.

You can see the temporary hostname in the command output:

ngrok by @inconshreveable                                                                                                                                                                                         (Ctrl+C to quit)

Session Status                online
Account                       Thomas Levesque (Plan: Free)
Version                       2.2.8
Region                        United States (us)
Web Interface                 http://127.0.0.1:4040
Forwarding                    http://89e14c16.ngrok.io -> localhost:7071
Forwarding                    https://89e14c16.ngrok.io -> localhost:7071

Connections                   ttl     opn     rt1     rt5     p50     p90
                              0       0       0.00    0.00    0.00    0.00

Finally, go to the GitHub app settings, and change the webook URL to https://89e14c16.ngrok.io/api/GitHubWebHook (i.e. the temporary domain with the same path as the local URL).

Now you’re all set. GitHub will send the webhook payloads to ngrok, which will forward them to your app running locally.

Note that unless you have a paid plan for ngrok, the temporary subdomain changes every time you start the tool, which is annoying. So it’s better to keep it running for the whole development session, otherwise you will need to change the GitHub app settings again.

Conclusion

Hopefully you learned a few things from this article. With Azure Functions, it’s almost trivial to implement a GitHub webhook (the only tricky part is the authentication to call the GitHub API, but not all webhooks need it). It’s much lighter than a full-blown web app, and much simpler to write: you don’t have to care about MVC, routing, services, etc. And if it wasn’t enough, the pricing model for Azure Functions make it a very cheap option for hosting a webhook!

Understanding the ASP.NET Core middleware pipeline

Middlewhat?

The ASP.NET Core architecture features a system of middleware, which are pieces of code that handle requests and responses. Middleware are chained to each other to form a pipeline. Incoming requests are passed through the pipeline, where each middleware has a chance to do something with them before passing them to the next middleware. Outgoing responses are also passed through the pipeline, in reverse order. If this sounds very abstract, the following schema from the official ASP.NET Core documentation should help you understand:

Middleware pipeline

Middleware can do all sort of things, such as handling authentication, errors, static files, etc… MVC in ASP.NET Core is also implemented as a middleware.

Configuring the pipeline

You typically configure the ASP.NET pipeline in the Configure method of your Startup class, by calling Use* methods on the IApplicationBuilder. Here’s an example straight from the docs:

public void Configure(IApplicationBuilder app)
{
    app.UseExceptionHandler("/Home/Error");
    app.UseStaticFiles();
    app.UseAuthentication();
    app.UseMvcWithDefaultRoute();
}

Each Use* method adds a middleware to the pipeline. The order in which they’re added determines the order in which requests will traverse them. So an incoming request will first traverse the exception handler middleware, then the static files middleware, then the authentication middleware, and will eventually be handled by the MVC middleware.

The Use* methods in this example are actually just "shortcuts" to make it easier to build the pipeline. Behind the scenes, they all end up using (directly or indirectly) these low-level primitives: Use and Run. Both add a middleware to the pipeline, the difference is that Run adds a terminal middleware, i.e. a middleware that is the last in the pipeline.

A basic pipeline with no branches

Let’s look at a simple example, using only the Use and Run primitives:

public void Configure(IApplicationBuilder app)
{
    // Middleware A
    app.Use(async (context, next) =>
    {
        Console.WriteLine("A (before)");
        await next();
        Console.WriteLine("A (after)");
    });

    // Middleware B
    app.Use(async (context, next) =>
    {
        Console.WriteLine("B (before)");
        await next();
        Console.WriteLine("B (after)");
    });

    // Middleware C (terminal)
    app.Run(async context =>
    {
        Console.WriteLine("C");
        await context.Response.WriteAsync("Hello world");
    });
}

Here, each middleware is defined inline as an anonymous method; they could also be defined as full-blown classes, but for this example I picked the more concise option. Non-terminal middleware take two arguments: the HttpContext and a delegate to call the next middleware. Terminal middleware only take the HttpContext. Here we have two middleware A and B that just log to the console, and a terminal middleware C which writes the response. Here’s the console output when we send a request to our app:

A (before)
B (before)
C
B (after)
A (after)

We can see that each middleware was traversed in the order in which it was added, then traversed again in reverse order. The pipeline can be represented like this:

Basic pipeline

Short-circuiting middleware

A middleware doesn’t necessarily have to call the next middleware. For instance, if the static files middleware can handle a request, it doesn’t need to pass it down to the rest of the pipeline, it can respond immediately. This behavior is called short-circuiting the pipeline.

In the previous example, if we comment out the call to next() in middleware B, we get the following output:

A (before)
B (before)
B (after)
A (after)

As you can see, middleware C is never invoked. The pipeline now looks like this:

Short-circuited pipeline

Branching the pipeline

In the previous examples, there was only one "branch" in the pipeline: the middleware coming after A was always B, and the middleware coming after B was always C. But it doesn’t have to be that way. You might want a given request to be processed by a completely different pipeline, based on the path or anything else.

There are two types of branches: branches that rejoin the main pipeline, and branches that don’t.

Making a non-rejoining branch

This can be done using the Map or MapWhen method. Map lets you specify a branch based on the request path. MapWhen gives you more control: you can specify a predicate on the HttpContext to decide whether to branch or not. Let’s look at a simple example using Map:

public void Configure(IApplicationBuilder app)
{
    app.Use(async (context, next) =>
    {
        Console.WriteLine("A (before)");
        await next();
        Console.WriteLine("A (after)");
    });

    app.Map(
        new PathString("/foo"),
        a => a.Use(async (context, next) =>
        {
            Console.WriteLine("B (before)");
            await next();
            Console.WriteLine("B (after)");
        }));

    app.Run(async context =>
    {
        Console.WriteLine("C");
        await context.Response.WriteAsync("Hello world");
    });
}

The first argument for Map is a PathString representing the path prefix of the request. The second argument is a delegate that configures the branch’s pipeline (the a parameter represents the IApplicationBuilder for the branch). The branch defined by the delegate will process the request if its path starts with the specified path prefix.

For a request that doesn’t start with /foo, this code produces the following output:

A (before)
C
A (after)

Middleware B is not invoked, since it’s in the branch and the request doesn’t match the prefix for the branch. But for a request that does start with /foo, we get the following output:

A (before)
B (before)
B (after)
A (after)

Note that this request returns a 404 (Not found) response: this is because the B middleware calls next(), but there’s no next middleware, so it falls back to returning a 404 response. To solve this, we could use Run instead of Use, or just not call next().

The pipeline defined by this code can be represented as follows:

Non-rejoining branch

(I omited the response arrows for clarity)

As you can see, the branch with middleware B doesn’t rejoin the main pipeline, so middleware C isn’t called.

Making a rejoining branch

You can make a branch that rejoins the main pipeline by using the UseWhen method. This method accepts a predicate on the HttpContext to decide whether to branch or not. The branch will rejoin the main pipeline where it left it. Here’s an example similar to the previous one, but with a rejoining branch:

public void Configure(IApplicationBuilder app)
{
    app.Use(async (context, next) =>
    {
        Console.WriteLine("A (before)");
        await next();
        Console.WriteLine("A (after)");
    });

    app.UseWhen(
        context => context.Request.Path.StartsWithSegments(new PathString("/foo")),
        a => a.Use(async (context, next) =>
        {
            Console.WriteLine("B (before)");
            await next();
            Console.WriteLine("B (after)");
        }));

    app.Run(async context =>
    {
        Console.WriteLine("C");
        await context.Response.WriteAsync("Hello world");
    });
}

For a request that doesn’t start with /foo, this code produces the same output as the previous example:

A (before)
C
A (after)

Again, middleware B is not invoked, since it’s in the branch and the request doesn’t match the predicate for the branch. But for a request that does start with /foo, we get the following output:

A (before)
B (before)
C
B (after)
A (after)

We can see that the request passes trough the branch (middleware B), then goes back to the main pipeline, ending with middleware C. This pipeline can be represented like this:

Rejoining branch

Note that there is no Use method that accepts a PathString to specify the path prefix. I’m not sure why it’s not included, but it would be easy to write, using UseWhen:

public static IApplicationBuilder Use(this IApplicationBuilder builder, PathString pathMatch, Action<IApplicationBuilder> configuration)
{
    return builder.UseWhen(
        context => context.Request.Path.StartsWithSegments(new PathString("/foo")),
        configuration);
}

Conclusion

As you can see, the idea behind the middleware pipeline is quite simple, but it’s very powerful. Most of the features baked in ASP.NET Core (authentication, static files, caching, MVC, etc) are implemented as middleware. And of course, it’s easy to write your own!

Cleanup Git history to remove unwanted files

I recently had to work with a Git repository whose modifications needed to be ported to another repo. Unfortunately, the repo had been created without a .gitignore file, so a lot of useless files (bin/obj/packages directories…) had been commited. This made the history hard to follow, because each commit had hundreds of modified files.

Fortunately, it’s rather easy with Git to cleanup a branch, by recreating the same commits without the files that shouldn’t have been there in the first place. Let’s see step by step how this can be achieved.

A word of caution

The operation we need to perform here involves rewriting the history of the branch, so a warning is in order: never rewrite the history of a branch that is published and shared with other people. If someone else bases works on the existing branch and the branch has its history rewritten, it will become much harder to integrate their commits into the rewritten branch.

In my case, it didn’t matter, because I didn’t need to publish the rewritten branch (I just wanted to examine it locally). But don’t do this on a branch your colleagues are currently working on, unless you want them to hate you 😉.

Create a working branch

Since we’re going to make pretty drastic and possibly risky changes on the repo, we’d better be cautious. The easiest way to avoid causing damage to the original branch is, of course, to work on a separate branch. So, assuming the branch we want to cleanup is master, let’s create a master2 working branch:

git checkout -b master2 master

Identify the files to remove

Before we start to cleanup, we need to identify what needs to be cleaned up. In a typical .NET project, it’s usually the contents of the bin and obj directories (wherever they’re located) and the packages directory (usually at the root of the solution). Which gives us the following patterns to remove:

  • **/bin/**
  • **/obj/**
  • packages/**

Cleanup the branch: the git filter-branch command

The Git command that will let us remove unwanted files is named filter-branch. It’s described in the Pro Git book Pro Git as the "nuclear option", because it’s very powerful and possibly destructive, so it should be used with great caution.

This command works by taking each commit in the branch, applying a filter to it, and recommiting it with the changes caused by the filter. There are several kinds of filter, for instance:

  • --msg-filter : a filter that can rewrite commit messages
  • --tree-filter : a filter that applies to the working tree (causes each commit to be checked out, so it can take a while on a large repo)
  • --index-filter : a filter that applies to the index (doesn’t require checking out each commit, hence faster)

In our current scenario, --index-filter is perfectly adequate, since we only need to filter files based on their path. The filter-branch command with this kind of filter can be used as follows:

git filter-branch --index-filter '<command>'

<command> is a bash command that will be executed for each commit on the branch. In our case, it will be a call to git rm to remove unwanted files from the index:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch **/bin/** **/obj/** packages/**'

The --cached parameter means we’re working on the index rather than on the working tree; --ignore-unmatch tells Git to ignore patterns that don’t match any file. By default, the command only applies to the current branch.

If the branch has a long history, this command can take a while to run, so be patient… Once it’s done, you should have a branch with the same commits as the original branch, but without the unwanted files.

More complex cases

In the example above, there were only 3 file patterns to remove, so the command was short enough to be written inline. But if there are many patterns, of if the logic to remove files is more complex, it doesn’t really work out… In this case, you can just write a (bash) script that contains the necessary commands, and use this script as the command passed to git filter-branch --index-filter.

Cleanup from a specific commit

In the previous example, we applied filter-branch to the whole branch. But it’s also possible to apply it only from a given commit, by specifying a revision range:

git filter-branch --index-filter '<command>' <ref>..HEAD

Here <ref> is a commit reference (SHA1, branch or tag). Note that the end of the range has to be HEAD, i.e. the tip of the current branch: you can’t rewrite the beginning or middle of a branch without touching the following commits, since each commit’s SHA1 depends on the previous commit.