Using wkhtmltopdf with Docker

Recently I needed to get wkhtmltopdf working in a Docker container with ASP.NET Core. The wonderful thing about Docker containers is that all dependencies are packaged up so you don’t have to worry about installs, versions, deployments, etc. But it does mean setting up the initial environment is a little trickier.

TL:DR Gimme the Dockerfile

For the impatient the full Dockerfile is as follows:

FROM microsoft/aspnetcore:1.1.2

COPY lib/wkhtmltox /usrtmp
RUN (cd /usrtmp && tar c .) | (cd /usr && tar xf -) && \
    chmod 555 /usr/bin/wkhtmltopdf && \
    apt-get update && \
    apt-get install -y --no-install-recommends zlib1g fontconfig libfreetype6 libx11-6 libxext6 libxrender1 && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY bin/Release/PublishOutput .
ENTRYPOINT ["dotnet", "Site.dll"]

Version Issues

You need to run a new enough version to operate correctly in headless mode on newer versions of Linux. 0.12.1-2 is the current version used by Debian, and this will not work. I used the latest 0.12.4 from the main site. Older versions also have heavy dependencies on X11, severely bloating your container.

The default package available in apt will not work.

Permissions Issues

By default, copying files into the Docker container will result in them being read-only, no execute permissions. Additionally, you can’t just do a RUN chmod and expect the files to have the updated permissions. This is due to a bug (maybe?) where you can’t really amend the permissions of another layer – it just doesn’t work. Instead you need to copy the files to one place, then apply your chmod changes:

COPY lib/wkhtmltox /usrtmp
RUN (cd /usrtmp && tar c .) | (cd /usr && tar xf -) && \
    chmod 555 /usr/bin/wkhtmltopdf

This copies the binaries from lib/wkhtmltox to the /usrtmp directory in the image, then moves the files to the real location of /usr. Finally, we set execute permissions for the binary itself.

Dependency Issues

Finally, wkhtmltopdf requires certain dependencies to be present to function. These can be installed as usual via apt:

RUN (apt-get update && \
    apt-get install -y --no-install-recommends zlib1g fontconfig libfreetype6 libx11-6 libxext6 libxrender1 && \
    rm -rf /var/lib/apt/lists/*

We have a few extra options here above what you’d normally run on your own personal Linux install:

–no-install-recommends

We only want the libraries we absolutely must have to make the tool work. So, this flag ensures only the minimum is installed, and not any extra utility packages that might be useful in a full-install. Remember this is a container, so the bare minimum is all we’ll ever need.

rm -rf /var/lib/apt/lists/*

Once we’ve got all the packages installed, we won’t be using the package cache again. So, remove it to keep the container size down.

Posted by Dan in Guides, Programming, 1 comment

Pre-compress static web files using GZip and Brotli automatically

If you’ve worked with the web for any amount of time, you’ll know that compression is one of the very best ways of improving page load times. You might also be annoyed by the fact that you’re wasting CPU cycles on compressing the same files over and over – not to mention the added latency waiting for the compression to complete. The best option is to pre-compress all of your static files as part of the build or deploy process of your web application. For just this requirement, I’ve created a small Node script that’ll recurse through a directory compressing all of the files it locates.

The Code

You can find the Gist here: https://gist.github.com/danclarke/7a5b647d38a63241b71fb3743db15160

Simply update the last line to point to the directory you want to compress. By default, I’ve got the script compressing ‘dist’.

compressDir('dist');

Nginx

Next, you’ll need to configure your web server to use the pre-compressed files instead of compressing them on the fly. For Nginx you’ll need to do the following:

Then add the following lines to your Nginx config for either the http configuration or the location configuration:

  • gzip_static on;
  • brotli_static on;

And that’s it!

Docker

If you want to use Nginx in a Docker container with Brotli, you can use this very cool Github project: https://github.com/fholzer/docker-nginx-brotli. Then in the Dockerfile for your website, ensure you use your custom Nginx image instead of the official one.

My configuration for Nginx looks like the following:

server {
    listen       80;
    server_name  localhost;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
        gzip_static   on;
        brotli_static on;
    }
}
Posted by Dan in Programming, 0 comments

Azure Blob Upload Speed – Don’t use OpenWriteAsync()

Uploading to Azure Storage with the .NET Client can often require some customisation to ensure acceptable performance. First, let’s look at some options in BlobRequestOptions:

SingleBlobUploadThresholdInBytes

The minimum size of a blob before it’ll be uploaded in ‘chunks’. This only works for the non-stream based upload methods.

Minimum: 1MB, or 1,048,576 bytes.

ParallelOperationThreadCount

The maximum number of upload operations to perform in parallel, for a single blob.

There’s also a useful property on the Blob itself:

StreamWriteSizeInBytes

The size of each block to upload. So, for example, if you set to 1MB, a 4MB file will be chunked into 4 separate 1MB blocks.

Default value: 4MB, or 4,194,304 bytes.

// Options
var options = new BlobRequestOptions
{
    SingleBlobUploadThresholdInBytes = 1024 * 1024, //1MB, the minimum
    ParallelOperationThreadCount = 1
};

client.DefaultRequestOptions = options;

// Blob stream write
blob.StreamWriteSizeInBytes = 1024 * 1024;

A more thorough explanation is available here: https://www.simple-talk.com/cloud/platform-as-a-service/azure-blob-storage-part-4-uploading-large-blobs/

When it’s all ignored – OpenWriteAsync()

You can set all of these options, but if you use blob.OpenWriteAsync() it’s going to upload files in 5KB chunks as you write to the stream. This will absolutely destroy performance if you’re uploading larger files or a lot of files. Instead, you’ll need to use the blob.UploadFromStreamAsync() method:

// Buffer to a memory stream so that the client uploads in one chunk instead of multiple
// By default the client seems to upload in 5KB chunks
using (var memStream = new MemoryStream())
{
    // Save to memory stream
    await saveActionAsync(memStream);

    // Upload to Azure
    memStream.Seek(0, SeekOrigin.Begin);
    await blob.UploadFromStreamAsync(memStream);
}

If you use the UploadFromStreamAsync() method, the settings you set will be honoured and blobs will be uploaded in a much more efficient manner.

Posted by Dan in Azure, C#, 3 comments

Including / Excluding files from build with VS2017 and .NET Core

I have a project where the wwwroot for an ASP.NET Core application is an SPA managed entirely by a node build chain. This means VS is completely hands-off with regards to the development of the UI, but it does have to somewhat be aware of it to debug and publish.

VS2015 / project.json Build

The project.json file looked as follows:

  "buildOptions": {
    "emitEntryPoint": true,
    "preserveCompilationContext": true,
    "compile": {
      "exclude": ["wwwroot"]
    }
  },

  "publishOptions": {
    "include": [
      "wwwroot/dist/prod",
      "appsettings.json",
      "web.config"
    ]
  }

This would result in the wwwroot not getting compiled whatsoever, but the files within ‘dist’ would get published along with the .NET code. The node_modules folder was hidden within the VS Solution Explorer.

VS2017 / CSProj build

The auto-migration with VS2017 will attempt to follow the changes in the project.json, but the end result won’t work. VS will be super slow due to the node_modules folder, while also trying to compile the Typescript. You can wholesale block the compilation of Typescript, but I prefer to be a bit more explicit about what I want VS to do. Edit the .csproj for your project, and use the following ItemGroup to do ensure it does the same as the project.json from earlier:

<ItemGroup>
  <Compile Remove="wwwroot\**" />
  <Content Remove="wwwroot\**" />
  <EmbeddedResource Remove="wwwroot\**" />
  <None Remove="wwwroot\**" />
  <Content Include="wwwroot\dist\prod\**\*" />
  <Content Update="wwwroot\dist\prod\**\*;appsettings.json;web.config">
    <CopyToPublishDirectory>PreserveNewest</CopyToPublishDirectory>
  </Content>
</ItemGroup>

This has the additional advantage in that the files we don’t care about in wwwroot are automatically hidden for all users of the solution – no manually hiding from Solution Explorer.

Posted by Dan in Programming, 2 comments

Building an ASP.NET Core Docker Image on Linux

ASP.NET Core is super awesome, especially since it plays really well with the latest deployment methods. Microsoft has even gone as far as to pre-package an optimised .NET Core Docker image with the core libraries pre-compiled for a super-fast startup. So let’s get started, but first you’ll need to install Docker and the .NET Tools on your Linux machine if you haven’t already.

Add a Dockerfile

Add a new text file called ‘Dockerfile’ (case sensitive) to the root of your project, make sure it doesn’t have any extension (such as .txt). In the Dockerfile add the following code:

FROM microsoft/aspnetcore:1.1.0
WORKDIR /app
COPY ./output .
ENTRYPOINT ["dotnet", "MySite.dll"]

Update the ‘MySite.dll’ reference to the name of your project with a .dll extension. Also while you’re at it, change the version number of .NET Core if you’re not using 1.1 like I am. I highly recommend 1.1 or later with Linux due to much better performance.

Build from command line

Run the following commands in the project directory:

dotnet restore
dotnet publish -o output -c release

This will get all dependencies, compile a release build, and put the result into the ‘output’ directory. The reason we’re using a sub-directory is due to a bug in the tooling, if we use a parent relative path (../) the ‘publishOptions/include’ config setting in project.json will be ignored and you’ll be missing a chunk of your project!

Build the docker image

Now let’s get that code into a Docker image! Run the following command:

sudo docker build -t myapp .

Feel free to rename myapp to something a little more descriptive. You can also specify multiple tags, such as:

sudo docker build -t myapp -t myapp:1.0 .

It’s really up to you with regards to tagging. If you’re publishing to a repository (likely), make sure to add the applicable repository tag to the list.

Running the image

Run the following command to start up a new container:

sudo docker run -p 8000:80 myapp 

Make sure to update myapp to the name you used earlier when building. Your site should now be accessible on port 8000: http://localhost:8000

You may be wondering why we forwarded to port 80, and not 5000 or whatever port you’ve specified in launchSettings.json. As part of the aspnetcore image, an environment variable is set to tell Kestrel to host on port 80. This can be overridden either in your Dockerfile, or using app.UseUrl in your Program.cs file.

Don’t connect directly to the container

Remember, the site is still running in Kestrel so it’s not magically secure by virtue of running in a container. Make sure to use a reverse-proxy such as nginx when running your site in a production environment. Microsoft are working on hardening Kestrel so that you can use it directly in the future – but we’re not there yet.

Possible errors

Did you mean to run dotnet SDK commands?

You might get the following error upon execution:

Did you mean to run dotnet SDK commands? Please install dotnet SDK from:
http://go.microsoft.com/fwlink/?LinkID=798306&clcid=0x409

This basically means the .dll you referenced couldn’t be found. Double-check all of your paths and filenames. It’s easy to include the source instead of the output, or even save to / from the wrong directory.

Could not load <dependency>.dll

If you get dependency errors on execution, it means the output of the publish wasn’t saved to the /app folder within the container. The .dll files from the publish operation must all go directly into the root of the /app folder in the container. This means you can’t rename /app to something else.

More information

You can get more information from:

Github

Official MS Github: https://github.com/aspnet/aspnet-docker. Navigate to the version of .NET Core you want (1.0/jessie, 1.1/jessie, etc.). If you’re using the output of a publish, you’ll want the ‘runtime’ subfolder.

Docket

Official MS Docker Hub: https://hub.docker.com/r/microsoft/aspnetcore/. A quick overview of the base image, although it looks like MS want to spend most of their focus on the Github account.

Posted by Dan in Guides, 0 comments

How ‘this’ works in Javascript

The this keyword in Javascript has a certain reputation for being awkward. Fortunately, it’s very easy to learn how this works, and then leverage it for your benefit rather than detriment.

Let’s start with some code to play around with:

name = 'Global';

function myClass() {
    this.name = 'My Class';

    this.getMyName = function() {
        return this.name;
    };
}

function person() {
    this.name = 'Bob';
}

var classInstance = new myClass();
var nameMethod = classInstance.getMyName;
var personInstance = new person();

In this example, the function ‘getMyName’ gets the name value from ‘this’. Let’s see what it outputs:

console.log(classInstance.getMyName()); // My Class

So far, so good. But what if we call the method directly, and not via the class?

console.log(nameMethod()); // Global

Uh oh, we’re seeing the global ‘name’ value, not the one for the class we set up. What’s happening? Well, this is actually passed in implicitly as the first argument to the function – you just never see it. By default, this is the target you call the function on. So in reality the function definition and execution actually look like the following:

function(this) {
    return this.name;
};

console.log(classInstance.getMyName(classInstance));

When you call the method directly without the classInstance receiver, the global scope is passed in as this. As a result, we can do some pretty clever things. Remember how we have another class called ‘person’? We can detach the method from myClass and call it using person:

console.log(nameMethod.call(personInstance)); // Bob

Hang on, what’s this ‘call’ method? Call allows you to explicitly set the this value. The first parameter will be mapped to this, while all subsequent parameters will be passed directly to the function. If you don’t specify a parameter, call will implicitly use the global scope:

console.log(classInstance.getMyName.call()); // Global

Pretty cool, huh? This is how jQuery allows you to use this to refer to the element you’re working on with your anonymous functions and the like.

But, what if we want getMyName to always refer to the name within the class? For this we’ll want to ‘capture’ the this variable so that it never changes:

function myClass() {
    self = this;
    this.name = 'My Class';

    this.getMyName = function() {
        return self.name;
    };
}

By copying this into a local variable at instantiation, we can then bring it into the getMyName function via a closure. Now the output is as expected without any this shenanigans:

console.log(classInstance.getMyName()); // My Class

console.log(classInstance.getMyName.call()); // My Class

console.log(nameMethod()); // My Class

console.log(nameMethod.call(personInstance)); // My Class
Posted by Dan in Javascript, 1 comment

Alexa Skills with ASP.NET Core – Porting Reindeer Games

Making a custom Alexa skill is surprisingly easy, especially with the samples available from Amazon. I recently took the opportunity to port the NodeJS Reindeer Games project to .NET, and at the same time improve the code to something a little more readable and maintainable.

I’ve put the code up on my GitHub here: https://github.com/danclarke/ReindeerGamesNetCore

There’s a project for both a Web Service hosted on Azure and a Lambda Function hosted on AWS.

Hosting-wise a Lambda Function is easily the preferred solution simply due to latency. Calls from Alexa to an AWS Lambda are much faster than leaving the data centre and calling your remote web service.

If you do need to create a web service it must be SSL secured. This is easy in production, but a little harder during development. To make this easier, you might want to use something like Caddy with Let’s Encrypt.

Posted by Dan in Alexa, 0 comments

Read-Only Public Properties / State in C#

There are a few ways you may choose to implement a read-only property in C#. Some options are better than others, but how do you know which is the best? Let’s take a look at what you could potentially use and what the pros/cons are:

Naive approach

public class Reader
{
    private string _filename = "Inputfile.txt";

    public string Filename { get { return _filename; } }
}

At first glance, the code above should be ideal. It gives you an explicitly read-only property and correctly hides the internal backing variable. So, what’s wrong?

Internal variable is visible, as well as the property

This makes the code a little more brittle to changes. In the implementation for Reader, developers could use either _filename or the property Filename. If in the future, the code was amended to make the Filename property virtual, all usages of _filename would be immediate bugs. You always want the code to be as resilient as possible. Of course, you should always use the property in code, but mistakes are very easy to make and overlook. It’s much better to make it impossible to write incorrect code – ensuring you end up in the pit of success.

_filename is mutable!

While the property itself is read-only, the backing variable is mutable! Our implementation of Reader is free to change the property as much as it likes. This might be desirable, but generally, we want the state to be as fixed as possible.

Better approach, but not ideal

public class Reader
{
    private const string _filename = "Inputfile.txt";

    public string Filename => _filename;
}

The property accessor is now more concise, and the backing store is read-only. This is good, but it could be better.

Use of const is risky

What? const is risky? In this case, it could be. Const works by substituting all uses of the variable with the explicit value directly into the bytecode at compilation time. This means const is super-fast because there’s no memory usage and no memory access. The drawback is if the const value is changed, but an assembly referencing this assembly isn’t re-compiled. If this happens the other assembly will still see the ‘old’ value. In this example, this could happen if another assembly extended the Reader class it wouldn’t see changes to the private const variable unless it was also re-compiled. To make it even more confusing – it would likely ‘see’ the changes with Debug builds, but not with Release builds. The simple fix to this const difficulty is to use readonly instead of const for values that could change, such as settings. Genuine constants, such as PI, should remain as const.

Better again, but still not ideal

public class Reader
{
    private static readonly string _filename = "Inputfile.txt";

    public string Filename => _filename;
}

The code is getting a little more reliable here. The filename state variable is now static readonly so it’s memory usage is low, while still offering flexibility to changes – even if consuming assemblies aren’t recompiled with every change we make to our assembly.

As an aside, read-only state variables should be static wherever possible. If the variable isn’t static, it’ll be created and initialised with every single instance of the class – rather than just once across all instances.

Ideal approach

public class Reader
{
    public string Filename { get; } = "Inputfile.txt";
}

This is the best solution possible because it gives us maximum flexibility, with the minimum amount of potential issues going forward.

Good Locality

The initialisation happens with the property declaration ensuring high code locality. Before the property could be in a completely different area of the code file to the actual implementation of the private backing state.

No access to backing variable / state

The backing variable is now hidden from us, so we can’t accidentally use it instead of the property. If we make the property virtual, we can’t accidentally use the backing variable by mistake elsewhere in the code.

Easy to change to constructor initialisation

We can just remove the initialiser and put initialisation into the constructor. The property is still read-only, and the backing variable is still invisible. Easy!

Posted by Dan in C#, Programming, 0 comments

ASP.NET Identity 3 with EF is bloated and slow… let’s fix that

Since first using ASP.NET 5 and MVC 6 I’ve been fighting the EF-based Identity framework. I can’t do much about the rather obtuse API, but I can fix the constant moaning about migrations and terrible performance by replacing the EF data stores with something a little (OK a LOT) better. I’m a big fan of the new ‘micro’ ORMs out there like PetaPOCO, Dapper, Massive, etc. They’re extremely easy to use, and incredibly fast – almost as fast as manually typing out ADO.NET commands yourself but without the pain. At work I use NPOCO, which is an extended and in my opinion ‘better’ version of PetaPOCO.

While the API for ASP.NET Identity may be ‘odd’, Microsoft have made it quite easy to extend / change the framework. Removing the hard dependency on Entity Framework is a simple case of replacing the ‘UserStore’ and ‘RoleStore’ with your own implementations. Technically you only have to support the features you’re actually using. But it’s pretty easy to support everything but the ‘Queryable’ functionality, and you do this by implementation various interfaces such as:

  • IUserLoginStore<TUser>
  • IUserRoleStore<TUser>
  • IUserClaimStore<TUser>
  • IUserPasswordStore<TUser>
  • IUserSecurityStampStore<TUser>
  • IUserEmailStore<TUser>
  • IUserLockoutStore<TUser>
  • IUserPhoneNumberStore<TUser>
  • IUserTwoFactorStore<TUser>

Part of the reason the storage API is so over-engineered is so that the system can support both databases and remote services without impacting on performance too much. The disadvantage, ironically, is that performance is slightly lower if you’re using a traditional database as the data store. It also means when using Identity you have to explicitly ask for things, rather than being able to fetch a user and immediately see the available roles for example.

Microsoft have completely open-sourced Identity so you can see the default implementation over at GitHub here. I used this code as the basis for my own implementation. Of particular interest are the UserStore and RoleStore classes in the EntityFramework project.

I’ve open sourced my implementation and made it available on GitHub here. In short you lost almost no functionality, yet gain a staggering performance gain of up to 1,136%. You can easily transition from your current EF version to the new NPOCO version since the DB schema is identical (in fact I’ve just stolen MS’ schema for compatibility purposes). You can still use your own custom user and role classes as long as they inherit from IdentityUser and IdentityRole – just like with the MS EF version.

Posted by Dan in C#, Open Source, 0 comments

When to use an Interface and when to use a concrete type

When you first start working with Inversion of Control (might have to write a blog post on that…) & unit testing you’ll probably go interface crazy! Interfaces are great, and they really make it easy to unit test and take full advantage of inversion of control. However, sometimes you don’t want to use one. Fortunately if you’re using design patterns properly, it’s easy to decide if you need to use an interface or not:

  • Is the class just data storage? If so, it should always just be a concrete type
  • Is the class a service that does something? If so it should always have an interface

The problem is when you’ve got a class that’s mostly data storage but also has some logic to it. In this case you’ve broken ‘Separation of Concerns’ and now have a problem! In this case you’ll want to expose an interface simply to make testing easier. However, ideally you want to refactor the logic out into one or more new classes.

Posted by Dan in Programming, 0 comments