adlfkjfadslkjfads

Docker and Image Sizes

Posted on Sat 11 August 2018 in Posts

So this surprised me. I had a Dockerfile that looked basically like this:

FROM alpine:latest

RUN apk add --no-cache --update \
    python3 nodejs-current-npm make git curl

RUN python3 -m ensurepip
RUN pip3 install --upgrade pip

RUN npm install -g markdownlint-cli

# needed for one of the packages in requirements.txt
RUN apk add --no-cache --update python3-dev gcc build-base

COPY requirements.txt /build/requirements.txt
RUN pip3 install -r /build/requirements.txt

Building this image resulted in an image that docker images reported as 379MB in size. That's a little large so I wanted to trim.

Since those packages installed just before copying requirements.txt to the image were only there to be able to install a package, there's no reason for them to remain in the image. Cool, so we can turf them to save on image size:

FROM alpine:latest

RUN apk add --no-cache --update \
    python3 nodejs-current-npm make git curl

RUN python3 -m ensurepip
RUN pip3 install --upgrade pip

RUN npm install -g markdownlint-cli

# needed for one of the packages in requirements.txt
RUN apk add --no-cache --update python3-dev gcc build-base

COPY requirements.txt /build/requirements.txt
RUN pip3 install -r /build/requirements.txt

# cleanup unneeded dependencies
RUN apk del python3-dev gcc build-base

Sweet, and this resulted in an image size of 381MB, a savings of NEGATIVE 2MB. Wait.... WAT?

WAT

So I removed some stuff and ended up with an image that's a few MB's larger? How does that work?

And this is where if we want to get technical, we start talking about how Docker uses a layered filesystem and as such (not entirely unlike Git) once something is added to an image, it can't really (or at least easily) be removed.

See this issue which mentions what I'm talking about: https://github.com/gliderlabs/docker-alpine/issues/45

So what do we do? Well, we combine the operations into a single Docker instruction:

FROM alpine:latest

RUN apk add --no-cache --update \
    python3 nodejs-current-npm make git curl

RUN python3 -m ensurepip
RUN pip3 install --upgrade pip

RUN npm install -g markdownlint-cli

COPY requirements.txt /build/requirements.txt

RUN apk add --no-cache --update python3-dev gcc build-base && \
    pip3 install -r /build/requirements.txt && \
    apk del python3-dev gcc build-base

Because the add & the removal of the apk packages are a single Docker instruction they don't inflate the size of the built image (you can think of layers as being "checkpoints" after each instruction in a Dockerfile).

With this change my image size dropped significantly. How much? Let's let the tool tell us:

$ docker images
REPOSITORY                   TAG                 IMAGE ID            CREATED              SIZE
someimage                    combineops          1744771da3fa        About a minute ago   216MB
someimage                    removepckgs         fc5877e2afad        4 minutes ago        381MB
someimage                    original            b6e5e43b22e0        5 minutes ago        379MB

That is, it dropped from 379MB to 216MB. Not a bad savings at all.

This is a classic "time vs space" tradeoff though. Because I had to move the requirements.txt line up, that means that builds of this image are often slower (because of the way the Docker cache works, if I change the requirements.txt file then it'll have to install those apk packages any time requirements.txt changes). However, I think the savings in space (40%+) is worth it.