AWS Journey: Docker Build Optimization and Image Tagging

Feb 25, 2025

After improving our security setup, we encountered two significant challenges:

Go Build Performance Issue:
- Every time we pushed a small code change
- Our CI/CD pipeline spent 1 minute downloading the same Go packages
- Even a one-line change triggered a full dependency download
Docker Image Versioning Problem:
- Currently using ':latest' tag for our images
- Makes it difficult to track which version is deployed
- Rollbacks are risky and complicated
- No clear connection between deployed code and Git history

Let's solve these issues by:

Implementing efficient Docker layer caching for Go builds
Using Git commit IDs for precise image versioning

1. Docker Layer Caching Strategies

Optimizing Our Dockerfile

⏳ Before:

# The rest of the Dockerfile is the same as before

# Copy only go.mod and go.sum first
COPY go.mod go.sum ./

# Download dependencies
RUN go mod download

# Copy the rest of the application
COPY . .

# Build the binary
RUN go build -o main

# The rest of the Dockerfile is the same as before

🏃‍♂️ After:

# The rest of the Dockerfile is the same as before

# Copy only go.mod and go.sum first
COPY go.mod go.sum ./

# Then copy the rest
COPY . .

# Combine download and build in single RUN with shared caches
RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    go mod download && \
    go build -trimpath -ldflags="-s -w" -o main ./cmd/main.go

# The rest of the Dockerfile is the same as before

💡 Why This Works Better:

Cache Mounting:
- type=cache: Uses BuildKit's cache feature
- target=/go/pkg/mod: Caches Go modules
- target=/root/.cache/go-build: Caches build artifacts
Performance Benefits:
- Faster subsequent builds
- Doesn't need to re-download dependencies
- Preserves build cache between runs
- Reduces disk space usage
Single RUN Command:
- Reduces number of layers
- Better layer caching
- More efficient image size

Build Time Comparisons

Test Environment

t2.small instance

Result

comparison table

Real Pipeline Logs

First Build (No Cache): First Build Time

First build taking 1 minute 29 seconds Second Build (No Cache): Second Build Time

Second build taking 1 minute 14 seconds

First Build (With Cache): First Build Time

First build taking 21 seconds Second Build (With Cache): Second Build Time

Second build taking 19 seconds

Performance Improvement:

Average build time reduced from 81.5s to 20s
~75% faster builds with caching
Consistent performance across subsequent builds

💡 What Improved:

Subsequent builds are much faster
No redundant package downloads
Less network usage
Better developer experience

Handling Cache from Previous Image

When building a new image, we want to use the last successfully deployed image for caching. Here's how to implement it:

build:
  script:
    # Get last tag and verify it exists in ECR
    - |
      LAST_TAG=$(aws ssm get-parameter \
        --name "/myapp/config/image-tag" \
        --query "Parameter.Value" \
        --output text || echo "none")
      echo "Last deployed tag from Parameter Store: $LAST_TAG"
      
      # Initialize use_cache
      use_cache=false
      
      # Check if image exists in ECR
      if [ "$LAST_TAG" != "none" ]; then
        echo "Checking if image exists in ECR..."
        if aws ecr describe-images \
          --repository-name $(echo $ECR_REPOSITORY_URL | cut -d/ -f2) \
          --image-ids imageTag=$LAST_TAG >/dev/null 2>&1; then
          echo "✅ Image found in ECR"
          use_cache=true
        else
          echo "⚠️ Warning: Image tag exists in Parameter Store but not in ECR"
        fi
      else
        echo "No previous tag found in Parameter Store"
      fi
    
    # Build with or without cache
    - |
      if [ "$use_cache" = true ]; then
        echo "Using previous image for cache: $LAST_TAG"
        time docker pull $ECR_REPOSITORY_URL:$LAST_TAG
        time docker build --progress=plain \
          --cache-from=$ECR_REPOSITORY_URL:$LAST_TAG \
          --memory=512m --memory-swap=1g \
          --cpu-quota=30000 --cpu-period=100000 \
          -t $ECR_REPOSITORY_URL:$CI_COMMIT_SHA \
          -f Dockerfile.prod .
      else
        echo "Building without cache"
        time docker build --progress=plain \
          --memory=512m --memory-swap=1g \
          --cpu-quota=30000 --cpu-period=100000 \
          -t $ECR_REPOSITORY_URL:$CI_COMMIT_SHA \
          -f Dockerfile.prod .
      fi

deploy stage in .gitlab-ci.yml

💡 How This Works:

Check Parameter Store:
- Get last deployed tag
- Returns "none" if parameter doesn't exist
Verify Image in ECR:
- Check if image actually exists in repository
- Handles cases where Parameter Store is out of sync
Cache Decision:
- Use cache only if image exists in both places
- Fall back to no-cache build if anything is missing
Build Scenarios:
- First deployment: builds without cache
- Missing image: builds without cache
- Image exists: uses cache for faster build

2. Image Tagging with Commit IDs

Why Not 'latest'?

Ambiguous versioning
Difficult to track deployments
Rollback challenges
Cache issues

Create parameter for Image Tag

Why Parameter Store?

Securely store the last deployed image tag
Enable EC2 to know which image version to pull
Maintain deployment history
Easy rollback capability

Setup Steps in AWS Console:

Access AWS Console > Parameter Store
Click "Create parameter"
Set parameter name as /myapp/config/image-tag or anything else but make sure it uses forward slash (/) because it is recommended by AWS
Leave description empty because it is not required
Choose "Standard" tier
Set parameter type as "String"
Set data type as "Text"
Set parameter value like initial-setup (will be updated by pipeline)
Leave tags empty because it is not required
Click "Create Parameter"

create Parameter for image tag

Verification

Parameter should appear in the list

updated list param

Save Commit ID as a variable in GitLab CI

variables:
  IMAGE_TAG: $CI_COMMIT_SHA

Complete Flow with Verification in GitLab CI

variables:
  DOCKER_HOST: "unix:///var/run/docker.sock"
  DOCKER_BUILDKIT: "1"
  IMAGE_TAG: $CI_COMMIT_SHA

stages:
  - build
  - deploy
build:
  stage: build
  tags: docker
  before_script:
    - |
      # Install AWS CLI
      apk add --no-cache aws-cli

      aws --version
  script:
    - echo "Building Docker image..."
    - aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_REPOSITORY_URL
    
    # Get the last deployed image tag for caching

    - |
      LAST_TAG=$(aws ssm get-parameter \
        --name "/myapp/config/image-tag" \
        --query "Parameter.Value" \
        --output text || echo "none")
      echo "Last deployed tag: $LAST_TAG"

      # Check if image exists in ECR
      if [ "$LAST_TAG" != "none" ]; then
        echo "Checking if image exists in ECR..."
        if aws ecr describe-images \
          --repository-name $(echo $ECR_REPOSITORY_URL | cut -d/ -f2) \
          --image-ids imageTag=$LAST_TAG >/dev/null 2>&1; then
          echo "✅ Image found in ECR"
          docker pull $ECR_REPOSITORY_URL:$LAST_TAG
        else
          echo "⚠️ Warning: Image tag exists in Parameter Store but not in ECR"
        fi
      else
        echo "No previous tag found in Parameter Store"
      fi

    # Generate unique tag and build
    - |
      # Build with cache if available
      if [ "$LAST_TAG" != "none" ]; then
        echo "Building with cache from: $LAST_TAG"
        time docker build --progress=plain \
          --cache-from=$ECR_REPOSITORY_URL:$LAST_TAG \
          --memory=512m --memory-swap=1g \
          --cpu-quota=30000 --cpu-period=100000 \
          -t $ECR_REPOSITORY_URL:$IMAGE_TAG \
          -f Dockerfile.prod .
      else
        echo "Building without cache"
        time docker build --progress=plain \
          --memory=512m --memory-swap=1g \
          --cpu-quota=30000 --cpu-period=100000 \
          -t $ECR_REPOSITORY_URL:$IMAGE_TAG \
          -f Dockerfile.prod .
      fi
      
      # Push the new image
      echo "Pushing image: $ECR_REPOSITORY_URL:$IMAGE_TAG"
      docker push $ECR_REPOSITORY_URL:$IMAGE_TAG
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: always

deploy:
  stage: deploy
  tags:
    - docker
  needs:
    - build
  before_script:
    - apk add --no-cache aws-cli
  script:
    - |
      echo "Starting deployment."
      echo "Commit ID: $IMAGE_TAG"
    - |
      # Update Parameter Store with current commit SHA
      aws ssm put-parameter \
        --name "/myapp/config/image-tag" \
        --value "$IMAGE_TAG" \
        --type String \
        --overwrite
      STORED_TAG=$(aws ssm get-parameter \
        --name "/myapp/config/image-tag" \
        --query "Parameter.Value" \
        --output text)
      
      if [ "$STORED_TAG" != "$IMAGE_TAG" ]; then
        echo "❌ Parameter update failed!"
        exit 1
      fi
      echo "✅ Parameter successfully updated !!!"
    - |
      echo "-----BEGIN RSA PRIVATE KEY-----" > private_key
      echo "$SSH_PRIVATE_KEY" | fold -w 64 >> private_key
      echo "-----END RSA PRIVATE KEY-----" >> private_key
    - chmod 600 private_key
    - scp -o StrictHostKeyChecking=no -i private_key docker-compose.prod.yml $EC2_USER@$EC2_HOST:$DIRECTORY_APP/
    - |
      COMMAND_ID=$(aws ssm send-command \
        --instance-ids $EC2_INSTANCE_ID \
        --document-name "AWS-RunShellScript" \
        --parameters '{"commands":[
            "DIRECTORY_APP=$(aws ssm get-parameter --name "/myapp/config/directory-app" --query "Parameter.Value" --output text)",
            "cd $DIRECTORY_APP",
          "echo "We use AWS Parameter Store to fetch secrets"",
          "REGION=$(aws ec2 describe-availability-zones --output text --query 'AvailabilityZones[0].[RegionName]')",
          
          "SECRET_NAME=$(aws ssm get-parameter --name "/myapp/config/secret-name" --query "Parameter.Value" --output text)",
          "IMAGE_TAG=$(aws ssm get-parameter --name "/myapp/config/image-tag" --query "Parameter.Value" --output text)",
          "secrets=$(aws secretsmanager get-secret-value --secret-id $SECRET_NAME --region $REGION --query "SecretString" --output text | jq .)",
          "echo "Secrets fetched successfully 🎉"",
          "export DB_HOST=$(echo $secrets | jq -r ".database.DB_HOST")",
          "export DB_USER=$(echo $secrets | jq -r ".database.DB_USER")",
          "export DB_PORT=$(echo $secrets | jq -r ".database.DB_PORT")",
          "export DB_PASSWORD=$(echo $secrets | jq -r ".database.DB_PASSWORD")",
          "export DB_ROOT_PASSWORD=$(echo $secrets | jq -r ".database.DB_ROOT_PASSWORD")",
          "export DB_NAME=$(echo $secrets | jq -r ".database.DB_NAME")",
          "export AWS_REGION=$REGION",
          "export ECR_REPOSITORY_URL=$(echo $secrets | jq -r ".aws.ECR_REPOSITORY_URL")",
          "export IMAGE_TAG=$IMAGE_TAG",
          "export PORT=$(echo $secrets | jq -r ".app.PORT")",
          "export GIN_MODE=$(echo $secrets | jq -r ".app.GIN_MODE")",
          "aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_REPOSITORY_URL",
          "docker pull $ECR_REPOSITORY_URL:$IMAGE_TAG || true",
          "if ! docker ps --filter "name=mysql-prod" --filter "status=running" | grep -q "mysql-prod"; then",
          "  echo "Starting mysql service..."",
          "  DB_USER=$DB_USER DB_ROOT_PASSWORD=$DB_ROOT_PASSWORD docker-compose -f docker-compose.prod.yml up -d mysql",
          "else",
          "  echo "MySQL is already running ✅"",
          "fi",
          "docker-compose -f docker-compose.prod.yml stop go",
          "docker-compose -f docker-compose.prod.yml rm -f go",
          "docker images $ECR_REPOSITORY_URL -q | grep -v $IMAGE_TAG | xargs -r docker rmi -f || true",
          "DB_HOST=$DB_HOST DB_USER=$DB_USER DB_PORT=$DB_PORT DB_PASSWORD=$DB_PASSWORD DB_ROOT_PASSWORD=$DB_ROOT_PASSWORD DB_NAME=$DB_NAME PORT=$PORT GIN_MODE=$GIN_MODE ECR_REPOSITORY_URL=$ECR_REPOSITORY_URL IMAGE_TAG=$IMAGE_TAG docker-compose -f docker-compose.prod.yml up -d go",
          "echo "Deployment completed successfully 🎉""
        ]}' \
        --output text \
        --query "Command.CommandId")
    - |
      echo "Waiting for command completion..."
      while true; do
        STATUS=$(aws ssm list-commands \
          --command-id "$COMMAND_ID" \
          --query "Commands[0].Status" \
          --output text)  
        
        if [ "$STATUS" = "Success" ]; then
          echo "Command completed successfully"
          break
        elif [ "$STATUS" = "Failed" ]; then
          echo "Command failed"
          break
        fi
        
        echo "Status: $STATUS"
        sleep 3
      done
    - |
      aws ssm get-command-invocation \
        --command-id "$COMMAND_ID" \
        --instance-id "$EC2_INSTANCE_ID" \
        --query "StandardOutputContent" \
        --output text
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
      when: on_success

updated .gitlab-ci.yml

Update Image Tag in Docker Compose

image: $ECR_REPOSITORY_URL:$IMAGE_TAG

updated image tag in docker-compose.prod.yml

Flow Steps:

GitLab CI has the commit SHA
Updates Parameter Store
EC2 reads from Parameter Store
Sets as environment variable
Docker Compose uses the variable
Pulls correct image version from ECR

🌟 Results

Build time improvements
Clear image versioning
Reliable deployments

🔗 Resources

Demo Repository

Full repository with complete implementation can be found here

Official Documentation

📈 Next Steps: Using AWS S3 in CI/CD pipeline

Our upcoming focus areas include:

Using S3 to store and sync docker-compose files
Implementing proper IAM roles and policies for S3 access

AWS Journey: Improving Security with SSM and Secrets Manager

AWS Journey: Using S3 in CI/CD Pipeline