Zero‑Downtime Blue/Green Deployments with GitHub Actions and Kubernetes

Intelligence NetworkAwaiting Sponsored Broadcast

# Introduction

Technical diagram of a blue/green deployment pipeline with GitHub Actions, Helm, and Kubernetes services

A recent survey of 1,200 DevOps teams reported that 42 % of production incidents stem from faulty releases, and 19 % of those could have been avoided with a proper traffic‑switch strategy. The blue/green pattern isolates the new version (green) from the live version (blue) until health checks pass, then flips traffic instantly. This tutorial shows you how to implement that pattern with modern tools that are still supported in 2025.

“The moment you trust a single deployment to be both new and safe, you invite failure. Blue/green forces you to prove safety before users see it.” – Senior Site Reliability Engineer, PPIL

# Prerequisites

A Kubernetes cluster (v1.27+) with kubectl configured.
A GitHub repository with admin rights to create Actions workflows.
Helm 3.12+ installed locally.
A service mesh that supports traffic splitting (e.g., Istio 1.20 or Linkerd 2.14).
Docker Engine 24.0+ for building container images.

All commands assume a Unix‑like shell.

# Step‑by‑Step Blueprint

# 1. Prepare the Kubernetes namespace and base manifests

Create a dedicated namespace for the application and store Helm charts in a charts/ directory.

# Create namespace
kubectl create namespace prod-bluegreen

# Verify
kubectl get ns prod-bluegreen

In charts/myapp/values.yaml, define two deployment names that will be templated later:

# charts/myapp/values.yaml
deploymentName: "{{ .Release.Name }}-{{ .Values.environment }}"
replicaCount: 3
image:
  repository: ghcr.io/yourorg/myapp
  tag: "{{ .Values.imageTag }}"
service:
  port: 80

# 2. Add a Helm chart for the green deployment

Duplicate the chart folder to charts/myapp-green and adjust the environment value.

cp -r charts/myapp charts/myapp-green
sed -i 's/environment: .*/environment: green/' charts/myapp-green/values.yaml

The blue chart remains unchanged (environment: blue by default). Both charts share the same service name, allowing the mesh to route traffic between them.

# 3. Configure Istio VirtualService for traffic splitting

Create a template istio/virtualservice.yaml that references the two deployments.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
  namespace: prod-bluegreen
spec:
  hosts:
  - myapp.example.com
  http:
  - route:
    - destination:
        host: myapp
        subset: blue
      weight: 100
    - destination:
        host: myapp
        subset: green
      weight: 0

Later the pipeline will patch the weight fields.

# 4. Set up GitHub Actions workflow

Create .github/workflows/bluegreen-deploy.yml. The file uses the docker/build-push-action (v4) and the helm/kubernetes-action (v2) which are the latest stable releases as of March 2025.

name: Blue/Green Deploy

on:
  push:
    branches: [ main ]

permissions:
  contents: read
  packages: write
  id-token: write

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ steps.set-tag.outputs.tag }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Compute image tag
        id: set-tag
        run: |
          TAG=$(git rev-parse --short HEAD)
          echo "tag= $TAG&quot; &gt;&gt;$ GITHUB_OUTPUT

      - name: Build and push image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: ghcr.io/yourorg/myapp:${{ steps.set-tag.outputs.tag }}

  deploy-green:
    needs: build-and-push
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Checkout repo
        uses: actions/checkout@v4

      - name: Set up kubectl
        uses: azure/setup-kubectl@v2
        with:
          version: 'v1.27.3'   # latest stable as of 2025

      - name: Set up Helm
        uses: azure/setup-helm@v3
        with:
          version: 'v3.12.3'

      - name: Deploy green release
        env:
          IMAGE_TAG: ${{ needs.build-and-push.outputs.image-tag }}
        run: |
          helm upgrade --install myapp-green charts/myapp-green \
            --namespace prod-bluegreen \
            --set environment=green \
            --set imageTag=$IMAGE_TAG

      - name: Verify green pods are ready
        run: |
          kubectl rollout status deployment/myapp-green -n prod-bluegreen --timeout=120s

  switch-traffic:
    needs: [deploy-green]
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repo
        uses: actions/checkout@v4

      - name: Set up kubectl
        uses: azure/setup-kubectl@v2
        with:
          version: 'v1.27.3'

      - name: Patch VirtualService to 100 % green
        run: |
          kubectl -n prod-bluegreen patch virtualservice myapp \
            --type='json' -p='[
              {"op":"replace","path":"/spec/http/0/route/0/weight","value":0},
              {"op":"replace","path":"/spec/http/0/route/1/weight","value":100}
            ]'

      - name: Wait for traffic to settle
        run: sleep 30

      - name: Decommission blue release
        run: |
          helm uninstall myapp-blue -n prod-bluegreen || true

Why this works

The workflow builds a container image tagged with the current commit SHA, guaranteeing traceability.
The deploy-green job installs the new version under the green label while the blue version continues serving traffic.
The switch-traffic job updates the Istio VirtualService, moving 100 % of requests to the green pods in a single API call, which the mesh applies atomically.
Finally, the blue release is removed, freeing resources.

# 5. Add health‑check probes to the Helm chart

Open charts/myapp/templates/deployment.yaml and insert liveness and readiness probes that use HTTP GET on /healthz. Probes must return 200 within 2 seconds; otherwise the pod is considered unhealthy.

          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 10
            periodSeconds: 15
            timeoutSeconds: 2
          readinessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 2

These probes give the mesh confidence that the green pods can accept traffic before the switch.

# 6. Verify the deployment locally before merging

Run the workflow on a feature branch using the workflow_dispatch event. The steps are identical, but you can point the kubectl context at a staging cluster.

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Target environment (staging or prod)'
        required: true
        default: 'staging'

Adjust the namespace and mesh configuration accordingly. A successful run proves that the pipeline works end‑to‑end without affecting live users.

# 7. Implement rollback logic (optional but recommended)

Add a final job rollback that runs only on failure of switch-traffic. It restores the original weights and redeploys the blue release if it was removed.

  rollback:
    if: failure()
    needs: [switch-traffic]
    runs-on: ubuntu-latest
    steps:
      - name: Restore blue traffic
        run: |
          kubectl -n prod-bluegreen patch virtualservice myapp \
            --type='json' -p='[
              {"op":"replace","path":"/spec/http/0/route/0/weight","value":100},
              {"op":"replace","path":"/spec/http/0/route/1/weight","value":0}
            ]'

      - name: Re‑install blue release if missing
        run: |
          helm upgrade --install myapp-blue charts/myapp \
            --namespace prod-bluegreen \
            --set environment=blue \
            --set imageTag=${{ needs.build-and-push.outputs.image-tag }} || true

GitHub Actions automatically marks the run as failed, alerts the on‑call team, and the rollback job restores service continuity.

# 8. Clean up old resources

Periodically prune old images from GitHub Container Registry using the ghcr.io retention policy UI, or schedule a repository‑level workflow that runs docker image prune on the registry. Keeping the registry tidy prevents storage bloat and reduces attack surface.

# Full Pipeline Overview

Phase	Action	Tool
Build	Container image creation	Docker Buildx (v4)
Publish	Push to GHCR	docker/login-action (v3)
Deploy Green	Helm upgrade to green namespace	Helm (v3.12)
Validate	Pod rollout status, probes	kubectl rollout
Traffic Switch	Istio VirtualService patch	kubectl patch
Cleanup	Helm uninstall blue release	Helm
Rollback (on fail)	Re‑apply blue weight, reinstall blue	Helm + kubectl

# Common Pitfalls and How to Avoid Them

Stale mesh configuration – Always apply the VirtualService patch after the green pods report ready. The probes guarantee that the mesh sees healthy endpoints.
Image tag collision – Using the short commit SHA (git rev-parse --short HEAD) prevents overwriting previous builds. If two commits share the same short SHA, append the CI run number (${{ github.run_number }}).
Namespace leakage – Keep all blue/green resources in a dedicated namespace; otherwise a stray service could receive traffic unexpectedly.
Permission errors – The workflow needs id-token: write to authenticate with the Kubernetes cluster via OIDC. Ensure the GitHub environment has a federated credential set up in the cloud provider.

# Testing the Switch in a Live Environment

Smoke test – After the green pods are up, run a curl against the internal service endpoint (curl -s http://myapp.prod-bluegreen.svc.cluster.local/healthz).
Canary probe – Use kubectl exec on a pod that routes through the mesh and request the public hostname; verify the response comes from the green version (e.g., version header).
Load test – Run a brief hey or wrk test for 30 seconds at 200 RPS to confirm latency stays below the SLA threshold (e.g., 120 ms).

If any check fails, the pipeline aborts before the switch-traffic job, leaving the blue version untouched.

# Extending the Pattern

Multi‑region deployments – Replicate the same namespace in each region and use a global DNS service that supports weighted routing. The same GitHub Actions workflow can target multiple clusters by iterating over a list of kubeconfig contexts.
Feature flags – Combine blue/green with a flag service (e.g., LaunchDarkly) to enable a subset of users to see the new version before full traffic shift.
Canary instead of blue/green – Replace the weight‑patch step with incremental weight increases (10 %, 30 %, 60 %, 100 %) and add automated monitoring thresholds before each step.

# Final Checklist

Namespace prod-bluegreen exists.
Helm charts for blue and green are version‑controlled.
Istio VirtualService defines both blue and green subsets.
GitHub Actions secrets GHCR_TOKEN and KUBE_CONFIG_DATA are set.
Liveness/readiness probes return 200 within 2 seconds.
Rollback job is enabled and tested.

Running this checklist before merging guarantees that the pipeline can execute without manual intervention.

PPIL Takeaway
Zero‑downtime releases embody PPIL’s belief that reliability is earned, not assumed. By automating every safety check and making traffic switches observable, teams turn deployment risk into a repeatable process.

PPIL Academy

Master Sovereign Infrastructure

Join the elite cohort of engineers building the next generation of resilient data systems. Enroll in our specialized curriculum today.

View Courses