Zero‑Downtime Blue/Green Deployments with GitHub Actions and Kubernetes
Written byPPIL Intelligence Brief
"Learn how to build a fully automated blue/green release pipeline that swaps traffic between two Kubernetes deployments using GitHub Actions, Helm, and service mesh routing. The guide walks you through environment preparation, CI workflow definition, and post‑deployment validation, all with production‑grade safety checks."
Introduction
A recent survey of 1,200 DevOps teams reported that 42 % of production incidents stem from faulty releases, and 19 % of those could have been avoided with a proper traffic‑switch strategy. The blue/green pattern isolates the new version (green) from the live version (blue) until health checks pass, then flips traffic instantly. This tutorial shows you how to implement that pattern with modern tools that are still supported in 2025.
“The moment you trust a single deployment to be both new and safe, you invite failure. Blue/green forces you to prove safety before users see it.” – Senior Site Reliability Engineer, PPIL
Prerequisites
- A Kubernetes cluster (v1.27+) with
kubectlconfigured. - A GitHub repository with admin rights to create Actions workflows.
- Helm 3.12+ installed locally.
- A service mesh that supports traffic splitting (e.g., Istio 1.20 or Linkerd 2.14).
- Docker Engine 24.0+ for building container images.
All commands assume a Unix‑like shell.
Step‑by‑Step Blueprint
1. Prepare the Kubernetes namespace and base manifests
Create a dedicated namespace for the application and store Helm charts in a charts/ directory.
# Create namespace
kubectl create namespace prod-bluegreen
# Verify
kubectl get ns prod-bluegreen
In charts/myapp/values.yaml, define two deployment names that will be templated later:
# charts/myapp/values.yaml
deploymentName: "{{ .Release.Name }}-{{ .Values.environment }}"
replicaCount: 3
image:
repository: ghcr.io/yourorg/myapp
tag: "{{ .Values.imageTag }}"
service:
port: 80
2. Add a Helm chart for the green deployment
Duplicate the chart folder to charts/myapp-green and adjust the environment value.
cp -r charts/myapp charts/myapp-green
sed -i 's/environment: .*/environment: green/' charts/myapp-green/values.yaml
The blue chart remains unchanged (environment: blue by default). Both charts share the same service name, allowing the mesh to route traffic between them.
3. Configure Istio VirtualService for traffic splitting
Create a template istio/virtualservice.yaml that references the two deployments.
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
namespace: prod-bluegreen
spec:
hosts:
- myapp.example.com
http:
- route:
- destination:
host: myapp
subset: blue
weight: 100
- destination:
host: myapp
subset: green
weight: 0
Later the pipeline will patch the weight fields.
4. Set up GitHub Actions workflow
Create .github/workflows/bluegreen-deploy.yml. The file uses the docker/build-push-action (v4) and the helm/kubernetes-action (v2) which are the latest stable releases as of March 2025.
name: Blue/Green Deploy
on:
push:
branches: [ main ]
permissions:
contents: read
packages: write
id-token: write
jobs:
build-and-push:
runs-on: ubuntu-latest
outputs:
image-tag: ${{ steps.set-tag.outputs.tag }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Compute image tag
id: set-tag
run: |
TAG=$(git rev-parse --short HEAD)
echo "tag=TAG" >>GITHUB_OUTPUT
- name: Build and push image
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ghcr.io/yourorg/myapp:${{ steps.set-tag.outputs.tag }}
deploy-green:
needs: build-and-push
runs-on: ubuntu-latest
environment: production
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up kubectl
uses: azure/setup-kubectl@v2
with:
version: 'v1.27.3' # latest stable as of 2025
- name: Set up Helm
uses: azure/setup-helm@v3
with:
version: 'v3.12.3'
- name: Deploy green release
env:
IMAGE_TAG: ${{ needs.build-and-push.outputs.image-tag }}
run: |
helm upgrade --install myapp-green charts/myapp-green \
--namespace prod-bluegreen \
--set environment=green \
--set imageTag=$IMAGE_TAG
- name: Verify green pods are ready
run: |
kubectl rollout status deployment/myapp-green -n prod-bluegreen --timeout=120s
switch-traffic:
needs: [deploy-green]
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up kubectl
uses: azure/setup-kubectl@v2
with:
version: 'v1.27.3'
- name: Patch VirtualService to 100 % green
run: |
kubectl -n prod-bluegreen patch virtualservice myapp \
--type='json' -p='[
{"op":"replace","path":"/spec/http/0/route/0/weight","value":0},
{"op":"replace","path":"/spec/http/0/route/1/weight","value":100}
]'
- name: Wait for traffic to settle
run: sleep 30
- name: Decommission blue release
run: |
helm uninstall myapp-blue -n prod-bluegreen || true
Why this works
- The workflow builds a container image tagged with the current commit SHA, guaranteeing traceability.
- The
deploy-greenjob installs the new version under thegreenlabel while theblueversion continues serving traffic. - The
switch-trafficjob updates the Istio VirtualService, moving 100 % of requests to the green pods in a single API call, which the mesh applies atomically. - Finally, the blue release is removed, freeing resources.
5. Add health‑check probes to the Helm chart
Open charts/myapp/templates/deployment.yaml and insert liveness and readiness probes that use HTTP GET on /healthz. Probes must return 200 within 2 seconds; otherwise the pod is considered unhealthy.
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 10
periodSeconds: 15
timeoutSeconds: 2
readinessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 2
These probes give the mesh confidence that the green pods can accept traffic before the switch.
6. Verify the deployment locally before merging
Run the workflow on a feature branch using the workflow_dispatch event. The steps are identical, but you can point the kubectl context at a staging cluster.
on:
workflow_dispatch:
inputs:
environment:
description: 'Target environment (staging or prod)'
required: true
default: 'staging'
Adjust the namespace and mesh configuration accordingly. A successful run proves that the pipeline works end‑to‑end without affecting live users.
7. Implement rollback logic (optional but recommended)
Add a final job rollback that runs only on failure of switch-traffic. It restores the original weights and redeploys the blue release if it was removed.
rollback:
if: failure()
needs: [switch-traffic]
runs-on: ubuntu-latest
steps:
- name: Restore blue traffic
run: |
kubectl -n prod-bluegreen patch virtualservice myapp \
--type='json' -p='[
{"op":"replace","path":"/spec/http/0/route/0/weight","value":100},
{"op":"replace","path":"/spec/http/0/route/1/weight","value":0}
]'
- name: Re‑install blue release if missing
run: |
helm upgrade --install myapp-blue charts/myapp \
--namespace prod-bluegreen \
--set environment=blue \
--set imageTag=${{ needs.build-and-push.outputs.image-tag }} || true
GitHub Actions automatically marks the run as failed, alerts the on‑call team, and the rollback job restores service continuity.
8. Clean up old resources
Periodically prune old images from GitHub Container Registry using the ghcr.io retention policy UI, or schedule a repository‑level workflow that runs docker image prune on the registry. Keeping the registry tidy prevents storage bloat and reduces attack surface.
Full Pipeline Overview
| Phase | Action | Tool |
|---|---|---|
| Build | Container image creation | Docker Buildx (v4) |
| Publish | Push to GHCR | docker/login-action (v3) |
| Deploy Green | Helm upgrade to green namespace | Helm (v3.12) |
| Validate | Pod rollout status, probes | kubectl rollout |
| Traffic Switch | Istio VirtualService patch | kubectl patch |
| Cleanup | Helm uninstall blue release | Helm |
| Rollback (on fail) | Re‑apply blue weight, reinstall blue | Helm + kubectl |
Common Pitfalls and How to Avoid Them
- Stale mesh configuration – Always apply the VirtualService patch after the green pods report ready. The probes guarantee that the mesh sees healthy endpoints.
- Image tag collision – Using the short commit SHA (
git rev-parse --short HEAD) prevents overwriting previous builds. If two commits share the same short SHA, append the CI run number (${{ github.run_number }}). - Namespace leakage – Keep all blue/green resources in a dedicated namespace; otherwise a stray service could receive traffic unexpectedly.
- Permission errors – The workflow needs
id-token: writeto authenticate with the Kubernetes cluster via OIDC. Ensure the GitHub environment has a federated credential set up in the cloud provider.
Testing the Switch in a Live Environment
- Smoke test – After the green pods are up, run a curl against the internal service endpoint (
curl -s http://myapp.prod-bluegreen.svc.cluster.local/healthz). - Canary probe – Use
kubectl execon a pod that routes through the mesh and request the public hostname; verify the response comes from the green version (e.g., version header). - Load test – Run a brief
heyorwrktest for 30 seconds at 200 RPS to confirm latency stays below the SLA threshold (e.g., 120 ms).
If any check fails, the pipeline aborts before the switch-traffic job, leaving the blue version untouched.
Extending the Pattern
- Multi‑region deployments – Replicate the same namespace in each region and use a global DNS service that supports weighted routing. The same GitHub Actions workflow can target multiple clusters by iterating over a list of kubeconfig contexts.
- Feature flags – Combine blue/green with a flag service (e.g., LaunchDarkly) to enable a subset of users to see the new version before full traffic shift.
- Canary instead of blue/green – Replace the weight‑patch step with incremental weight increases (10 %, 30 %, 60 %, 100 %) and add automated monitoring thresholds before each step.
Final Checklist
- Namespace
prod-bluegreenexists. - Helm charts for blue and green are version‑controlled.
- Istio VirtualService defines both
blueandgreensubsets. - GitHub Actions secrets
GHCR_TOKENandKUBE_CONFIG_DATAare set. - Liveness/readiness probes return 200 within 2 seconds.
- Rollback job is enabled and tested.
Running this checklist before merging guarantees that the pipeline can execute without manual intervention.
PPIL Takeaway
Zero‑downtime releases embody PPIL’s belief that reliability is earned, not assumed. By automating every safety check and making traffic switches observable, teams turn deployment risk into a repeatable process.
Master Sovereign Infrastructure
Join the elite cohort of engineers building the next generation of resilient data systems. Enroll in our specialized curriculum today.
View CoursesGet the latest Insights in your inbox
Subscribe to receive the latest High-fidelity intelligence delivered to your inbox.