Deployment
Snowpack is deployed via Terraform using a helm_release resource that manages
the charts/snowpack/ Helm chart. All changes flow through terraform apply —
never run helm install, helm upgrade, or helm uninstall directly.
Deployment flow
Every deployment follows the same sequence:
- Modify the chart or application code under
charts/snowpack/. - Bump the version in
charts/snowpack/Chart.yaml. Terraform tracks the chart by path + version + values. If you change template files without bumping the version,terraform planwill show no diff and your changes will not deploy. - Build and push the Docker image to ECR (if application code changed).
- Plan and apply from the target environment directory:
cd terraform/snowpack-api/env/devterraform planterraform applyTerraform compares the new chart version and values against its state, generates a Helm upgrade under the hood, and records the result. The entire cycle takes roughly 60-90 seconds for a clean apply.
Why Terraform only
Terraform owns the Helm release via the helm_release resource. Running Helm
commands directly (even helm status) is fine for read-only inspection, but
any mutating Helm command creates state drift. After a direct
helm upgrade, the next terraform apply will see a version mismatch and
either fail or force a destructive re-deploy.
If you need to inspect what is deployed:
# Safe read-only Helm commandshelm list -n snowpackhelm get values snowpack -n snowpackChart versioning rule
Bump version in charts/snowpack/Chart.yaml whenever you modify any file
under charts/snowpack/templates/. This is the only mechanism Terraform uses to
detect chart changes. A common mistake is modifying a template and forgetting
the version bump — terraform plan shows “No changes” and the new template
never deploys.
Infrastructure inventory
| Resource | Kind | Key settings |
|---|---|---|
| API | Deployment (2 replicas, arm64) | 250m/768Mi request, port 8000 |
| NLB Service | LoadBalancer | ACM TLS termination, external-dns hostname |
| Worker | KEDA ScaledJob | postgresql trigger, 30s polling, max 5 replicas (3 dev) |
| Orchestrator | CronJob | Hourly at :30 (dev), every 2h (default), concurrencyPolicy: Forbid |
| Health Sync | CronJob | Every 15 minutes, concurrencyPolicy: Forbid |
| Postgres | Deployment (1 replica) | 17-alpine, PVC-backed (gp3), Recreate strategy |
| IRSA ServiceAccount | ServiceAccount | eks.amazonaws.com/role-arn annotation, OIDC-bound |
All resources live in the snowpack namespace. Nodes are selected by
kubernetes.io/arch: arm64.
Rollback tiers
Snowpack has three rollback tiers depending on the severity of the issue.
Tier 1: Application bug
Flip the image tag back to the last known-good version and apply:
# In terraform.tfvars or the helm_release set blockimage_tag = "abc123-previous"terraform applyThis is the fastest rollback (~30 seconds). Only the pod image changes; the chart, values, and infrastructure remain the same.
Tier 2: Chart regression
If a template or values change caused the problem, revert the
charts/snowpack/Chart.yaml version to the previous value and apply:
git revert <commit-that-broke-the-chart>terraform applyTerraform detects the version change and performs a Helm rollback internally. This takes slightly longer because Kubernetes must reconcile the full chart diff.
Tier 3: Postgres corruption
If the Postgres data is corrupted or schema migrations went wrong:
- PVC-backed internal Postgres — Delete the Postgres deployment, delete
the PVC, and re-apply. Snowpack auto-creates tables on startup via
CREATE TABLE IF NOT EXISTSDDL, so a fresh database starts cleanly. Historical job data will be lost. - RDS-backed external Postgres — Restore from an RDS snapshot to a point-in-time before the corruption. Re-run any pending schema migrations after restore.
In both cases, verify recovery by checking GET /readyz returns 200 and
submitting a test dry-run job.