Authentication
Snowpack uses three credential paths: Polaris OAuth2 for table discovery, IRSA for AWS API access (Glue and S3), and an internal Postgres password for job state. There are no static AWS credentials anywhere in the deployment.
Polaris service principal
The Polaris REST catalog authenticates via an OAuth2 service principal. The credential flow is:
-
Secrets Manager stores the principal at
{env}/polaris/snowpack-principalas a JSON object:{"client_id": "...","client_secret": "..."} -
Terraform reads the secret via a
data "aws_secretsmanager_secret_version"data source and injects both values into thehelm_releaseresource usingset_sensitiveblocks. This keeps credentials out of Terraform state in plaintext and out of version-controlled values files. -
Helm maps the values to environment variables on the API and worker pods:
SNOWPACK_POLARIS_CLIENT_IDSNOWPACK_POLARIS_CLIENT_SECRETSNOWPACK_POLARIS_URISNOWPACK_POLARIS_CATALOG
-
PolarisConfigvalidates at startup thatclient_idandclient_secretare both present whenuriis set. If either is missing, the application fails fast with a clear error message rather than silently falling back.
PyIceberg REST catalog OAuth2
The health-sync worker and API endpoints that use PyIceberg construct a REST catalog with OAuth2 token exchange:
credential="client_id:client_secret"(colon-separated)scope=PRINCIPAL_ROLE:ALL
PyIceberg handles the token exchange automatically — the application never manages tokens directly.
IRSA (IAM Roles for Service Accounts)
AWS API access uses IAM Roles for Service Accounts (IRSA). No static
AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY variables exist anywhere in the
deployment.
The flow:
- Terraform creates an IAM role (
aws_iam_role.snowpack) with an OIDC trust policy scoped to the EKS cluster and thesnowpacknamespace service account. - The Helm chart creates a
ServiceAccountannotated witheks.amazonaws.com/role-arnpointing at that role. - The API, worker, and health-sync pods mount the service account. The AWS SDK automatically exchanges the projected service account token for temporary IAM credentials via the OIDC provider.
Permissions granted
The IRSA role has two policy statements:
| Sid | Actions | Resources |
|---|---|---|
GlueCatalogRead | glue:GetDatabase, glue:GetDatabases, glue:GetTable, glue:GetTables, glue:GetPartition, glue:GetPartitions, glue:BatchGetPartition, glue:GetUserDefinedFunctions | All databases and tables in the account’s Glue catalog |
ReadLakehouseData | s3:GetObject, s3:ListBucket | The lakehouse S3 bucket and all objects within it |
These are read-only permissions. Snowpack never writes to Glue or S3 directly — all write operations (compaction, snapshot expiry) go through Spark SQL.
Internal Postgres password
The Postgres password is generated by a random_password Terraform resource and
injected into the Helm release via set_sensitive. It is never stored in values
files or checked into version control.
Both the Postgres deployment and the application pods receive the same password through separate environment variable paths:
- Postgres pod:
POSTGRES_PASSWORDenv var - Application pods:
SNOWPACK_POSTGRES_PASSWORDenv var (via thesnowpack.postgresEnvHelm helper)
The KEDA TriggerAuthentication resource also references the password from a
Kubernetes Secret so the ScaledJob trigger can query the job_queue table.
Catalog/metadata split
Snowpack uses two different catalog backends for different purposes:
| Component | Catalog | Purpose |
|---|---|---|
TableCacheSyncWorker | Polaris REST | Table discovery — lists databases and tables for the table cache. Polaris is the authoritative catalog for “what tables exist.” |
| Health endpoints, workers, health-sync | Glue + S3 (via PyIceberg) | Metadata access — reads Iceberg metadata files to compute health metrics and execute maintenance. Uses Glue/S3 directly to avoid Polaris metadata-size limits. |
This split exists because Polaris imposes size limits on metadata responses that can cause failures for large tables with many snapshots or manifests. By routing metadata-heavy operations through Glue/S3 directly, Snowpack avoids those limits while still using Polaris as the single source of truth for table discovery.