Skip to content

Authentication

Snowpack uses three credential paths: Polaris OAuth2 for table discovery, IRSA for AWS API access (Glue and S3), and an internal Postgres password for job state. There are no static AWS credentials anywhere in the deployment.

Polaris service principal

The Polaris REST catalog authenticates via an OAuth2 service principal. The credential flow is:

  1. Secrets Manager stores the principal at {env}/polaris/snowpack-principal as a JSON object:

    {
    "client_id": "...",
    "client_secret": "..."
    }
  2. Terraform reads the secret via a data "aws_secretsmanager_secret_version" data source and injects both values into the helm_release resource using set_sensitive blocks. This keeps credentials out of Terraform state in plaintext and out of version-controlled values files.

  3. Helm maps the values to environment variables on the API and worker pods:

    • SNOWPACK_POLARIS_CLIENT_ID
    • SNOWPACK_POLARIS_CLIENT_SECRET
    • SNOWPACK_POLARIS_URI
    • SNOWPACK_POLARIS_CATALOG
  4. PolarisConfig validates at startup that client_id and client_secret are both present when uri is set. If either is missing, the application fails fast with a clear error message rather than silently falling back.

PyIceberg REST catalog OAuth2

The health-sync worker and API endpoints that use PyIceberg construct a REST catalog with OAuth2 token exchange:

  • credential = "client_id:client_secret" (colon-separated)
  • scope = PRINCIPAL_ROLE:ALL

PyIceberg handles the token exchange automatically — the application never manages tokens directly.

IRSA (IAM Roles for Service Accounts)

AWS API access uses IAM Roles for Service Accounts (IRSA). No static AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY variables exist anywhere in the deployment.

The flow:

  1. Terraform creates an IAM role (aws_iam_role.snowpack) with an OIDC trust policy scoped to the EKS cluster and the snowpack namespace service account.
  2. The Helm chart creates a ServiceAccount annotated with eks.amazonaws.com/role-arn pointing at that role.
  3. The API, worker, and health-sync pods mount the service account. The AWS SDK automatically exchanges the projected service account token for temporary IAM credentials via the OIDC provider.

Permissions granted

The IRSA role has two policy statements:

SidActionsResources
GlueCatalogReadglue:GetDatabase, glue:GetDatabases, glue:GetTable, glue:GetTables, glue:GetPartition, glue:GetPartitions, glue:BatchGetPartition, glue:GetUserDefinedFunctionsAll databases and tables in the account’s Glue catalog
ReadLakehouseDatas3:GetObject, s3:ListBucketThe lakehouse S3 bucket and all objects within it

These are read-only permissions. Snowpack never writes to Glue or S3 directly — all write operations (compaction, snapshot expiry) go through Spark SQL.

Internal Postgres password

The Postgres password is generated by a random_password Terraform resource and injected into the Helm release via set_sensitive. It is never stored in values files or checked into version control.

Both the Postgres deployment and the application pods receive the same password through separate environment variable paths:

  • Postgres pod: POSTGRES_PASSWORD env var
  • Application pods: SNOWPACK_POSTGRES_PASSWORD env var (via the snowpack.postgresEnv Helm helper)

The KEDA TriggerAuthentication resource also references the password from a Kubernetes Secret so the ScaledJob trigger can query the job_queue table.

Catalog/metadata split

Snowpack uses two different catalog backends for different purposes:

ComponentCatalogPurpose
TableCacheSyncWorkerPolaris RESTTable discovery — lists databases and tables for the table cache. Polaris is the authoritative catalog for “what tables exist.”
Health endpoints, workers, health-syncGlue + S3 (via PyIceberg)Metadata access — reads Iceberg metadata files to compute health metrics and execute maintenance. Uses Glue/S3 directly to avoid Polaris metadata-size limits.

This split exists because Polaris imposes size limits on metadata responses that can cause failures for large tables with many snapshots or manifests. By routing metadata-heavy operations through Glue/S3 directly, Snowpack avoids those limits while still using Polaris as the single source of truth for table discovery.