Onboarding a Database
Snowpack discovers Iceberg tables through a PyIceberg catalog and runs maintenance for tables that have explicitly opted in. Onboarding a new database is a two-step process: opt tables in at the catalog level, then register the database in the Helm chart so the health-sync and orchestrator CronJobs know about it.
Step 1 — Opt tables in via Spark SQL
Each table must declare that it wants Snowpack maintenance by setting the
snowpack.maintenance_enabled table property. Connect to Spark (or Kyuubi) and
run:
ALTER TABLE lakehouse_dev.<database>.<table> SET TBLPROPERTIES ('snowpack.maintenance_enabled' = 'true');Replace <database> and <table> with the actual database and table names.
Repeat for every table in the database that should receive automated
maintenance.
Per-table cadence override. By default the orchestrator respects the
cluster-wide cadenceHours value (6 hours in dev). To override the cadence for
a specific table, set the snowpack.maintenance_cadence_hours property at the
same time:
ALTER TABLE lakehouse_dev.<database>.<table> SET TBLPROPERTIES ( 'snowpack.maintenance_enabled' = 'true', 'snowpack.maintenance_cadence_hours' = '12' );Tables without the maintenance_enabled property, or with it set to any value
other than true, are ignored by the orchestrator.
Step 2 — Add the database to Helm values
Open charts/snowpack/values-dev.yaml and add the database name to both
healthSync.databases and orchestrator.includeDatabases. These are
comma-separated strings:
healthSync: databases: "offer_service,points_service,<new_database>"
orchestrator: includeDatabases: "offer_service,points_service,<new_database>"Step 3 — Deploy via Terraform
All Snowpack infrastructure changes are deployed through Terraform. Never run
helm install or helm upgrade directly — Terraform owns the Helm release
and direct Helm commands cause state drift.
terraform applyIf you modified any files under charts/snowpack/templates/, remember to bump
the version field in charts/snowpack/Chart.yaml as well. Terraform detects
chart changes by comparing the chart version; template-only edits without a
version bump are invisible to the plan.
Step 4 — Verify
After Terraform applies successfully, wait for the next CronJob firing. In the
dev environment the orchestrator runs hourly at :30 past the hour.
Check recent orchestrator runs to confirm the new database’s tables were assessed:
curl -s https://<snowpack-host>/orchestrator/runs | jq '.[0]'A successful run includes tables_assessed, jobs_submitted, and
jobs_completed counts. If the new tables do not appear, verify that:
- The table property
snowpack.maintenance_enabledis set totruein the catalog. - The database is listed in both
healthSync.databasesandorchestrator.includeDatabasesin the deployed values. - The health-sync CronJob has completed at least one cycle since the deploy (runs every 15 minutes).
You can also confirm a specific table is visible in the cache:
curl -s https://<snowpack-host>/tables?database=<new_database>This returns the list of tables Snowpack knows about for that database. If the list is empty, health-sync has not yet populated the cache for the new database.