How to Use Google's Antigravity IDE to Manage Your Data Lake
Managing a modern data lake with an AI-powered coding assistant
Introduction
I maintain a sophisticated data lake built on the medallion architecture pattern. Managing dozens of SQL transformations, ensuring data quality, and iterating quickly on analytics pipelines can be challenging—especially as the codebase grows.
Enter Antigravity, Google’s agentic AI coding assistant. By integrating Antigravity into our development workflow, I’ve transformed how I build, debug, and maintain our data infrastructure. In this post, I’ll share how I’ve configured Antigravity to understand our data lake, the rules and workflows I’ve created, and real-world use cases that have dramatically improved our productivity.
Our Data Lake Architecture: The Medallion Pattern
Our data lake follows the medallion architecture, a design pattern that organizes data into progressively refined layers:
┌─────────────────────────────────────────────────────────────────────-┐
│ MEDALLION ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────-┤
│ │
│ ┌──────────┐ ┌───────────────┐ ┌─────────┐ │
│ │ BRONZE │ ──► │ BRONZE_SILVER │ ──► │ SILVER │ │
│ └──────────┘ └───────────────┘ └─────────┘ │
│ Raw Data Cleaned & Single │
│ + Aliases Structured Source of │
│ Truth │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ SILVER_GOLD │ │
│ └─────────────┘ │
│ Complex │
│ Aggregations │
│ │ │
│ ▼ │
│ ┌────────┐ │
│ │ GOLD │ │
│ └────────┘ │
│ Analytics- │
│ Ready Data │
│ │
└────────────────────────────────────────────────────────────────────-─┘
Layer Breakdown
| Layer | Description | Example Tables |
|---|---|---|
| Bronze | Raw data ingestion with aliases and basic filtering | brz_crm_opportunity_scd |
| Bronze-Silver | Initial cleaning, casting, and structuring | brz_slv_opportunity_daily |
| Silver | Conformed, business-level views—the “single source of truth” | slv_opportunity_daily_snapshot |
| Silver-Gold | Intermediate aggregations for specific business use cases | slv_gld_retention_daily_snapshot |
| Gold | Highly aggregated, analytics-ready datasets for BI and reporting | gld_retention_daily, gld_user_analytics_daily |
Each SQL file in our repository corresponds directly to a table name in the data lake, making navigation intuitive. The SQL transformations are compiled by a Python script (compile_layers.py) that generates _compiled.sql files—which Antigravity knows to ignore.
Teaching Antigravity About Our Repository
The key to making Antigravity effective is context. By providing the agent with rules and workflows, it understands our repository structure, conventions, and how to interact with our data infrastructure.
Rules: Always-On Context
I store rules in .agent/rules/ as markdown files. These provide persistent context that Antigravity references in every interaction:
databricks.md — Connecting to Our Data Warehouse
---
trigger: always_on
---
In order to connect to the databricks sql cluster, use the `dbsqlcli -e "<QUERY>"`
terminal command. All tables use the schema `analytics`.
This rule ensures Antigravity knows how to execute SQL queries against our Databricks cluster. With this context, the agent can:
- Run ad-hoc queries to validate data
- Check record counts after transformations
- Verify data quality constraints
medallion-tables.md — Understanding Our Architecture
---
trigger: always_on
---
This repository consists of SQL and Python files used to populate tables and views
of a data lake following the "medallion architecture". The layers consist of:
- bronze - raw data with aliases and some filtering
- bronze_silver - Initial cleaning, casting, and structuring
- silver - Conformed, business-level view, the "single source of truth"
- silver_gold - Intermediate tables with complex aggregations
- gold - Analytics-ready datasets for BI and reporting
This rule teaches Antigravity the semantic meaning of each directory and layer, enabling it to make intelligent decisions about where to place new tables or find existing transformations.
ignore.md — Avoiding Noise
---
trigger: always_on
---
Ignore the following directories:
@/keep/
@/apps/
@/prompts/
@/tasks/
Some directories contain experimental code, archived work, or unrelated applications. This rule prevents Antigravity from surfacing irrelevant files.
Workflows: Reusable Multi-Step Processes
Workflows are stored in .agent/workflows/ and define repeatable procedures triggered by slash commands.
/check-pk — Validate Primary Key Uniqueness
---
description: Checks that there are no duplicates of the primary key.
---
Run a sql against the databricks sql cluster to check that there are
no duplicates in the table and primary key(s) mentioned in the prompt.
Return the number of duplicates.
Usage:
/check-pk slv_orders_daily on order_date, order_id
Antigravity will generate and execute a SQL query like:
SELECT order_date, order_id, COUNT(*) as cnt
FROM analytics.slv_orders_daily
GROUP BY order_date, order_id
HAVING COUNT(*) > 1
This is invaluable for validating data integrity after modifying transformations.
/lineage — Discover Data Lineage
---
description: Find the uses and data lineage of a given table.
---
1. Execute the utility to find definition and usages of the table.
// turbo
2. Run the command: `python3 utils/lineage.py {table_name}`
3. Analyze the output to identify which files define and use the table.
4. If appropriate, summarize the lineage for the user.
Usage:
/lineage slv_orders_daily
The agent executes our custom lineage utility, which scans all SQL files to find:
- Where the table is defined (the source SQL file)
- Where the table is used (downstream dependencies)
This provides instant impact analysis before making changes.
Connecting to Databricks via CLI
A critical piece of our setup is the Databricks SQL CLI (dbsqlcli). This allows Antigravity to execute queries directly against our data warehouse.
Setup
-
Install the CLI:
pip install databricks-sql-cli -
Configure environment variables:
export DBSQLCLI_HOST_NAME="your-workspace.cloud.databricks.com" export DBSQLCLI_HTTP_PATH="/sql/1.0/warehouses/your_warehouse_id" export DBSQLCLI_ACCESS_TOKEN="your_access_token_here" -
Run queries:
dbsqlcli -e "SELECT count(*) FROM analytics.gld_retention_daily"
With this integration, Antigravity can autonomously:
- Run validation queries
- Explore table schemas with
DESCRIBE - Sample data to understand transformations
- Debug data issues by inspecting actual records
Real-World Use Cases
1. Making Changes to the Pipeline
Scenario: Add a new column renewal_probability from your CRM to the retention snapshot.
Prompt:
Add the
renewal_probabilityfield fromcrm.opportunityto our retention pipeline, flowing through bronze → silver → gold.
Antigravity’s Approach:
- Identifies the bronze table (
brz_crm_opportunity_scd.sql) - Adds the column with the correct alias (
renewal_probability_c AS renewal_probability) - Propagates to silver (
stg_slv_opportunity_daily_snapshot.sql) - Updates gold layer aggregations (
slv_gld_retention_daily_snapshot.sql) - Ignores
_compiled.sqlfiles (per our rules)
Time saved: What would take 30+ minutes of manual file navigation and editing is completed in seconds.
2. Adding New Data Assets
Scenario: Create a new silver table for tracking account health scores.
Prompt:
Create a new silver table called
slv_account_healththat combines data from our engagement tables and CRM accounts.
Antigravity’s Approach:
- Creates
silver/slv_account_health.sql - Uses the correct schema prefix (
analytics) - References existing tables appropriately
- Follows our naming conventions (snake_case, layer prefix)
3. Discovering Data Lineage
Scenario: Before modifying slv_orders_daily, understand its dependencies.
Prompt:
/lineage slv_orders_daily
Output:
Searching for table: slv_orders_daily
DEFINED IN:
notebooks/stg_slv_orders_daily.sql
USED IN (3 files):
silver_gold/slv_gld_revenue_daily_snapshot.sql
gold/gld_revenue_daily.sql
notebooks/dev_slv_org_daily.sql
Now we know exactly which downstream tables will be affected by any changes.
4. Debugging Data Issues
Scenario: A stakeholder reports that retention numbers look incorrect for Q4.
Prompt:
Check if there are any duplicate records in
gld_retention_dailyfor Q4 2024 on account_id and snapshot_date.
Antigravity’s Approach:
-
Constructs a validation query:
SELECT account_id, snapshot_date, COUNT(*) as cnt FROM analytics.gld_retention_daily WHERE snapshot_date >= '2024-10-01' AND snapshot_date < '2025-01-01' GROUP BY account_id, snapshot_date HAVING COUNT(*) > 1 - Executes via
dbsqlcli - Reports findings with analysis
5. Data Analysis and Insights
Scenario: Quickly get top accounts by ARR for executive reporting.
Prompt:
What are the top 10 accounts by ARR for this quarter?
Antigravity’s Approach:
- Understands our table structure (knows ARR is in retention tables)
-
Writes and executes the query:
SELECT account_name, arr_usd FROM analytics.slv_gld_retention_daily_snapshot WHERE snapshot_date = CURRENT_DATE() ORDER BY arr_usd DESC LIMIT 10 - Returns formatted results
6. Ad-Hoc Schema Exploration
Scenario: You need to understand what columns are available in a table.
Prompt:
Describe the columns in
brz_crm_opportunity_scd
Antigravity’s Approach:
- Runs
DESCRIBE TABLE analytics.brz_crm_opportunity_scd - Returns a formatted table of column names, types, and descriptions
Benefits We’ve Experienced
| Before Antigravity | After Antigravity |
|---|---|
| Manual file navigation across 100+ SQL files | Natural language queries to find relevant code |
| Copy-paste errors when propagating column changes | Automatic multi-file updates with consistency |
| Running queries in separate Databricks notebooks | Inline query execution with immediate results |
| Tribal knowledge about table dependencies | Instant lineage discovery |
| Time-consuming data validation | One-command primary key checks |
Conclusion
By teaching Antigravity about our medallion architecture through rules and workflows, and connecting it to Databricks via the dbsqlcli, I’ve created a powerful development environment that:
- Understands context — Knows our table layers, naming conventions, and file structure
- Executes queries — Can validate data, explore schemas, and debug issues in real-time
- Accelerates development — Reduces multi-file changes from minutes to seconds
- Preserves knowledge — Workflows encode best practices that any team member can use
The combination of AI-assisted coding with direct database connectivity has fundamentally changed how we build and maintain our data lake. What once required deep institutional knowledge and careful manual work is now accessible through natural language conversations.
If you’re managing a data lake and haven’t explored agentic AI coding tools, I highly recommend giving Antigravity a try. The time investment in setting up rules and workflows pays dividends immediately.
Resources: