BigQuery Cost Analysis by Label
BigQuery job labels let you tag queries with arbitrary key-value metadata — team names, pipeline IDs, feature flags, or environment tags. This query aggregates costs by label so you can attribute spending to the business dimension that matters most to your organisation.
Why This Matters
User-level cost attribution only goes so far. In modern data stacks, a single service account might run queries on behalf of dozens of teams or pipelines. Labels bridge this gap by letting you tag jobs with business context (e.g., team=analytics, pipeline=revenue_report) and then slice costs accordingly. Without label-based attribution you are flying blind on where your money actually goes.
How It Works
BigQuery stores labels as a repeated STRUCT field on each job. This query CROSS JOINs the labels array, deduplicates jobs, and then aggregates total_bytes_billed by each label key-value pair. The result is an on-demand cost estimate per label combination.
SQL Query
Fill in your details to get a ready-to-run query:
-- Break down on-demand cost by job label
DECLARE lookback_days INT64 DEFAULT 30;
WITH flattened AS (
SELECT
job_id,
lbl.key AS label_key,
lbl.value AS label_value,
COALESCE(total_bytes_billed, 0) AS bytes_billed,
ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY end_time DESC) AS rn
FROM `your-project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
CROSS JOIN UNNEST(labels) AS lbl
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL lookback_days DAY)
AND state = 'DONE'
AND total_slot_ms IS NOT NULL
AND lbl.key IS NOT NULL
),
deduplicated AS (
SELECT label_key, label_value, bytes_billed
FROM flattened WHERE rn = 1
)
SELECT
label_key,
label_value,
COUNT(*) AS job_count,
ROUND(SUM(bytes_billed) / POW(1024, 3), 2) AS gib_scanned,
ROUND(SUM(bytes_billed) / POW(1024, 4) * 6.25, 2) AS estimated_cost_usd
FROM deduplicated
GROUP BY label_key, label_value
...Query Explanation
The query CROSS JOINs the labels repeated field to flatten one row per label per job. It deduplicates with ROW_NUMBER() on job_id, then groups by label_key and label_value to sum total_bytes_billed. The on-demand cost is calculated at $6.25 per TiB.
Key Insights
If most of your jobs lack labels, this query will only show a fraction of your costs — start by labelling your most expensive pipelines.
Compare label-level costs month-over-month to detect cost regressions introduced by new features or pipeline changes.
Labels like 'environment=production' vs 'environment=staging' can reveal if non-production workloads are costing more than expected.
dbt automatically adds labels with the model name, making this query especially powerful for dbt shops.
Best Practices
- 1
Enforce a labelling policy: require at least a 'team' and 'pipeline' label on all scheduled jobs.
- 2
Use BigQuery's default query labels in your orchestration tool (Airflow, dbt, Dataform) to auto-tag every job.
- 3
Create a dashboard that tracks label-level costs over time to catch regressions early.
- 4
Alert on labels whose weekly cost exceeds a threshold.
Want CloudClerk to find these savings automatically?
Our platform connects to your BigQuery project, runs these analyses automatically, and delivers AI-powered optimisation recommendations — all with your data fully anonymised.
Related Guides
BigQuery Cost Analysis by User
Find which users drive the most BigQuery spending. Query INFORMATION_SCHEMA to rank users by on-demand cost and identify optimization targets.
Read guideBigQuery Cost Analysis by Project
Track BigQuery costs across GCP projects. Use INFORMATION_SCHEMA to compare project-level spending and find the most expensive projects in your organization.
Read guideBigQuery Top Queries by Cost
Find the most expensive BigQuery queries by on-demand cost. Rank queries by total bytes billed to identify the biggest cost drivers in your project.
Read guideBigQuery Cost Analysis by Query
Analyze every individual query's cost, slot usage, and execution time. Get a detailed breakdown of on-demand cost per query execution.
Read guide