Get a free observability report to evaluate the potential savingsContact us →
Cost Analysis3 min read

BigQuery Cost Analysis by User

Understanding which users generate the most BigQuery costs is the first step to controlling your cloud bill. This query ranks every user by their total on-demand cost over a configurable time window, letting you quickly spot the biggest spenders.

Why This Matters

Without user-level cost attribution, organizations often struggle to identify who is responsible for unexpected bill spikes. A single user running unoptimized ad-hoc queries can easily account for 30-50% of a project's total spend. By surfacing per-user costs you can have targeted conversations, set budgets, and prioritise training.

How It Works

BigQuery's INFORMATION_SCHEMA.JOBS_BY_PROJECT view contains metadata for every job executed in a project, including the user's email, total bytes billed, and slot usage. This query aggregates total_bytes_billed per user and converts it to an estimated on-demand cost using Google's current pricing of $6.25 per TiB scanned.

SQL Query

Fill in your details to get a ready-to-run query:

SQL
-- Rank users by estimated on-demand cost

DECLARE lookback_days INT64 DEFAULT 30;

WITH jobs AS (
  SELECT
    user_email,
    job_id,
    COALESCE(total_bytes_billed, 0) AS bytes_billed,
    ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY end_time DESC) AS rn
  FROM `your-project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL lookback_days DAY)
    AND job_type = 'QUERY'
    AND state = 'DONE'
    AND total_slot_ms IS NOT NULL
),
deduplicated AS (
  SELECT user_email, bytes_billed FROM jobs WHERE rn = 1
)
SELECT
  user_email,
  COUNT(*) AS query_count,
  ROUND(SUM(bytes_billed) / POW(1024, 4), 4) AS tib_scanned,
  ROUND(SUM(bytes_billed) / POW(1024, 4) * 6.25, 2) AS estimated_cost_usd
FROM deduplicated
GROUP BY user_email
ORDER BY estimated_cost_usd DESC
Replace your-project and region-us with your GCP project and dataset region.

Query Explanation

The query collects every completed query job from INFORMATION_SCHEMA.JOBS_BY_PROJECT within the lookback window. It deduplicates jobs using ROW_NUMBER() partitioned by job_id (BigQuery can log the same job more than once). It then groups by user_email, sums total_bytes_billed, and converts bytes to tebibytes (dividing by 1024^4) before multiplying by the on-demand price of $6.25/TiB.

Key Insights

  • lightbulb

    A small number of users often account for the majority of costs — check whether the top 3 users represent more than 50% of total spend.

  • lightbulb

    Service accounts (e.g., dbt, Looker, Airflow) frequently top the list because they run scheduled queries at scale.

  • lightbulb

    Users with high costs but low query counts are likely running unoptimized queries that scan large amounts of data.

  • lightbulb

    Compare user costs week-over-week to detect sudden changes in behaviour.

Best Practices

  1. 1

    Set up BigQuery custom cost controls to cap per-user bytes billed per day.

  2. 2

    Require LIMIT clauses or partition filters for interactive/ad-hoc queries.

  3. 3

    Use labels on scheduled query jobs so you can separate automated costs from human-driven costs.

  4. 4

    Share this report with engineering leads so they can coach team members on query best practices.

Want CloudClerk to find these savings automatically?

Our platform connects to your BigQuery project, runs these analyses automatically, and delivers AI-powered optimisation recommendations — all with your data fully anonymised.

Related Guides