Get a free observability report to evaluate the potential savingsContact us →
Cost Analysis3 min read

BigQuery Top Queries by Duration

Long-running queries tie up resources, delay downstream pipelines, and frustrate users waiting for results. This query identifies the longest-running queries in your project with their execution time, cost, and a runtime-to-bytes ratio that helps distinguish legitimate large scans from poorly optimized queries.

Why This Matters

Execution time matters beyond just cost. A query that runs for 30 minutes blocks a data pipeline, delays dashboard refreshes, or causes an analyst to context-switch. Under Editions pricing, long-running queries also consume more slot-hours (minimum 1-minute billing). Reducing query duration improves both cost and productivity.

How It Works

The query calculates execution time as the TIMESTAMP_DIFF between end_time and start_time. It also computes a seconds-per-GiB ratio: high values indicate queries that are slow relative to the amount of data they process, suggesting optimization potential.

SQL Query

Fill in your details to get a ready-to-run query:

SQL
-- Slowest queries ranked by wall-clock execution time

DECLARE lookback_days INT64 DEFAULT 7;

WITH jobs AS (
  SELECT
    user_email,
    query,
    project_id,
    start_time,
    end_time,
    TIMESTAMP_DIFF(end_time, start_time, SECOND) AS duration_sec,
    COALESCE(total_bytes_billed, 0) AS bytes_billed,
    total_slot_ms,
    ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY end_time DESC) AS rn
  FROM `your-project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL lookback_days DAY)
    AND job_type = 'QUERY' AND state = 'DONE' AND total_slot_ms IS NOT NULL
),
deduplicated AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY query ORDER BY bytes_billed DESC) AS qrn
  FROM jobs WHERE rn = 1
)
SELECT
  user_email,
  query,
  project_id,
  start_time,
  duration_sec,
  ROUND(SAFE_DIVIDE(total_slot_ms, duration_sec * 1000), 0) AS avg_slots,
...
Replace your-project and region-us with your GCP project and dataset region.

Query Explanation

Two levels of deduplication: first by job_id to remove duplicate job entries, then by query text to show only the most expensive execution of each unique query. The sec_per_gib metric divides execution time by bytes billed — high values mean the query is slow relative to how much data it reads.

Key Insights

  • lightbulb

    Queries over 10 minutes are likely scanning unpartitioned tables or performing expensive shuffles.

  • lightbulb

    A high sec_per_gib means the query is compute-bound (complex JOINs, sorts, window functions) rather than I/O-bound.

  • lightbulb

    A low sec_per_gib means the query is scanning a lot of data quickly — it could benefit from partition pruning.

  • lightbulb

    Long-running queries in pipelines can cascade delays to all downstream dependencies.

Best Practices

  1. 1

    Set query timeout limits to prevent runaway queries from consuming resources indefinitely.

  2. 2

    For ETL pipelines, break long-running monolithic queries into smaller, staged transformations.

  3. 3

    Use EXPLAIN PLAN to identify the slowest stages and optimize them specifically.

  4. 4

    Consider approximate aggregation functions (APPROX_QUANTILES, APPROX_TOP_COUNT) for exploratory queries.

Want CloudClerk to find these savings automatically?

Our platform connects to your BigQuery project, runs these analyses automatically, and delivers AI-powered optimisation recommendations — all with your data fully anonymised.

Related Guides