BigQuery Top Queries by Frequency
Repeated queries are a hidden cost multiplier in BigQuery. This query groups identical query texts and counts how many times each has been executed, along with the cumulative cost of all those executions.
Why This Matters
A query that costs $0.10 per execution but runs 10,000 times per month costs $1,000. Frequency analysis often reveals low-cost queries that add up to significant spending due to sheer repetition. These are prime candidates for caching, scheduled materialization, or BI Engine acceleration.
How It Works
The query groups each unique query text, then counts occurrences and sums total_bytes_billed per group. The result shows each unique query with its execution count and cumulative on-demand cost.
SQL Query
Fill in your details to get a ready-to-run query:
-- Find the most frequently repeated queries and their cumulative cost
DECLARE lookback_days INT64 DEFAULT 14;
WITH jobs AS (
SELECT
query,
COALESCE(total_bytes_billed, 0) AS bytes_billed,
ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY end_time DESC) AS rn
FROM `your-project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL lookback_days DAY)
AND job_type = 'QUERY' AND state = 'DONE' AND total_slot_ms IS NOT NULL
),
deduplicated AS (
SELECT query, bytes_billed FROM jobs WHERE rn = 1
),
grouped AS (
SELECT
query,
COUNT(*) AS exec_count,
SUM(bytes_billed) AS total_bytes
FROM deduplicated
GROUP BY query
)
SELECT
query,
exec_count,
ROUND(total_bytes / POW(1024, 3), 2) AS total_gib_scanned,
ROUND(total_bytes / POW(1024, 4) * 6.25, 2) AS cumulative_cost_usd
FROM grouped
...Query Explanation
Jobs are deduplicated by job_id first, then grouped by query text. The count gives execution frequency, and the sum of bytes billed gives cumulative cost. This approach efficiently surfaces queries that are run repeatedly at scale.
Key Insights
Queries running >100 times per day are almost certainly automated — check if they can use cached results.
BigQuery caches results for 24 hours by default, but only for identical queries with deterministic results.
Queries with CURRENT_TIMESTAMP() or NOW() bypass the cache — consider using date truncation instead.
High-frequency queries scanning the same tables may benefit from materialized views or BI Engine.
Best Practices
- 1
Enable BigQuery result caching and avoid patterns that defeat it (CURRENT_TIMESTAMP, RAND, etc.).
- 2
Create materialized views for the top 5 most frequently executed analytical queries.
- 3
Consolidate duplicate queries from different dashboards or services into a single scheduled query.
- 4
Use BI Engine for sub-second queries powering interactive dashboards.
Want CloudClerk to find these savings automatically?
Our platform connects to your BigQuery project, runs these analyses automatically, and delivers AI-powered optimisation recommendations — all with your data fully anonymised.
Related Guides
BigQuery Top Queries by Cost
Find the most expensive BigQuery queries by on-demand cost. Rank queries by total bytes billed to identify the biggest cost drivers in your project.
Read guideBigQuery Top Queries by Complexity
Find the most compute-intensive BigQuery queries ranked by slot usage. Identify queries consuming disproportionate computational resources.
Read guideBigQuery Top Queries by Duration
Find the longest-running BigQuery queries. Identify slow queries that block resources and impact user experience with runtime analysis.
Read guideBigQuery Cost Analysis by Query
Analyze every individual query's cost, slot usage, and execution time. Get a detailed breakdown of on-demand cost per query execution.
Read guide