Most experiment analyses start—and end—the same way.
You group by experiment variant.
You calculate averages.
You compare numbers.
You call it a day.
Sometimes that’s enough.
Often, it’s not.
As experiments become more central to business decisions, simple aggregates stop telling the full story. The good news is: you don’t need fancy tools or complex modeling to do better.
With the right SQL patterns, you can perform robust, nuanced experiment analysis directly in your data warehouse.
This article walks through how to move beyond basic aggregates and use SQL to analyze experiments the way experienced data scientists actually do.
Why Simple Aggregates Fall Short in Experiments
A typical “quick” experiment query looks like this:
SELECT
experiment_group,
AVG(metric_value) AS avg_metric
FROM experiment_results
GROUP BY experiment_group;
This answers one narrow question:
“What is the average outcome per group?”
But experiments are rarely that simple.
Simple averages hide:
- Uneven exposure or traffic
- Skewed distributions
- Pre-existing differences between groups
- Changes over time
- Data quality issues
If you rely only on aggregates, you risk false confidence—especially when results look “significant” but are actually misleading.
Experiment Analysis Is About Comparisons, Not Metrics
A mindset shift helps here.
BI asks: What is the metric?
Experiment analysis asks: How does behavior differ between groups—and why?
This is why experiment analysis often requires:
- More granular data
- More careful filtering
- Multiple perspectives on the same outcome
SQL is well suited for this kind of work, as long as you move beyond surface-level summaries.
Start With Unit-Level Data (Always)
Before any aggregation, you should be clear about the unit of analysis.
Common units include:
- User
- Session
- Order
- Shipment
- Account
Your experiment conclusions are only valid at the level you analyze.
A good habit is to build a clean unit-level table first, then aggregate after.
SELECT
user_id,
experiment_group,
COUNT(*) AS events,
SUM(metric_value) AS total_metric
FROM raw_events
GROUP BY user_id, experiment_group;
This gives you flexibility to:
- Compare distributions
- Detect outliers
- Adjust analysis later without rewriting everything
Go Beyond Averages: Distribution Matters
Two experiment groups can have the same average and completely different behavior.
SQL makes it easy to explore distributions using:
- Percentiles
- Buckets
- Histograms
SELECT
experiment_group,
APPROX_PERCENTILE(metric_value, 0.5) AS median,
APPROX_PERCENTILE(metric_value, 0.9) AS p90
FROM experiment_results
GROUP BY experiment_group;
Why this matters:
- Medians are often more stable than means
- Tail behavior (p90 / p95) can drive business impact
- Large effects may only affect a small subset of users
Normalize Before You Compare
Raw totals are rarely comparable across groups.
Common normalization strategies include:
- Per-user
- Per-session
- Per-day
- Per-exposure
SUM(metric_value) / COUNT(DISTINCT user_id) AS metric_per_user
Normalization helps answer:
“Is the difference real, or just driven by volume?”
This is especially important when traffic allocation isn’t perfectly balanced.
Control for Pre-Experiment Behavior
One of the most overlooked steps in experiment analysis is checking baseline differences.
Before comparing outcomes, ask:
- Were groups similar before the experiment started?
- Did one group already behave differently?
SQL can help you compute pre-period metrics:
SUM(
CASE
WHEN event_date < experiment_start_date THEN metric_value
ELSE 0
END
) AS pre_metric
Comparing pre- and post-experiment behavior can reveal:
- Randomization issues
- Segment imbalance
- Hidden biases
Time Matters: Look at Trends, Not Just End States
Many experiments don’t behave consistently over time.
Early effects may fade.
Late effects may emerge.
Operational issues may appear mid-test.
Time-based SQL analysis helps surface this.
SELECT
experiment_group,
event_date,
AVG(metric_value) AS daily_avg
FROM experiment_results
GROUP BY experiment_group, event_date;
Plotting these trends often reveals:
- Learning effects
- Ramp-up issues
- External events affecting results
Segment Analysis: Where Effects Actually Live
Overall averages often hide heterogeneous effects.
Segmenting by meaningful dimensions can uncover:
- Who benefits
- Who is harmed
- Where the trade-offs are
Examples:
- New vs returning users
- High-volume vs low-volume customers
- Region or device type
SELECT
experiment_group,
user_segment,
AVG(metric_value) AS avg_metric
FROM experiment_results
GROUP BY experiment_group, user_segment;
This is where experiments become actionable.
Be Explicit About Data Filters
Small filter decisions can dramatically change results.
Always be explicit about:
- Inclusion criteria
- Date ranges
- Event definitions
- Missing or invalid data
SQL forces clarity, which is a feature—not a limitation.
If you can’t explain your filters in plain language, your experiment result probably isn’t ready to be trusted.
SQL vs Statistical Testing: Know the Boundary
SQL can take you far:
- Data preparation
- Sanity checks
- Directional comparisons
- Effect size estimation
But SQL alone won’t:
- Prove statistical significance
- Correct for variance optimally
- Handle complex causal inference
That’s okay.
Strong experiment analysis often looks like this:
- Use SQL to deeply understand the data
- Validate assumptions and patterns
- Apply statistical methods after the data is clean and well-understood
Skipping step 1 is where many experiments go wrong.
Why SQL-Based Experiment Analysis Still Matters
Even in mature data science teams, SQL remains central because it is:
- Transparent
- Auditable
- Scalable
- Close to the source of truth
The goal is not to replace statistics with SQL, but to use SQL to make statistical analysis meaningful.
Good experiment decisions are rarely made from a single aggregate metric.
They come from understanding:
- Behavior
- Context
- Trade-offs
- Uncertainty
SQL, used thoughtfully, gets you there.
Final Thoughts
Experiment analysis is not about writing clever queries.
It’s about asking better questions—and using SQL to answer them clearly.
If you already know SQL, you don’t need to “wait until you learn Python” to do serious experiment analysis.
You just need to go beyond simple aggregates.
Discover more from Daily BI Talks
Subscribe to get the latest posts sent to your email.
