Skip to main content

Incrementality testing (Beta)

Set up a Geo-experiment to explore the incremental impact of marketing channels

Written by Tim Schouten
Updated over 2 weeks ago

Incrementality testing in marketing is a method used to measure the true impact of a marketing campaign or channel by determining the incremental value it adds beyond what would happen naturally or through other efforts. This approach helps you understand whether your campaigns drive additional outcomes (e.g., sales, leads) or simply take credit for results that would have occurred anyway.

Incrementality testing is currently in Beta. We are accepting new clients to set up a geo-test. Contact our team via support@billygrace.com to set up an incrementality test.

How does it work?

  • A geo-experiment to explore the incremental impact of a channel on the number of sessions.

  • We turn off a channel (for example, Meta) in a specific region and observe the divergence between the on/off regions to gain insights into the incremental effects.

  • The KPI to monitor will be website traffic/sessions.

It is not an A/B test. The channel will be off in the tested region and on in the rest of the country.

Differences in incrementality testing: Google vs Meta

The way we run an incrementality test is the same for all channels: we pause ads in one region and compare performance with the rest of the country. The main difference between Google and Meta is what we measure and what kind of impact you should expect.

  • For Google, campaigns like Search, Shopping and Performance Max are closer to the moment of conversion. That means the impact of switching ads on or off is usually visible in conversions. We also look at organic traffic to see whether paid ads are simply replacing organic clicks.

  • For Meta, campaigns typically focus on awareness and consideration earlier in the customer journey. Conversions often happen later, outside the test window. That’s why we mainly measure website traffic (sessions) to understand whether Meta is driving additional demand. Conversions are still monitored, but they are not the primary success metric.

In short:

  • Google tests focus on incremental conversions

  • Meta tests focus on incremental traffic and demand

The testing approach itself is identical for both channels.

An example of an Incrementality test

Let’s say you’re running Meta ads. To measure the incremental value of your Meta Ads, you decide to conduct a geo-holdout test:

  • Control Region: except for the holdout region, in the rest of the country, you continue running Meta ads as usual.

  • Holdout Region: In Noord-Holland, you pause all Meta ads for the duration of the test (or just exclude Noord-Holland from targeting).

For several weeks, Billy Grace tracks and compares user behavior in both regions. By comparing the data, Billy Grace can determine how much of the website traffic in Noord-Holland was lost due to no Meta Ads, while the rest of the country serves as a baseline where ads were running.

Steps to set up the test

  1. Feasibility check

    Billy runs simulations on historical data to identify the most suitable regions and channels for testing and determine the necessary time window (e.g., 15 days or 1 month).

  2. Run the experiment for 15 days or 1 month by turning off advertising in the selected region.

  3. Calibrate UMM results based on outcomes.

Result

The experiment results will be used to calibrate the UMM models, increasing the certainty in the estimates —> Validation of the estimated UMM view effects. This gives insight in the incrementality of your advertising.

Incrementality testing is interesting when you have the following questions:

Are my campaigns driving results?

Problem: Are the observed outcomes (e.g., sales, clicks, sign-ups) due to the campaign, or would they have happened organically without it?

Insight: Incrementality testing helps determine the causal impact of the campaign, ensuring resources aren't wasted on initiatives that merely capture existing demand.

Which marketing channels deliver the most value?

Problem: With multiple marketing channels (e.g., social media, email, paid search), it's unclear which is truly contributing to incremental growth.

Insight: By isolating the impact of each channel, you can identify high-performing tactics and reallocate budgets to maximize ROI.

What is the optimal budget allocation?

Problem: How should the marketing budget be distributed across campaigns or channels to achieve the best return on investment?

Insight: Incrementality testing reveals diminishing returns on oversaturated channels and highlights underfunded areas with untapped potential.

Incrementality testing is currently in Beta. We are accepting new clients to set up a geo-test. Contact our team via support@billygrace.com to to set up an incrementality test.

FAQ

When is the best time to run an incrementality test?

The test can best be run during a period in which the sales and spend is not influenced to much by external factors and there is no planned big increase or decrease of spend during the test period.

For example, a sale (discount) period is not a good period to do a test because different regions may react to the same sale differently, so by default, there's already some contamination in results, which you want to avoid.

Where does Billy Grace look at when doing a feasability check?

We run simulations with both sessions and events based on at least 90 days of historical data

For each region, we repeat the following steps

  • Assuming a 30-day experiment duration: we use the first 60 days of historical data to fit the model, and use the last 30 days for effect simulations

  • We simulate different potential effect sizes (for example, effect size of -5% -> all data points of the last 30 days are artificially decreased by 5%). The purpose is to find the smallest effect size (in absolute value) that the model can detect with high certainty (by default, 90% confidence level). This is called the "minimum detectable effect".

  • We cross-check with our attribution models to see which channels (or campaign clusters) can meet the minimum detectable effect size.

A good channel (or cluster) and a good region to run the experiment need to satisfy

  • Good model fit based on the first 60 days -> reliable control group matching

  • The model should not detect any effect for the last 30 days if the simulated size is 0% -> valid control group

  • The channel / cluster attribution needs to be at least min detectable size. For example, based on our attribution models, Google Branded Search is 7% of L30D conversions. But based on the simulations, Noord-Holland requires min 10% drop in conversions to be detected with high certainty while Zuid-Holland requires only min 5% drop. Then ZH is a good region to experiment with, and not NH.

Did this answer your question?