Your feature flag is live, but is it actually working? PostHog gives you the data to measure what changed after you rolled out a flag. We'll show you how to calculate the actual impact on your key metrics.
Capture Flag Variant Events
Before you can measure impact, PostHog needs to know which variant each user saw.
Send events tagged with the flag variant
When a user interacts with your feature flag, capture an event that includes the variant they received. Use the PostHog SDK to add the flag and variant as properties on your event.
posthog.capture('feature_interaction', {
'feature_flag': 'new_checkout_flow',
'flag_variant': 'treatment',
'action': 'clicked_button'
});Ensure your control and treatment groups both log events
Users in the control (original) variant and treatment (new) variant must both send the same event. This is how PostHog compares them. If only one group sends events, you can't measure impact.
if (posthog.featureFlags.isFeatureEnabled('new_checkout_flow')) {
showNewCheckout();
posthog.capture('checkout_flow_viewed', {
'flag_variant': 'treatment'
});
} else {
showOldCheckout();
posthog.capture('checkout_flow_viewed', {
'flag_variant': 'control'
});
}Calculate Impact in Insights
Once both variants are logging events, create an Insight in PostHog to compare their metrics.
Create a Trend or Funnel insight
Go to Insights > New Insight and choose Trend to measure event frequency over time, or Funnel to see how users progress through steps. Select the event you want to measure (e.g., checkout_flow_viewed).
fetch('https://your-instance.posthog.com/api/insights/?token=YOUR_API_TOKEN', {
method: 'GET'
}).then(res => res.json()).then(data => {
console.log(data.results);
});Add a property filter for flag variant
In the Insight, add a filter: flag_variant = control for one graph, then duplicate it and change to flag_variant = treatment. PostHog will show you the metrics side-by-side.
fetch('https://your-instance.posthog.com/api/trends/', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_TOKEN'
},
body: JSON.stringify({
event: 'checkout_flow_viewed',
properties: [{ key: 'flag_variant', operator: 'exact', value: 'treatment' }]
})
}).then(res => res.json()).then(data => {
console.log('Treatment events:', data.results[0].count);
});Calculate the percentage change
Once you have counts or rates for both variants, calculate impact: (treatment - control) / control × 100. If treatment has 1200 events and control has 1000, that's a +20% improvement.
Check Statistical Significance
Not all improvements are real. PostHog helps you determine if your result is statistically sound.
Use PostHog Experiments for automated significance testing
If you set up a feature flag as an Experiment in PostHog, it automatically tracks control vs. treatment and runs a chi-squared test for statistical significance. Open the Experiment and look for the Statistical Significance indicator.
fetch('https://your-instance.posthog.com/api/experiments/?token=YOUR_API_TOKEN', {
method: 'GET'
}).then(res => res.json()).then(data => {
const exp = data.results[0];
console.log('Significance:', exp.significance);
console.log('Confidence:', exp.confidence);
});Look for p-value < 0.05
PostHog displays the p-value and confidence level (usually 95%). A p-value below 0.05 means there's less than a 5% chance the difference happened by random variation. That's the standard threshold for a statistically significant result.
Common Pitfalls
- Only logging events for the treatment variant — you need both control and treatment sending the same event to compare them
- Stopping analysis after one day — feature flag impact needs time to stabilize; wait at least one week of user data before drawing conclusions
- Confusing event volume with conversion rate — a higher event count doesn't mean better impact if both groups got the feature; measure the rate (events per user) instead
- Ignoring sample size — with small user counts, random variation can masquerade as impact; make sure you have at least 100+ events in each variant before trusting the result
Wrapping Up
Now you can ship a feature flag and actually know whether it worked. Track both variants, compare them in Insights, and check statistical significance before declaring victory. If you want to track this automatically across tools and get insights without manual analysis, Product Analyst can help.