Statsig Experimentation Platform: SDK Architecture and Rollouts

Statsig is one of the few feature-flagging vendors whose product surface — flags, experiments, product analytics, session replay — sits on a single ingestion and assignment pipeline. For a senior engineer integrating it, the interesting questions are mechanical rather than marketing: how a user is bucketed, where evaluation runs, what the SDK does on a cold start, how to keep evaluation alive when the Statsig API is unreachable, and how the deployment model (cloud vs. warehouse-native) shapes the rest of the system. This article walks through those mechanics with citations to the official docs and SDK source.

Statsig high-level architecture: server SDKs pull config specs from the CDN and evaluate locally, while client SDKs receive pre-evaluated payloads from /initialize — Statsig high-level architecture: server SDKs pull config specs from the CDN and evaluate locally; client SDKs receive pre-evaluated payloads from /initialize.

TL;DR

Two evaluation models, one assignment algorithm. Server SDKs download the full ruleset and evaluate locally; client SDKs receive pre-computed values for the current user from /initialize. Both produce the same bucket because both feed SHA-256(salt + unitID) into a mod 10000 (experiments) or mod 1000 (layers) bucket assignment (How Evaluation Works).
Deterministic, stateless. SDKs do not persist user → bucket mappings. The hash is recomputed every check; the same user gets the same bucket as long as the rule’s salt is unchanged (How Evaluation Works).
Bootstrap is the recommended pattern for SSR. A server-side getClientInitializeResponse(user) produces a JSON payload you can inline into the page; the browser SDK then calls initializeSync with no network request, eliminating UI flicker (Server Core docs, Client SDK docs).
Failure mode is “serve last known good.” Server SDKs keep evaluating from in-memory specs when the API is unreachable; new instances coming up during an outage need a DataAdapter (Redis, Edge Config, your own table) to recover state (DataAdapter docs).
Two deployment models. Statsig Cloud is fully managed; Statsig Warehouse Native runs the Stats Engine inside your BigQuery / Snowflake / Redshift / Databricks. Pick by where your metric source-of-truth already lives, not by raw price (WHN vs. Cloud).

Mental model

Three concepts cover most of what follows; everything else is mechanics.

Config spec. The serialized definition of every gate, experiment, layer, dynamic config, and ID list in a project. A version of this JSON document is what server SDKs download from the CDN; an evaluated projection of it is what client SDKs receive from /initialize.
Salt. A stable, per-rule string. Combined with the user’s unitID (e.g. userID, stableID, or any customID), it deterministically picks a bucket. Re-using the same rule keeps the same users in the same bucket; creating a new rule re-rolls them (rollout-toggle FAQ).
Exposure. A logged event that says “user X was bucketed into variant Y at time T.” Statistical analysis runs on exposures, not on rule definitions, so dropping or duplicating them silently corrupts experiment results.

Important

The package landscape is split across two SDK generations. The legacy Node SDK is statsig-node (class methods on a singleton: await Statsig.initialize(key)). The current generation is Server Core — @statsig/statsig-node-core — built on a Rust core, instantiated as new Statsig(key, options); await statsig.initialize(). The two surfaces are similar but not identical; the Vercel EdgeConfigDataAdapter example, in particular, still uses the legacy statsig-node package today. All examples below explicitly mark which generation they target.

Unified pipeline, in practice

Unified pipeline: flags, experiments, product analytics, and session replay all flow through one ingestion path, so an exposure and the conversion event it caused share an identity model.

The advantage of routing flags, experiments, and analytics through the same pipeline is not aesthetic. When the exposure that bucketed a user into variant B and the conversion event the user generated five minutes later are processed by the same identity model and the same metric definitions, you don’t need to reconcile two pipelines to claim causality. That removes a class of integration bugs that show up as “the feature flag platform says +5% but the analytics dashboard says -2%.”

Statsig’s architecture decomposes into the usual pieces:

Component	Responsibility
Assignment service	Bucketing and rule evaluation. SDKs do this in-process; the service mostly serves the spec.
Configuration service	Persists rule definitions; emits the `download_config_specs` payload consumed by every server SDK.
Metrics pipeline	Ingests, dedupes, and stages exposures and custom events.
Analysis service	Runs the Stats Engine — CUPED, sequential testing, etc. — against the staged metric data.

The split matters for failure mode discussion later: a problem in the metrics pipeline shows up as bad analytics, while a problem in the configuration service shows up as stale flags — and the SDK’s behavior in each case is different.

What the Stats Engine actually does

The SDK’s job ends once an exposure is logged; the Stats Engine takes over from there. A senior reader picking the platform should know what surface area the engine covers, even when the deep methodology is out of scope for this article:

Capability	What it does	Source
Frequentist + Bayesian results	Both engines run by default; pick per-experiment based on the team’s decision protocol.	Statistical methods overview
CUPED variance reduction	Uses a 7-day pre-experiment baseline as a covariate to shrink confidence intervals on topline metrics. The technique is from Microsoft’s WSDM 2013 paper by Deng, Xu, Kohavi, and Walker¹.	Variance reduction docs, CUPED docs
Sequential testing	Always-valid p-values and confidence sequences so peeking does not inflate false positives. Mandatory if anyone reads results before the planned end date.	Sequential testing docs
SRM (sample ratio mismatch) checks	Chi-squared test against the planned allocation, run automatically on every experiment; flags assignment-pipeline bugs that silently bias results.	SRM checks, Statsig SRM primer
Guardrail metrics	Always-on metrics (latency, errors, retention) that the engine watches alongside the success metric so a launch never wins on the topline while degrading a guardrail.	Guardrail metrics docs

The two operational implications worth internalising up front:

SRM is a stop-the-experiment signal, not a warning. A 50/50 split that arrives as 51/49 with significance is almost always a bug in your assignment plumbing — sticky session caches, downstream filtering, or a bot population landing in only one arm. CUPED will not fix it.
Sequential testing is opt-in for a reason. It costs statistical power (wider intervals) in exchange for the freedom to peek. If your team genuinely commits to fixed-horizon analysis, frequentist with a power calc remains tighter; in practice most teams peek, and sequential is the honest default.

Deterministic assignment

This is the most cited claim in the article and the one that makes cross-platform consistency possible at all.

Per the How Evaluation Works doc:

Each gate or experiment rule generates a unique, stable salt.
The user’s unitID is concatenated with that salt and run through SHA-256.
The resulting digest is reduced to an integer and taken modulo 10000 (for experiments) or 1000 (for layers).
That integer is the user’s bucket; rule allocation thresholds determine which variant the bucket maps to.

1// Conceptual sketch — the canonical implementation lives in2// https://github.com/statsig-io/node-js-server-sdk/blob/main/src/Evaluator.ts3import { createHash } from "node:crypto"45export function bucket(unitID: string, salt: string, modulus: 10_000 | 1_000): number {6  const digest = createHash("sha256").update(`${salt}${unitID}`).digest()7  // The exact byte slice the SDK uses to fold the digest into an integer8  // is an implementation detail; treat the open-source Evaluator as the source of truth.9  const head = digest.readBigUInt64BE(0)10  return Number(head % BigInt(modulus))11}

Note

Earlier drafts of this article specified an exact byte slice (“first 8 hex chars” or “first 8 bytes”). The public docs only state that the digest is “subjected to a modulus operation”; the byte width and endianness are implementation details that have differed between SDK generations. If your application depends on bit-exact reproduction (e.g. you’re building your own assignment service that must match Statsig’s), pin the version of the open-source Evaluator.ts and treat that as the contract, not this article.

The consequences of the algorithm are worth stating explicitly:

Cross-platform consistency. A web client and a Node backend evaluating the same rule for the same user get the same bucket without any coordination, because both compute the same hash.
Temporal consistency under rollout changes. Take a gate from 0% to 50%, back to 0%, and back to 50% again on the same rule and you re-expose the same 50% of users. Create a new rule and the salt changes, so the population re-rolls. This is the operational contract for safe canaries (How Evaluation Works).
No cached state. SDKs do not store user → bucket. Statsig has explicitly written about customers who tried to add Redis to memoize this and ended up paying more for Redis than for Statsig itself; the deterministic hash is cheaper than caching it (How Evaluation Works).

Server SDK: download and evaluate locally

Server SDK lifecycle: at boot the SDK downloads the full config spec from the CDN, then evaluates everything in-memory; a background poller refreshes specs every 10s by default — Server SDK lifecycle: at boot the SDK downloads the full config spec from the CDN, then evaluates everything in-memory; a background poller refreshes specs on a configurable interval (10s default).

The server-side model is the simpler one to reason about: keep the project’s specs in memory, evaluate locally on every check, refresh the specs in the background.

Initialization (Server Core)

1import { Statsig, StatsigUser } from "@statsig/statsig-node-core"23const statsig = new Statsig(process.env.STATSIG_SECRET_KEY!, {4  environment: "production",5})67await statsig.initialize()89const user = new StatsigUser({ userID: "user_123" })10const isOn = statsig.checkGate(user, "new_homepage")11const config = statsig.getDynamicConfig(user, "pricing_tier")12const exp = statsig.getExperiment(user, "recommendation_algorithm")

Important deltas from the legacy statsig-node SDK (migration notes):

Construct an instance with new Statsig(...), then await initialize(). There is no static Statsig.initialize(...) on Server Core.
The Node Core package ships native binaries; if you use a frozen lockfile (Next.js / Docker multi-stage builds), include the platform-specific subpackages your build and runtime targets need.
Mark @statsig/statsig-node-core as a serverExternalPackages entry in next.config.js to keep webpack from trying to bundle the native addon.

Where the spec comes from

The server SDK pulls the project spec from the CDN at the documented endpoint (CDN edge testing guide):

1GET https://api.statsigcdn.com/v1/download_config_specs/<SERVER_SDK_KEY>.json

The polling interval is configurable via rulesetsSyncIntervalMs; the default is 10 000 ms across the Server Core SDKs (C++ Server SDK docs, legacy Python docs). Updates use a company_lcut (“last config-update timestamp”) so the SDK can ask the server “anything new since T?” rather than re-downloading the entire payload every cycle, and the in-memory store is swapped atomically once the new payload is parsed.

The shape of the spec — useful when reasoning about cache TTLs and DataAdapter contents:

1interface ConfigSpecs {2  feature_gates: Record<string, FeatureGateSpec>3  dynamic_configs: Record<string, DynamicConfigSpec>4  layer_configs: Record<string, LayerSpec>5  id_lists: Record<string, string[]>6  has_updates: boolean7  time: number8}

Evaluation latency

Once the spec is in memory, checkGate / getExperiment / getDynamicConfig are synchronous. The Statsig docs target sub-millisecond latency for these calls, with no network involved after init (How Evaluation Works). Treat that as a guideline rather than a guarantee — actual latency depends on rule complexity (geo / segment lookups, custom IDs, ID-list segments) and host load.

Client SDK: pre-computed values

Client SDK lifecycle: at boot the SDK posts the user object to /initialize and receives a tailored JSON payload of pre-evaluated values, which it caches in localStorage for subsequent sessions — Client SDK lifecycle: at boot the SDK sends the user to /initialize and receives a JSON payload of pre-evaluated values; subsequent checks are local, and a localStorage copy survives reloads.

The browser SDK does not download the full ruleset. Two reasons:

Payload weight. A non-trivial Statsig project has thousands of rules; shipping all of them to every browser session is wasteful.
Business-logic exposure. Rules often encode targeting (paid users in EU, tier-1 customers, beta cohort) that you don’t want to ship in plaintext to the client.

So /initialize accepts the user object, evaluates everything Statsig-side using the same algorithm a server SDK would, and returns a tailored payload of { gate → value, config → values, experiment → group }.

Modern `@statsig/js-client` API

1import { StatsigClient } from "@statsig/js-client"23const client = new StatsigClient(4  "client-xyz",5  { userID: "user_123", custom: { plan: "premium" } },6  { environment: { tier: "production" } },7)89await client.initializeAsync()1011const showNewUI = client.checkGate("new_ui")12const layer = client.getLayer("homepage_promo")13const title = layer.get("title", "Welcome")

The constructor takes (sdkKey, user, options?); initializeAsync() and initializeSync() take no arguments. The details object on each result surfaces how the value was resolved (Network:Recognized, Cache:Recognized, Bootstrap, Prefetch, NoValues) — invaluable for debugging stale-flag reports (Client SDK docs).

Initialization strategies

Strategy	When values arrive	Best for	Cost
`initializeAsync()` (awaited)	After network round-trip	Login flows where staleness is unacceptable	Adds RTT to time-to-interactive
`initializeAsync()` (not awaited)	Cache first, network in background	SPAs that can re-render when fresh values land	Mixed values during the bootstrap window
`initializeSync()` from cache	Synchronously, from `localStorage`	Repeat visits	Up to one-session staleness
Bootstrap (`dataAdapter.setData` + `initializeSync`)	Synchronously, from server-injected payload	SSR (Next.js, Remix)	Couples SSR and client init; needs the server payload to be available before render

Bootstrap is the recommended pattern when SSR is available, because it eliminates both UI flicker and the network round-trip:

1// In the browser, after the server has injected `bootstrapValues` into the page:2import { StatsigClient } from "@statsig/js-client"34const client = new StatsigClient("client-xyz", { userID: "user_123" })5client.dataAdapter.setData(JSON.stringify(bootstrapValues))6client.initializeSync()

The matching server side uses getClientInitializeResponse from the Server Core SDK, with the hashAlgorithm set to 'djb2' (the modern client SDK assumes that hash for size and lookup speed; see bootstrap docs):

1import { Statsig, StatsigUser } from "@statsig/statsig-node-core"23const statsig = new Statsig(process.env.STATSIG_SECRET_KEY!)4await statsig.initialize()56export function bootstrapFor(user: StatsigUser) {7  return statsig.getClientInitializeResponse(user, {8    hashAlgorithm: "djb2",9    featureGateFilter: new Set(["new_ui"]),10    experimentFilter: new Set(["pricing_v2"]),11  })12}

Tip

Use featureGateFilter / experimentFilter / layerFilter to ship only the entities the page actually needs. Without filters, the bootstrap payload contains every gate / experiment / config evaluated for the user, which can balloon HTML weight and leak the existence of in-progress experiments to the client.

DataAdapter: the only resilience knob worth tuning

When Statsig’s API is reachable, the SDKs work; when it’s not, behavior depends entirely on whether you supplied a DataAdapter.

DataAdapter pattern: a single writer service or cron pulls fresh specs from the Statsig CDN into a shared store (Redis / Edge Config / a database table); every webserver instance reads from the store on cold start — DataAdapter pattern: one writer keeps a shared store fresh; every webserver reads on cold start. This is the only way to keep new instances functioning during a Statsig API outage.

The interface

The DataAdapter (also called DataStore in some SDK languages) is the SDK’s pluggable cache for config specs. The shape is intentionally minimal (DataAdapter docs):

1interface DataAdapter {2  initialize(): Promise<void>3  get(key: string): Promise<string | null>4  set(key: string, value: string): Promise<void>5  shutdown(): Promise<void>6  // Server Core adds:7  // supportsPollingUpdatesFor(key: string): boolean8}

In Server Core, the cache key format is statsig|{path}|{format}|{hashedSDKKey}, where path is /v1/download_config_specs or /v1/get_id_lists, format is plain_text, and the SDK key is folded with djb2 (DataAdapter docs). Don’t reconstruct these keys yourself unless you have to — the SDK handles them.

The single most important rule

From the official docs, repeated here because mis-configuring it will silently degrade your platform:

In most cases, your webservers should only implement the read path (get). … Otherwise, every SDK instance across all of your servers will attempt to write to the store whenever it sees an update, which is inefficient and can lead to unnecessary contention or duplication. (DataAdapter docs)

The recommended topology:

A single writer — either a dedicated microservice with the SDK, or a cron job that fetches download_config_specs from the CDN and writes the response into your store.
Many readers — every webserver instance with the SDK and a read-only DataAdapter whose set is a no-op.

Polling for ongoing updates via the DataAdapter is currently supported in the Node.js, Ruby, Go, Java, and .NET SDKs (DataAdapter docs). Other languages still need to fall back to the Statsig CDN once after cold start.

Vercel Edge Config (the canonical hosted DataAdapter)

The Vercel adapter ships in statsig-node-vercel (note: not @statsig/vercel-server) and exposes EdgeConfigDataAdapter (Vercel guide, GitHub source). It targets the legacy statsig-node SDK today:

1import Statsig from "statsig-node"2import { createClient } from "@vercel/edge-config"3import { EdgeConfigDataAdapter } from "statsig-node-vercel"45const edgeConfigClient = createClient(process.env.EDGE_CONFIG!)6const dataAdapter = new EdgeConfigDataAdapter({7  edgeConfigClient,8  edgeConfigItemKey: process.env.EDGE_CONFIG_ITEM_KEY!,9})1011await Statsig.initialize("statsig-server-api-key-here", { dataAdapter })

The Statsig Vercel native integration pushes config updates into your Edge Config item automatically, so the adapter’s read-only get is enough for most setups.

Redis (write-it-yourself)

The first-party Redis reference is @statsig/node-js-server-sdk-redis on GitHub (repo). The wiring follows the same shape; the only design decision is who owns the writer.

Cloud vs. Warehouse Native

The deployment-model decision shapes everything downstream — what data you ship, what the latency budget for analysis looks like, who owns the metrics catalogue. The two options have meaningful trade-offs (WHN vs. Cloud, WHN pipeline overview, decision guide):

Dimension	Statsig Cloud (managed)	Statsig Warehouse Native
Where data lives	Statsig’s infrastructure	Your BigQuery / Snowflake / Redshift / Databricks
Where the Stats Engine runs	Statsig	Inside your warehouse, against your tables
Who owns metric definitions	SDK events autocreate metrics	You define metrics on top of warehouse tables
Setup effort	Drop in the SDK	Connect warehouse, model exposures, define metrics
Latency to results	Near real-time exposures	Bound by warehouse compute and refresh cadence
Egress / compliance	Data leaves your perimeter	Data stays in-warehouse
Best fit	Teams without a central warehouse, or teams that want speed-of-iteration over data control	Teams whose source-of-truth metrics already live in the warehouse, or who have egress / sovereignty constraints

The WHN pipeline is materialized as a sequence of warehouse jobs: identify first exposures, annotate them against metric sources, build per-(metric, user, day) staging, and roll up to group-level summary statistics. It supports Full, Incremental, and Metric refreshes, and exposes job history and cost in the console (WHN pipeline overview).

For the historical context on the engine: Statsig’s own internal experiment pipeline migrated from Spark to BigQuery to escape pipeline error rates, storage limits, and Spark-cluster ops cost (Statsig × Google Cloud postmortem). That same shift is what makes WHN-on-BigQuery viable as a product today.

Bootstrap initialization in Next.js

Putting Server Core, the bootstrap pattern, and the modern client SDK together for an SSR app:

Bootstrap initialization flow: getServerSideProps calls getClientInitializeResponse, the JSON payload is serialized into the HTML, and the client SDK calls dataAdapter.setData + initializeSync with no network call — Bootstrap initialization flow: server pre-computes per-user values, ships them in the HTML, and the client SDK initializes synchronously with no network round-trip.

1import { Statsig, StatsigUser } from "@statsig/statsig-node-core"23let instance: Statsig | null = null45export async function getStatsig(): Promise<Statsig> {6  if (!instance) {7    instance = new Statsig(process.env.STATSIG_SECRET_KEY!)8    await instance.initialize()9  }10  return instance11}1213export async function getBootstrapValues(user: StatsigUser) {14  const s = await getStatsig()15  return s.getClientInitializeResponse(user, { hashAlgorithm: "djb2" })16}

1import type { GetServerSideProps } from "next"2import { useEffect, useState } from "react"3import { StatsigClient } from "@statsig/js-client"4import { getBootstrapValues } from "../lib/statsig-server"56export const getServerSideProps: GetServerSideProps = async ({ req }) => {7  const userID = (req.headers["x-user-id"] as string) ?? "anonymous"8  const bootstrapValues = await getBootstrapValues({ userID, custom: { source: "web" } })9  return { props: { bootstrapValues, userID } }10}1112export default function Home({ bootstrapValues, userID }: { bootstrapValues: unknown; userID: string }) {13  const [client, setClient] = useState<StatsigClient | null>(null)1415  useEffect(() => {16    const c = new StatsigClient(process.env.NEXT_PUBLIC_STATSIG_CLIENT_KEY!, { userID })17    c.dataAdapter.setData(JSON.stringify(bootstrapValues))18    c.initializeSync()19    setClient(c)20    return () => void c.shutdown()21  }, [bootstrapValues, userID])2223  const showNewUI = client?.checkGate("new_homepage") ?? false24  return showNewUI ? <NewHomepage /> : <ClassicHomepage />25}

The next.config.js needs serverExternalPackages: ['@statsig/statsig-node-core'] so the native binary is not bundled.

Overrides: the testing escape hatch

Overrides skip rule evaluation entirely; the SDK returns the value you set (Server Core overrides).

1statsig.overrideGate("new_ui", true)2statsig.overrideGate("new_ui", false, "user_123")3statsig.overrideExperimentByGroupName("pricing_v2", "treatment")4statsig.overrideDynamicConfig("homepage_copy", { title: "Hello" }, "user_123")

For tests that should never call out, instantiate the SDK in local mode:

1const statsig = new Statsig("secret-key", { localMode: true })2await statsig.initialize()

Failure modes

Statsig API outage, existing instances. Server SDKs continue evaluating from the in-memory specs from the last successful poll; the SDK retries in the background and atomically swaps when a new payload arrives (DataAdapter docs).
Statsig API outage, new instance cold start. Without a DataAdapter, the SDK has no specs and checkGate returns the default-false. With a DataAdapter whose get returns the last cached spec, the new instance comes up fully evaluated. This is the only common scenario where a DataAdapter is non-optional.
Client browser offline. initializeSync reads from localStorage and the SDK keeps serving the cached values. Reasons surface as Cache:Recognized so you can detect it; brand-new sessions with no cache return NoValues and fall to your code-defined defaults (Client SDK docs).
Stale rule deploy. Because evaluation is deterministic, the same user keeps the same bucket across rule changes that don’t touch the salt. A genuinely new rule (or a salt change) re-rolls the population — desirable for a fresh experiment, dangerous if you didn’t intend to.
Bootstrap drift. If your SSR getClientInitializeResponse runs against a different STATSIG_SECRET_KEY (or a stale snapshot) than the client’s client-xyz key targets, the client SDK will quietly re-evaluate against the network and you’ll see flicker. Match the keys to the same project and the same environment tier.
Sample ratio mismatch (SRM). If the engine’s chi-squared check flags an experiment whose realised allocation diverges from its planned split, treat it as a hard stop, not a curiosity (SRM checks). Common upstream causes are bot traffic landing in only one arm, asymmetric pre-bucketing filters, sticky session layers that cache one variant, and crash rates that prune one side. Variance reduction does not fix SRM — the assignment data is biased before any analysis runs.

Operational guidance

A short list of opinionated defaults from running this in production:

Default to bootstrap for SSR apps. It’s the only initialization mode that gives you correct first-paint values without an extra round-trip. Filter the payload aggressively.
Run a single DataAdapter writer. Don’t let every webserver fight to update Redis. A cron job pulling the CDN every 30–60s is sufficient for most teams.
Log evaluation details.reason in your client telemetry. It tells you cache vs network vs bootstrap and is the fastest path to debugging “why is this user seeing the wrong variant.”
Match the SDK key to the bootstrap key. A bootstrap payload generated against project A injected into a client running against project B fails silently — the client falls back to network and you lose the bootstrap benefit.
Don’t memoize assignments yourself. The deterministic hash is faster than a Redis lookup; the only reason to cache is the spec, not the assignment.
For warehouse-native, model exposures as a first-class table. The pipeline assumes a clean exposure table with (unitID, experiment, group, timestamp) semantics; don’t try to bolt experiment analysis onto a generic events table.

References

How Evaluation Works — the canonical description of the SHA-256 + modulus algorithm and the determinism guarantees.
Node Server SDK (Server Core) — modern Node init, getClientInitializeResponse, override APIs, manual exposures.
JavaScript Client SDK (Web) — modern @statsig/js-client init, details.reason, dataAdapter.
Server Data Stores / Data Adapter — interface, cache key format, recommended single-writer topology.
CDN Edge Testing — the download_config_specs URL form.
Using Edge Config with Statsig (Vercel) — the canonical hosted DataAdapter example.
statsig-io/node-js-server-sdk Evaluator.ts — open-source evaluation reference; the source of truth for byte-exact bucket reproduction.
Statsig Warehouse Native vs. Cloud — the deployment-model trade-off matrix.
WHN Pipeline Overview — what the warehouse-native pipeline actually computes.
How Statsig migrated to BigQuery from Spark — primary-source engineering postmortem on the analysis backend.
Statistical methods overview — the Stats Engine surface area: frequentist + Bayesian, CUPED, sequential testing, SRM checks, guardrail metrics.

Alex Deng, Ya Xu, Ron Kohavi, Toby Walker. “Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data”, WSDM 2013 (Microsoft Research). The foundational paper for CUPED, the variance-reduction technique Statsig and most modern experimentation platforms apply by default. ↩