Statsig Experimentation Platform: SDK Architecture and Rollouts
Statsig is one of the few feature-flagging vendors whose product surface — flags, experiments, product analytics, session replay — sits on a single ingestion and assignment pipeline. For a senior engineer integrating it, the interesting questions are mechanical rather than marketing: how a user is bucketed, where evaluation runs, what the SDK does on a cold start, how to keep evaluation alive when the Statsig API is unreachable, and how the deployment model (cloud vs. warehouse-native) shapes the rest of the system. This article walks through those mechanics with citations to the official docs and SDK source.
TL;DR
- Two evaluation models, one assignment algorithm. Server SDKs download the full ruleset and evaluate locally; client SDKs receive pre-computed values for the current user from
/initialize. Both produce the same bucket because both feedSHA-256(salt + unitID)into amod 10000(experiments) ormod 1000(layers) bucket assignment (How Evaluation Works). - Deterministic, stateless. SDKs do not persist
user → bucketmappings. The hash is recomputed every check; the same user gets the same bucket as long as the rule’s salt is unchanged (How Evaluation Works). - Bootstrap is the recommended pattern for SSR. A server-side
getClientInitializeResponse(user)produces a JSON payload you can inline into the page; the browser SDK then callsinitializeSyncwith no network request, eliminating UI flicker (Server Core docs, Client SDK docs). - Failure mode is “serve last known good.” Server SDKs keep evaluating from in-memory specs when the API is unreachable; new instances coming up during an outage need a
DataAdapter(Redis, Edge Config, your own table) to recover state (DataAdapter docs). - Two deployment models. Statsig Cloud is fully managed; Statsig Warehouse Native runs the Stats Engine inside your BigQuery / Snowflake / Redshift / Databricks. Pick by where your metric source-of-truth already lives, not by raw price (WHN vs. Cloud).
Mental model
Three concepts cover most of what follows; everything else is mechanics.
- Config spec. The serialized definition of every gate, experiment, layer, dynamic config, and ID list in a project. A version of this JSON document is what server SDKs download from the CDN; an evaluated projection of it is what client SDKs receive from
/initialize. - Salt. A stable, per-rule string. Combined with the user’s
unitID(e.g.userID,stableID, or anycustomID), it deterministically picks a bucket. Re-using the same rule keeps the same users in the same bucket; creating a new rule re-rolls them (rollout-toggle FAQ). - Exposure. A logged event that says “user X was bucketed into variant Y at time T.” Statistical analysis runs on exposures, not on rule definitions, so dropping or duplicating them silently corrupts experiment results.
Important
The package landscape is split across two SDK generations. The legacy Node SDK is statsig-node (class methods on a singleton: await Statsig.initialize(key)). The current generation is Server Core — @statsig/statsig-node-core — built on a Rust core, instantiated as new Statsig(key, options); await statsig.initialize(). The two surfaces are similar but not identical; the Vercel EdgeConfigDataAdapter example, in particular, still uses the legacy statsig-node package today. All examples below explicitly mark which generation they target.
Unified pipeline, in practice
The advantage of routing flags, experiments, and analytics through the same pipeline is not aesthetic. When the exposure that bucketed a user into variant B and the conversion event the user generated five minutes later are processed by the same identity model and the same metric definitions, you don’t need to reconcile two pipelines to claim causality. That removes a class of integration bugs that show up as “the feature flag platform says +5% but the analytics dashboard says -2%.”
Statsig’s architecture decomposes into the usual pieces:
| Component | Responsibility |
|---|---|
| Assignment service | Bucketing and rule evaluation. SDKs do this in-process; the service mostly serves the spec. |
| Configuration service | Persists rule definitions; emits the download_config_specs payload consumed by every server SDK. |
| Metrics pipeline | Ingests, dedupes, and stages exposures and custom events. |
| Analysis service | Runs the Stats Engine — CUPED, sequential testing, etc. — against the staged metric data. |
The split matters for failure mode discussion later: a problem in the metrics pipeline shows up as bad analytics, while a problem in the configuration service shows up as stale flags — and the SDK’s behavior in each case is different.
What the Stats Engine actually does
The SDK’s job ends once an exposure is logged; the Stats Engine takes over from there. A senior reader picking the platform should know what surface area the engine covers, even when the deep methodology is out of scope for this article:
| Capability | What it does | Source |
|---|---|---|
| Frequentist + Bayesian results | Both engines run by default; pick per-experiment based on the team’s decision protocol. | Statistical methods overview |
| CUPED variance reduction | Uses a 7-day pre-experiment baseline as a covariate to shrink confidence intervals on topline metrics. The technique is from Microsoft’s WSDM 2013 paper by Deng, Xu, Kohavi, and Walker1. | Variance reduction docs, CUPED docs |
| Sequential testing | Always-valid p-values and confidence sequences so peeking does not inflate false positives. Mandatory if anyone reads results before the planned end date. | Sequential testing docs |
| SRM (sample ratio mismatch) checks | Chi-squared test against the planned allocation, run automatically on every experiment; flags assignment-pipeline bugs that silently bias results. | SRM checks, Statsig SRM primer |
| Guardrail metrics | Always-on metrics (latency, errors, retention) that the engine watches alongside the success metric so a launch never wins on the topline while degrading a guardrail. | Guardrail metrics docs |
The two operational implications worth internalising up front:
- SRM is a stop-the-experiment signal, not a warning. A 50/50 split that arrives as 51/49 with significance is almost always a bug in your assignment plumbing — sticky session caches, downstream filtering, or a bot population landing in only one arm. CUPED will not fix it.
- Sequential testing is opt-in for a reason. It costs statistical power (wider intervals) in exchange for the freedom to peek. If your team genuinely commits to fixed-horizon analysis, frequentist with a power calc remains tighter; in practice most teams peek, and sequential is the honest default.
Deterministic assignment
This is the most cited claim in the article and the one that makes cross-platform consistency possible at all.
Per the How Evaluation Works doc:
- Each gate or experiment rule generates a unique, stable salt.
- The user’s
unitIDis concatenated with that salt and run through SHA-256. - The resulting digest is reduced to an integer and taken modulo
10000(for experiments) or1000(for layers). - That integer is the user’s bucket; rule allocation thresholds determine which variant the bucket maps to.
// Conceptual sketch — the canonical implementation lives in// https://github.com/statsig-io/node-js-server-sdk/blob/main/src/Evaluator.tsimport { createHash } from "node:crypto"export function bucket(unitID: string, salt: string, modulus: 10_000 | 1_000): number { const digest = createHash("sha256").update(`${salt}${unitID}`).digest() // The exact byte slice the SDK uses to fold the digest into an integer // is an implementation detail; treat the open-source Evaluator as the source of truth. const head = digest.readBigUInt64BE(0) return Number(head % BigInt(modulus))}Note
Earlier drafts of this article specified an exact byte slice (“first 8 hex chars” or “first 8 bytes”). The public docs only state that the digest is “subjected to a modulus operation”; the byte width and endianness are implementation details that have differed between SDK generations. If your application depends on bit-exact reproduction (e.g. you’re building your own assignment service that must match Statsig’s), pin the version of the open-source Evaluator.ts and treat that as the contract, not this article.
The consequences of the algorithm are worth stating explicitly:
- Cross-platform consistency. A web client and a Node backend evaluating the same rule for the same user get the same bucket without any coordination, because both compute the same hash.
- Temporal consistency under rollout changes. Take a gate from 0% to 50%, back to 0%, and back to 50% again on the same rule and you re-expose the same 50% of users. Create a new rule and the salt changes, so the population re-rolls. This is the operational contract for safe canaries (How Evaluation Works).
- No cached state. SDKs do not store
user → bucket. Statsig has explicitly written about customers who tried to add Redis to memoize this and ended up paying more for Redis than for Statsig itself; the deterministic hash is cheaper than caching it (How Evaluation Works).
Server SDK: download and evaluate locally
The server-side model is the simpler one to reason about: keep the project’s specs in memory, evaluate locally on every check, refresh the specs in the background.
Initialization (Server Core)
import { Statsig, StatsigUser } from "@statsig/statsig-node-core"const statsig = new Statsig(process.env.STATSIG_SECRET_KEY!, { environment: "production",})await statsig.initialize()const user = new StatsigUser({ userID: "user_123" })const isOn = statsig.checkGate(user, "new_homepage")const config = statsig.getDynamicConfig(user, "pricing_tier")const exp = statsig.getExperiment(user, "recommendation_algorithm")Important deltas from the legacy statsig-node SDK (migration notes):
- Construct an instance with
new Statsig(...), thenawait initialize(). There is no staticStatsig.initialize(...)on Server Core. - The Node Core package ships native binaries; if you use a frozen lockfile (Next.js / Docker multi-stage builds), include the platform-specific subpackages your build and runtime targets need.
- Mark
@statsig/statsig-node-coreas aserverExternalPackagesentry innext.config.jsto keep webpack from trying to bundle the native addon.
Where the spec comes from
The server SDK pulls the project spec from the CDN at the documented endpoint (CDN edge testing guide):
GET https://api.statsigcdn.com/v1/download_config_specs/<SERVER_SDK_KEY>.jsonThe polling interval is configurable via rulesetsSyncIntervalMs; the default is 10 000 ms across the Server Core SDKs (C++ Server SDK docs, legacy Python docs). Updates use a company_lcut (“last config-update timestamp”) so the SDK can ask the server “anything new since T?” rather than re-downloading the entire payload every cycle, and the in-memory store is swapped atomically once the new payload is parsed.
The shape of the spec — useful when reasoning about cache TTLs and DataAdapter contents:
interface ConfigSpecs { feature_gates: Record<string, FeatureGateSpec> dynamic_configs: Record<string, DynamicConfigSpec> layer_configs: Record<string, LayerSpec> id_lists: Record<string, string[]> has_updates: boolean time: number}Evaluation latency
Once the spec is in memory, checkGate / getExperiment / getDynamicConfig are synchronous. The Statsig docs target sub-millisecond latency for these calls, with no network involved after init (How Evaluation Works). Treat that as a guideline rather than a guarantee — actual latency depends on rule complexity (geo / segment lookups, custom IDs, ID-list segments) and host load.
Client SDK: pre-computed values
The browser SDK does not download the full ruleset. Two reasons:
- Payload weight. A non-trivial Statsig project has thousands of rules; shipping all of them to every browser session is wasteful.
- Business-logic exposure. Rules often encode targeting (paid users in EU, tier-1 customers, beta cohort) that you don’t want to ship in plaintext to the client.
So /initialize accepts the user object, evaluates everything Statsig-side using the same algorithm a server SDK would, and returns a tailored payload of { gate → value, config → values, experiment → group }.
Modern @statsig/js-client API
import { StatsigClient } from "@statsig/js-client"const client = new StatsigClient( "client-xyz", { userID: "user_123", custom: { plan: "premium" } }, { environment: { tier: "production" } },)await client.initializeAsync()const showNewUI = client.checkGate("new_ui")const layer = client.getLayer("homepage_promo")const title = layer.get("title", "Welcome")The constructor takes (sdkKey, user, options?); initializeAsync() and initializeSync() take no arguments. The details object on each result surfaces how the value was resolved (Network:Recognized, Cache:Recognized, Bootstrap, Prefetch, NoValues) — invaluable for debugging stale-flag reports (Client SDK docs).
Initialization strategies
| Strategy | When values arrive | Best for | Cost |
|---|---|---|---|
initializeAsync() (awaited) |
After network round-trip | Login flows where staleness is unacceptable | Adds RTT to time-to-interactive |
initializeAsync() (not awaited) |
Cache first, network in background | SPAs that can re-render when fresh values land | Mixed values during the bootstrap window |
initializeSync() from cache |
Synchronously, from localStorage |
Repeat visits | Up to one-session staleness |
Bootstrap (dataAdapter.setData + initializeSync) |
Synchronously, from server-injected payload | SSR (Next.js, Remix) | Couples SSR and client init; needs the server payload to be available before render |
Bootstrap is the recommended pattern when SSR is available, because it eliminates both UI flicker and the network round-trip:
// In the browser, after the server has injected `bootstrapValues` into the page:import { StatsigClient } from "@statsig/js-client"const client = new StatsigClient("client-xyz", { userID: "user_123" })client.dataAdapter.setData(JSON.stringify(bootstrapValues))client.initializeSync()The matching server side uses getClientInitializeResponse from the Server Core SDK, with the hashAlgorithm set to 'djb2' (the modern client SDK assumes that hash for size and lookup speed; see bootstrap docs):
import { Statsig, StatsigUser } from "@statsig/statsig-node-core"const statsig = new Statsig(process.env.STATSIG_SECRET_KEY!)await statsig.initialize()export function bootstrapFor(user: StatsigUser) { return statsig.getClientInitializeResponse(user, { hashAlgorithm: "djb2", featureGateFilter: new Set(["new_ui"]), experimentFilter: new Set(["pricing_v2"]), })}Tip
Use featureGateFilter / experimentFilter / layerFilter to ship only the entities the page actually needs. Without filters, the bootstrap payload contains every gate / experiment / config evaluated for the user, which can balloon HTML weight and leak the existence of in-progress experiments to the client.
DataAdapter: the only resilience knob worth tuning
When Statsig’s API is reachable, the SDKs work; when it’s not, behavior depends entirely on whether you supplied a DataAdapter.
The interface
The DataAdapter (also called DataStore in some SDK languages) is the SDK’s pluggable cache for config specs. The shape is intentionally minimal (DataAdapter docs):
interface DataAdapter { initialize(): Promise<void> get(key: string): Promise<string | null> set(key: string, value: string): Promise<void> shutdown(): Promise<void> // Server Core adds: // supportsPollingUpdatesFor(key: string): boolean}In Server Core, the cache key format is statsig|{path}|{format}|{hashedSDKKey}, where path is /v1/download_config_specs or /v1/get_id_lists, format is plain_text, and the SDK key is folded with djb2 (DataAdapter docs). Don’t reconstruct these keys yourself unless you have to — the SDK handles them.
The single most important rule
From the official docs, repeated here because mis-configuring it will silently degrade your platform:
In most cases, your webservers should only implement the read path (
get). … Otherwise, every SDK instance across all of your servers will attempt to write to the store whenever it sees an update, which is inefficient and can lead to unnecessary contention or duplication. (DataAdapter docs)
The recommended topology:
- A single writer — either a dedicated microservice with the SDK, or a cron job that fetches
download_config_specsfrom the CDN and writes the response into your store. - Many readers — every webserver instance with the SDK and a read-only
DataAdapterwhosesetis a no-op.
Polling for ongoing updates via the DataAdapter is currently supported in the Node.js, Ruby, Go, Java, and .NET SDKs (DataAdapter docs). Other languages still need to fall back to the Statsig CDN once after cold start.
Vercel Edge Config (the canonical hosted DataAdapter)
The Vercel adapter ships in statsig-node-vercel (note: not @statsig/vercel-server) and exposes EdgeConfigDataAdapter (Vercel guide, GitHub source). It targets the legacy statsig-node SDK today:
import Statsig from "statsig-node"import { createClient } from "@vercel/edge-config"import { EdgeConfigDataAdapter } from "statsig-node-vercel"const edgeConfigClient = createClient(process.env.EDGE_CONFIG!)const dataAdapter = new EdgeConfigDataAdapter({ edgeConfigClient, edgeConfigItemKey: process.env.EDGE_CONFIG_ITEM_KEY!,})await Statsig.initialize("statsig-server-api-key-here", { dataAdapter })The Statsig Vercel native integration pushes config updates into your Edge Config item automatically, so the adapter’s read-only get is enough for most setups.
Redis (write-it-yourself)
The first-party Redis reference is @statsig/node-js-server-sdk-redis on GitHub (repo). The wiring follows the same shape; the only design decision is who owns the writer.
Cloud vs. Warehouse Native
The deployment-model decision shapes everything downstream — what data you ship, what the latency budget for analysis looks like, who owns the metrics catalogue. The two options have meaningful trade-offs (WHN vs. Cloud, WHN pipeline overview, decision guide):
| Dimension | Statsig Cloud (managed) | Statsig Warehouse Native |
|---|---|---|
| Where data lives | Statsig’s infrastructure | Your BigQuery / Snowflake / Redshift / Databricks |
| Where the Stats Engine runs | Statsig | Inside your warehouse, against your tables |
| Who owns metric definitions | SDK events autocreate metrics | You define metrics on top of warehouse tables |
| Setup effort | Drop in the SDK | Connect warehouse, model exposures, define metrics |
| Latency to results | Near real-time exposures | Bound by warehouse compute and refresh cadence |
| Egress / compliance | Data leaves your perimeter | Data stays in-warehouse |
| Best fit | Teams without a central warehouse, or teams that want speed-of-iteration over data control | Teams whose source-of-truth metrics already live in the warehouse, or who have egress / sovereignty constraints |
The WHN pipeline is materialized as a sequence of warehouse jobs: identify first exposures, annotate them against metric sources, build per-(metric, user, day) staging, and roll up to group-level summary statistics. It supports Full, Incremental, and Metric refreshes, and exposes job history and cost in the console (WHN pipeline overview).
For the historical context on the engine: Statsig’s own internal experiment pipeline migrated from Spark to BigQuery to escape pipeline error rates, storage limits, and Spark-cluster ops cost (Statsig × Google Cloud postmortem). That same shift is what makes WHN-on-BigQuery viable as a product today.
Bootstrap initialization in Next.js
Putting Server Core, the bootstrap pattern, and the modern client SDK together for an SSR app:
import { Statsig, StatsigUser } from "@statsig/statsig-node-core"let instance: Statsig | null = nullexport async function getStatsig(): Promise<Statsig> { if (!instance) { instance = new Statsig(process.env.STATSIG_SECRET_KEY!) await instance.initialize() } return instance}export async function getBootstrapValues(user: StatsigUser) { const s = await getStatsig() return s.getClientInitializeResponse(user, { hashAlgorithm: "djb2" })}import type { GetServerSideProps } from "next"import { useEffect, useState } from "react"import { StatsigClient } from "@statsig/js-client"import { getBootstrapValues } from "../lib/statsig-server"export const getServerSideProps: GetServerSideProps = async ({ req }) => { const userID = (req.headers["x-user-id"] as string) ?? "anonymous" const bootstrapValues = await getBootstrapValues({ userID, custom: { source: "web" } }) return { props: { bootstrapValues, userID } }}export default function Home({ bootstrapValues, userID }: { bootstrapValues: unknown; userID: string }) { const [client, setClient] = useState<StatsigClient | null>(null) useEffect(() => { const c = new StatsigClient(process.env.NEXT_PUBLIC_STATSIG_CLIENT_KEY!, { userID }) c.dataAdapter.setData(JSON.stringify(bootstrapValues)) c.initializeSync() setClient(c) return () => void c.shutdown() }, [bootstrapValues, userID]) const showNewUI = client?.checkGate("new_homepage") ?? false return showNewUI ? <NewHomepage /> : <ClassicHomepage />}The next.config.js needs serverExternalPackages: ['@statsig/statsig-node-core'] so the native binary is not bundled.
Overrides: the testing escape hatch
Overrides skip rule evaluation entirely; the SDK returns the value you set (Server Core overrides).
statsig.overrideGate("new_ui", true)statsig.overrideGate("new_ui", false, "user_123")statsig.overrideExperimentByGroupName("pricing_v2", "treatment")statsig.overrideDynamicConfig("homepage_copy", { title: "Hello" }, "user_123")For tests that should never call out, instantiate the SDK in local mode:
const statsig = new Statsig("secret-key", { localMode: true })await statsig.initialize()Failure modes
- Statsig API outage, existing instances. Server SDKs continue evaluating from the in-memory specs from the last successful poll; the SDK retries in the background and atomically swaps when a new payload arrives (DataAdapter docs).
- Statsig API outage, new instance cold start. Without a
DataAdapter, the SDK has no specs andcheckGatereturns the default-false. With aDataAdapterwhosegetreturns the last cached spec, the new instance comes up fully evaluated. This is the only common scenario where aDataAdapteris non-optional. - Client browser offline.
initializeSyncreads fromlocalStorageand the SDK keeps serving the cached values. Reasons surface asCache:Recognizedso you can detect it; brand-new sessions with no cache returnNoValuesand fall to your code-defined defaults (Client SDK docs). - Stale rule deploy. Because evaluation is deterministic, the same user keeps the same bucket across rule changes that don’t touch the salt. A genuinely new rule (or a salt change) re-rolls the population — desirable for a fresh experiment, dangerous if you didn’t intend to.
- Bootstrap drift. If your SSR
getClientInitializeResponseruns against a differentSTATSIG_SECRET_KEY(or a stale snapshot) than the client’sclient-xyzkey targets, the client SDK will quietly re-evaluate against the network and you’ll see flicker. Match the keys to the same project and the same environment tier. - Sample ratio mismatch (SRM). If the engine’s chi-squared check flags an experiment whose realised allocation diverges from its planned split, treat it as a hard stop, not a curiosity (SRM checks). Common upstream causes are bot traffic landing in only one arm, asymmetric pre-bucketing filters, sticky session layers that cache one variant, and crash rates that prune one side. Variance reduction does not fix SRM — the assignment data is biased before any analysis runs.
Operational guidance
A short list of opinionated defaults from running this in production:
- Default to bootstrap for SSR apps. It’s the only initialization mode that gives you correct first-paint values without an extra round-trip. Filter the payload aggressively.
- Run a single DataAdapter writer. Don’t let every webserver fight to update Redis. A cron job pulling the CDN every 30–60s is sufficient for most teams.
- Log evaluation
details.reasonin your client telemetry. It tells you cache vs network vs bootstrap and is the fastest path to debugging “why is this user seeing the wrong variant.” - Match the SDK key to the bootstrap key. A bootstrap payload generated against project A injected into a client running against project B fails silently — the client falls back to network and you lose the bootstrap benefit.
- Don’t memoize assignments yourself. The deterministic hash is faster than a Redis lookup; the only reason to cache is the spec, not the assignment.
- For warehouse-native, model exposures as a first-class table. The pipeline assumes a clean exposure table with
(unitID, experiment, group, timestamp)semantics; don’t try to bolt experiment analysis onto a generic events table.
References
- How Evaluation Works — the canonical description of the SHA-256 + modulus algorithm and the determinism guarantees.
- Node Server SDK (Server Core) — modern Node init,
getClientInitializeResponse, override APIs, manual exposures. - JavaScript Client SDK (Web) — modern
@statsig/js-clientinit,details.reason,dataAdapter. - Server Data Stores / Data Adapter — interface, cache key format, recommended single-writer topology.
- CDN Edge Testing — the
download_config_specsURL form. - Using Edge Config with Statsig (Vercel) — the canonical hosted
DataAdapterexample. statsig-io/node-js-server-sdkEvaluator.ts — open-source evaluation reference; the source of truth for byte-exact bucket reproduction.- Statsig Warehouse Native vs. Cloud — the deployment-model trade-off matrix.
- WHN Pipeline Overview — what the warehouse-native pipeline actually computes.
- How Statsig migrated to BigQuery from Spark — primary-source engineering postmortem on the analysis backend.
- Statistical methods overview — the Stats Engine surface area: frequentist + Bayesian, CUPED, sequential testing, SRM checks, guardrail metrics.
Footnotes
-
Alex Deng, Ya Xu, Ron Kohavi, Toby Walker. “Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data”, WSDM 2013 (Microsoft Research). The foundational paper for CUPED, the variance-reduction technique Statsig and most modern experimentation platforms apply by default. ↩