Uber Schemaless: Building a Scalable Datastore on MySQL

In early 2014, Uber’s trip storage was a single PostgreSQL instance growing roughly 20% per month, on track to exhaust both disk and IOPS by the end of the year¹. Rather than adopt Cassandra or Riak, the team built Schemaless — a thin, horizontally scalable layer on top of sharded MySQL — and cut over a month before Halloween 2014¹. The core insight was operational, not technical: pick the storage engine the team can debug at 3 a.m., then make it look distributed by adding the smallest possible coordination layer on top. This case study walks the cell model, the fixed-shard layout, the two-cluster buffered-write protocol, the trigger framework, and the design lines along which Schemaless eventually broke and was replaced by Docstore.

Schemaless overview: a stateless worker tier hashes the row key into one of 4,096 shards; each shard cluster is one MySQL master plus two cross-DC minions; a trigger framework tails per-shard logs.

Thesis

Schemaless is a deliberately small abstraction. The whole product fits on one whiteboard:

Immutable cells indexed by (row_key, column_name, ref_key) mean every write is idempotent and the database is also a totally-ordered change log per shard²³.
Fixed 4,096 shards with shard = hash(row_key) % 4096 keep routing stateless and remove the hard parts of dynamic resharding¹⁴.
Two-cluster buffered writes — every write hits a randomly chosen secondary plus the primary in lockstep — turns asynchronous MySQL replication into a durability story without taking a 2PC penalty on the hot path⁴.
Eventually consistent secondary indexes with denormalized payloads keep index queries single-shard, at the price of <20 ms staleness windows².
Triggers turn the per-shard log into an at-least-once, idempotent event bus that drives billing, analytics, and cross-DC replication³.

What it did not try to do — multi-row transactions, strict consistency, an expressive query language — is exactly what eventually motivated Docstore in 2021⁵.

Mental model

A senior reader can hold most of Schemaless with five terms:

Term	Meaning
Cell	Immutable `(row_key, column_name, ref_key) → JSON body` tuple. Once written, never rewritten.²
Shard	One of 4,096 logical partitions (typical config). `shard = hash(row_key) % 4096`.⁴
Storage cluster	One MySQL master + two minions, with minions placed in different data centres. Async replication.⁴
Worker node	Stateless HTTP server that routes requests, fans out, runs the circuit breaker, and orchestrates buffered writes.⁴
Buffered write	Always-on dual write: the same cell goes to a secondary cluster’s buffer table and to the primary cluster.⁴

Two non-obvious moves underpin everything else: cells are addressed by (row_key, column_name, ref_key) rather than (row_key) so that one logical “row” can be a sparse, versioned grid; and the entity table’s MySQL primary key is an added_id auto-increment, which lets every shard double as an append-only log keyed on insertion order⁴.

The pre-Schemaless problem

A single Postgres instance under exponential trip growth

In early 2014, the entire trips table was on a single PostgreSQL instance — the largest, fastest-growing, and IOPS-heaviest data type in Uber’s monolith¹. Trip volume was growing at ~20%/month and the team projected the instance would run out of both storage and IOPS by year-end if untouched¹. Even routine schema operations on the trips table — adding a column or an index — were causing downtime¹.

Why the team rejected Postgres for the new tier

Evan Klitzke’s “Why Uber Engineering Switched from Postgres to MySQL”⁶ documents the structural reasons Postgres no longer fit the trips workload at Uber’s scale. The full post is worth reading; the relevant pieces here are:

Index write amplification. Postgres’ MVCC stores immutable row tuples identified by a ctid (physical disk offset). When a row is updated, a new tuple is created with a new ctid, and every index must add a pointer to the new tuple — even if its indexed columns did not change⁶. A 12-index table absorbs 12 index writes per row update.
Verbose WAL replication. Postgres streaming replication ships the WAL, which encodes every physical change including all of the above index writes. Cross-datacenter bandwidth becomes the bottleneck before the master does⁶.
Process-per-connection cost. Postgres forks a process per connection and uses System V IPC for shared structures; MySQL spawns a thread per connection on top of futex-based primitives. Uber routinely ran MySQL deployments at ~10,000 concurrent connections; Postgres started misbehaving past a few hundred even with adequate RAM, forcing them onto pgbouncer and exposing them to “idle in transaction” outages⁶.

Alternatives that were not chosen

The team’s stated requirements for the new trip store were: linear horizontal scaling, write availability under failover, downstream-notification primitives, secondary indexes, and operational trust — could on-call debug it at 3 a.m.².

Option	What it offered	What blocked it
Cassandra	Linear scaling, tunable consistency	No native notification; eventual consistency complicating indexes; ops maturity at Uber’s scale was the deal-breaker²
Riak	Masterless, sibling reconciliation	Same notification gap; CRDT modelling cost; ops experience²
MongoDB	Document model, familiar query language	Scaling concerns and a consistency model the team did not trust for trip data²
Custom on MySQL	Operational depth; deterministic perf model	Build cost; everything beyond a single MySQL had to be built²¹

The 2016 retrospective is candid that the call was about ops trust, not theoretical fit: Cassandra and Riak both ended up in production at Uber for other workloads later². The trip store needed something the team already knew how to operate.

Note

Two acknowledged design influences: FriendFeed’s “schemaless MySQL” pattern (JSON blobs in a flat MySQL row), and Pinterest’s operational discipline around sharded MySQL². The data model itself is described in the Schemaless papers as an “append-only sparse three-dimensional persistent hash map, very similar to Google’s Bigtable”².

The cell model

Definition

1Cell = (row_key, column_name, ref_key) → body

Field	Type	Role
`row_key`	UUID	Logical entity (e.g. `trip_uuid`); the only field used for sharding.
`column_name`	string	App-defined attribute group (e.g. `BASE`, `STATUS`, `FARE`). Schemaless does not interpret it.
`ref_key`	int	Monotonic version number per `(row_key, column_name)`. Highest wins on read.
`body`	compressed JSON blob	Stored as MessagePack-encoded JSON, ZLib-compressed⁴.

In MySQL, every cell lives in a single entity table per shard with a compound index on (row_key, column_name, ref_key). The MySQL primary key is a separate added_id auto-increment integer; this is what makes inserts physically sequential on disk and exposes the shard as a partitioned log⁴.

1type Cell = {2  rowKey: string;       // UUID3  columnName: string;   // e.g. "BASE"4  refKey: number;       // monotonic per (rowKey, columnName)5  body: Record<string, unknown>;6};78const tripBase: Cell = {9  rowKey: "trip_uuid_abc123",10  columnName: "BASE",11  refKey: 3,12  body: {13    driverUuid: "driver_xyz",14    riderUuid: "rider_456",15    status: "completed",16    pickupLat: 37.7749,17    pickupLng: -122.4194,18  },19};

The application — not Schemaless — decides what a column is. Mezzanine, the trip store, splits a trip across BASE (set once at trip end), STATUS (one cell per billing attempt), NOTES (driver/rider notes), and FARE_ADJUSTMENT (rare). Splitting is deliberate: it sidesteps update conflicts and minimises write amplification because each column’s writes are independent².

Why immutability earns its keep

Immutability is the single design choice that pulls the most weight:

Idempotent writes. Re-issuing a write with the same (row_key, column_name, ref_key) is rejected as a duplicate; the buffered-write protocol leans on this⁴.
Total order per shard. added_id is unique and monotonic within a shard, identical across replicas. That makes failover safe and triggers re-readable from any replica without reordering⁴³.
Replication is a memcpy. No conflict resolution, no last-writer-wins logic, no vector clocks. The cell either exists at the destination or it does not.
The DB is also the change log. Triggers ride on added_id rather than a separate WAL pipeline (see Trigger framework).

The trade-off is the obvious one: every “update” is an append, and old versions accumulate until a long-lived entity is compacted out-of-band.

Caution

Immutability fits transaction-shaped data — trips, billing attempts, payment events — well. It is a poor fit for hot, frequently-mutated objects (e.g., a counter or a frequently-edited document) and for large entities where a small change rewrites a large blob.

Sharding architecture

Fixed 4,096 shards, deterministic routing

Schemaless assigns each cell to a shard via shard = hash(row_key) % 4096. The shard count is fixed at instance creation time (typically 4,096 — the number is configurable but 4,096 is the documented default)⁴¹. Worker nodes can compute the routing locally without consulting any coordination service; no metadata service sits in the request path.

1SHARDS = 409623def shard_id(row_key: bytes) -> int:4    return hash_uuid(row_key) % SHARDS

Crucially, “fixed” applies to the logical shard count — not the physical placement. Capacity is added by moving shards onto more MySQL hosts; Uber’s documented expansion path is “split each MySQL server in two”¹. The cells themselves are not rehashed, so there is no consistent-hashing rebalance protocol; only shard ownership moves.

Important

The 4,096-shard ceiling caps the maximum parallelism of a single Schemaless instance, not its absolute capacity. Beyond that ceiling each shard absorbs more load, and the team’s lever is to spin up another Schemaless instance — by 2016 there were 40+, by 2018 more than 50⁷⁸.

Storage cluster topology

Each shard is owned by a storage cluster: one MySQL master plus typically two minions, with minions placed in different data centres for DC-failure tolerance⁴.

1Shard N2└── Master  ────── async MySQL replication ──── Minion (DC2)3                                            └── Minion (DC3)

Reads default to the master (so clients see their own writes) but the API allows per-request routing to a minion when stale-read tolerance is acceptable. Production replication lag is typically subsecond⁴.

Why MySQL specifically

MySQL is the thinnest possible substrate for the Schemaless contract:

The compound index on (row_key, column_name, ref_key) makes “give me the latest cell for this row+column” an indexed lookup.
InnoDB’s per-shard ACID is enough; Schemaless never asks MySQL for cross-shard guarantees.
Async replication, backup, and crash recovery are mature and operationally familiar.

Starting in 2019, Uber migrated Schemaless and Docstore storage nodes from InnoDB to MyRocks (MySQL on RocksDB’s LSM-tree engine), netting >30% disk savings; InnoDB compression alone gave only ~5% because the worker layer already compresses cell bodies before storage⁹.

Worker tier and request paths

Worker nodes are stateless HTTP servers. They do four jobs:

Route. Hash the row key, look up the cluster for the shard, and pick a node by request type.
Aggregate. For multi-row index queries, fan out to the relevant shards and merge.
Detect failure. A circuit breaker on each storage-node connection switches reads to a healthy node when the master misbehaves⁴.
Buffer writes. Coordinate the two-cluster write protocol described below⁴.

Because workers are stateless, Uber scales them by adding more boxes; by mid-2016 there were several thousand worker processes across all Schemaless instances⁷.

Cell write path

A single cell write is a two-cluster commit, not a single insert:

Cell write sequence: the worker writes the cell into the secondary cluster's buffer table, then into the primary cluster's entity table, and only acks after both succeed; async MySQL replication catches the minions up afterward.

Concretely, when a cell arrives at a worker:

Hash row_key to find the primary cluster for shard N.
Pick a secondary cluster at random (the count of secondaries is configurable).
Write the cell into the secondary’s buffer table. If this fails, fail the request.
Write the cell into the primary’s entity table. If this fails, fail the request.
Only after both succeed, return success to the client⁴.
The primary master replicates to its minions asynchronously.
A background reaper polls a primary minion; when the cell appears there, it deletes the buffer row from the secondary⁴.

Buffered writes — the always-on durability story

The repo’s earlier framing of buffered writes as “fail over to another master when the primary is down” is the failure-time behaviour, not the design. Buffered writes happen on every successful request, precisely because MySQL replication from master to minions is asynchronous. If the primary master dies after acking but before replicating, the cell would be lost. The secondary’s buffer table is the temporary backup that closes that window⁴.

Buffered writes: every PUT lands in two clusters in lockstep — the primary's entity table and a randomly chosen secondary's buffer table. A background reaper retires the buffer row only after the cell shows up on a primary minion.

Two consequences fall out:

Failure mode. If the primary master is down, writes still succeed (they land in the secondary buffer), but they are not yet readable through the primary. The client is told the master is down, so it knows the new cell is “buffered, not visible”. This is the hinted-handoff family of techniques, in the same lineage as Dynamo’s⁴.
Idempotency cleanup. If multiple writes with the same (row_key, column_name, ref_key) race, only one wins; the buffer table simply rejects the duplicate when the primary catches up⁴.

Warning

The “client knows the cell is buffered” bit only helps if the client uses it. In production, the read-your-write contract is best-effort during master failover; downstream services should be designed for at-least-once delivery anyway because triggers inherit the same property.

Failure modes by component

Failure	Reads	Writes	Recovery
Worker process crash	Other workers serve; client retries elsewhere.	Same.	Process-level restart.
Primary minion down	Reads route to master or the other minion.	No impact (writes go to master).	Async replication catches up.
Primary master down	Reads fail over to a minion (stale-tolerant).	Writes still succeed via the secondary cluster’s buffer table; reads of the new cell are not yet visible on the primary⁴.	Promote a minion; replay buffered writes once primary is back.
Whole DC outage	Reads served from minions in surviving DCs.	Writes succeed; primary masters in the affected DC are skipped.	DC restored or replaced; minions rebuilt from healthy peers.
Secondary cluster failure	Unaffected.	Worker picks another secondary at random.	None at request time.

Secondary indexes

Eventually consistent, denormalized, single-shard queries

A Schemaless secondary index is a separate MySQL table per shard. When a cell is written, the entity-table insert happens first and ack-able; index updates happen out-of-band, without a 2PC — Uber explicitly chose to avoid the cross-shard 2PC overhead². Production lag between cell writes and the corresponding index updates is “well below 20 ms” but not bounded².

To keep index queries on a single shard, indexes are themselves sharded by a designated shard field. Uber recommends denormalising frequently-queried fields into the index entry so queries can return without a follow-up entity-table fetch:

1shard_field: driver_partner_uuid   # determines shard placement; must be immutable2filter_fields:3  - city_uuid4  - trip_created_at5denormalized_fields:6  - status7  - fare_currency8  - distance_km

Important

The shard field of an index must be immutable². If you index trips by their driver and the driver assignment changes after the trip starts, you cannot rely on the index — you would need a new cell with a new shard placement, or a different index entirely. Uber explicitly notes this is a constraint they were “exploring how to make mutable without too big of a performance overhead” as of 2016.

Choosing the shard field also matters for distribution: UUIDs distribute evenly; city IDs are skewed; status enums are pathological².

Trigger framework

The DB as an event bus

Because every cell has a unique, monotonic added_id per shard, every Schemaless instance is also a partitioned change log. The trigger framework tails that log: client programs register a callback against a column, and the framework calls the callback at-least-once for every new cell in that column³.

1@trigger(column="BASE", table="trips")2def bill_rider(cell):3    status = trips.get_latest(cell.row_key, "STATUS")4    if status and status.body.get("is_completed"):5        return  # idempotency check6    fare = trips.get_latest(cell.row_key, "BASE").body7    result = payments.charge(cell.row_key, fare)8    trips.put(cell.row_key, "STATUS", {"is_completed": result.ok})

Architecture

Trigger framework: per-shard cells are pulled by trigger workers; each worker is assigned a disjoint set of shards by an elected leader, with progress offsets persisted to ZooKeeper or to Schemaless itself.

The framework’s pieces³:

Per-shard offset. Each trigger worker tracks the highest added_id it has processed for each shard it owns. Offsets are persisted to ZooKeeper or to a Schemaless instance.
Leader election. One worker is elected leader and assigns shards to workers; on worker failure, the leader redistributes the failing worker’s shards. During leader election, in-flight processing continues but rebalancing pauses.
In-shard ordering. Cells within a shard are delivered in added_id order; failures stall the shard until either retried successfully or moved to a parking queue past a configured threshold³.
Scale ceiling. Up to 4,096 trigger workers per Schemaless instance — one per shard.

Property	Behaviour
Delivery	At-least-once. Callback must be idempotent.
Ordering	Total order per shard; no cross-shard ordering.
Backpressure	Failed cells park after threshold; system halts to surface systematic bugs.
Coordination	ZooKeeper or Schemaless itself for offsets and leader election.

What this enables

Use case	Trigger column	Action
Billing	`BASE` of trips	Charge rider; write `STATUS`.
Analytics ingest	Any	Stream into the data warehouse.
Cross-DC replication	Any	Forward to a peer DC (eventually replaced by Herb — see below).
Notifications	`BASE`	Push to driver/rider devices.

The pattern generalises: any append-only datastore is already an event source; Schemaless made the event-source semantics first-class.

Multi-datacenter replication: Herb

Schemaless triggers were the first cross-DC replication path; in 2018, Uber published Herb, a Go-based replication engine that reads MySQL binlogs (“commit logs”) instead of polling cells, and runs a full-mesh topology between DCs⁸.

Herb’s design points:

Mesh, not chain. Each DC connects directly to every other DC; one slow DC does not block faster peers.
At-least-once, in-order per origin. All DCs see updates in the same order they were written at the origin DC, which simplifies consumer logic.
Streaming, not polling. Herb tails the binary logs rather than running expensive SELECT queries against the entity table.
Pluggable transport. Started on TCP, moved to HTTP, then YARPC.
Production performance. A reported median of 550–900 cells/sec per stream with end-to-end replication latency of 82–90 ms — of which roughly 40 ms is cross-DC network time⁸.

Migration: Project Mezzanine

Schemaless went live as Project Mezzanine — the trip-store cutover from a single PostgreSQL instance — about a month before Halloween 2014, after roughly five months of design and implementation and a six-week final crunch¹.

The dual-write playbook

Mezzanine migration phases: a lib/tripstore abstraction switches between PostgreSQL and Schemaless. Phase 1 mirrors writes and backfills; phase 2 shadows reads and diffs results probabilistically; phase 3 flips the primary while keeping PG hot for rollback.

The migration was the now-canonical strangler-fig pattern, executed before that name was popular¹:

Introduce an abstraction. A lib/tripstore Python package gated reads and writes behind a feature flag.
Backfill + mirror. Backfill PG into Schemaless; mirror new writes to both. Backfill, refactor, and Schemaless feature work proceeded in parallel and shipped continuously.
Shadow read + diff. Probabilistically replay reads against Schemaless and compare results to PG. Sampling controlled the extra load on the already-saturated PG box.
Flip + decommission. A month before Halloween 2014, the team gathered in a “war room” at 6 a.m. and flipped the primary path to Schemaless. The cutover was uneventful — no alerts, no rollback.

Specific de-risking moves worth stealing

Trip-IDs to UUIDs first. PG used auto-increment IDs that the rest of the codebase had baked in. Hundreds of SQL queries had to be rewritten to UUID before the storage swap could even start¹.
Probabilistic shadow validation. Validating every read would have killed the PG box. The team validated a sampled fraction, dialing the rate up as they got more confident.
Idempotency replaces transactions. The team explicitly notes “the (mental) transition from writing code against a key-value store with limited query capabilities … takes time to master. Idempotency replaces transactions.”¹
Plan for multiple backfills. “Don’t expect to get the data model right the first time. Plan for multiple complete and partial backfills.”¹

Date	Event
Early 2014	Trip growth at ~20%/month; storage + IOPS exhaustion projected for year-end¹.
~5 months	Schemaless design + initial implementation¹.
6 weeks	Final “war room” sprint: backfill, validation, refactor¹.
~Oct 1, 2014	Cutover from PostgreSQL to Schemaless (“a month before Halloween”)¹.
Oct 31, 2014	Halloween — Uber’s traditional traffic peak — passed without incident¹.

Operational outcomes

Scale at which Schemaless settled

Metric	Value	Source
Schemaless instances (2016)	40+	Frontless retrospective⁷
Schemaless instances (2018)	50+	Herb post⁸
Storage nodes (mid-2016)	Many thousands	Frontless⁷
Worker processes (mid-2016)	Several thousand	Frontless⁷
Trip volume (Jun 18, 2016)	2 billion total trips on Uber	Press coverage of Uber’s own announcement¹⁰

Project Frontless: Python → Go on the worker tier

By mid-2016 the Python+uWSGI worker tier was eating CPU and limiting concurrency. Project Frontless rewrote the worker tier in Go, validating each new endpoint by running it side-by-side with the Python version and diffing responses on real production traffic before flipping over⁷. The Mezzanine read endpoints were fully on Frontless by December 2016; the project as a whole was published in February 2018⁷.

Metric	Improvement (vs Python worker)
Median request latency	-85%⁷
P99 request latency	-70%⁷
CPU utilization	-85%⁷

The validation-by-comparison technique generalises: it is the same trick used in Mezzanine for the storage swap, applied to a protocol swap.

Evolution to Docstore

By 2020 the same constraints that made Schemaless cheap to build were friction for general-purpose use cases. Uber published Docstore in February 2021 as Schemaless’s evolution⁵.

Property	Schemaless	Docstore
Data model	Immutable cells	Mutable rows with optional nested structures
Schema	None	Schema-on-write with explicit evolution
Consistency	Eventual	Strict serializability per partition⁵
Transactions	None	Multi-row, ACID, replicated via Raft consensus⁵
Query model	Cell + sharded indexes	Materialized views; richer queries with composite partition keys⁵
Replication unit	MySQL row	MySQL transaction (log entry in a Raft replicated state machine)⁵

The architectural delta is small but consequential: instead of one master + async minions, a Docstore partition is 3–5 nodes running Raft, with each replica’s MySQL applying the same transaction in the same order. MySQL still does row-level locking; Raft just makes the transaction log durable and consistent across replicas⁵.

Uber’s stated goal for Docstore was the explicit mid-point between document and relational: “the best of both worlds: schema flexibility of document models, and schema enforcement seen in traditional relational models.”⁵

What to take from this

When the Schemaless playbook applies

The dominant workload is append-shaped (transactions, trips, events, audit logs).
Operational depth in MySQL/Postgres outweighs the appeal of a NoSQL feature checklist.
Eventual consistency of secondary indexes is acceptable, or your indexes are not consistency-critical.
Single-row consistency is enough; cross-row transactions are rare or can be modeled as event sourcing.
The build-vs-buy budget is real: Schemaless took ~5 months for a full team, plus continuous investment for years¹.

When it does not

You need cross-row ACID transactions with general-purpose semantics — you want Docstore, Spanner, CockroachDB, or a managed equivalent.
Your workload is hot, mutable objects (counters, frequently-edited documents); immutability becomes a tax, not a feature.
You need rich query support (joins, aggregates) at the storage tier.
Your team prefers managed services to operational ownership.

Patterns worth lifting

Idempotent writes as a substitute for distributed transactions. Append-only schema with (key, version) keys means every retry, replay, or out-of-order delivery is harmless.
Two-cluster buffered writes for async-replication durability. Cheaper than synchronous replication, much safer than naked async.
Per-shard log = event bus for free. If your writes are append-only and ordered within a shard, you have an event source — expose it.
Strangler-fig with shadow validation. lib/tripstore + probabilistic read-diffing turned a year-long migration into a non-event¹.
Replace runtimes the same way you replace storage. Project Frontless validated Go endpoints against Python endpoints in production before cutting over⁷.

References

René Schmidt. “Project Mezzanine: The Great Migration”. Uber Engineering, July 28, 2015. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹
Jakob Holdgaard Thomsen. “Designing Schemaless, Uber Engineering’s Scalable Datastore Using MySQL”. Uber Engineering, January 12, 2016. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶
Jakob Holdgaard Thomsen. “Using Triggers On Schemaless, Uber Engineering’s Datastore Using MySQL”. Uber Engineering, January 19, 2016. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Jakob Holdgaard Thomsen. “The Architecture of Schemaless, Uber Engineering’s Trip Datastore Using MySQL”. Uber Engineering, January 15, 2016. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹ ↩²²
Himank Chaudhary, Ovais Tariq, Deba Chatterjee. “Evolving Schemaless into a Distributed SQL Database”. Uber Engineering, February 23, 2021. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
Evan Klitzke. “Why Uber Engineering Switched from Postgres to MySQL”. Uber Engineering, July 26, 2016. ↩ ↩² ↩³ ↩⁴
Jesper Lindstrom Nielsen and Anders Johnsen. “Code Migration in Production: Rewriting the Sharding Layer of Uber’s Schemaless Datastore”. Uber Engineering, February 22, 2018. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
Himank Chaudhary. “Herb: Multi-DC Replication Engine for Uber’s Schemaless Datastore”. Uber Engineering, July 25, 2018. ↩ ↩² ↩³ ↩⁴
Mehul Patel. “MySQL to MyRocks Migration in Uber’s Distributed Datastores”. Uber Engineering, 2024. ↩
Coverage of Uber’s June 18, 2016 announcement that the platform reached its 2-billionth ride: Reuters, Fortune. ↩