Skip to main content
On this page

RPC and API Design

Choosing a wire format and a request shape is a load-bearing architectural decision: it determines what is cacheable at the edge, how breakable the contract is across mobile releases, and how forgiving the system is when a downstream service is slow. This article compares REST, gRPC, and GraphQL at the protocol level, then works through the four cross-cutting concerns that every production API surface eventually has to solve — versioning, pagination, rate limiting, and machine-readable documentation. The audience is a senior engineer who has shipped HTTP APIs but wants the trade-offs explicit before they pick the next one.

Protocol families across the request lifecycle: clients reach an edge that handles auth, rate limiting, and version routing, then fan out to REST, GraphQL, and gRPC surfaces over a backend that aggregates and streams.
Protocol families across the request lifecycle. The same request may flow through REST, GraphQL, or gRPC depending on the client and the operation.

Mental model

API design optimizes against three orthogonal axes. Every protocol pays one of them down to buy the others.

Axis REST gRPC GraphQL
Coupling Loose. Resources + media types; clients tolerate added fields. Tight. Generated stubs from a .proto; field numbers are forever. Medium. Schema is shared, but clients pick the fields per request.
Wire & runtime cost JSON over HTTP/1.1 or HTTP/2; verbose but cacheable. Protobuf over HTTP/2; compact framing; lower CPU on parse. JSON over HTTP/1.1 or HTTP/2; per-field resolver cost on server.
Operational ergonomics Universal tooling, CDN-friendly, curl-debuggable. Needs codegen, gRPC-aware load balancing, and a proxy (gRPC-Web) for browsers. Needs query-cost analysis, persisted operations, server-side caching.

The decision is rarely “which protocol is best” but “which surface for which client”. The decision tree below captures how I split a new API surface between the three.

Decision tree: pick REST when the browser or third parties are the primary client and queries are stable; pick GraphQL when the client mix needs varying field shapes; pick gRPC for low-latency internal RPC and streaming.
A practical protocol selector. Most production stacks land on more than one branch.

REST: constraints, not a protocol

REST is an architectural style — a set of constraints — defined by Roy Fielding in Chapter 5 of his 2000 dissertation, written while he was co-authoring the HTTP/1.1 RFCs.1 It is not a protocol, and a system can be “REST” over almost any underlying transport.

The six constraints

Constraint What it means Property gained
Client-Server Separation of concerns Independent evolution
Stateless No session state on the server Horizontal scaling
Cacheable Responses self-describe their cacheability Latency, lower load
Uniform Interface Standardized resource interactions Simplicity, visibility
Layered System Client unaware of intermediaries Proxies, gateways, CDNs
Code-on-Demand Optional executable code transfer Extensibility (rare)

Statelessness is the constraint most teams unintentionally violate; the moment a server stores per-connection session state, horizontal scaling becomes “sticky” and rolling a node becomes a logout event. The trade is real: clients have to send their identity and context (Authorization, Cookie, idempotency keys) on every request, which inflates payloads.

HATEOAS, and why it almost never ships

The Uniform Interface bundles four sub-constraints; the most-ignored is HATEOAS — hypermedia as the engine of application state. The server returns links that drive the next state transition, so the client only needs to know the entry URI and the media types.

Order resource with hypermedia links
{  "orderId": "12345",  "status": "pending",  "total": 99.99,  "_links": {    "self": { "href": "/orders/12345" },    "cancel": { "href": "/orders/12345/cancel", "method": "POST" },    "pay": { "href": "/orders/12345/payment", "method": "POST" },    "items": { "href": "/orders/12345/items" }  }}

Fielding clarified the bar in 2008 with characteristic bluntness:

A REST API should be entered with no prior knowledge beyond the initial URI (bookmark) and set of standardized media types that are appropriate for the intended audience.2

Almost no production API meets that bar. Mobile and SDK clients prefer compile-time knowledge of the API surface; runtime link-following adds latency, parsing, and a class of “what if the link is missing” bugs that explicit API references never have. Leonard Richardson’s Maturity Model (popularized by Martin Fowler) captures the gap: most “REST” APIs are Level 2 — resources and HTTP verb semantics — and treat HATEOAS as aspirational.

Note

Calling a Level-2 design “REST” is a vocabulary fight, not a quality fight. The constraints earn architectural properties; whether you bother with all of them depends on whether you actually need those properties.

When REST earns its place

REST is the right default whenever any of the following are true:

  • The client is a third-party integrator. Universal tooling (curl, Postman, every HTTP client in every language) and the Cache-Control / ETag model for shared and edge caches are decisive.
  • The data model is CRUD-shaped and the responses are stable.
  • The browser is a first-class client and you want intermediaries (CDN, reverse proxy, browser cache) to do real work.

The cost shows up the moment the client wants more or less than the resource shape provides: REST forces over-fetching when the client only wants a few fields, and chains of round trips when the client wants related resources. HTTP/2 and HTTP/3 mitigate the round-trip cost via multiplexing but do not change the response shape.

gRPC: contracts, framing, and streams

gRPC is an RPC framework over HTTP/2, with Protocol Buffers as the default IDL and serialization. The combination buys three things at once: a strict contract, a compact wire format, and four communication patterns that map cleanly onto HTTP/2 streams.

Wire format and what protobuf actually saves you

A protobuf field on the wire is a tag (field number + wire type, varint-encoded) followed by a value. Field numbers in the 1–15 range fit into a one-byte tag; 16–2047 take two bytes — which is why hot fields should claim the low numbers.3 Default values for implicit-presence fields are not transmitted, and unknown fields are preserved on the round trip, which is what gives protobuf its forward-compatibility story.4

The “protobuf is N times faster than JSON” claim is over-circulated. The real shape of the trade is:

  • For typical record-shaped payloads, protobuf is smaller than JSON, often noticeably — Auth0’s published benchmark measured roughly 34% smaller and 21% faster for an uncompressed GET payload, but only 9% smaller and 4% faster once both were gzipped.5
  • For string-dominated payloads (long descriptions, base64 blobs), the size win shrinks toward parity because strings are length-prefixed in protobuf and length-quoted in JSON either way.6
  • The bigger gain in production is parser CPU, not bytes — JSON parsers do allocations and string decoding that a generated protobuf parser skips.

In short: protobuf is reliably smaller and lower-CPU than JSON, but the multiplier depends on data shape. Treat it as 2–10× smaller for record-shaped data, parity for blob-shaped data, and always benchmark your own payloads before quoting a number.

HTTP/2 and the four streaming modes

gRPC mandates HTTP/2 end-to-end. It uses HTTP/2 framing, header compression, multiplexing, and trailers, and signals incompatibility with proxies via the te: trailers header.7 All four communication patterns ride on the same :method POST /Service/Method HTTP/2 stream:

Pattern Use case Example
Unary Request → response GetUser(id) → User
Server streaming Tail logs, large result sets, telemetry ListOrders() → stream Order
Client streaming Chunked uploads, batched writes stream Chunk → UploadResult
Bidirectional Real-time sync, chat, push stream Message ⇄ stream Message

Sequence: each gRPC streaming mode shown as messages on a single HTTP/2 stream — unary, server streaming, client streaming, and bidirectional.
gRPC's four streaming modes mapped onto a single HTTP/2 stream. The protocol buys you all four for free once you have a contract.

order_service.proto
service OrderService {  rpc GetOrder(GetOrderRequest) returns (Order);  rpc ListOrders(ListOrdersRequest) returns (stream Order);  rpc UploadInvoice(stream Chunk) returns (UploadResult);  rpc Sync(stream Event) returns (stream Event);}

Why browsers and load balancers are awkward

Two operational details are worth internalizing before betting on gRPC for an external API:

  • Browsers can’t speak gRPC natively. Browser JavaScript cannot read HTTP/2 trailers, and gRPC encodes its status in trailers — so gRPC-Web frames the trailers into the body and a proxy (Envoy is the canonical choice) translates between gRPC-Web and gRPC-on-HTTP/2.8
  • Load balancing is not free. A single gRPC channel multiplexes many requests over one long-lived HTTP/2 connection. An L4 load balancer that hashes by connection sees one connection per client and pins all that traffic to one backend; you need an L7 (gRPC-aware) balancer or a client-side load-balancing strategy to fan out per-RPC.9

Where gRPC earns its place

The companies that go all-in on gRPC do so because the framework solves multiple problems in one drop-in: a multilingual fleet gets generated stubs and a single contract; the network gets HTTP/2 multiplexing; the platform gets streaming primitives that are awkward to bolt onto REST.

  • Netflix runs a deep gRPC stack internally — over 600 applications and 1,300 services on its jrpc Java framework, and uses protobuf FieldMask to let callers ask for only the fields they need over backend-to-backend gRPC.10
  • Uber rebuilt its mobile push platform on gRPC bidirectional streaming (over QUIC/HTTP/3) to replace battery-draining polling with a single long-lived stream, and used shadow traffic plus circuit breakers to migrate safely from the legacy REST path.11

The complement is also true: when there is no codegen budget, no platform discipline around .proto review, or no L7 mesh, gRPC tends to leak its internals into the application.

GraphQL: shifting query shape to the client

GraphQL flips the request/response contract: the client picks fields, and the server’s job is to resolve any tree the schema permits. The first-order win is that one round trip can replace two or three REST calls; the second-order cost is that every field is a potential query path on the server.

orders.gql
query Orders {  user(id: "123") {    name    email    orders(first: 5) {      id      total      items {        productName      }    }  }}

In REST, the same data needs at least three trips (/users/123, /users/123/orders, then per-order /orders/{id}/items) and the client throws away anything it didn’t ask for. In GraphQL, it is one POST with a known cost — provided the server can resolve the tree without falling into N+1.

N+1, DataLoader, and per-request scoping

Per-field resolvers create the N+1 problem: a query that returns 100 orders and asks for each order’s user issues 1 query for the orders and 100 follow-ups for the users. The standard fix is Facebook’s DataLoader pattern — coalesce all the load(key) calls inside a single tick of the event loop, then issue a single WHERE id IN (...) query.12

user-loader.js
const userLoader = new DataLoader(async (userIds) => {  const users = await db.users.findByIds(userIds)  return userIds.map((id) => users.find((u) => u.id === id))})const resolvers = {  Order: {    user: (order) => userLoader.load(order.userId),  },}

Sequence: without DataLoader the resolver fires one DB query per order (N+1); with DataLoader, all loads issued in the same tick coalesce into a single batched query before the engine resolves the next field.
DataLoader collapses N per-field reads into one batched query per request, executed at the end of the event-loop tick.

Important

DataLoader caches per-instance, and the recommended pattern is one DataLoader instance per request. Sharing a loader between requests leaks data (and identity) between users.12

Shopify’s GraphQL Batch is the Ruby equivalent: same pattern, same gotcha. The performance impact in their stack is reported as a step-change in query count, not a constant factor.13

Caching, complexity, and the security surface

GraphQL throws away two things REST gets for free:

  • HTTP caching. Queries are POSTed (the body holds the query), so CDNs and browsers won’t cache the response by URL. The mitigation is a persisted operation model (sometimes called trusted documents): the server keeps an allowlist of pre-registered queries, the client sends a SHA-256 hash, and the request becomes idempotent and GET-able — and as a bonus, ad-hoc malicious queries are rejected at the gateway.14 Apollo’s automatic persisted queries (APQ) is the bandwidth-only version of the same idea.
  • Bounded server work. Every field is a code path. Without depth and complexity limits, an attacker (or a sloppy client) can ask for friends { friends { friends { ... } } } and burn server budget. Production GraphQL gateways enforce maximum depth, weight-based query complexity, and per-cost rate limiting.

GitHub’s public GraphQL API illustrates both ends: it ships an explicit schema-first surface alongside the older REST API, uses persisted queries for the mobile clients, and publishes its query cost limits up front.15

When GraphQL earns its place

  • Mobile clients with heterogeneous data needs (different screens, different field selections per release).
  • BFF / aggregator nodes consolidating multiple downstream services into one request shape.
  • Frontends that iterate faster than the backend can ship endpoints.

It is rarely a fit for simple CRUD APIs (the abstraction tax is real), file uploads (multipart extension required), or fully-public APIs where you cannot enforce persisted operations.

API versioning

A version strategy is a contract about how the API will break, not whether. Three families dominate.

URL path versioning

URL path
GET /v1/usersGET /v2/users

Trade-offs:

  • Wins: explicit, impossible to miss; load balancer can route on path; multiple versions can be separate deployments.
  • Loses: versioned URLs aren’t really resource identifiers; clients must update every URL on a major bump.
  • Used by: Twitter, Facebook, Google. Long-lived versions with multi-year overlap are normal.

Header-based versioning

Custom Accept media type
GET /users HTTP/1.1Accept: application/vnd.myapi.v2+json

Trade-offs:

  • Wins: URLs stay stable; the version sits in content negotiation where it semantically belongs.
  • Loses: invisible in browser address bars and curl-by-default; some intermediaries silently strip custom headers; harder to test with “just change the URL”.

Date-based versioning (Stripe’s model)

Stripe’s API versioning is the most quoted example of API stability as a feature. The version is a date, e.g. Stripe-Version: 2024-10-01. The mechanism that makes it work is layered:

  1. Account pinning. The first API call from a new account pins it to the most recent version.16
  2. Header override. A Stripe-Version header on a request overrides the pin, used for testing the next version before flipping the account.
  3. Compatibility layers. Internal code is always written against the latest schema; request and response transformation modules walk older versions backward to the request’s pinned date.
Date-pinned request
GET /v1/customers HTTP/1.1Stripe-Version: 2024-10-01Authorization: Bearer sk_test_...

The architectural cost is real: the transformation modules accumulate, and the backend is permanently responsible for older shapes. The win is that code written against the API in 2011 still works in 2026.17

What counts as a breaking change

Breaking Additive (safe)
Removing a field Adding a new optional field
Changing a field’s type or semantics Adding a new endpoint
Removing or renaming an endpoint Adding a new optional query parameter
Changing an error code or shape Adding a new enum value (if clients tolerate it)
Tightening a previously-loose validation Loosening a previously-strict validation

The safest deprecation pattern is to keep the old field, return both, and surface a structured deprecation warning so consumers can act before the sunset:

Deprecation envelope
{  "data": { "id": "cus_123", "legacy_id": "cus_123" },  "_warnings": [    {      "code": "deprecated_field",      "message": "Field 'legacy_id' is deprecated. Use 'id' instead.",      "deprecated_at": "2024-01-01",      "sunset_at": "2025-01-01"    }  ]}

Pagination

The right pagination model is dictated by the data model and the access pattern, not by personal preference.

Three pagination strategies against the same B-tree index. Offset scans and discards every preceding row; cursor and keyset both seek directly to the next page.
Offset, cursor, and keyset against the same composite index. Offset's cost grows with page depth; cursor and keyset stay flat.

Offset

Offset request
GET /orders?limit=20&offset=40

Postgres (and every other relational engine) implements LIMIT n OFFSET k by walking the result set and discarding the first k rows. Page 1 is fast; page 1,000 is not.18

Use when Avoid when
Datasets < ~10K rows Datasets where users page deep
Need “jump to page N” Underlying data churns
Data is mostly static Latency at depth must stay constant

Cursor

Cursor request
GET /orders?limit=20&after=eyJpZCI6MTIzNH0

A cursor is an opaque token — usually a base64-encoded JSON of the sort key — that tells the server “resume after this row”. Internally:

Cursor seek
SELECT *FROM ordersWHERE (created_at, id) > ($cursor_created_at, $cursor_id)ORDER BY created_at, idLIMIT 20;

Index seek instead of scan-and-discard; latency stays flat regardless of depth.

Keyset

Same shape as cursor, but the keys travel as named query parameters instead of an opaque token:

Keyset request
GET /orders?limit=20&created_after=2024-01-15T10:30:00Z&id_after=12345

Keyset trades opacity for debuggability. The query is the same, and so is the cost — but the schema leaks into the URL, so a future change of sort columns is a breaking change.

Decision matrix

Factor Offset Cursor Keyset
Dataset size < 10K Any Any
Page depth Shallow only Any Any
”Jump to page” UI Yes No No
Data churns mid-scroll Skips, dups Stable Stable
Implementation cost Trivial Medium Medium
Latency vs depth O(offset) O(1) O(1)

A reproducible benchmark on a 1,000,000-row table puts the gap at roughly 17× at depth — keyset/cursor in the tens of milliseconds, offset in the hundreds.18 The exact multiplier varies with the index; the shape never does.

Rate limiting

Rate limiting is two decisions in a trench coat: which algorithm shapes the traffic, and which response contract the client uses to back off.

Side-by-side: token bucket allows bursts up to capacity then enforces refill rate; leaky bucket queues requests and drains at a constant rate.
Token bucket favors burst-friendly clients; leaky bucket smooths the output rate at the cost of queue latency.

Token bucket

A bucket of capacity B is refilled at rate r tokens/sec. Each request consumes one token; an empty bucket means a 429. Bursts up to B are allowed; sustained throughput is bounded by r.

AWS API Gateway uses a token bucket per region, per account, per stage, per method, per usage-plan client — a small hierarchy of buckets evaluated bottom-up.19 The default account-level steady-state is 10,000 requests/sec with a 5,000-request burst capacity.20

Tip

Token bucket’s biggest footgun is that the burst is accumulated capacity, not “extra throughput”. If a long-idle client floods the API with B requests, your downstreams are responsible for surviving them, not the rate limiter.

Leaky bucket

A queue of capacity Q drains at constant rate r. Bursts are absorbed by the queue, not by passing through; if the queue is full, the request is rejected. Output is perfectly smooth at the cost of queue latency.

Leaky bucket is the right shape when the downstream cannot tolerate bursts — a third-party API with its own throttle, a legacy system with low concurrency, or a database with a known slow path. It is the wrong shape for a UI that benefits from immediate feedback.

Sliding window

Variant Memory Accuracy Burst behavior
Sliding window log O(requests in window) Exact Smooth
Sliding window counter O(1) Approximate Smooth

The counter variant blends the current and previous fixed window with a weighted overlap: if the previous 60-second window saw 50 requests and we are 25 seconds into a window with 30 so far, the effective count is 30 + 50 × (35/60) ≈ 59.2. Redis-backed implementations almost always pick the counter variant for its O(1) memory footprint and good-enough accuracy.

Headers and the contract with the client

RFC 6585 defines the 429 Too Many Requests status code, and the Retry-After header tells the client how long to wait. The IETF draft-ietf-httpapi-ratelimit-headers standardizes the header surface for advertising the current rate-limit state. The current draft (-10) consolidates everything into a single structured RateLimit field; older drafts shipped as separate RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset headers, which is what most existing APIs still emit.21

Rate-limit response (legacy split-header form)
HTTP/1.1 429 Too Many RequestsRateLimit-Limit: 100RateLimit-Remaining: 0RateLimit-Reset: 30Retry-After: 30Content-Type: application/json
Rate-limit response (current draft, single structured field)
HTTP/1.1 429 Too Many RequestsRateLimit: "default"; r=0; t=30Retry-After: 30Content-Type: application/json

Either form should be present on success responses too, not only on 429s — clients that can see “remaining = 12” pace themselves; clients that only see 429 retry-storm.

Machine-readable contracts

OpenAPI

OpenAPI 3.1.0 (released February 2021) aligned the schema dialect with JSON Schema 2020-12; before that, OpenAPI’s schema language was a near-but-not-quite subset of JSON Schema, and the two ecosystems didn’t compose cleanly.22 The practical implication is that any modern tool that speaks JSON Schema — code generators, validators, IDE plugins — can validate against an OpenAPI 3.1 document directly.

Authoring rule of thumb: generate the spec from code annotations (@RequestMapping, @OperationId, decorators in Python/TS) rather than handwriting YAML; the spec drifts the moment the source of truth lives in a different file from the implementation.

AsyncAPI

OpenAPI describes request/response APIs. AsyncAPI describes message-driven ones — Kafka topics, AMQP queues, MQTT channels, WebSockets — using a similar shape but a different vocabulary: channels instead of paths, and operations (send/receive) instead of HTTP verbs. The 3.0 release decouples channels from operations, so a single channel can host multiple operations and be referenced across services.23

orders.asyncapi.yaml
asyncapi: 3.0.0info:  title: Order Events  version: 1.0.0channels:  orderCreated:    address: orders/created    messages:      orderCreated:        payload:          type: object          properties:            orderId: { type: string }            total: { type: number }operations:  publishOrderCreated:    action: send    channel:      $ref: '#/channels/orderCreated'

Springwolf is the canonical Spring-side generator: it scans @KafkaListener, Spring AMQP, JMS, and SNS/SQS annotations and emits an AsyncAPI document at runtime, with a UI similar to Swagger UI.24

Real-world surfaces

The pattern most production systems converge on is multiple protocols, one team.

Slack: REST + Events API + Socket Mode

Slack ships a REST Web API for outbound calls (post message, list channels), an Events API for inbound webhooks (“a user reacted to a message”), and Socket Mode (WebSocket) for apps that can’t expose a public webhook endpoint. The legacy RTM API (raw WebSocket pushed by the server) is in long-tail deprecation — modern Slack apps cannot use it, and rtm.start no longer behaves as advertised.25 The choice is deliberate: REST keeps the integration developer experience accessible (no protobuf, no codegen), and the asynchronous channel is split between webhooks (default) and Socket Mode (firewall-friendly).

Netflix: gRPC inside the wall

Netflix runs gRPC for backend-to-backend traffic across hundreds of services and a thousand-plus contracts in its jrpc Java framework, and uses protobuf FieldMask to let callers ask for partial response shapes — a clean way to dodge the over-fetching that any RPC framework with a fixed message shape inherits.10 The REST surface lives at the edge for browser clients and cacheable assets.

GitHub: REST and GraphQL side-by-side

GitHub kept its REST API and added GraphQL alongside it. The GraphQL API is the better fit for the mobile and integration surfaces that need flexible field selection; REST stays around for backward compatibility and for ops that benefit from HTTP caching. GraphQL clients use persisted operations to enable caching and to enforce a query allowlist.15

Common pitfalls

Treating gRPC as a REST replacement on the browser

The mistake: a benchmark says gRPC is faster, so the team ships it for the browser API too. The price: every request now needs a gRPC-Web proxy, browser DevTools can’t read the body, and the build pipeline owns a .proto-to-TS toolchain it didn’t budget for.8 The fix: gRPC for service-to-service, REST or GraphQL for the browser.

GraphQL without complexity limits

The mistake: a public GraphQL endpoint with no depth or cost ceiling. The price: a single attacker query exhausts your CPU. The fix: depth limit (10–15 is typical), per-field cost weights, and either a persisted-operation allowlist or strict cost-based rate limiting.

Offset pagination as the default

The mistake: LIMIT 20 OFFSET 100000 ships fine in dev (small dataset) and falls over in prod. The price: deep pages take seconds, and the database CPU graph looks like a sawtooth. The fix: default to cursor or keyset pagination, and cap the maximum offset if “jump to page” is a hard requirement.

Breaking the API “because no one uses that field”

The mistake: a field is removed in a point release because the team can’t see anyone using it in their analytics. The price: a long tail of mobile clients fail until users update — except the users who have notifications off and never will. The fix: treat the API as infrastructure. Additive changes only. Deprecate, sunset, then remove — and surface the sunset date in the response.

Practical takeaways

Constraint REST gRPC GraphQL
Public API Default Proxy required Lock down with allowlist
Internal services Verbose at scale Default Usually overkill
Mobile apps Many round trips Efficient + streams Flexible field selection
Browser direct Native Needs gRPC-Web Native
Streaming SSE / workarounds First-class Subscriptions (extra spec)
Edge caching First-class Custom Persisted operations

A defensible default for a greenfield surface in 2026:

  • REST + OpenAPI for the public surface.
  • gRPC for the internal mesh (with FieldMask if you need partial responses).
  • GraphQL only where the client mix actually benefits from query flexibility — a BFF, a mobile app with many surfaces, or a developer platform — and only with persisted operations and cost limits.
  • For every protocol: cursor pagination, RateLimit headers on every response (not just 429), and date-based or path-based versioning chosen for how often the contract will break.

Appendix

Prerequisites

  • HTTP/1.1 and HTTP/2 fundamentals.
  • Familiarity with JSON and at least one binary serialization format.
  • Working knowledge of relational query plans for the pagination section.

Footnotes

References

Footnotes

  1. Fielding was co-author of RFC 2616 (HTTP/1.1) while writing the dissertation, which is why REST is so tightly coupled to HTTP semantics.

  2. Roy T. Fielding, REST APIs must be hypertext-driven, 2008.

  3. Protocol Buffers — Encoding documents the varint tag layout, field-number costs, and the wire types.

  4. Language Guide (proto 3) — Default values and unknown fields.

  5. Auth0, Beating JSON performance with Protobuf. Concrete benchmark on real REST traffic; the gap shrinks once gzip is involved.

  6. VictoriaMetrics, How Protobuf Works — The Art of Data Encoding. Includes a Go benchmark for parse cost; the parser CPU difference is consistently larger than the byte-size difference.

  7. gRPC over HTTP/2 (gRPC core docs) — pseudo-headers, framing, and trailer semantics.

  8. The state of gRPC in the browser. Trailers in body, proxy required. 2

  9. gRPC Load Balancing — explains why a multiplexed HTTP/2 connection breaks naive L4 balancing and what client-side balancing or look-aside LB does instead.

  10. Netflix Tech Blog, Practical API Design at Netflix, Part 1: Using protobuf FieldMask. 2

  11. Uber Engineering, Uber’s Next Gen Push Platform on gRPC. Shadow-traffic migration and circuit-breaker fallback are the operational details worth copying.

  12. graphql/dataloader — README documents the per-request cache rule and the single-tick batching contract. 2

  13. Shopify Engineering, Solving the N+1 Problem for GraphQL through Batching.

  14. Benjie Gillam, GraphQL Trusted Documents — the security framing for persisted operations.

  15. GitHub GraphQL API documentation and the public rate limits page. 2

  16. Stripe Engineering, APIs as infrastructure: future-proofing Stripe with versioning. The compatibility-layer architecture is the actual lesson.

  17. The Stripe API has been stable since launch in 2011; the same pinned 2011 request shape still works today.

  18. Milan Jovanović, Understanding Cursor Pagination and Why It’s So Fast — Deep Dive. Benchmark and SQL plans for offset vs keyset on a million-row table. 2

  19. AWS, Throttle requests to your REST APIs in API Gateway.

  20. AWS, Amazon API Gateway quotas.

  21. IETF, draft-ietf-httpapi-ratelimit-headers. The RFC editors moved from three separate headers to a single structured field; both are still in the wild.

  22. OpenAPI Initiative, OpenAPI Specification 3.1.0 released.

  23. AsyncAPI Initiative, AsyncAPI 3.0 specification.

  24. Baeldung, Documenting Spring Event-Driven API Using AsyncAPI and Springwolf.

  25. Slack, Slack APIs overview and the rtm.start deprecation note.