RPC and API Design

Choosing communication protocols and patterns for distributed systems. This article covers REST, gRPC, and GraphQL—their design trade-offs, when each excels, and real-world implementations. Also covers API versioning, pagination, rate limiting, and documentation strategies that scale.

API architecture: clients connect through a gateway that handles rate limiting and auth, then routes to appropriate protocol handlers. REST for public APIs, gRPC for internal service-to-service, GraphQL for flexible client queries.

Abstract

API design is about matching communication patterns to constraints:

Protocol	Optimizes For	Sacrifices
REST	Cacheability, uniform interface, tooling	Flexibility, efficiency
gRPC	Performance, type safety, streaming	Browser support, human readability
GraphQL	Query flexibility, reducing round trips	Caching, complexity

The core trade-off: coupling vs efficiency. REST’s uniform interface decouples clients from servers but requires multiple round trips. gRPC’s contract-first approach enables optimizations but couples clients to schema. GraphQL shifts query logic to clients but complicates server-side caching.

Decision heuristic:

Public APIs: REST (tooling, cacheability, developer familiarity)
Internal microservices: gRPC (performance, type safety)
Mobile/frontend with varying data needs: GraphQL (reduce round trips)
Real-time bidirectional: gRPC streaming or WebSockets

REST: Constraints and Trade-offs

Fielding’s Architectural Constraints

REST (Representational State Transfer) is an architectural style defined by Roy Fielding in his 2000 dissertation while co-authoring HTTP/1.1. It’s not a protocol—it’s a set of constraints that, when followed, yield specific architectural properties.

The six constraints:

Constraint	What It Means	Architectural Property
Client-Server	Separation of concerns	Independent evolution
Stateless	No session state on server	Scalability, reliability
Cacheable	Responses labeled cacheable/non-cacheable	Performance, reduced load
Uniform Interface	Standardized resource interactions	Simplicity, visibility
Layered System	Client unaware of intermediaries	Encapsulation, proxies
Code-on-Demand	Optional executable code transfer	Extensibility

Why statelessness matters: Each request contains all information needed to process it. Servers don’t store session state between requests. This enables:

Horizontal scaling (any server can handle any request)
Simpler failure recovery (no session loss)
Better cacheability

Trade-off: Clients must send authentication/context with every request, increasing payload size.

HATEOAS: The Forgotten Constraint

HATEOAS (Hypermedia As The Engine Of Application State) is REST’s most ignored constraint. It requires servers to provide navigation options in responses:


4 collapsed lines
1
// Example: Order resource with hypermedia links
2
{
3
  "orderId": "12345",
4
  "status": "pending",
5
  "total": 99.99,
6
  "_links": {
7
    "self": { "href": "/orders/12345" },
8
    "cancel": { "href": "/orders/12345/cancel", "method": "POST" },
9
    "pay": { "href": "/orders/12345/payment", "method": "POST" },
10
    "items": { "href": "/orders/12345/items" }
11
  }
12
}
13
// After payment, "pay" link disappears, "refund" appears

Fielding’s 2008 clarification: “A REST API should be entered with no prior knowledge beyond the initial URI and set of standardized media types.”

Why most APIs ignore it: HATEOAS requires clients to be generic hypermedia browsers rather than knowing the API structure upfront. In practice, client developers prefer explicit API documentation over runtime discovery. The overhead of parsing links adds latency without practical benefit for known API contracts.

Richardson Maturity Model

Leonard Richardson’s maturity model (popularized by Martin Fowler) categorizes APIs by REST adherence:

Level	Name	Description	Example
0	Swamp of POX	Single URI, single verb (usually POST)	SOAP over HTTP
1	Resources	Multiple URIs, but verbs ignored	`/getUser`, `/createUser`
2	HTTP Verbs	Resources + proper verb semantics	GET `/users/123`, POST `/users`
3	Hypermedia	Full HATEOAS with link navigation	GitHub API, PayPal

Production reality: Most “REST” APIs are Level 2. Level 3 adds complexity without proportional benefit for APIs with stable contracts and good documentation.

When REST Excels

Best for:

Public APIs (universal tooling, curl-debuggable)
Cacheable resources (CDN integration, HTTP cache headers)
CRUD-dominant operations
Browser clients without build tooling

Real-world: Stripe, Twilio, and GitHub use REST for public APIs. Stripe’s API has maintained backward compatibility since 2011 using REST’s uniform interface combined with careful versioning.

Limitations:

Over-fetching: Fixed response shapes waste bandwidth
Under-fetching: Multiple round trips for related data
No streaming: HTTP/1.1 request-response only
Chatty for mobile: Each resource requires separate request

gRPC: Performance and Type Safety

Protocol Buffers and Wire Efficiency

gRPC uses Protocol Buffers (protobuf) for serialization—a binary format that’s 3-10x smaller than JSON and 20-100x faster to parse.

Wire format efficiency:


1 collapsed line
1
syntax = "proto3";
2

3
message User {
4
  int64 id = 1;        // Field number 1 (1 byte tag)
5
  string name = 2;     // Field number 2
6
  repeated string tags = 3;
7
}

Encoding details:

Field tags use varint encoding: numbers 1-15 take 1 byte, 16-2047 take 2 bytes
Strings are length-prefixed (no delimiters)
Default values (0, "", false) aren’t transmitted
Unknown fields are preserved (forward compatibility)

Size comparison (same user object):

Format	Size	Parse Time
JSON	95 bytes	2.1 ms
Protobuf	28 bytes	0.08 ms

HTTP/2 and Streaming Modes

gRPC mandates HTTP/2, enabling features impossible with HTTP/1.1:

Multiplexing: Multiple requests share one TCP connection. No head-of-line blocking at HTTP level (though TCP still has it).

Four communication patterns:

Pattern	Use Case	Example
Unary	Request-response	GetUser(id) → User
Server streaming	Large result sets, real-time updates	ListOrders() → stream Order
Client streaming	File upload, batched writes	stream Chunk → UploadResult
Bidirectional	Chat, real-time sync	stream Message ↔ stream Message

Server streaming example:

1
service OrderService {
2
  rpc ListOrders(ListRequest) returns (stream Order);
3
}

Client receives orders as they’re fetched from database—no need to buffer entire response.

gRPC Performance in Production

Netflix: Migrated recommendation serving to gRPC. Results:

90% latency reduction for live predictions
30,000 concurrent prediction requests per node
Recommendations delivered in <25ms

Why the improvement: Binary serialization eliminates JSON parsing overhead. HTTP/2 multiplexing reduces connection setup. Streaming enables incremental responses.

Uber: Uses gRPC for real-time location tracking. Key benefit: low battery impact on mobile from efficient encoding and persistent connections.

gRPC Limitations

Browser support: gRPC-Web requires a proxy (Envoy) to translate between gRPC and browser-compatible format. Native browser gRPC isn’t possible due to lack of HTTP/2 trailer support in browser APIs.

Debugging: Binary format isn’t human-readable. Requires tooling (grpcurl, Postman) instead of curl.

Load balancer compatibility: L7 load balancers need gRPC-aware configuration. Connection-level balancing (L4) doesn’t distribute requests evenly because gRPC multiplexes requests over persistent connections.

Recommendation: Use gRPC between services you control. Keep REST for public APIs and browser clients.

GraphQL: Query Flexibility

Solving Over-fetching and Under-fetching

GraphQL lets clients specify exactly what data they need:

1
query {
2
  user(id: "123") {
3
    name
4
    email
5
    orders(first: 5) {
6
      id
7
      total
8
      items {
9
        productName
10
      }
11
    }
12
  }
13
}

REST equivalent: 3 requests minimum (/users/123, /users/123/orders, /orders/{id}/items for each order).

Trade-off: Server complexity increases. Each field is a potential code path. Authorization becomes per-field instead of per-endpoint.

The N+1 Problem and DataLoader

GraphQL’s per-field resolvers create the N+1 query problem:

1
Query: users(first: 100) { orders { ... } }
2

3
Naive execution:
4
1. SELECT * FROM users LIMIT 100           -- 1 query
5
2. SELECT * FROM orders WHERE user_id = 1  -- 100 queries
6
3. SELECT * FROM orders WHERE user_id = 2
7
... (N+1 total)

DataLoader solution (Facebook): Batch and deduplicate requests within a single tick of the event loop:


3 collapsed lines
1
// DataLoader batches loads within same execution frame
2
const userLoader = new DataLoader(async (userIds) => {
3
  // Single query: SELECT * FROM users WHERE id IN (1, 2, 3, ...)
4
  const users = await db.users.findByIds(userIds)
5
  // Return in same order as input keys (DataLoader contract)
6
  return userIds.map((id) => users.find((u) => u.id === id))
7
})
8

9
// In resolver
10
const resolvers = {
11
  Order: {
12
    user: (order) => userLoader.load(order.userId),
13
  },
14
}

Critical rule: Create new DataLoader instance per request. Sharing loaders leaks data between users.

Shopify’s approach: Built GraphQL Batch Ruby library inspired by DataLoader. Reduced database queries by 10-100x for complex queries.

GraphQL Caching Challenges

HTTP caching doesn’t work: GraphQL uses POST for all queries (to send query body). POST responses aren’t cached by CDNs or browsers.

Solutions:

Persisted queries: Hash queries server-side, clients send hash instead of full query. Enables GET requests and CDN caching.
Response caching: Cache at resolver level (per-field), not HTTP level.
Client-side: Apollo Client normalizes responses into entity cache.

Real-world: GitHub’s GraphQL API uses persisted queries for their mobile apps. Reduces payload size and enables caching.

When GraphQL Fits

Best for:

Mobile apps with varying data requirements (reduce round trips)
Aggregating multiple backend services (BFF pattern)
Rapidly evolving frontends (no backend changes for new field combinations)

Not suitable for:

Simple CRUD APIs (overhead not justified)
File uploads (requires multipart extensions)
Public APIs (attack surface for malicious queries)

Query complexity protection: Production GraphQL servers must limit:

Query depth (prevent deeply nested queries)
Query complexity (weighted field costs)
Rate limiting per query cost, not just requests

API Versioning Strategies

Design Choices

URL Path Versioning

Mechanism: Version in URL path: /v1/users, /v2/users

When to use:

Public APIs with long-lived versions
Need to run multiple versions simultaneously
Clear separation between versions

Trade-offs:

✅ Explicit, impossible to miss
✅ Easy routing at load balancer/gateway
✅ Different versions can be separate deployments
❌ URL pollution (not a resource attribute)
❌ Clients must update all URLs for new version

Real-world: Twitter, Facebook, and Google Maps use URL versioning. Google runs v1 and v2 simultaneously for years during migrations.

Header-Based Versioning

Mechanism: Version in custom header or Accept header

1
GET /users HTTP/1.1
2
Accept: application/vnd.myapi.v2+json

When to use:

Clean URLs are priority
Gradual migration between versions
Same resource, different representations

Trade-offs:

✅ URLs remain stable
✅ Easier version negotiation
❌ Hidden—easy to forget
❌ Harder to test (can’t just change URL)
❌ Some tools don’t support custom headers easily

Date-Based Versioning (Stripe’s Approach)

Mechanism: Version by release date: Stripe-Version: 2024-01-28

How Stripe implements it:

Account pinning: First API call pins account to current version
Header override: Stripe-Version header overrides for testing
Version change modules: Internal code transforms responses between versions

1
GET /v1/customers HTTP/1.1
2
Stripe-Version: 2024-01-28

Why it works for Stripe:

Breaking changes are rare (biannual)
Version modules handle transformation at edges
Core code stays clean—versions are an adapter layer
72-hour rollback window after upgrade

Trade-off accepted: Complex internal architecture. Version transformation code accumulates over time.

Result: Code from 2011 still works. Stripe prioritizes API stability as infrastructure.

Breaking Change Management

What constitutes a breaking change:

Removing fields
Changing field types
Changing field semantics
Removing endpoints
Changing error codes

Safe changes (additive):

Adding new fields
Adding new endpoints
Adding new optional parameters
Adding new enum values (if clients handle unknown values)

Deprecation pattern:


2 collapsed lines
1
// Response includes deprecation warning
2
{
3
  "data": { ... },
4
  "_warnings": [
5
    {
6
      "code": "deprecated_field",
7
      "message": "Field 'legacy_id' deprecated. Use 'id' instead.",
8
      "deprecated_at": "2024-01-01",
9
      "sunset_at": "2025-01-01"
3 collapsed lines
10
    }
11
  ]
12
}

API Pagination Strategies

Offset Pagination

Mechanism: LIMIT x OFFSET y

1
GET /orders?limit=20&offset=40

When to use:

Small datasets (<10K records)
Need “jump to page X” functionality
Data changes infrequently

Trade-offs:

✅ Simple to implement
✅ Supports arbitrary page access
✅ Easy to calculate total pages
❌ Performance degrades with offset (database scans and discards rows)
❌ Inconsistent results if data changes between pages

Performance cliff: At offset 10,000, Postgres scans and discards 10,000 rows. Page 1: 10ms. Page 1000: several seconds.

Cursor Pagination

Mechanism: Opaque cursor points to position in result set

1
GET /orders?limit=20&after=eyJpZCI6MTIzNH0=

How it works:

Encode last item’s sort key as cursor (often base64 JSON)
Query: WHERE id > cursor_id ORDER BY id LIMIT 20
Return next_cursor with response

When to use:

Large datasets
Infinite scroll / real-time feeds
Data changes frequently

Trade-offs:

✅ Consistent performance regardless of page depth
✅ Stable results despite concurrent inserts
✅ 17x faster than offset for deep pagination (measured on 1M rows)
❌ No “jump to page” capability
❌ Can’t easily show “page 5 of 100”

Real-world: Twitter, Facebook, and Slack use cursor pagination for feeds. Twitter’s cursor encodes timestamp + tweet ID for deterministic ordering.

Keyset Pagination

Mechanism: Use actual column values instead of opaque cursor

1
GET /orders?limit=20&created_after=2024-01-15T10:30:00Z&id_after=12345

Why two columns: Timestamps alone aren’t unique. Tie-breaker (usually ID) ensures deterministic ordering.

Query:

1
SELECT * FROM orders
2
WHERE (created_at, id) > ('2024-01-15T10:30:00Z', 12345)
3
ORDER BY created_at, id
4
LIMIT 20

Requires: Composite index on (created_at, id)

Trade-offs:

✅ Same performance benefits as cursor
✅ Debuggable (values visible, not encoded)
❌ Exposes internal schema
❌ Harder if sort order changes dynamically

Decision Matrix

Factor	Offset	Cursor	Keyset
Dataset size	<10K	Any	Any
Page depth	Shallow only	Any	Any
Jump to page	✅	❌	❌
Real-time data	❌	✅	✅
Implementation	Simple	Medium	Medium
Performance (deep)	O(offset)	O(1)	O(1)

Rate Limiting Algorithms

Token Bucket

Mechanism: Bucket holds tokens. Each request consumes a token. Tokens refill at fixed rate up to bucket capacity.

Parameters:

Bucket capacity: Maximum burst size
Refill rate: Sustained request rate

Behavior:

Allows bursts up to capacity
After burst, rate limited to refill rate
Unused capacity accumulates (up to max)

1
Capacity: 10, Refill: 1/second
2

3
t=0: 10 tokens, 10 requests → 0 tokens
4
t=5: 5 tokens (refilled), 3 requests → 2 tokens
5
t=10: 7 tokens (2 + 5 refilled)

Trade-offs:

✅ Allows legitimate bursts
✅ Simple state (count + timestamp)
✅ Memory efficient
❌ Burst can overwhelm downstream
❌ Requires tuning capacity vs refill rate

Real-world: AWS API Gateway, Nginx use token bucket. AWS allows bursts of 5000 requests, then 10,000/second sustained.

Leaky Bucket

Mechanism: Requests enter bucket, leak out at constant rate. Overflow rejected.

Key difference from token bucket: Output rate is constant, not bursty.

Trade-offs:

✅ Smooth, predictable output rate
✅ Protects downstream from bursts
❌ No burst allowance—legitimate spikes rejected
❌ Can reject requests even under limit if they arrive in bursts

Use case: Traffic shaping where downstream can’t handle bursts (legacy systems, rate-limited third-party APIs).

Sliding Window

Sliding Window Log: Store timestamp of each request. Count requests in last N seconds.

Sliding Window Counter: Combine current window count with weighted previous window.

1
Window: 60 seconds
2
Previous window: 50 requests
3
Current window: 30 requests (25 seconds in)
4
Effective count: 30 + (50 × 35/60) = 59.2 requests

Trade-offs:

Variant	Memory	Accuracy	Burst Handling
Log	O(requests)	Exact	Smooth
Counter	O(1)	Approximate	Smooth

Real-world: Redis-based rate limiters often use sliding window counter for balance of accuracy and memory efficiency.

Rate Limit Headers

Standard headers (RFC 6585 + draft-ietf-httpapi-ratelimit-headers):

1
HTTP/1.1 429 Too Many Requests
2
RateLimit-Limit: 100
3
RateLimit-Remaining: 0
4
RateLimit-Reset: 1640995200
5
Retry-After: 30

Best practice: Always include headers so clients can back off gracefully. Include Retry-After on 429 responses.

API Documentation

OpenAPI (Swagger)

Current version: OpenAPI 3.1 (2021) achieved full JSON Schema compatibility.

What it defines:

Endpoints, methods, parameters
Request/response schemas
Authentication methods
Server URLs

Tooling ecosystem:

Code generation (client SDKs, server stubs)
Documentation (Swagger UI, Redoc)
Testing (Postman, Insomnia import)
Validation (request/response checking)

Best practice: Generate OpenAPI from code annotations (not manual YAML). Keeps spec synchronized with implementation.

AsyncAPI for Event-Driven APIs

Purpose: OpenAPI equivalent for message-based APIs (Kafka, RabbitMQ, WebSockets).

Key differences from OpenAPI:

Channels instead of paths
Messages instead of request/response
Bindings for protocol-specific config


5 collapsed lines
1
asyncapi: 3.0.0
2
info:
3
  title: Order Events
4
  version: 1.0.0
5

6
channels:
7
  orderCreated:
8
    address: orders/created
9
    messages:
10
      orderCreated:
11
        payload:
12
          type: object
13
          properties:
14
            orderId: { type: string }
15
            total: { type: number }

Tooling: Springwolf generates AsyncAPI from Spring Kafka/RabbitMQ annotations.

Real-World Protocol Decisions

Slack: Hybrid Approach

Problem: Needed real-time messaging, REST-like simplicity for integrations, efficient mobile sync.

Solution:

Web API: REST for third-party integrations (familiar, cacheable)
RTM API: WebSocket for real-time events
Events API: Webhooks for server-to-server

Why not just gRPC: Slack prioritized developer experience. REST + WebSocket was more accessible than requiring protobuf tooling from integration developers.

Netflix: All-In on gRPC

Scale: 100% internal traffic uses gRPC.

Why:

Multilingual services (Java, Node, Python) need shared contract
Built-in load balancing and health checking
Streaming for large recommendation payloads
Protocol buffers reduce bandwidth (significant at Netflix scale)

Trade-off accepted: Built custom tooling for debugging. Invested in gRPC-Web proxy for admin UIs.

GitHub: GraphQL for Public API

Problem: Mobile app needed flexible queries. REST API required many round trips.

Solution: GraphQL API alongside REST.

Implementation details:

Persisted queries for mobile (reduced payload, enabled caching)
Query complexity limits (prevent abuse)
REST API maintained for backward compatibility

Outcome: Mobile app performance improved. Developer adoption slower than expected—GraphQL learning curve higher than REST.

Common Pitfalls

1. Treating gRPC as HTTP/JSON Replacement

The mistake: Using gRPC for browser-facing APIs expecting same ease as REST.

Why it happens: Performance benchmarks show gRPC faster. Teams assume faster = better everywhere.

The consequence: Need gRPC-Web proxy, lose browser DevTools debugging, complicate frontend build.

The fix: Use gRPC for service-to-service. Keep REST/GraphQL for browser clients.

2. GraphQL Without Complexity Limits

The mistake: Exposing GraphQL without query depth/cost limits.

Why it happens: Focus on functionality, security as afterthought.

The consequence: Malicious queries exhaust server resources:

1
query {
2
  users { friends { friends { friends { friends { ... } } } } }
3
}

The fix: Implement query complexity analysis. Limit depth (typically 10-15). Assign costs to fields. Reject queries exceeding budget.

3. Offset Pagination on Large Tables

The mistake: Using LIMIT 20 OFFSET 100000 for API pagination.

Why it happens: Offset pagination is intuitive and works fine in development.

The consequence: Production queries take 10+ seconds. Database CPU spikes. Users report “infinite loading” on deep pages.

The fix: Switch to cursor pagination. If “jump to page” needed, limit maximum offset (e.g., 10,000).

4. Breaking API Changes Without Versioning

The mistake: Changing field types or removing fields in production API.

Why it happens: “It’s just a small change” or “no one uses that field.”

The consequence: Client applications break. Mobile apps (can’t force update) fail for weeks.

The fix: Treat API as infrastructure. Additive changes only. Version for breaking changes. Deprecate before removing.

Conclusion

Protocol selection requires matching characteristics to constraints:

Constraint	REST	gRPC	GraphQL
Public API	✅ Best	❌ Proxy needed	⚠️ Query limits required
Internal services	⚠️ Verbose	✅ Best	⚠️ Overkill
Mobile apps	⚠️ Many round trips	✅ Efficient	✅ Flexible
Browser direct	✅ Native	❌ gRPC-Web	✅ Works
Streaming	❌ Workarounds	✅ Native	⚠️ Subscriptions
Caching	✅ HTTP native	❌ Custom	⚠️ Complex

The pragmatic approach:

REST for public APIs and browser-direct calls
gRPC for internal service mesh
GraphQL when client flexibility outweighs server complexity
Often: multiple protocols in the same system

Versioning, pagination, and rate limiting aren’t optional add-ons—they’re fundamental to operating APIs at scale. Stripe’s decade of API stability demonstrates that treating APIs as infrastructure pays long-term dividends.

Appendix

Prerequisites

HTTP/1.1 and HTTP/2 fundamentals
Basic understanding of serialization formats (JSON, binary)
Database query basics (for pagination section)

Summary

REST optimizes for cacheability and uniform interface; most “REST” APIs are Level 2 Richardson Maturity
gRPC provides 3-10x efficiency over JSON with binary serialization and HTTP/2 streaming; use for internal services
GraphQL shifts query complexity to clients; requires N+1 mitigation (DataLoader) and query complexity limits
Cursor pagination outperforms offset by 17x on large datasets; use offset only for small, static data
Token bucket allows bursts, leaky bucket enforces constant rate—choose based on downstream tolerance
Version APIs from day one; Stripe’s date-based versioning maintains 13+ years of backward compatibility

References

Martin Fowler: Richardson Maturity Model - REST maturity levels explained
gRPC Core Concepts - Official gRPC documentation
Protocol Buffers Encoding - Wire format specification
GraphQL Specification - Official GraphQL spec
Solving N+1 with DataLoader - GraphQL.js DataLoader documentation
Shopify: Solving N+1 through Batching - Production GraphQL optimization
Stripe: API Versioning - Date-based versioning strategy
Understanding Cursor Pagination - Performance deep dive
Rate Limiting Algorithms - Algorithm comparison with implementations
AsyncAPI Specification - Event-driven API documentation standard

Read more