Skip to content

Building for Scale: Architecture Patterns That Actually Work

Most scaling advice is generic. Here are the patterns that have consistently worked across real systems handling millions of requests — and the ones that sound good but fail in practice.

Durval Pereira
Durval Pereira
5 min

The problem with scaling advice

Most articles about scaling repeat the same playbook: add caching, use a CDN, shard your database, go async. This advice isn't wrong — it's just incomplete. It skips the part where you have to figure out which of these things matter for your system, when to apply them, and how to avoid creating new problems while solving old ones.

After building and scaling systems across different domains — from high-throughput APIs to real-time data pipelines — I've found that the most useful scaling patterns share a common trait: they reduce the blast radius of failure while increasing the independence of components.

Start with the bottleneck, not the architecture diagram

The single most common mistake in scaling is optimizing the wrong thing. Before you redesign your system, you need to know where it actually breaks.

// A simple but effective approach: instrument first, optimize second
import { metrics } from '@/lib/telemetry'

export async function handleRequest(req: Request) {
  const timer = metrics.startTimer('request_duration')
  const route = extractRoute(req)

  try {
    const result = await processRequest(req)
    timer.end({ route, status: 'success' })
    return result
  } catch (error) {
    timer.end({ route, status: 'error' })
    throw error
  }
}

This isn't glamorous. But knowing that 80% of your latency comes from three specific database queries is worth more than any architecture diagram.

Pattern 1: The read replica with smart routing

The simplest scaling pattern that delivers outsized results. Most applications are read-heavy — often 90% reads or more. Routing reads to replicas is straightforward, but the implementation details matter.

interface DatabaseRouter {
  write(): DatabaseConnection
  read(consistency?: 'eventual' | 'strong'): DatabaseConnection
}

class SmartRouter implements DatabaseRouter {
  private primary: DatabaseConnection
  private replicas: DatabaseConnection[]
  private recentWrites: Map<string, number> = new Map()

  write(): DatabaseConnection {
    return this.primary
  }

  read(consistency: 'eventual' | 'strong' = 'eventual'): DatabaseConnection {
    if (consistency === 'strong') {
      return this.primary
    }
    return this.selectReplica()
  }

  private selectReplica(): DatabaseConnection {
    const healthy = this.replicas.filter((r) => r.isHealthy())
    if (healthy.length === 0) return this.primary
    return healthy[Math.floor(Math.random() * healthy.length)]
  }
}

The recentWrites map is key. After a user writes data, you route their reads to the primary for a short window to avoid read-your-writes inconsistency. This is the kind of detail that generic scaling advice misses.

Pattern 2: Tiered caching with explicit invalidation

Caching is easy. Cache invalidation is the actual problem. The most robust approach I've found uses explicit tiers with clear ownership.

Tier 1 — Request-level: Memoize within a single request. Zero risk, massive benefit for repeated computations.

Tier 2 — Application-level: In-memory cache (like a TTL map) for hot data. Fast but requires careful sizing.

Tier 3 — Distributed cache: Redis or Memcached for shared state. Adds a network hop but scales horizontally.

Tier 4 — CDN edge: For static and semi-static content. The most effective tier for read-heavy public APIs.

The mistake most teams make is jumping straight to Tier 3 or 4 without exhausting the value from Tier 1 and 2. An in-memory cache with a 30-second TTL can eliminate 95% of identical database queries with zero infrastructure cost.

Pattern 3: Backpressure as a feature

Most systems fail not because a component is slow, but because a slow component gets overwhelmed by a fast producer. Backpressure — the ability for a consumer to signal that it can't keep up — is essential at scale.

class BoundedQueue<T> {
  private queue: T[] = []
  private waiters: Array<(value: T) => void> = []

  constructor(private maxSize: number) {}

  async enqueue(item: T): Promise<boolean> {
    if (this.queue.length >= this.maxSize) {
      return false // Signal backpressure
    }

    if (this.waiters.length > 0) {
      const waiter = this.waiters.shift()!
      waiter(item)
    } else {
      this.queue.push(item)
    }
    return true
  }

  async dequeue(): Promise<T> {
    if (this.queue.length > 0) {
      return this.queue.shift()!
    }

    return new Promise((resolve) => {
      this.waiters.push(resolve)
    })
  }
}

When enqueue returns false, the producer knows to back off. This is vastly better than unbounded queues that eat memory until the process crashes, or circuit breakers that drop requests without the producer knowing why.

What doesn't work

A few patterns that sound reasonable but consistently cause problems:

Premature microservices. If your team can't deploy a monolith reliably, microservices won't help — they'll multiply your operational problems. Start with a well-structured modular monolith.

Shared databases across services. This creates invisible coupling that makes independent scaling impossible. If two services share a database, they are not separate services — they're a distributed monolith.

Over-reliance on async processing. Making everything async can create debugging nightmares and user experience problems. Some operations should be synchronous and fast.

The meta-pattern

The real scaling pattern isn't any single technique. It's this: make components independently deployable, independently scalable, and independently observable. Every concrete pattern — read replicas, caching, queuing, sharding — is just a specific application of this principle.

When you evaluate any scaling proposal, ask three questions:

  1. Does this reduce coupling between components?
  2. Does this make failure more localized?
  3. Can I observe and debug this independently?

If the answer is no to any of these, the pattern may solve your immediate problem but create a harder one later.


This is the first in a series on practical system design. Next: how to think about data modeling decisions that scale.

Tagsscalingdistributed-systemssystem-designpatterns