SystemDesign2025-04-10

System design - Rate limiter

Rate limit

Rate Limiter

a rate limiter is used to control the rate of traffic sent by a client or a service.

why

prevent Dos attach to prevent prevent server overload
reduce cost if using third party api, like retireve balance

implement

client side: throttle the button click rate, no guarentee (can be modified source code)
server side: most cases implemented within API Gateway
hard limit or soft limit

Algorithms

Token Bucket
Leaking Bucket
Fixed Window Counter
sliding window log
sliding window counter

token bucket: A token bucket is a container that has pre-defined capacity. Tokens are put in the bucket at preset rates periodically. Once the bucket is full, no more tokens are added.

If there is enough token, process request and consume one token
else: request is denied
token bucket
TokenBucket(bucketSize, RefillRate)
- Refill rate: number of tokens put into the bucket every second
How many bucket is needed depends on different api, if a user is allowed to make 1 post per second, add 150 friends per day, the we need 2 buckets

Pros:
    1. easy to implement
    2. memory efficient
Cons:
    1. hard to tune the params properly

Leaking bucket: FIFO queue - if queue is not full, add it to queue - else drop it - Requests are pulled from the queue and processed at regular intervals.

leakingBucket(BucketSize, OutflowRate):
- Outflow rate: it defines how many requests can be processed at a fixed rate, usually in seconds.

Pros:
    1. memory efficient given the limited queue size
    2. Requests are processed at a fixed rate therefore it is suitable for use cases that a stable outflow rate is needed.
Cons:
    1. when queue filled with old requests, burst recent traffic will be dropped
    2. hard to tune params

Fixed window counter - The algorithm divides the timeline into fix-sized time windows and assign a counter for each window - Each request increments the counter by one - Once the counter reaches the pre-defined threshold, new requests are dropped until a new time window starts.

issue: [0.5 : 1] + [1, 1.5] this one second can process 2 * count requests

Pros:
    1. memory efficient
    2. fit certain cases
Cons:
    issue: [0.5 : 1] + [1, 1.5] this one second can process 2 * count requests

sliding window log : fix the issue of fixed window counter - keep track request timestamp in cache (redis) - when new request came, remove all outdated timestamp, eg request comes at 1:01:30, so the frame is [1:00:30 - 1:01:30], remove all timestamp before 1:00:30 - add timestamp to log - if log size <= allowed count, process else rejected

Pros:
    1. accurate rate limit, in any rolling window, request will not exceed limit
Cons:
    1. need memory to store request timestamp

sliding window counter: combine fixed window counter and sliding window log - Requests in current window + requests in the previous window * overlap percentage of the rolling window and previous window - round up or down, then compare with allowed count

Pros:
    1.  It smooths out spikes in traffic because the rate is based on the average rate of the previous window
    2. memory efficient
Cons:
    1. not so strict

implementation Code with Redis

High level architure

in-memory cache like Redis is a good choice, as this is time-based expiration strategy.

high level

Deep Dive

rules:

domain: auth
descriptors:
  - key: auth_type
    Value: login
    rate_limit:
      unit: minute
      requests_per_unit: 5

rules often stored on configs or saved on disk. generally rules will be cached as well.

Headers: client can get information from headers about the rate limit.

X-Ratelimit-Remaining: The remaining number of allowed requests within the window.
X-Ratelimit-Limit: It indicates how many calls the client can make per time window.
X-Ratelimit-Retry-After: The number of seconds to wait until you can make a request again without being throttled.

When a user has sent too many requests, a 429 too many requests error and X-Ratelimit-Retry-After header are returned to the client.

Rate limiter in distributed systems

In distributed systems, generally it will have two problems:

Race condition:
- Lua script
- sorted set
Sync problem:
- same client to same limiter (not scalable)
- better solution:
  - use centralized data stores like Redis, all rate limiter fetch data from Redis

performance

set up multi-data center to reduce latency
sync data in eventual consistency model (trade strict limit for latency)

metric & monitor

limit algorithm is effective
limit rules are effective

too many valid request are dropped -> relax rule flash sales, burst traffic -> change to token buckets

Improvement with Bloom Filters & Count-Min Sketches

Reference

Use Bloom Filter to track requests and estimate whether a user has made requests in the current time window.

Use Count-min Sketches to count the number of requests for each user, ensuring a precise count within the time window

Bloom Filter Class Implementation

import java.util.BitSet;

public class BloomFilter {
    private BitSet bitSet;
    private int size;
    private int[] hashSeeds; // List of seeds for different hash functions

    public BloomFilter(int size, int[] hashSeeds) {
        this.size = size;
        this.bitSet = new BitSet(size);
        this.hashSeeds = hashSeeds;
    }

    // Insert an item (e.g., User ID) into the Bloom Filter
    public void add(String item) {
        for (int seed : hashSeeds) {
            int hash = hash(item, seed);
            bitSet.set(Math.abs(hash % size));
        }
    }

    // Check if an item exists in the Bloom Filter
    public boolean mightContain(String item) {
        for (int seed : hashSeeds) {
            int hash = hash(item, seed);
            if (!bitSet.get(Math.abs(hash % size))) {
                return false; // If any bit is not set, item is definitely not present
            }
        }
        return true; // Otherwise, item might be present
    }

    // Hashing function
    private int hash(String item, int seed) {
        return item.hashCode() * seed;
    }

    // Reset the Bloom filter for a new time window
    public void reset() {
        bitSet.clear();
    }
}

Count-min Sketches Implementation

import java.util.Random;

public class CountMinSketch {
    private int[][] sketch;
    private int depth;
    private int width;
    private int[] hashSeeds;

    public CountMinSketch(int depth, int width) {
        this.depth = depth;
        this.width = width;
        this.sketch = new int[depth][width];
        this.hashSeeds = new int[depth];

        // Initialize hash seeds for each row
        Random random = new Random();
        for (int i = 0; i < depth; i++) {
            hashSeeds[i] = random.nextInt();
        }
    }

    // Add an item and increment its count
    public void add(String item) {
        for (int i = 0; i < depth; i++) {
            int hash = hash(item, hashSeeds[i]);
            sketch[i][hash % width] += 1;
        }
    }

    // Estimate the count of an item
    public int estimateCount(String item) {
        int min = Integer.MAX_VALUE;
        for (int i = 0; i < depth; i++) {
            int hash = hash(item, hashSeeds[i]);
            min = Math.min(min, sketch[i][hash % width]);
        }
        return min;
    }

    // Hashing function
    private int hash(String item, int seed) {
        return item.hashCode() * seed;
    }

    // Reset Count-Min Sketch for a new time window
    public void reset() {
        for (int i = 0; i < depth; i++) {
            for (int j = 0; j < width; j++) {
                sketch[i][j] = 0;
            }
        }
    }
}

Rate Limiter Class Implementation

public class RateLimiter {
    private BloomFilter bloomFilter;
    private CountMinSketch countMinSketch;
    private int maxRequests; // Max allowed requests per user/IP per time window

    public RateLimiter(int bloomFilterSize, int[] hashSeeds, int depth, int width, int maxRequests) {
        this.bloomFilter = new BloomFilter(bloomFilterSize, hashSeeds);
        this.countMinSketch = new CountMinSketch(depth, width);
        this.maxRequests = maxRequests;
    }

    // Process a request from a user and check if it's within the rate limit
    public boolean isRequestAllowed(String userId) {
        if (bloomFilter.mightContain(userId)) {
            // Check exact count in Count-Min Sketch
            int currentCount = countMinSketch.estimateCount(userId);
            if (currentCount >= maxRequests) {
                return false; // Exceeds rate limit
            }
        }

        // If not exceeded, add the request
        bloomFilter.add(userId);
        countMinSketch.add(userId);
        return true; // Request allowed
    }

    // Reset for new time window
    public void reset() {
        bloomFilter.reset();
        countMinSketch.reset();
    }
}

Window Manager Class Implementation

import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

public class WindowManager {
    private RateLimiter rateLimiter;
    private int windowDuration; // Duration in seconds for each sliding window

    public WindowManager(RateLimiter rateLimiter, int windowDuration) {
        this.rateLimiter = rateLimiter;
        this.windowDuration = windowDuration;
    }

    public void startWindowRotation() {
        ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
        
        // Schedule BloomFilter and CountMinSketch reset at regular intervals
        scheduler.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                rateLimiter.reset();
            }
        }, windowDuration, windowDuration, TimeUnit.SECONDS);
    }
}

Filed under

SystemDesign

← Previous

System design - Payment - Amazon

System Design - Rate Limiting/Throttling 方法论框架

Back to All Posts