Skip to main content

Command Palette

Search for a command to run...

⚡ Rate Limiting Explained – A Complete Beginner’s Guide

Updated
9 min read
⚡ Rate Limiting Explained – A Complete Beginner’s Guide

The internet is a busy place. Every second, millions of people log in, download files, stream videos, or use APIs. But what happens if too many requests hit a server at once? 👉 It slows down, crashes, or even becomes vulnerable to attacks.

This is where Rate Limiting comes in.


💡 What is Rate Limiting?

Imagine you’re at an amusement park. There’s a popular roller coaster ride, but only 20 people can ride at a time. The staff doesn’t let 200 people rush in all at once — they control the flow to keep things safe and smooth.

That’s exactly what Rate Limiting does for servers: ➡️ It controls how many requests a user (or app) can make in a given time frame.

For example:

  • 10 login attempts per minute per user
  • 100 API requests per hour per IP address

🎯 Why Do We Need Rate Limiting?

  1. Prevent Abuse Without rate limits, someone could spam your login endpoint with brute force attacks to guess passwords.

  2. Avoid Server Overload Like a cashier handling one customer at a time, servers need breathing room. Too many requests = breakdown.

  3. Control API Usage If you provide a free API, you don’t want one user making 1 million requests and hogging all resources.

  4. Prevent DDoS Attacks Rate limiting acts like a bouncer at a club — blocking suspiciously high traffic before it causes trouble.


👩‍💻 Real-Life Examples of Rate Limiting

Rate limiting is something you bump into daily, even if you don’t notice it. Here are some simple cases 👇

  1. Login Pages 🔑 – You get locked out after too many wrong password attempts. This prevents hackers from guessing your password. (Like an ATM locking your card after 3 wrong PIN tries.)

  2. APIs ⚙️ – Services like GitHub or Twitter only allow a fixed number of requests per minute/hour. This keeps their servers safe and fair for everyone.

  3. E-Commerce 🛒 – During sales, websites limit how often you can refresh or checkout, so no one hoards all the stock.

  4. Messaging Apps 💬 – WhatsApp or Telegram stop you from sending too many messages at once to prevent spam.

  5. Streaming Platforms 🎬 – Netflix limits how many devices can stream from the same account at the same time.

👉 These examples show that rate limiting isn’t just a “developer thing” — it’s a rule that keeps systems fair, reliable, and safe in the real world.


🛠️ How Does Rate Limiting Work?

At its core, Rate Limiting is about counting requests and deciding whether to allow, delay, or block them.

Think of it as a traffic signal for your server:

  • Green → Allow request
  • Yellow → Slow down or wait
  • Red → Block the request

Here’s a breakdown of the most common algorithms with examples you can relate to 👇

1. Fixed Window Counter

  • The simplest method: “X requests per fixed time window.”
  • Example: Max 100 requests per minute.

👉 Real-life analogy: Think of a parking garage with 100 slots per hour. If you come at 10:15 and all 100 slots are filled, you’re denied entry until 11:00, when the counter resets.

Pros:

  • Simple and fast.
  • Easy to implement.

Cons:

  • Can cause a burst problem at the reset. (e.g., 100 requests at 10:59, another 100 at 11:00 = 200 requests almost instantly).

2. Sliding Window ⏳➡️⏳

  • Instead of a strict reset, this method looks at a rolling time window (like “last 60 seconds” from now).
  • Example: Max 100 requests in the last 60 seconds.

👉 Real-life analogy: Imagine a movie theater with 100 tickets available for “any rolling 1-hour period.” If 70 people buy tickets between 7:00–7:30, and 40 more try between 7:15–7:45, only 30 will be allowed in. As time passes and old requests “slide out,” new ones get space.

Pros:

  • Smoother traffic handling.
  • No burst problem.

Cons:

  • Slightly more complex to implement.

3. Token Bucket 🎟️

  • Each user gets a “bucket of tokens.” Each request uses up 1 token.
  • Tokens refill slowly over time (say 1 token per second). If you run out, you wait.

👉 Real-life analogy: Think of a water cooler that drips water into a cup at a steady pace. You can drink when there’s water. If you gulp too fast and empty it, you must wait until it refills.

Pros:

  • Allows short bursts while keeping long-term limits.
  • Very popular for APIs.

Cons:

  • Requires tracking tokens per user/IP.

4. Leaky Bucket 🪣

  • Similar to token bucket, but the bucket drains at a fixed rate.
  • Incoming requests are added to the bucket. If it overflows → extra requests are dropped.

👉 Real-life analogy: Think of a coffee machine that pours at a steady rate. If too many people pour coffee at once, the extra spills over the sides and is wasted.

Pros:

  • Keeps traffic smooth and predictable.

Cons:

  • Bursty traffic may be lost.

5. Concurrency Limits 👥

  • Instead of requests per second, you limit how many requests a user can run at the same time.

👉 Real-life analogy: At a bank counter, only 3 people can be served simultaneously. If 10 people walk in, the rest must wait in line until a counter is free.

6. Dynamic / Adaptive Rate Limiting 🤖

  • Smarter systems adjust limits based on current server load or user reputation.
  • Example: If the server is 90% busy, it cuts request limits in half automatically.

👉 Real-life analogy: During festival season, a store might tighten entry (only 50 customers inside at once) compared to a normal day (200 allowed).

✅ Putting It All Together

Different systems use different algorithms depending on needs:

  • Fixed Window → Good for simple apps.
  • Sliding Window → Best when fairness matters.
  • Token Bucket / Leaky Bucket → Perfect for APIs with bursty traffic.
  • Concurrency Limit → Useful for heavy operations (like file uploads).
  • Adaptive → Smart choice for scaling apps.

👉 Example in practice:

  • Login page → Fixed Window (5 attempts/minute).
  • Public API → Token Bucket (100 requests/hour, refill 1 every 36 seconds).
  • File uploads → Concurrency Limit (max 2 uploads per user at once).

🧑‍💻 Implementing Rate Limiting in Node.js

We’ll explore 3 approaches:

  1. Using express-rate-limit (Fixed Window)
  2. Custom Token Bucket Implementation
  3. Custom Sliding Window Implementation

1️⃣ Fixed Window (using express-rate-limit)

This is the most common and easiest way.

const express = require("express");
const rateLimit = require("express-rate-limit");

const app = express();

// Fixed Window: 5 requests per minute
const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 5, // max 5 requests per IP
  message: "⚠️ Too many requests! Please try again later.",
});

app.use(limiter);

app.get("/", (req, res) => {
  res.send("Welcome! You are protected by Fixed Window Rate Limiting 🚦");
});

app.listen(3000, () => console.log("Server running on port 3000"));

👉 Best for: Simple login systems, small apps.

2️⃣ Token Bucket Implementation 🎟️

Let’s manually implement a Token Bucket:

const express = require("express");
const app = express();

const buckets = {}; // Store tokens for each IP

// Config
const MAX_TOKENS = 10;      // Max requests allowed
const REFILL_RATE = 1;      // 1 token per second

// Middleware
function tokenBucket(req, res, next) {
  const ip = req.ip;
  const now = Date.now();

  if (!buckets[ip]) {
    buckets[ip] = { tokens: MAX_TOKENS, lastRefill: now };
  }

  const bucket = buckets[ip];

  // Refill tokens
  const elapsed = (now - bucket.lastRefill) / 1000; // in seconds
  const refill = Math.floor(elapsed * REFILL_RATE);
  bucket.tokens = Math.min(MAX_TOKENS, bucket.tokens + refill);
  bucket.lastRefill = now;

  // Check if tokens available
  if (bucket.tokens > 0) {
    bucket.tokens -= 1;
    next(); // allow request
  } else {
    res.status(429).send("⚠️ Too many requests. Please wait...");
  }
}

app.use(tokenBucket);

app.get("/", (req, res) => {
  res.send("Welcome! You are protected by Token Bucket Rate Limiting 🎟️");
});

app.listen(3000, () => console.log("Server running on port 3000"));

👉 How it works:

  • Each IP gets a “bucket” of tokens.
  • Every request consumes 1 token.
  • Tokens refill at a fixed rate.
  • If empty → requests are denied until refill.

👉 Best for: APIs with bursty traffic (e.g., users can make quick bursts but not abuse long-term).

3️⃣ Sliding Window Implementation ⏳➡️⏳

This method ensures fairness by looking at requests over a rolling period.

const express = require("express");
const app = express();

const requests = {}; // Store timestamps per IP

// Config
const WINDOW_SIZE = 60 * 1000; // 1 minute
const MAX_REQUESTS = 5;

function slidingWindow(req, res, next) {
  const ip = req.ip;
  const now = Date.now();

  if (!requests[ip]) {
    requests[ip] = [];
  }

  // Keep only recent requests within window
  requests[ip] = requests[ip].filter(ts => now - ts < WINDOW_SIZE);

  if (requests[ip].length >= MAX_REQUESTS) {
    return res.status(429).send("⚠️ Too many requests. Please slow down.");
  }

  requests[ip].push(now); // record new request
  next();
}

app.use(slidingWindow);

app.get("/", (req, res) => {
  res.send("Welcome! You are protected by Sliding Window Rate Limiting ⏳");
});

app.listen(3000, () => console.log("Server running on port 3000"));

👉 How it works:

  • Keeps track of request timestamps per IP.
  • Removes old ones outside the time window.
  • Only allows new requests if under the limit.

👉 Best for: Fair usage (avoids “burst at reset” problem of Fixed Window).

🔑 Key Takeaways

  • Fixed Window (express-rate-limit): Easy, quick, good for small apps.
  • Token Bucket: Best for APIs with bursts of requests.
  • Sliding Window: More precise + fair, avoids reset abuse.

In real-world apps:

  • You might store counters in Redis instead of in-memory (for scalability).
  • Combine multiple strategies → e.g., Token Bucket for API, Fixed Window for login.

📊 Best Practices for Rate Limiting

  • ✅ Apply different limits for different routes (e.g., stricter on /login, lenient on /public).
  • ✅ Always return a clear error message (429 Too Many Requests).
  • ✅ Use rate limiting with other security tools (like WAF, CAPTCHA, monitoring).
  • ✅ Log blocked requests for later analysis.
  • ✅ For APIs, include headers (X-RateLimit-Limit, X-RateLimit-Remaining) so users know their usage.
  • ✅ Use different storage like Redis/MongoDB for distributed apps (instead of just in-memory).
  • ✅ Whitelist trusted internal services so they aren’t blocked.

🚀 Final Words

Rate Limiting is like traffic control for your servers.

  • Without it → chaos, slowdowns, and attacks.
  • With it → smooth, safe, and fair access for everyone.

If you’re building apps, APIs, or websites — implementing rate limiting is a must-have security layer.

💬 Have Questions or Suggestions?

Drop a comment below or connect with me on LinkedIn or GitHub. Let’s make apps safer and faster together! 🚀

More from this blog

B

Build Better with Kuntal

19 posts

Practical guides, tips, and tutorials on full-stack development, performance, and mobile apps—helping you build better, faster, and smarter with clean, real-world code. 🚀