About this tool
What Is API Rate Limiting?
API rate limiting is a mechanism that restricts how many requests a client can make to an API within a specified time window. When you exceed the limit, the server returns an HTTP 429 "Too Many Requests" response, temporarily blocking further requests until the window resets.
Rate limits protect API infrastructure from being overwhelmed by excessive traffic — whether from legitimate high-volume applications, automated scrapers, or denial-of-service attacks. Nearly every public and private API implements rate limiting: Stripe (100 requests/second), OpenAI (varies by model and plan), GitHub (5,000 requests/hour), Twitter/X (varies by endpoint), and Shopify (2 requests/second on some endpoints).
Leaky Bucket vs. Token Bucket Algorithms
API providers implement rate limiting using one of two main algorithms:
Token Bucket (used by Stripe, GitHub, OpenAI): Your account has a "bucket" filled with tokens. Each request consumes one token. Tokens replenish at a constant rate. This allows burst traffic — you can send many requests quickly until the bucket empties, then must wait for tokens to refill.
Leaky Bucket (used by Shopify, some legacy APIs): Requests enter a queue and are processed at a fixed rate, like water dripping from a bucket. There is no burst capacity — if you send 10 requests instantly, they are processed one at a time at the allowed rate, and excess requests may be dropped.
The safest client-side strategy is to pace your requests at a steady interval (Leaky Bucket style) regardless of which algorithm the server uses. Steady pacing satisfies both algorithms and avoids triggering any rate limit penalties.
The Hidden Concurrency Problem
The most common reason developers unexpectedly hit rate limits is hidden concurrency. Consider this scenario:
- API limit: 10 requests per second
- Your script runs at a safe 3 requests per second
- You deploy 5 parallel workers
- Actual global rate: 5 × 3 = 15 requests per second
- Result: HTTP 429 errors
When using parallel workers, containers, or serverless functions, you must divide the total allowed rate by the number of concurrent workers. Each worker gets its own proportional share of the global limit. This calculator handles the division automatically.
Handling 429 Responses Correctly
When you receive an HTTP 429 response, follow this protocol:
- Read the Retry-After header. Many APIs include this header specifying how many seconds to wait before retrying.
- Implement exponential backoff. If no Retry-After header exists, wait 1 second, then 2, then 4, then 8 seconds between retries.
- Set a maximum retry count. Do not retry indefinitely — after 5-10 attempts, log the error and skip the request.
- Never retry immediately. Instant retries make the problem worse and may result in longer bans or IP-level blocking.
Common API Rate Limits by Provider
| API Provider | Default Limit | Time Window | Algorithm |
|---|---|---|---|
| Stripe | 100 requests | Per second | Token Bucket |
| OpenAI GPT-4 | Varies by plan | Per minute (RPM & TPM) | Token Bucket |
| GitHub | 5,000 requests | Per hour | Token Bucket |
| Shopify | 2 requests/sec | Per second | Leaky Bucket |
| Twitter/X | Varies by endpoint | Per 15 minutes | Sliding Window |
| Google Maps | 50 requests | Per second | Per-key limit |
Always check the specific API documentation for your endpoint and plan tier, as limits vary.
Rate Limiting in System Design
For developers building APIs rather than consuming them, rate limiting is a critical system design decision. Implementation options include:
- Fixed Window: Count requests in fixed time blocks (e.g., 0:00-0:59, 1:00-1:59). Simple but allows burst traffic at window boundaries.
- Sliding Window Log: Track the timestamp of every request and count requests in a rolling time window. More accurate but memory-intensive.
- Sliding Window Counter: Hybrid approach using weighted averages of current and previous windows. Balances accuracy and efficiency.
- Token Bucket with Redis: Store token counts in Redis for distributed rate limiting across multiple application servers. The most common production implementation.
Practical Usage Examples
Single Worker
API allows 1000 requests per minute, 1 worker.
Safe interval: 60ms between requests. Maximum daily capacity: ~1,440,000 requests. Distributed Workers
API allows 3000 req/min across 10 parallel pods.
Each pod: 5 req/sec, 200ms delay per pod. Total: 50 req/sec globally. Step-by-Step Instructions
Step 1: Enter the API Rate Limit. Check your API provider's documentation for the request limit (e.g., Stripe allows 100/sec, OpenAI has tokens per minute). Enter the number.
Step 2: Select the Time Window. Choose whether the limit applies per second, per minute, per hour, or per day. Most APIs specify this in their rate limit headers or documentation.
Step 3: Set Concurrent Workers. If you run multiple parallel threads, containers, or serverless functions hitting the same API, enter the count. The calculator divides the safe rate across all workers.
Step 4: Get Your Delay Interval. The calculator outputs the exact millisecond delay each worker should wait between requests to stay within limits using Leaky Bucket pacing.
Step 5: Copy the Code. Use the generated JavaScript, Python, or Go code snippet showing exactly where to add the delay in your request loop.
Core Benefits
Prevents HTTP 429 Errors: Calculate the exact delay needed to stay within API rate limits, avoiding the "Too Many Requests" response that halts your application.
Multi-Thread Safety: If you run 10 parallel workers each making requests to the same API, the calculator distributes the allowed rate evenly across all workers to prevent exceeding the global limit.
Ready-to-Use Code Snippets: Get copy-paste delay code for JavaScript (await/setTimeout), Python (time.sleep), and Go (time.Sleep) — no need to manually calculate millisecond values.
ETL Duration Planning: If you need to migrate 1 million records through a rate-limited API, instantly calculate how long the entire process will take at the safe request rate.
All-Browser Processing: Your API architecture details and rate limit configurations stay in your browser. No data is sent to any server.
Frequently Asked Questions
HTTP 429 is the standard status code returned when a client exceeds the API rate limit. The response typically includes a Retry-After header indicating how many seconds to wait before making another request. Your application should handle 429 responses with retry logic and backoff delays.
Token Bucket allows burst traffic — you can send many requests quickly until tokens run out, then wait for replenishment. Leaky Bucket processes requests at a constant rate with no burst capacity. As a client, pacing your requests at a steady interval (Leaky Bucket style) safely works against both server-side algorithms.
Divide the total allowed rate by the number of parallel workers. If the API allows 100 requests per second and you have 5 workers, each worker can send 20 requests per second, requiring a 50ms delay between requests per worker. This calculator does this math automatically.
Stripe allows approximately 100 requests per second in live mode for most endpoints. Some endpoints have lower limits. For safe bulk operations like data migrations, use a delay of at least 40ms between requests to stay comfortably within the limit.
When you receive an HTTP 429 response, check for the Retry-After header value (in seconds). Pause your request loop for that duration before retrying. If the header is absent, implement exponential backoff: wait 1s, then 2s, then 4s between retry attempts, up to a maximum of 5-10 retries.
The most common cause is hidden concurrency — other workers, containers, or scripts sharing the same API key are consuming part of your quota. Also check if you are accounting for network latency correctly: the delay should be applied after receiving the response, not before sending the request.
OpenAI uses dual rate limits: RPM (requests per minute) and TPM (tokens per minute). Limits vary by model and pricing tier. For example, GPT-4 may allow 500 RPM and 40,000 TPM on a standard paid plan. You must stay within both limits simultaneously.
Yes. Rate limiting protects your API from abuse, prevents individual clients from consuming disproportionate resources, and ensures fair access for all users. Common implementations use Redis-backed token buckets for distributed systems or in-memory counters for single-server setups.
Divide your total records by the requests per second rate. For example, 1,000,000 records at 10 requests per second = 100,000 seconds ≈ 27.8 hours. This calculator shows the daily maximum throughput, which you can use to estimate migration duration.
This calculator is designed for HTTP REST API rate limits. WebSocket connections have different throttling mechanisms based on message frequency rather than request count. However, if your WebSocket provider specifies a messages-per-second limit, you can use this calculator to determine the appropriate delay between messages.