Handling 429 Too Many Requests from Push Services

Q: What should I use as the default delay when Retry-After is missing?

Use at minimum 15 seconds as a floor before applying your exponential backoff multiplier. For signals that suggest per-sender quota exhaustion affecting all endpoints to a service, use 60 seconds as the floor. Log every case where Retry-After is absent at WARN level so you can track which service variants omit the header and calibrate defaults accordingly.

Q: Should a single FCM 429 pause all FCM dispatches or only the failing subscription?

It depends on the 429 type. A per-subscription FCM 429 should pause only that subscription. A per-sender-ID 429 must pause all FCM dispatches for the duration of Retry-After. If more than 5% of FCM sends in a 10-second window return 429, treat it as a per-sender event and activate the global FCM pause flag.

Q: Can exponential backoff cause messages to expire before delivery?

Yes. A message with a 1-hour TTL that encounters a 429 with Retry-After: 3600 will expire before the retry fires. Set message TTLs longer than your maximum backoff window, check ttl_remaining > delay before re-enqueueing, and reduce upstream burst rate to avoid multi-hour Retry-After windows.

A 429 from a browser push service means the upstream gateway has refused the request at the protocol level — dispatching again immediately will compound the violation, burn quota faster, and potentially trigger a temporary endpoint suspension.

Quick Answer

Parse the Retry-After header from the 429 response — it may be an integer (seconds) or an HTTP-date string. Convert it to an absolute epoch timestamp, add full-jitter backoff capped at your maximum delay, then re-enqueue the job with that calculated delay and increment the attempt counter. Never synchronously re-dispatch on a 429; the response is authoritative and the service expects silence until the indicated window expires.

Worker sends Service refuses seconds or date BullMQ / SQS after window

429 Response Handling Pipeline

No sync retry Retry-After + jitter ≤ max_delay

All push service 429s must flow through the delay queue — never directly back to dispatch.

The 429 handling pipeline: a refused dispatch is never synchronously retried. The Retry-After header drives delay calculation before the job re-enters the queue.

What a 429 Means in the Web Push Context

RFC 8030 §5 defines flow control for HTTP-based push subscriptions. A push service is permitted to return 429 Too Many Requests when a sender exceeds the request rate it is willing to accept for a given subscription, sender identity (VAPID subject), or global connection pool. The response signals active throttling — not a transient network fault.

This is distinct from a 503 Service Unavailable, which indicates upstream capacity degradation. A 429 means the service is healthy but the sender is being rate-limited by policy. Treating them identically (as some naive backoff implementations do) under-counts 429s in your metrics and can mask per-sender quota violations that compound over time.

Push services operate per-subscription quotas, per-sender-ID quotas, and global per-IP quotas simultaneously. A single 429 may be signaling exhaustion of any one of those layers. The Retry-After value, if present, reflects when that specific layer is expected to recover. It does not guarantee capacity will be available immediately after the window — it is a minimum wait, not a reservation.

Within the broader Backend Delivery Architecture & Queue Management model, 429 handling belongs in the retry tier, not the dispatch tier. The dispatch worker should detect the 429, hand off to the retry scheduler, and release the worker thread immediately.

Parsing the Retry-After Header

The Retry-After header is defined in RFC 9110 §10.2.4 and can appear in two formats:

Numeric (seconds delay):

Retry-After: 30

HTTP-date (absolute timestamp):

Retry-After: Fri, 20 Jun 2026 14:35:00 GMT

Both forms are used in practice. FCM tends to return numeric values. Mozilla Autopush may return either. Any production parser must handle both without assumptions.

When the header is absent — which happens on some 429s from intermediary proxies or misconfigured CDN layers — fall back to a conservative default. A floor of 5 seconds is too aggressive for a rate-limited service; use at least 10–15 seconds as the default before adding jitter, and apply your exponential multiplier on top.

/**
 * Parse the Retry-After header from a push service 429 response.
 * Returns the delay in milliseconds to wait before the next attempt.
 *
 * @param headers  - The Headers object from the fetch() Response.
 * @param fallbackMs - Default delay (ms) when Retry-After is absent. Default: 15_000.
 * @returns Milliseconds to delay before re-dispatch.
 */
export function parseRetryAfter(
  headers: Headers,
  fallbackMs = 15_000
): number {
  const raw = headers.get('Retry-After');

  if (!raw) {
    return fallbackMs;
  }

  const trimmed = raw.trim();

  // Numeric form: "Retry-After: 30"
  if (/^\d+$/.test(trimmed)) {
    const seconds = parseInt(trimmed, 10);
    if (Number.isFinite(seconds) && seconds >= 0) {
      return seconds * 1_000;
    }
    return fallbackMs;
  }

  // HTTP-date form: "Retry-After: Fri, 20 Jun 2026 14:35:00 GMT"
  const parsed = Date.parse(trimmed);
  if (!Number.isNaN(parsed)) {
    const delayMs = parsed - Date.now();
    // Negative means the date is in the past; treat as immediate but still jitter
    return Math.max(0, delayMs);
  }

  return fallbackMs;
}

The function is intentionally defensive: malformed values (e.g. a header containing a locale-specific date that Date.parse cannot handle) silently fall back to the safe default rather than throwing. Log malformed values separately so you can detect if a specific push service endpoint is returning garbage.

Exponential Backoff with Full Jitter

The base formula for exponential backoff is:

exponential_cap = min(MAX_DELAY_MS, BASE_DELAY_MS × MULTIPLIER^attempt)
actual_delay    = random(0, exponential_cap)

Full jitter — drawing uniformly from [0, exponential_cap] — is the correct choice for distributed systems because it eliminates synchronized retry spikes across workers. See Implementing exponential backoff for failed push deliveries for the full algorithmic breakdown and Python/TypeScript implementations.

For 429-specific handling, the Retry-After value must act as a floor, not a replacement for backoff. The calculated delay is:

delay = max(parseRetryAfter(headers), exponential_cap_with_jitter)
delay = min(delay, MAX_DELAY_MS)

This ensures you always respect the service’s stated minimum wait while still spreading retries across your worker pool via jitter.

The following BullMQ handler integrates parseRetryAfter with this formula and re-enqueues the job with the computed delay:

import { Queue, Job } from 'bullmq';
import { randomInt } from 'crypto';
import { parseRetryAfter } from './parseRetryAfter';

const PUSH_RETRY_QUEUE = new Queue('push-retry', {
  connection: { host: process.env.REDIS_HOST ?? 'localhost', port: 6379 },
});

const BACKOFF = {
  BASE_DELAY_MS:  2_000,
  MULTIPLIER:     2.0,
  MAX_DELAY_MS:   300_000, // 5 minutes hard cap
  MAX_ATTEMPTS:   6,
  FALLBACK_MS:    15_000,
};

interface PushJobData {
  subscriptionEndpoint: string;
  encryptedPayload:     string;
  vapidSubject:         string;
  attempt:              number;
  originalTimestamp:    number;
  ttlSeconds:           number;
}

/**
 * Handle a 429 response from a push service.
 * Calculates the delay, validates TTL, and re-enqueues the job.
 */
export async function handle429AndRequeue(
  job: Job<PushJobData>,
  response: Response
): Promise<void> {
  const data = job.data;
  const attempt = data.attempt ?? 0;

  // Hard stop: too many attempts regardless of 429s
  if (attempt >= BACKOFF.MAX_ATTEMPTS) {
    await PUSH_RETRY_QUEUE.add('push-dlq', {
      ...data,
      reason: 'max_attempts_exceeded',
      finalStatusCode: 429,
    });
    return;
  }

  // 1. Parse the service-mandated minimum wait
  const retryAfterMs = parseRetryAfter(response.headers, BACKOFF.FALLBACK_MS);

  // 2. Calculate exponential cap for this attempt
  const exponentialCap = Math.min(
    BACKOFF.MAX_DELAY_MS,
    BACKOFF.BASE_DELAY_MS * Math.pow(BACKOFF.MULTIPLIER, attempt)
  );

  // 3. Full jitter within the exponential cap
  const jitteredDelay = randomInt(0, Math.max(1, exponentialCap));

  // 4. Retry-After is a floor; take whichever is larger, then clamp to max
  const delayMs = Math.min(
    BACKOFF.MAX_DELAY_MS,
    Math.max(retryAfterMs, jitteredDelay)
  );

  // 5. Validate remaining TTL before re-enqueueing
  const elapsedMs       = Date.now() - data.originalTimestamp;
  const ttlRemainingMs  = data.ttlSeconds * 1_000 - elapsedMs;

  if (ttlRemainingMs <= 0 || delayMs >= ttlRemainingMs) {
    // Payload will be stale by the time we can re-dispatch
    await PUSH_RETRY_QUEUE.add('push-dlq', {
      ...data,
      reason: 'ttl_expired_during_backoff',
      delayMs,
      ttlRemainingMs,
    });
    return;
  }

  // 6. Re-enqueue with computed delay and incremented attempt counter
  await PUSH_RETRY_QUEUE.add(
    'push-delivery',
    { ...data, attempt: attempt + 1 },
    { delay: delayMs }
  );
}

The attempt counter increments on every 429, not just on non-429 errors. If a service sustains rate limiting across multiple windows it will still eventually route to the DLQ rather than looping indefinitely.

Per-Service Rate Limits: FCM vs Mozilla Autopush

Firebase Cloud Messaging (FCM)

FCM’s Web Push endpoint (https://fcm.googleapis.com/fcm/send for legacy, or the VAPID endpoint at https://fcm.googleapis.com/wp/... for RFC 8030) enforces limits at multiple layers:

Per-sender-ID (VAPID subject) quota: FCM enforces a per-project hourly message quota. Exceeding it returns 429 with a Retry-After in the range of 60–3600 seconds depending on the severity of the overage.
Per-subscription burst limit: Rapid successive sends to the same fcm.googleapis.com subscription endpoint can trigger a per-endpoint 429 in addition to the per-sender limit.
Concurrent connection limits: FCM HTTP/2 connections are subject to per-IP stream concurrency limits. Saturating these from a single server IP results in 429 at the connection level before the message is even evaluated.

FCM’s 429 responses include Retry-After as an integer seconds value. The header is generally reliable. When you receive a per-sender 429, the correct behavior is to pause all dispatches to FCM, not just the failing subscription — the quota applies globally across your sender ID.

Implement a global FCM pause flag in Redis: when a per-sender 429 is detected, set fcm:rate_limited with an expiry equal to Retry-After. Worker dispatch loops should check this flag before attempting any FCM send. This is queue-level throttling at the service layer.

Mozilla Autopush

Mozilla Autopush (https://updates.push.services.mozilla.com/) enforces per-subscription quotas rather than per-sender quotas. Each subscription endpoint has an independent message counter that resets on a rolling window.

Key behavioral differences from FCM:

Per-subscription, not per-sender: You can be rate-limited on subscription A while subscription B accepts traffic normally. This means a single 429 does not require pausing all Mozilla dispatches.
Quota is message-count based, not bandwidth based. Autopush tracks the number of pending undelivered messages per subscription. If a device has been offline long enough that its queue is full, further sends to that endpoint return 429.
Retry-After is less consistently present. When absent, use a 60-second default for Mozilla endpoints specifically, since their quota windows are typically minute-scale.
429 vs 413: Mozilla Autopush returns 413 for payloads exceeding 4 KB, not 429. Do not conflate them.

For delivery tracking that integrates with Autopush acknowledgment behavior, see Delivery Tracking & Acknowledgment.

Queue-Level Throttling: Token Bucket and Adaptive Pacing

Individual job-level backoff is necessary but not sufficient. Without queue-level throttling, a burst of 10,000 messages that all arrive simultaneously will still spike your outbound rate, trigger 429s en masse, and fill your retry queue with jobs that all have similar delay timestamps — reproducing the thundering herd problem at the queue level.

Token Bucket

A token bucket allows bursts up to a defined capacity while enforcing a steady-state rate. The bucket refills at a constant rate (tokens per second); each dispatch attempt consumes one token. When the bucket is empty, the worker waits rather than dispatching.

bucket_capacity = 500 tokens
refill_rate     = 100 tokens/second
cost_per_send   = 1 token

Implement the token bucket in Redis using the standard Lua atomic decrement pattern to ensure correctness across distributed workers. The Message Batching & Throughput Optimization guide covers batch-level rate control that pairs directly with this approach.

Adaptive Pacing

Static token buckets cannot respond to signals from the push service. Adaptive pacing adjusts the effective dispatch rate in response to observed 429 rates:

Track a rolling 60-second 429 rate: rate_429 = count_429 / total_attempts.
If rate_429 > 0.05 (5% of attempts), reduce dispatch concurrency by 25%.
If rate_429 > 0.20, halt new dispatches for the current service and trigger the circuit breaker.
If rate_429 < 0.01 and the service has been stable for 120 seconds, restore concurrency incrementally (10% per 30-second window).

Adaptive pacing requires per-service metric counters. Segment counters by fcm and autopush (and any other service) to avoid cross-service interference. A 429 spike on FCM should not throttle Mozilla dispatches.

Diagnostic Steps for 429s in Production

Confirm the 429 layer. Log the full response headers including X-Request-ID or vendor-specific trace headers. Distinguish between per-subscription 429s and per-sender 429s by checking whether the 429 affects all endpoints to the same service or only a subset.
Audit your dispatch rate. Query your outbound HTTP metrics for the time window preceding the 429 spike. Identify if a scheduled campaign, a batch import of new subscriptions, or a queue flush after a downtime event caused a sudden rate surge.
Check your Retry-After parsing logs. Verify that your parser is correctly handling the header format returned by the specific service. FCM returns numeric seconds; confirm your parser is not interpreting that as an HTTP-date.
Verify the global pause flag is respected. For per-sender 429s on FCM, confirm that all dispatch workers have observed the Redis pause flag and have stopped sending. A single misbehaving worker that ignores the flag will continue triggering 429s and reset the quota window.
Inspect TTL expiry during backoff. If push-dlq is filling with ttl_expired_during_backoff reasons, your Retry-After values are exceeding your message TTLs. Either reduce message TTLs to avoid sending stale notifications, increase TTLs at dispatch time for messages that warrant longer retry windows, or reduce the burst rate to avoid hitting the 429 in the first place.
Review VAPID credentials. A 429 with Retry-After: 0 or with a WWW-Authenticate header present may indicate an authentication failure being proxied as a rate limit by some gateway layers. Validate your VAPID JWT expiry and sub claim format before assuming it is a quota issue.

Gotchas and Edge Cases

Missing Retry-After is common, not exceptional. Both FCM and Mozilla Autopush omit Retry-After on some 429 variants (particularly intermediate gateway 429s). Your fallback default must be conservative — 15 seconds minimum for per-subscription 429s, 60 seconds for signals that suggest per-sender exhaustion. Treat an absent Retry-After as a signal to be cautious, not an invitation to retry quickly.
Per-subscription vs per-sender quota are additive, not exclusive. A send can trigger a per-subscription 429 even when your per-sender quota is healthy. Conversely, a per-sender 429 blocks all sends regardless of individual subscription health. Route these to different retry pools with different pause durations if your monitoring can distinguish the cause.
Campaign bursts overwhelm static token buckets. If you dispatch a broadcast to 500,000 subscriptions inside a 60-second window, even a token bucket set at 100/second will queue 494,000 messages. The last messages enqueued may arrive after the initial 429 storm has already triggered longer Retry-After windows for the earlier messages. Pre-flight large broadcasts through a rate estimator that computes expected queue drain time and checks it against your message TTLs before dispatch begins.
TTL expiry during extended backoff. A Retry-After: 3600 from a severely quota-exceeded FCM sender ID means the retried message will be 1+ hours old. If the original message TTL was 3600 seconds (a common default), the message expires precisely as the retry window opens. Always set TTL ≥ expected maximum backoff window for messages that must survive quota exhaustion events.
429 from VAPID authentication failure vs endpoint quota. Some intermediary layers (Cloudflare, CDN WAF rules) return 429 when they detect anomalous VAPID JWT patterns — high signing rates from a single key, malformed aud claims, or expired exp values. These 429s will not include a meaningful Retry-After and will not resolve with backoff alone. Distinguish them by checking for WWW-Authenticate headers or vendor-specific error bodies before routing into the standard retry pipeline.

Implementing Exponential Backoff for Failed Push Deliveries — Full algorithm walkthrough with TTL-aware scheduling and DLQ routing for all retryable status codes.
Message Batching & Throughput Optimization — Batch dispatch patterns and concurrency tuning to reduce the likelihood of hitting service rate limits.
Delivery Tracking & Acknowledgment — Correlate 429-delayed deliveries with acknowledgment records to detect stale notifications before they re-dispatch.
Backend Delivery Architecture & Queue Management — System-level queue design, worker topology, and state machine modeling for reliable push delivery at scale.

Back to Retry Logic & Backoff Strategies

FAQ

What should I use as the default delay when Retry-After is missing?

Use at minimum 15 seconds as a floor before applying your exponential backoff multiplier. For signals that suggest per-sender quota exhaustion (affecting all endpoints to a service, not just one), use 60 seconds as the floor. Do not use values below 5 seconds — they are too aggressive for a push service that is actively rate limiting you, and a rapid retry will either trigger another 429 or waste quota that you need for legitimate sends. Log every case where Retry-After is absent at WARN level so you can track which service variants omit the header and calibrate your defaults accordingly.

Should a single FCM 429 pause all FCM dispatches or only the failing subscription?

It depends on the 429 type. A per-subscription FCM 429 (where only that subscription endpoint is rate limited) should pause only that subscription — continue dispatching to other FCM endpoints normally. A per-sender-ID 429 (your VAPID subject or FCM project quota is exhausted) must pause all FCM dispatches for the duration of Retry-After. Distinguish them by checking whether the 429 is isolated to a single endpoint or is firing across multiple different subscription endpoints in the same time window. In practice, if more than 5% of FCM sends in a 10-second window return 429, treat it as a per-sender event and activate the global FCM pause flag.

Can exponential backoff cause messages to expire before delivery?

Yes. A message with a 1-hour TTL that encounters a 429 with Retry-After: 3600 will expire before the retry fires. The solution is threefold: first, set message TTLs at dispatch time to be meaningfully longer than your maximum backoff window for messages that need retry resilience. Second, check ttl_remaining > delay before re-enqueueing and route to the DLQ with a ttl_expired_during_backoff reason if not — this prevents a zombie retry that will deliver a stale notification or be silently dropped by the push service. Third, reduce the upstream burst rate so you are less likely to exhaust per-sender quotas and trigger multi-hour Retry-After windows in the first place. For time-critical messages (2FA codes, payment confirmations), cap MAX_DELAY_MS at a value well below their TTL and route to DLQ immediately rather than waiting out an extended backoff.