Backend Delivery Architecture & Queue Management

This guide is the architectural reference for the server side of web push: how a trigger event becomes an encrypted, signed, rate-limited HTTP request to a browser push service, and how you track, retry, and expire that request at scale. Everything here builds on the same RFC 8030 transport that the Core Protocols & Browser Implementation guide covers from the client side.

Backend web push delivery pipeline A trigger event enters an API gateway, is normalized and queued in a broker, drained by rate-limited workers that VAPID-sign and aes128gcm-encrypt each payload, dispatched to the browser push service, and tracked through acknowledgment, retry, and TTL expiration stages. Trigger Event API gateway Message Broker Redis / RabbitMQ Dispatch Workers rate-limit + sign aes128gcm encrypt VAPID JWT Push Service FCM / Autopush / APNs Browser SW showNotification Ack / Retry / TTL delivery ledger
The backend delivery path: ingest and normalize, queue, drain through rate-limited workers that sign and encrypt, dispatch to the push service, then reconcile acknowledgment, retry, and TTL state.

Core Delivery Pipeline Architecture

Modern web push delivery requires strict decoupling of ingestion and dispatch. An API gateway accepts trigger events, normalizes payloads, and routes them to a durable message broker. Broker selection depends on throughput and ordering guarantees. RabbitMQ handles complex routing, Kafka streams high-volume events, and AWS SQS provides managed durability. Decoupling prevents cascading failures during traffic spikes.

Payload normalization strips non-essential metadata before encryption. Implementing Message Batching & Throughput Optimization reduces HTTP round-trips to push providers. Grouping payloads by endpoint and tenant improves dispatch velocity while maintaining strict isolation boundaries.

Secure payload preparation requires VAPID authentication and end-to-end encryption. Every payload is capped at a 4 KB ciphertext limit and must use the aes128gcm content-encoding mandated by RFC 8291 — the mechanics are detailed in the Push API Payload Encryption guide. In practice, use a standards-compliant library such as web-push (Node.js) or pywebpush (Python) that implements this encoding correctly, and sign each request with a VAPID key pair loaded from process.env.VAPID_PRIVATE_KEY — never hardcode keys in source. The following example shows TTL validation, which is a simpler but production-critical concern:

import { z } from 'zod';

// FCM and Web Push Protocol (RFC 8030) cap TTL at 2419200 s (28 days).
// APNs uses apns-expiration (Unix timestamp), but per-message max is also 30 days.
const ProviderLimits = {
  fcm:      { maxTTL: 2419200, defaultTTL: 86400 },
  webpush:  { maxTTL: 2419200, defaultTTL: 86400 },
} as const;

export const validateTTL = (
  provider: keyof typeof ProviderLimits,
  requestedTTL?: number
): number => {
  const limits = ProviderLimits[provider];
  const ttl = requestedTTL ?? limits.defaultTTL;
  if (ttl < 0 || ttl > limits.maxTTL) {
    throw new Error(
      `Invalid TTL for ${provider}: must be 0–${limits.maxTTL} seconds`
    );
  }
  return ttl;
};

Always prune invalid subscription endpoints during validation. Retaining dead endpoints increases dispatch latency and violates data minimization principles under GDPR.

The ingestion stage should also enforce idempotency at the boundary. Campaign APIs frequently fire the same trigger twice — a retried webhook, a double-clicked send button, a replayed event from an at-least-once stream. Deduplicate at the gateway using a deterministic key derived from the campaign and the recipient, and reject the duplicate before it ever reaches the broker. This keeps the queue clean and prevents user-facing notification fatigue downstream. The broker itself should be treated as the durable system of record between ingestion and dispatch: once an event is acknowledged into the queue, the gateway can return 202 Accepted to the caller, and all subsequent work — encryption, signing, rate limiting, retrying — happens asynchronously against that durable record.

Message Lifecycle & TTL Management

Time-sensitive notifications require strict lifecycle controls. Each message carries a metadata schema defining priority, expiration windows, and compliance tags. Providers enforce TTL limits via the TTL HTTP header (RFC 8030). FCM and the Web Push Protocol both accept up to 2,419,200 seconds (28 days). APNs uses the apns-expiration Unix timestamp header instead of a duration.

Ignoring provider-specific TTL defaults leads to stale message delivery. Implementing TTL & Expiration Handling ensures expired payloads are purged before consuming queue resources. Middleware should validate and clamp TTL values based on provider constraints before dispatch.

TTL is not a single setting but a value that must be mirrored across three layers: the broker eviction policy, the TTL HTTP header, and a final freshness check inside the service worker. When these disagree, you get phantom deliveries — a message the push service would have dropped but the broker kept, or vice versa. The cleanest model assigns a TTL tier per notification class at ingestion (seconds for one-time passcodes, hours for system alerts, a day or more for promotional campaigns) and carries that value through every subsequent stage as immutable metadata. The trade-off between the two extremes of that range is explored in TTL 0 vs TTL 86400 delivery guarantees.

Reliability Patterns & Error Recovery

Network partitions and provider outages are inevitable. Dispatch pipelines must implement circuit breakers and dead-letter queues to isolate failures. Transient errors require automated recovery, but aggressive retries trigger provider bans.

Hardcoded retry intervals cause thundering herd effects. Applying Retry Logic & Backoff Strategies introduces jitter and caps maximum attempts. This stabilizes queue depth and prevents cascading overload during recovery windows.

export function calculateBackoff(
  attempt: number,
  baseDelayMs: number,
  maxDelayMs: number
): number {
  const exponential = Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs);
  const jitter = Math.random() * (exponential * 0.3);
  return Math.round(exponential + jitter);
}

Idempotency guarantees are non-negotiable. Every dispatch must carry a unique correlation ID. Duplicate processing during retries causes user fatigue and compliance violations. Always verify message state before re-queuing.

Failure classification is the heart of a reliable retry layer. Not every non-2xx response should be retried: 400, 401, 403, 404, and 410 are terminal and retrying them only wastes compute and risks reputation damage, while 429 and the 5xx family are transient and recoverable. A 410 Gone is special — it is the push service telling you the subscription is permanently dead, which should immediately flip the subscription’s status and halt all further attempts; the cleanup pattern at scale is detailed in Handling 410 Gone responses at scale. Pair classification with per-region circuit breakers so a single degraded push gateway cannot drag the whole pipeline into a retry storm.

Provider Constraints & Flow Control

Push providers enforce strict rate limits and concurrency quotas. FCM restricts concurrent connections, APNs limits HTTP/2 streams, and Web Push endpoints vary by browser vendor. Exceeding thresholds results in HTTP 429 or 503 responses, and the way you absorb a sustained 429 directly determines whether a campaign completes or stalls — see Handling 429 Too Many Requests from push services.

Unbounded queue growth during campaign bursts requires proactive backpressure. A token-bucket rate limiter maintains dispatch velocity within safe boundaries while adaptive pacing dynamically adjusts worker concurrency based on real-time provider feedback.

export class TokenBucketRateLimiter {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private readonly capacity: number,
    private readonly refillRate: number // tokens per second
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  consume(count: number = 1): boolean {
    this.refill();
    if (this.tokens >= count) {
      this.tokens -= count;
      return true;
    }
    return false;
  }

  private refill(): void {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }
}

Monitor queue depth and dispatch latency continuously. When provider quotas tighten, route lower-priority messages to delayed windows or batch them for off-peak delivery.

Observability & Delivery State Tracking

Async delivery requires deterministic state reconciliation. Implement webhook listeners to parse provider responses. Map HTTP status codes and provider-specific error payloads to internal state machines. Track sent, delivered, clicked, and failed states using immutable correlation IDs.

Reconciling asynchronous callbacks with internal databases introduces race conditions. Applying Delivery Tracking & Acknowledgment ensures idempotent state transitions and accurate delivery attribution. Audit logs must capture telemetry without exposing sensitive user identifiers.

import { createClient } from 'redis';

export async function processIdempotentDispatch(
  redis: ReturnType<typeof createClient>,
  messageId: string,
  dispatchFn: () => Promise<void>
): Promise<'processed' | 'duplicate'> {
  const lockKey = `dispatch:lock:${messageId}`;
  const acquired = await redis.set(lockKey, '1', { NX: true, EX: 30 });
  if (!acquired) return 'duplicate';

  try {
    await dispatchFn();
    await redis.set(`dispatch:status:${messageId}`, 'sent', { EX: 604800 });
    return 'processed';
  } catch (error) {
    await redis.del(lockKey);
    throw error;
  }
}

GDPR and CCPA compliance mandate strict data minimization. Never log raw payloads, PII, or full subscription endpoints in DLQs or observability platforms. Hash identifiers and mask sensitive fields before telemetry ingestion.

The distinction worth internalizing here is between acknowledgment and delivery. The synchronous HTTP response from the push service confirms only that the encrypted payload was accepted into its queue — it says nothing about whether the device woke, decrypted the message, or showed it. True display confirmation arrives asynchronously, if at all, through events forwarded from the service worker back to your analytics endpoint. Treating the 201 as proof of receipt is the single most common attribution error in push systems, and it inflates “delivered” numbers that the Delivery analytics instrumentation guide shows how to separate into honest delivery-versus-display rates.

Scaling for Enterprise & Multi-Tenant Workloads

High-volume campaigns demand architectural isolation. Multi-tenant environments require strict data boundaries, sharded queues, and tenant-aware worker pools. Horizontal partitioning by region or tenant ID prevents noisy-neighbor degradation.

Cross-region failover requires active-passive or active-active queue replication. Dynamic worker scaling and cost-aware routing reduce latency and help comply with data residency mandates. Route traffic to the nearest provider edge. For infrastructure-level queue scaling details, see Scaling push queues with Redis or RabbitMQ.

Broker choice shapes the operational character of the whole system. Redis with a sorted-set delay queue is the lightest path to scheduled retries and TTL-bound jobs, and it doubles as the idempotency and rate-limiter store, which keeps the moving-part count low. RabbitMQ earns its weight when you need per-tenant exchanges, complex routing keys, and native dead-lettering without hand-rolling it. Kafka fits when the push event itself is part of a larger replayable stream and you want consumer groups to fan the same events into delivery, analytics, and CRM sync independently. Whatever the broker, the dispatch workers themselves stay stateless: they pull a durable job, load the VAPID keys from the secrets manager at runtime, encrypt and sign, and report the outcome back to the ledger. Statelessness is what makes horizontal scaling and spot-instance reclamation safe.

Concurrency control is the other half of scaling. Because Web Push (RFC 8030) is a per-endpoint protocol — one HTTP POST per subscription — “throughput” is really a question of how many of those POSTs you keep in flight without tripping a provider’s rate limit. A semaphore per worker, HTTP/2 connection reuse per push-service origin, and provider-aware pacing are covered end to end in Message Batching & Throughput Optimization, with the empirical sweet-spot tuning in Optimal batch size for web push throughput.

Monitor cost per delivery and optimize routing logic. Use spot instances for non-critical batch processing. Maintain strict SLAs by isolating transactional alerts from marketing campaigns. Regularly audit queue retention policies and purge stale subscription data to maintain compliance and operational efficiency. The campaign-side counterpart to this infrastructure — segmentation, A/B testing, and engagement analytics — lives in the Notification Engagement & Campaign Optimization guide.

Subscription & Delivery Data Model

The delivery pipeline is only as reliable as the storage model behind it. Two tables anchor the system: a subscriptions table that holds the endpoint plus its p256dh/auth keys, and an append-only delivery_log keyed by correlation ID. Store the endpoint hash, not the raw endpoint, anywhere it joins to analytics.

CREATE TABLE subscriptions (
  id              BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  tenant_id       BIGINT NOT NULL,
  endpoint        TEXT NOT NULL UNIQUE,
  endpoint_hash   BYTEA NOT NULL,           -- HMAC-SHA256(endpoint, rotating_key)
  p256dh          TEXT NOT NULL,
  auth            TEXT NOT NULL,
  vapid_key_id    TEXT NOT NULL,            -- which VAPID key signed the most recent send
  status          TEXT NOT NULL DEFAULT 'active', -- active | gone | expired
  created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
  last_seen_at    TIMESTAMPTZ
);

CREATE TABLE delivery_log (
  correlation_id  UUID PRIMARY KEY,
  subscription_id BIGINT NOT NULL REFERENCES subscriptions(id),
  state           TEXT NOT NULL,            -- queued | sent | retrying | gone | expired | failed
  http_status     SMALLINT,
  ttl_seconds     INTEGER NOT NULL,
  attempt         SMALLINT NOT NULL DEFAULT 0,
  dispatched_at   TIMESTAMPTZ,
  updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_delivery_state ON delivery_log (state, updated_at);

State transitions in delivery_log must be idempotent — a retried dispatch reuses the same correlation_id rather than inserting a new row, which is the contract enforced by the Delivery Tracking & Acknowledgment layer. A 410 Gone flips subscriptions.status to gone and stops all further retries.

Error Taxonomy

Push services communicate delivery outcomes through HTTP status codes. Mapping each code to a single deterministic action is the difference between a self-healing pipeline and a retry storm.

Status Meaning Cause Resolution
400 Bad Request Malformed request Bad headers, invalid TTL, broken encryption framing Drop, alert engineering; never retry
401 Unauthorized VAPID auth failed Expired JWT or wrong signing key Re-sign with current process.env.VAPID_PRIVATE_KEY; check key rotation
403 Forbidden VAPID identity mismatch aud/sub claim or key does not match subscription Verify the key pair that created the subscription is still in use
404 Not Found Unknown endpoint Subscription never existed or was purged upstream Flag for pruning; do not retry
410 Gone Endpoint permanently dead User unsubscribed or browser expired it Delete subscription immediately — see the 410 handling guide
413 Payload Too Large Ciphertext over 4 KB Payload exceeds the RFC 8291 limit Shrink payload below 4 KB; send IDs, fetch content client-side
429 Too Many Requests Rate limited Concurrency or per-origin quota exceeded Honor Retry-After, apply backoff
502/503/504 Transient outage Push service degradation Retry with exponential backoff + jitter, capped attempts

Compliance, Transport & Privacy

Web push is HTTPS-only end to end: the service worker that receives the message must be served over a secure context, and your dispatch endpoints must reject any non-HTTPS subscription URL during validation (this also doubles as SSRF protection). Under GDPR and CCPA, the legal basis for sending is explicit opt-in, so log consent — timestamp, version of the prompt copy, and the action taken — at the moment of subscription rather than reconstructing it later. The UX side of capturing and honoring that consent is covered in the Frontend Permission UX & Subscription Flows guide.

Apply data minimization everywhere telemetry flows: never write raw payloads, PII, or full endpoints to dead-letter queues, observability platforms, or logs. Hash endpoints with a rotating HMAC key before they leave the dispatch boundary, and scrub payload bodies from aggregation pipelines once a message passes its TTL. A tight Content-Security-Policy on the pages that register the service worker reduces the blast radius of any injected script that could tamper with subscriptions.

Back to Web Push Notifications

FAQ

Does a 201 response mean the user received the notification?

No. A 201 Created only confirms the push service accepted the encrypted payload into its queue. It does not guarantee device wake, network delivery, or that the user saw anything. Actual display is observable only through client-side events forwarded from the service worker, which the Delivery Tracking & Acknowledgment layer reconciles.

Should I use Redis or RabbitMQ for the push queue?

Redis (with streams or a sorted-set delay queue) is simplest for delayed retries and TTL-bound jobs and scales well for fan-out. RabbitMQ shines when you need complex routing, per-tenant exchanges, and built-in dead-lettering. Kafka fits very high-volume, replayable event streams. The full trade-off is in Scaling push queues with Redis or RabbitMQ.

What is the maximum payload size for a web push message?

The encrypted payload must not exceed 4 KB (4096 bytes of ciphertext) and must use the aes128gcm content-encoding per RFC 8291. Exceeding it returns 413 Payload Too Large. Send identifiers and fetch full content in the service worker rather than packing it into the payload.

How many times should I retry a failed push?

Cap retries at 3–5 attempts within a wall-clock window that respects the message TTL. Only retry transient failures (429, 5xx); treat 400, 401, 404, and 410 as terminal. Use exponential backoff with jitter to avoid thundering-herd recovery, as detailed in Retry Logic & Backoff Strategies.

Where should the VAPID private key live?

In a secrets manager, injected at runtime as process.env.VAPID_PRIVATE_KEY. It must never appear hardcoded in server-side source or be committed to version control. The public key is shared with the browser during subscription; rotation is covered in VAPID key generation & rotation.

How do I keep a multi-tenant pipeline from letting one tenant starve others?

Shard queues per tenant or region, give each tenant its own worker pool budget with weighted fair queuing, and apply per-tenant token-bucket rate limiting. Isolate transactional alerts from marketing campaigns so a promotional burst can never delay a 2FA code.