Backend Delivery Architecture & Queue Management
This guide is the architectural reference for the server side of web push: how a trigger event becomes an encrypted, signed, rate-limited HTTP request to a browser push service, and how you track, retry, and expire that request at scale. Everything here builds on the same RFC 8030 transport that the Core Protocols & Browser Implementation guide covers from the client side.
Core Delivery Pipeline Architecture
Modern web push delivery requires strict decoupling of ingestion and dispatch. An API gateway accepts trigger events, normalizes payloads, and routes them to a durable message broker. Broker selection depends on throughput and ordering guarantees. RabbitMQ handles complex routing, Kafka streams high-volume events, and AWS SQS provides managed durability. Decoupling prevents cascading failures during traffic spikes.
Payload normalization strips non-essential metadata before encryption. Implementing Message Batching & Throughput Optimization reduces HTTP round-trips to push providers. Grouping payloads by endpoint and tenant improves dispatch velocity while maintaining strict isolation boundaries.
Secure payload preparation requires VAPID authentication and end-to-end encryption. Every payload is capped at a 4 KB ciphertext limit and must use the aes128gcm content-encoding mandated by RFC 8291 — the mechanics are detailed in the Push API Payload Encryption guide. In practice, use a standards-compliant library such as web-push (Node.js) or pywebpush (Python) that implements this encoding correctly, and sign each request with a VAPID key pair loaded from process.env.VAPID_PRIVATE_KEY — never hardcode keys in source. The following example shows TTL validation, which is a simpler but production-critical concern:
import { z } from 'zod';
// FCM and Web Push Protocol (RFC 8030) cap TTL at 2419200 s (28 days).
// APNs uses apns-expiration (Unix timestamp), but per-message max is also 30 days.
const ProviderLimits = {
fcm: { maxTTL: 2419200, defaultTTL: 86400 },
webpush: { maxTTL: 2419200, defaultTTL: 86400 },
} as const;
export const validateTTL = (
provider: keyof typeof ProviderLimits,
requestedTTL?: number
): number => {
const limits = ProviderLimits[provider];
const ttl = requestedTTL ?? limits.defaultTTL;
if (ttl < 0 || ttl > limits.maxTTL) {
throw new Error(
`Invalid TTL for ${provider}: must be 0–${limits.maxTTL} seconds`
);
}
return ttl;
};
Always prune invalid subscription endpoints during validation. Retaining dead endpoints increases dispatch latency and violates data minimization principles under GDPR.
The ingestion stage should also enforce idempotency at the boundary. Campaign APIs frequently fire the same trigger twice — a retried webhook, a double-clicked send button, a replayed event from an at-least-once stream. Deduplicate at the gateway using a deterministic key derived from the campaign and the recipient, and reject the duplicate before it ever reaches the broker. This keeps the queue clean and prevents user-facing notification fatigue downstream. The broker itself should be treated as the durable system of record between ingestion and dispatch: once an event is acknowledged into the queue, the gateway can return 202 Accepted to the caller, and all subsequent work — encryption, signing, rate limiting, retrying — happens asynchronously against that durable record.
Message Lifecycle & TTL Management
Time-sensitive notifications require strict lifecycle controls. Each message carries a metadata schema defining priority, expiration windows, and compliance tags. Providers enforce TTL limits via the TTL HTTP header (RFC 8030). FCM and the Web Push Protocol both accept up to 2,419,200 seconds (28 days). APNs uses the apns-expiration Unix timestamp header instead of a duration.
Ignoring provider-specific TTL defaults leads to stale message delivery. Implementing TTL & Expiration Handling ensures expired payloads are purged before consuming queue resources. Middleware should validate and clamp TTL values based on provider constraints before dispatch.
TTL is not a single setting but a value that must be mirrored across three layers: the broker eviction policy, the TTL HTTP header, and a final freshness check inside the service worker. When these disagree, you get phantom deliveries — a message the push service would have dropped but the broker kept, or vice versa. The cleanest model assigns a TTL tier per notification class at ingestion (seconds for one-time passcodes, hours for system alerts, a day or more for promotional campaigns) and carries that value through every subsequent stage as immutable metadata. The trade-off between the two extremes of that range is explored in TTL 0 vs TTL 86400 delivery guarantees.
Reliability Patterns & Error Recovery
Network partitions and provider outages are inevitable. Dispatch pipelines must implement circuit breakers and dead-letter queues to isolate failures. Transient errors require automated recovery, but aggressive retries trigger provider bans.
Hardcoded retry intervals cause thundering herd effects. Applying Retry Logic & Backoff Strategies introduces jitter and caps maximum attempts. This stabilizes queue depth and prevents cascading overload during recovery windows.
export function calculateBackoff(
attempt: number,
baseDelayMs: number,
maxDelayMs: number
): number {
const exponential = Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs);
const jitter = Math.random() * (exponential * 0.3);
return Math.round(exponential + jitter);
}
Idempotency guarantees are non-negotiable. Every dispatch must carry a unique correlation ID. Duplicate processing during retries causes user fatigue and compliance violations. Always verify message state before re-queuing.
Failure classification is the heart of a reliable retry layer. Not every non-2xx response should be retried: 400, 401, 403, 404, and 410 are terminal and retrying them only wastes compute and risks reputation damage, while 429 and the 5xx family are transient and recoverable. A 410 Gone is special — it is the push service telling you the subscription is permanently dead, which should immediately flip the subscription’s status and halt all further attempts; the cleanup pattern at scale is detailed in Handling 410 Gone responses at scale. Pair classification with per-region circuit breakers so a single degraded push gateway cannot drag the whole pipeline into a retry storm.
Provider Constraints & Flow Control
Push providers enforce strict rate limits and concurrency quotas. FCM restricts concurrent connections, APNs limits HTTP/2 streams, and Web Push endpoints vary by browser vendor. Exceeding thresholds results in HTTP 429 or 503 responses, and the way you absorb a sustained 429 directly determines whether a campaign completes or stalls — see Handling 429 Too Many Requests from push services.
Unbounded queue growth during campaign bursts requires proactive backpressure. A token-bucket rate limiter maintains dispatch velocity within safe boundaries while adaptive pacing dynamically adjusts worker concurrency based on real-time provider feedback.
export class TokenBucketRateLimiter {
private tokens: number;
private lastRefill: number;
constructor(
private readonly capacity: number,
private readonly refillRate: number // tokens per second
) {
this.tokens = capacity;
this.lastRefill = Date.now();
}
consume(count: number = 1): boolean {
this.refill();
if (this.tokens >= count) {
this.tokens -= count;
return true;
}
return false;
}
private refill(): void {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
}
Monitor queue depth and dispatch latency continuously. When provider quotas tighten, route lower-priority messages to delayed windows or batch them for off-peak delivery.
Observability & Delivery State Tracking
Async delivery requires deterministic state reconciliation. Implement webhook listeners to parse provider responses. Map HTTP status codes and provider-specific error payloads to internal state machines. Track sent, delivered, clicked, and failed states using immutable correlation IDs.
Reconciling asynchronous callbacks with internal databases introduces race conditions. Applying Delivery Tracking & Acknowledgment ensures idempotent state transitions and accurate delivery attribution. Audit logs must capture telemetry without exposing sensitive user identifiers.
import { createClient } from 'redis';
export async function processIdempotentDispatch(
redis: ReturnType<typeof createClient>,
messageId: string,
dispatchFn: () => Promise<void>
): Promise<'processed' | 'duplicate'> {
const lockKey = `dispatch:lock:${messageId}`;
const acquired = await redis.set(lockKey, '1', { NX: true, EX: 30 });
if (!acquired) return 'duplicate';
try {
await dispatchFn();
await redis.set(`dispatch:status:${messageId}`, 'sent', { EX: 604800 });
return 'processed';
} catch (error) {
await redis.del(lockKey);
throw error;
}
}
GDPR and CCPA compliance mandate strict data minimization. Never log raw payloads, PII, or full subscription endpoints in DLQs or observability platforms. Hash identifiers and mask sensitive fields before telemetry ingestion.
The distinction worth internalizing here is between acknowledgment and delivery. The synchronous HTTP response from the push service confirms only that the encrypted payload was accepted into its queue — it says nothing about whether the device woke, decrypted the message, or showed it. True display confirmation arrives asynchronously, if at all, through events forwarded from the service worker back to your analytics endpoint. Treating the 201 as proof of receipt is the single most common attribution error in push systems, and it inflates “delivered” numbers that the Delivery analytics instrumentation guide shows how to separate into honest delivery-versus-display rates.
Scaling for Enterprise & Multi-Tenant Workloads
High-volume campaigns demand architectural isolation. Multi-tenant environments require strict data boundaries, sharded queues, and tenant-aware worker pools. Horizontal partitioning by region or tenant ID prevents noisy-neighbor degradation.
Cross-region failover requires active-passive or active-active queue replication. Dynamic worker scaling and cost-aware routing reduce latency and help comply with data residency mandates. Route traffic to the nearest provider edge. For infrastructure-level queue scaling details, see Scaling push queues with Redis or RabbitMQ.
Broker choice shapes the operational character of the whole system. Redis with a sorted-set delay queue is the lightest path to scheduled retries and TTL-bound jobs, and it doubles as the idempotency and rate-limiter store, which keeps the moving-part count low. RabbitMQ earns its weight when you need per-tenant exchanges, complex routing keys, and native dead-lettering without hand-rolling it. Kafka fits when the push event itself is part of a larger replayable stream and you want consumer groups to fan the same events into delivery, analytics, and CRM sync independently. Whatever the broker, the dispatch workers themselves stay stateless: they pull a durable job, load the VAPID keys from the secrets manager at runtime, encrypt and sign, and report the outcome back to the ledger. Statelessness is what makes horizontal scaling and spot-instance reclamation safe.
Concurrency control is the other half of scaling. Because Web Push (RFC 8030) is a per-endpoint protocol — one HTTP POST per subscription — “throughput” is really a question of how many of those POSTs you keep in flight without tripping a provider’s rate limit. A semaphore per worker, HTTP/2 connection reuse per push-service origin, and provider-aware pacing are covered end to end in Message Batching & Throughput Optimization, with the empirical sweet-spot tuning in Optimal batch size for web push throughput.
Monitor cost per delivery and optimize routing logic. Use spot instances for non-critical batch processing. Maintain strict SLAs by isolating transactional alerts from marketing campaigns. Regularly audit queue retention policies and purge stale subscription data to maintain compliance and operational efficiency. The campaign-side counterpart to this infrastructure — segmentation, A/B testing, and engagement analytics — lives in the Notification Engagement & Campaign Optimization guide.
Subscription & Delivery Data Model
The delivery pipeline is only as reliable as the storage model behind it. Two tables anchor the system: a subscriptions table that holds the endpoint plus its p256dh/auth keys, and an append-only delivery_log keyed by correlation ID. Store the endpoint hash, not the raw endpoint, anywhere it joins to analytics.
CREATE TABLE subscriptions (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_id BIGINT NOT NULL,
endpoint TEXT NOT NULL UNIQUE,
endpoint_hash BYTEA NOT NULL, -- HMAC-SHA256(endpoint, rotating_key)
p256dh TEXT NOT NULL,
auth TEXT NOT NULL,
vapid_key_id TEXT NOT NULL, -- which VAPID key signed the most recent send
status TEXT NOT NULL DEFAULT 'active', -- active | gone | expired
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
last_seen_at TIMESTAMPTZ
);
CREATE TABLE delivery_log (
correlation_id UUID PRIMARY KEY,
subscription_id BIGINT NOT NULL REFERENCES subscriptions(id),
state TEXT NOT NULL, -- queued | sent | retrying | gone | expired | failed
http_status SMALLINT,
ttl_seconds INTEGER NOT NULL,
attempt SMALLINT NOT NULL DEFAULT 0,
dispatched_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_delivery_state ON delivery_log (state, updated_at);
State transitions in delivery_log must be idempotent — a retried dispatch reuses the same correlation_id rather than inserting a new row, which is the contract enforced by the Delivery Tracking & Acknowledgment layer. A 410 Gone flips subscriptions.status to gone and stops all further retries.
Error Taxonomy
Push services communicate delivery outcomes through HTTP status codes. Mapping each code to a single deterministic action is the difference between a self-healing pipeline and a retry storm.
| Status | Meaning | Cause | Resolution |
|---|---|---|---|
400 Bad Request |
Malformed request | Bad headers, invalid TTL, broken encryption framing | Drop, alert engineering; never retry |
401 Unauthorized |
VAPID auth failed | Expired JWT or wrong signing key | Re-sign with current process.env.VAPID_PRIVATE_KEY; check key rotation |
403 Forbidden |
VAPID identity mismatch | aud/sub claim or key does not match subscription |
Verify the key pair that created the subscription is still in use |
404 Not Found |
Unknown endpoint | Subscription never existed or was purged upstream | Flag for pruning; do not retry |
410 Gone |
Endpoint permanently dead | User unsubscribed or browser expired it | Delete subscription immediately — see the 410 handling guide |
413 Payload Too Large |
Ciphertext over 4 KB | Payload exceeds the RFC 8291 limit | Shrink payload below 4 KB; send IDs, fetch content client-side |
429 Too Many Requests |
Rate limited | Concurrency or per-origin quota exceeded | Honor Retry-After, apply backoff |
502/503/504 |
Transient outage | Push service degradation | Retry with exponential backoff + jitter, capped attempts |
Compliance, Transport & Privacy
Web push is HTTPS-only end to end: the service worker that receives the message must be served over a secure context, and your dispatch endpoints must reject any non-HTTPS subscription URL during validation (this also doubles as SSRF protection). Under GDPR and CCPA, the legal basis for sending is explicit opt-in, so log consent — timestamp, version of the prompt copy, and the action taken — at the moment of subscription rather than reconstructing it later. The UX side of capturing and honoring that consent is covered in the Frontend Permission UX & Subscription Flows guide.
Apply data minimization everywhere telemetry flows: never write raw payloads, PII, or full endpoints to dead-letter queues, observability platforms, or logs. Hash endpoints with a rotating HMAC key before they leave the dispatch boundary, and scrub payload bodies from aggregation pipelines once a message passes its TTL. A tight Content-Security-Policy on the pages that register the service worker reduces the blast radius of any injected script that could tamper with subscriptions.
Related
- Delivery Tracking & Acknowledgment — map push service HTTP responses to an idempotent delivery ledger.
- Message Batching & Throughput Optimization — concurrency control, connection pooling, and provider-aware pacing.
- Retry Logic & Backoff Strategies — exponential backoff, jitter, dead-letter queues, and circuit breakers.
- TTL & Expiration Handling — lifecycle control from the
TTLheader down to service-worker filtering. - Core Protocols & Browser Implementation — the RFC 8030/8291/8292 transport this pipeline rides on.
Back to Web Push Notifications
FAQ
Does a 201 response mean the user received the notification?
No. A 201 Created only confirms the push service accepted the encrypted payload into its queue. It does not guarantee device wake, network delivery, or that the user saw anything. Actual display is observable only through client-side events forwarded from the service worker, which the Delivery Tracking & Acknowledgment layer reconciles.
Should I use Redis or RabbitMQ for the push queue?
Redis (with streams or a sorted-set delay queue) is simplest for delayed retries and TTL-bound jobs and scales well for fan-out. RabbitMQ shines when you need complex routing, per-tenant exchanges, and built-in dead-lettering. Kafka fits very high-volume, replayable event streams. The full trade-off is in Scaling push queues with Redis or RabbitMQ.
What is the maximum payload size for a web push message?
The encrypted payload must not exceed 4 KB (4096 bytes of ciphertext) and must use the aes128gcm content-encoding per RFC 8291. Exceeding it returns 413 Payload Too Large. Send identifiers and fetch full content in the service worker rather than packing it into the payload.
How many times should I retry a failed push?
Cap retries at 3–5 attempts within a wall-clock window that respects the message TTL. Only retry transient failures (429, 5xx); treat 400, 401, 404, and 410 as terminal. Use exponential backoff with jitter to avoid thundering-herd recovery, as detailed in Retry Logic & Backoff Strategies.
Where should the VAPID private key live?
In a secrets manager, injected at runtime as process.env.VAPID_PRIVATE_KEY. It must never appear hardcoded in server-side source or be committed to version control. The public key is shared with the browser during subscription; rotation is covered in VAPID key generation & rotation.
How do I keep a multi-tenant pipeline from letting one tenant starve others?
Shard queues per tenant or region, give each tenant its own worker pool budget with weighted fair queuing, and apply per-tenant token-bucket rate limiting. Isolate transactional alerts from marketing campaigns so a promotional burst can never delay a 2FA code.