Message Batching & Throughput Optimization for Web Push Delivery
High-volume web push campaigns demand precise orchestration to manage provider throttling, minimize latency, and guarantee secure delivery. Implementing an efficient batching strategy is foundational to the broader Backend Delivery Architecture & Queue Management framework. This guide provides production-ready patterns for payload consolidation, connection pooling, and secure scaling.
Prerequisites
web-pushNode.js library) for connection reuse- VAPID keys available at runtime via
process.env.VAPID_PRIVATE_KEY aes128gcmand verified under the 4 KB ciphertext limitTIME_WAIT, and per-provider429rate
Architecting the Batching Engine
Push providers enforce strict rate limits and connection caps per origin. Unbatched, sequential dispatches exhaust connection pools and trigger 429 Too Many Requests responses. A deterministic batching engine groups subscriptions by push service, applies dynamic chunk sizing, and dispatches via a controlled worker pool.
Important: Web Push (RFC 8030) is a per-endpoint HTTP protocol — you send one HTTP POST per subscription. Batching in this context means controlling how many concurrent dispatches run simultaneously using a semaphore, not packaging multiple subscriptions into a single HTTP request.
Implementation Steps
- Endpoint Normalization & Grouping: Partition subscription payloads by push service domain. Chrome/Edge endpoints route to FCM, Firefox to Mozilla Autopush, and Safari to APNs Web Push — each requires distinct
Authorizationheader formats. - Dynamic Concurrency Limits: Configure a semaphore limiting concurrent in-flight requests to 10–20 per worker process. FCM handles high concurrency well; APNs benefits from more conservative limits to avoid HTTP/2 stream errors. Choosing the right window is a measurable trade-off rather than a guess — see Optimal batch size for web push throughput.
- Priority Queue Dispatch: Deploy a worker pool that drains from a Redis-backed priority queue. High-priority campaigns (e.g., security alerts) bypass standard batching windows.
Production-Ready Dispatch Implementation
const BATCH_SIZE = 200; // subscriptions per dispatch window
const MAX_CONCURRENCY = 10; // concurrent HTTP requests per worker
const DISPATCH_TIMEOUT_MS = 30_000;
interface PushSubscription {
endpoint: string;
keys: { p256dh: string; auth: string };
metadata?: Record<string, unknown>;
}
class Semaphore {
private permits: number;
private queue: (() => void)[] = [];
constructor(permits: number) {
this.permits = permits;
}
async run<T>(task: () => Promise<T>): Promise<T> {
if (this.permits > 0) {
this.permits--;
try {
return await task();
} finally {
this.permits++;
this.releaseNext();
}
}
return new Promise<T>((resolve, reject) => {
this.queue.push(async () => {
try { resolve(await task()); }
catch (err) { reject(err); }
});
});
}
private releaseNext() {
if (this.queue.length > 0) this.queue.shift()!();
}
}
function chunkArray<T>(arr: T[], size: number): T[][] {
const chunks: T[][] = [];
for (let i = 0; i < arr.length; i += size) {
chunks.push(arr.slice(i, i + size));
}
return chunks;
}
async function dispatchBatches(
subscriptions: PushSubscription[],
sendPush: (sub: PushSubscription) => Promise<void>
): Promise<void> {
const semaphore = new Semaphore(MAX_CONCURRENCY);
const dispatches = subscriptions.map((sub, index) =>
semaphore.run(async () => {
try {
await Promise.race([
sendPush(sub),
new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error(`Dispatch ${index} timeout`)), DISPATCH_TIMEOUT_MS)
)
]);
} catch (error) {
console.error(`[DISPATCH_ERROR] Sub ${index} failed:`, error);
throw error;
}
})
);
await Promise.allSettled(dispatches);
}
Architecture Trade-offs: Higher concurrency increases throughput but risks TCP connection exhaustion and provider-side IP reputation degradation. Monitor TIME_WAIT socket counts to ensure they stay within OS limits.
Optimizing Payload & Connection Throughput
Throughput bottlenecks frequently originate from redundant payload serialization and ephemeral HTTP connections. HTTP/2 multiplexing allows concurrent push requests over a single persistent TCP connection, drastically reducing TLS handshake overhead. Aligning expiration windows with TTL & Expiration Handling ensures stale payloads are discarded before consuming dispatch cycles.
Implementation Steps
- HTTP/2 Persistence: Use an HTTP/2-capable client and reuse connections across requests to the same push service origin. The
web-pushNode.js library handles this internally. - Payload Minimization: Keep plaintext JSON payloads under 3 KB to stay within provider limits after encryption overhead. Send only IDs and action metadata; fetch full content client-side.
- Pre-Encryption Pipeline: Encrypt payloads server-side before queue insertion. Offloading cryptographic operations to a dedicated pre-processing stage prevents CPU contention during high-throughput dispatch.
Security & Compliance Posture
- PII Stripping: Remove or hash personally identifiable information before encryption. Push payloads traverse multiple network hops.
- VAPID Validation: Verify
Authorization: WebPush <token>headers against active key rotation schedules. Invalid signatures result in immediate401 Unauthorizedresponses. - Regulatory Alignment: Enforce GDPR/CCPA consent flags at the batching layer. Subscriptions lacking explicit opt-in must be filtered before entering the dispatch pipeline.
Monitoring & Acknowledgment Integration
Without granular tracking, a single 410 Gone response within a dispatch window can corrupt subscription health metrics. Implementing a correlation ID per request and mapping it to individual subscription receipts enables accurate state reconciliation. This workflow directly feeds into Delivery Tracking & Acknowledgment systems, allowing real-time throughput adjustments and failure isolation.
Implementation Steps
- Correlation Mapping: Attach a unique
request_id(UUIDv4) andsubscription_indexto each HTTP dispatch. Store mappings in an ephemeral cache (e.g., Redis with 15-minute TTL). - Response Parsing: Parse
201 Created,404 Not Found, and410 Goneresponses. Immediately flag404/410as invalid subscriptions and purge them from the active database. - Dead-Letter Routing: Route failed dispatches to a dedicated DLQ. Implement exponential backoff for transient
5xxerrors as described in Retry Logic & Backoff Strategies, but permanently quarantine4xxclient errors to prevent retry storms.
Debugging Checklist
- Verify provider responses contain expected headers before abstracting status codes into internal metrics.
- Audit batch sizes against provider-specific
Retry-Afterheaders on429responses. - Ensure worker logs capture raw HTTP status codes before internal mapping.
Production Deployment & Queue Scaling
Deploying a high-throughput dispatch system requires rigorous validation of connection limits, memory allocation during encryption, and circuit breaker thresholds. Align dispatch intervals with campaign velocity and enforce strict idempotency keys to prevent duplicate notifications. For infrastructure teams evaluating distributed message brokers, refer to Scaling push queues with Redis or RabbitMQ to select the optimal persistence and routing strategy.
Implementation Steps
- VAPID Key Rotation Compatibility: Implement hot-swappable VAPID keys. Workers must fetch the latest key from a centralized secrets manager before signing each dispatch window.
- Circuit Breaker Configuration: Set failure thresholds at 5% per provider. When exceeded, halt dispatch for 60 s, drain pending dispatches to a retry queue, and alert SRE teams.
- Backpressure Handling: Enforce queue depth limits. When exceeded, apply producer-side rate limiting and return
503 Service Unavailableto upstream campaign APIs. - Load Testing Protocol: Simulate 50,000 concurrent subscription dispatches with a 99.5% success target. Monitor GC pauses, heap allocation, and network I/O saturation during peak windows.
Security & Operational Hardening
- Immutable Audit Logging: Record every dispatched request with correlation IDs, timestamps, and VAPID key fingerprints. Logs must be append-only and retained per compliance mandates.
- Idempotency Enforcement: Generate deterministic idempotency keys using
SHA-256(campaign_id + subscription_endpoint_hash + timestamp_window). Reject duplicate dispatches within a 5-minute sliding window. - Memory Safeguards: Cap worker heap usage at 70%. Implement stream-based payload processing to avoid loading entire subscription lists into memory simultaneously.
Throughput Tuning Reference
These are the knobs that move throughput without tipping a provider into throttling. Start conservative and raise concurrency only while the 429 rate stays near zero.
| Parameter | Type | Default | Notes |
|---|---|---|---|
BATCH_SIZE |
integer | 200 |
Subscriptions drained per dispatch window before yielding |
MAX_CONCURRENCY |
integer | 10 |
In-flight requests per worker; lower for APNs (HTTP/2 stream limits) |
DISPATCH_TIMEOUT_MS |
integer | 30000 |
Per-request timeout before the dispatch is treated as failed |
circuitFailurePct |
float | 0.05 |
Per-provider failure rate that opens the circuit breaker |
circuitCooldownSec |
integer | 60 |
Pause before half-open probing after the breaker trips |
correlationCacheTtlSec |
integer | 900 |
TTL on the ephemeral request_id → subscription map in Redis |
idempotencyWindowSec |
integer | 300 |
Sliding window that rejects duplicate dispatches |
Verification
Validate the dispatcher under load before trusting it in a campaign. Watch the socket table and the provider response mix while ramping concurrency:
# Watch TIME_WAIT sockets while a load test runs against the dispatcher
watch -n 2 "ss -tan state time-wait | wc -l"
A healthy run holds TIME_WAIT well under the OS limit (HTTP/2 reuse keeps it low), shows a flat 429 rate, and reports a success ratio at or above your target (e.g. 99.5%). A climbing 429 rate means MAX_CONCURRENCY is too high for that provider — back it off and honor every Retry-After.
Related
- Scaling push queues with Redis or RabbitMQ — broker selection and persistence for the work queue behind these workers.
- Optimal batch size for web push throughput — how to measure the concurrency sweet spot per provider.
- Retry Logic & Backoff Strategies — backoff for the transient failures this dispatcher surfaces.
- Delivery Tracking & Acknowledgment — reconcile per-request correlation IDs into a delivery ledger.
- TTL & Expiration Handling — discard stale payloads before they consume dispatch cycles.
Back to Backend Delivery Architecture & Queue Management
FAQ
Can I send multiple subscriptions in a single HTTP request?
No. Web Push (RFC 8030) is a per-endpoint protocol — one HTTP POST per subscription. “Batching” here means controlling how many of those POSTs run concurrently through a semaphore and reusing HTTP/2 connections, not packing recipients into one request.
What concurrency limit avoids 429 responses?
Start at 10–20 in-flight requests per worker. FCM tolerates higher concurrency; APNs needs lower limits to avoid HTTP/2 stream errors. Raise it only while the per-provider 429 rate stays flat, and measure the sweet spot empirically as covered in Optimal batch size for web push throughput.
How big can a push payload be?
The encrypted payload is capped at 4 KB of ciphertext with the aes128gcm encoding. Keep plaintext JSON under roughly 3 KB to leave room for encryption overhead. Send identifiers and fetch full content in the service worker.
Does HTTP/2 multiplexing really help throughput?
Yes. Reusing a single persistent HTTP/2 connection per push-service origin eliminates repeated TLS handshakes and keeps TIME_WAIT socket counts low, which is usually the first OS-level bottleneck under high fan-out.