Message Batching & Throughput Optimization for Web Push Delivery

High-volume web push campaigns demand precise orchestration to manage provider throttling, minimize latency, and guarantee secure delivery. Implementing an efficient batching strategy is foundational to the broader Backend Delivery Architecture & Queue Management framework. This guide provides production-ready patterns for payload consolidation, connection pooling, and secure scaling.

Prerequisites

  • web-push Node.js library) for connection reuse
  • VAPID keys available at runtime via process.env.VAPID_PRIVATE_KEY
  • aes128gcm and verified under the 4 KB ciphertext limit
  • TIME_WAIT, and per-provider 429 rate
Concurrency-controlled batch dispatch Subscriptions are grouped by push service, then drained through a semaphore that caps in-flight requests, reusing HTTP/2 connections per provider origin while honoring per-origin rate limits. Chrome/Edge subs → FCM Firefox subs → Autopush Safari subs → APNs Semaphore MAX_CONCURRENCY in-flight cap HTTP/2 conn pool reused per origin 429 / Retry-After back-pressure
Group by provider, drain through a concurrency-capping semaphore over reused HTTP/2 connections, and back off on per-origin 429s.

Architecting the Batching Engine

Push providers enforce strict rate limits and connection caps per origin. Unbatched, sequential dispatches exhaust connection pools and trigger 429 Too Many Requests responses. A deterministic batching engine groups subscriptions by push service, applies dynamic chunk sizing, and dispatches via a controlled worker pool.

Important: Web Push (RFC 8030) is a per-endpoint HTTP protocol — you send one HTTP POST per subscription. Batching in this context means controlling how many concurrent dispatches run simultaneously using a semaphore, not packaging multiple subscriptions into a single HTTP request.

Implementation Steps

  1. Endpoint Normalization & Grouping: Partition subscription payloads by push service domain. Chrome/Edge endpoints route to FCM, Firefox to Mozilla Autopush, and Safari to APNs Web Push — each requires distinct Authorization header formats.
  2. Dynamic Concurrency Limits: Configure a semaphore limiting concurrent in-flight requests to 10–20 per worker process. FCM handles high concurrency well; APNs benefits from more conservative limits to avoid HTTP/2 stream errors. Choosing the right window is a measurable trade-off rather than a guess — see Optimal batch size for web push throughput.
  3. Priority Queue Dispatch: Deploy a worker pool that drains from a Redis-backed priority queue. High-priority campaigns (e.g., security alerts) bypass standard batching windows.

Production-Ready Dispatch Implementation

const BATCH_SIZE = 200;       // subscriptions per dispatch window
const MAX_CONCURRENCY = 10;   // concurrent HTTP requests per worker
const DISPATCH_TIMEOUT_MS = 30_000;

interface PushSubscription {
  endpoint: string;
  keys: { p256dh: string; auth: string };
  metadata?: Record<string, unknown>;
}

class Semaphore {
  private permits: number;
  private queue: (() => void)[] = [];

  constructor(permits: number) {
    this.permits = permits;
  }

  async run<T>(task: () => Promise<T>): Promise<T> {
    if (this.permits > 0) {
      this.permits--;
      try {
        return await task();
      } finally {
        this.permits++;
        this.releaseNext();
      }
    }
    return new Promise<T>((resolve, reject) => {
      this.queue.push(async () => {
        try { resolve(await task()); }
        catch (err) { reject(err); }
      });
    });
  }

  private releaseNext() {
    if (this.queue.length > 0) this.queue.shift()!();
  }
}

function chunkArray<T>(arr: T[], size: number): T[][] {
  const chunks: T[][] = [];
  for (let i = 0; i < arr.length; i += size) {
    chunks.push(arr.slice(i, i + size));
  }
  return chunks;
}

async function dispatchBatches(
  subscriptions: PushSubscription[],
  sendPush: (sub: PushSubscription) => Promise<void>
): Promise<void> {
  const semaphore = new Semaphore(MAX_CONCURRENCY);

  const dispatches = subscriptions.map((sub, index) =>
    semaphore.run(async () => {
      try {
        await Promise.race([
          sendPush(sub),
          new Promise<never>((_, reject) =>
            setTimeout(() => reject(new Error(`Dispatch ${index} timeout`)), DISPATCH_TIMEOUT_MS)
          )
        ]);
      } catch (error) {
        console.error(`[DISPATCH_ERROR] Sub ${index} failed:`, error);
        throw error;
      }
    })
  );

  await Promise.allSettled(dispatches);
}

Architecture Trade-offs: Higher concurrency increases throughput but risks TCP connection exhaustion and provider-side IP reputation degradation. Monitor TIME_WAIT socket counts to ensure they stay within OS limits.

Optimizing Payload & Connection Throughput

Throughput bottlenecks frequently originate from redundant payload serialization and ephemeral HTTP connections. HTTP/2 multiplexing allows concurrent push requests over a single persistent TCP connection, drastically reducing TLS handshake overhead. Aligning expiration windows with TTL & Expiration Handling ensures stale payloads are discarded before consuming dispatch cycles.

Implementation Steps

  1. HTTP/2 Persistence: Use an HTTP/2-capable client and reuse connections across requests to the same push service origin. The web-push Node.js library handles this internally.
  2. Payload Minimization: Keep plaintext JSON payloads under 3 KB to stay within provider limits after encryption overhead. Send only IDs and action metadata; fetch full content client-side.
  3. Pre-Encryption Pipeline: Encrypt payloads server-side before queue insertion. Offloading cryptographic operations to a dedicated pre-processing stage prevents CPU contention during high-throughput dispatch.

Security & Compliance Posture

  • PII Stripping: Remove or hash personally identifiable information before encryption. Push payloads traverse multiple network hops.
  • VAPID Validation: Verify Authorization: WebPush <token> headers against active key rotation schedules. Invalid signatures result in immediate 401 Unauthorized responses.
  • Regulatory Alignment: Enforce GDPR/CCPA consent flags at the batching layer. Subscriptions lacking explicit opt-in must be filtered before entering the dispatch pipeline.

Monitoring & Acknowledgment Integration

Without granular tracking, a single 410 Gone response within a dispatch window can corrupt subscription health metrics. Implementing a correlation ID per request and mapping it to individual subscription receipts enables accurate state reconciliation. This workflow directly feeds into Delivery Tracking & Acknowledgment systems, allowing real-time throughput adjustments and failure isolation.

Implementation Steps

  1. Correlation Mapping: Attach a unique request_id (UUIDv4) and subscription_index to each HTTP dispatch. Store mappings in an ephemeral cache (e.g., Redis with 15-minute TTL).
  2. Response Parsing: Parse 201 Created, 404 Not Found, and 410 Gone responses. Immediately flag 404/410 as invalid subscriptions and purge them from the active database.
  3. Dead-Letter Routing: Route failed dispatches to a dedicated DLQ. Implement exponential backoff for transient 5xx errors as described in Retry Logic & Backoff Strategies, but permanently quarantine 4xx client errors to prevent retry storms.

Debugging Checklist

  • Verify provider responses contain expected headers before abstracting status codes into internal metrics.
  • Audit batch sizes against provider-specific Retry-After headers on 429 responses.
  • Ensure worker logs capture raw HTTP status codes before internal mapping.

Production Deployment & Queue Scaling

Deploying a high-throughput dispatch system requires rigorous validation of connection limits, memory allocation during encryption, and circuit breaker thresholds. Align dispatch intervals with campaign velocity and enforce strict idempotency keys to prevent duplicate notifications. For infrastructure teams evaluating distributed message brokers, refer to Scaling push queues with Redis or RabbitMQ to select the optimal persistence and routing strategy.

Implementation Steps

  1. VAPID Key Rotation Compatibility: Implement hot-swappable VAPID keys. Workers must fetch the latest key from a centralized secrets manager before signing each dispatch window.
  2. Circuit Breaker Configuration: Set failure thresholds at 5% per provider. When exceeded, halt dispatch for 60 s, drain pending dispatches to a retry queue, and alert SRE teams.
  3. Backpressure Handling: Enforce queue depth limits. When exceeded, apply producer-side rate limiting and return 503 Service Unavailable to upstream campaign APIs.
  4. Load Testing Protocol: Simulate 50,000 concurrent subscription dispatches with a 99.5% success target. Monitor GC pauses, heap allocation, and network I/O saturation during peak windows.

Security & Operational Hardening

  • Immutable Audit Logging: Record every dispatched request with correlation IDs, timestamps, and VAPID key fingerprints. Logs must be append-only and retained per compliance mandates.
  • Idempotency Enforcement: Generate deterministic idempotency keys using SHA-256(campaign_id + subscription_endpoint_hash + timestamp_window). Reject duplicate dispatches within a 5-minute sliding window.
  • Memory Safeguards: Cap worker heap usage at 70%. Implement stream-based payload processing to avoid loading entire subscription lists into memory simultaneously.

Throughput Tuning Reference

These are the knobs that move throughput without tipping a provider into throttling. Start conservative and raise concurrency only while the 429 rate stays near zero.

Parameter Type Default Notes
BATCH_SIZE integer 200 Subscriptions drained per dispatch window before yielding
MAX_CONCURRENCY integer 10 In-flight requests per worker; lower for APNs (HTTP/2 stream limits)
DISPATCH_TIMEOUT_MS integer 30000 Per-request timeout before the dispatch is treated as failed
circuitFailurePct float 0.05 Per-provider failure rate that opens the circuit breaker
circuitCooldownSec integer 60 Pause before half-open probing after the breaker trips
correlationCacheTtlSec integer 900 TTL on the ephemeral request_id → subscription map in Redis
idempotencyWindowSec integer 300 Sliding window that rejects duplicate dispatches

Verification

Validate the dispatcher under load before trusting it in a campaign. Watch the socket table and the provider response mix while ramping concurrency:

# Watch TIME_WAIT sockets while a load test runs against the dispatcher
watch -n 2 "ss -tan state time-wait | wc -l"

A healthy run holds TIME_WAIT well under the OS limit (HTTP/2 reuse keeps it low), shows a flat 429 rate, and reports a success ratio at or above your target (e.g. 99.5%). A climbing 429 rate means MAX_CONCURRENCY is too high for that provider — back it off and honor every Retry-After.

Back to Backend Delivery Architecture & Queue Management

FAQ

Can I send multiple subscriptions in a single HTTP request?

No. Web Push (RFC 8030) is a per-endpoint protocol — one HTTP POST per subscription. “Batching” here means controlling how many of those POSTs run concurrently through a semaphore and reusing HTTP/2 connections, not packing recipients into one request.

What concurrency limit avoids 429 responses?

Start at 10–20 in-flight requests per worker. FCM tolerates higher concurrency; APNs needs lower limits to avoid HTTP/2 stream errors. Raise it only while the per-provider 429 rate stays flat, and measure the sweet spot empirically as covered in Optimal batch size for web push throughput.

How big can a push payload be?

The encrypted payload is capped at 4 KB of ciphertext with the aes128gcm encoding. Keep plaintext JSON under roughly 3 KB to leave room for encryption overhead. Send identifiers and fetch full content in the service worker.

Does HTTP/2 multiplexing really help throughput?

Yes. Reusing a single persistent HTTP/2 connection per push-service origin eliminates repeated TLS handshakes and keeps TIME_WAIT socket counts low, which is usually the first OS-level bottleneck under high fan-out.