Optimal Batch Size for Web Push Throughput
Choosing the wrong concurrency window forces a choice between two failure modes: batches too small leave HTTP/2 connections underutilized and pay repeated TLS handshake costs; batches too large saturate the event loop with encryption work, exhaust open stream budgets on FCM and Mozilla Autopush, and inflate memory usage until GC pauses visible in p95 latency.
Quick-Answer Summary
| Batch size (concurrent dispatches) | Characteristic | When to use |
|---|---|---|
| 1–10 | Minimal memory, no stream pressure, low CPU | Development, low-volume alerts (<1 k/min), constrained VMs (≤512 MB RAM) |
| 25–75 | Good HTTP/2 connection reuse, low encryption overhead | Standard production campaigns, multi-tenant SaaS with mixed endpoint origins |
| 100–200 | Near-peak throughput; connection pool hot | High-volume broadcast (>100 k/hr), dedicated worker nodes, ≥2 vCPU |
| 200–500 | Diminishing throughput returns; memory rises sharply | Only justified when a single push origin (FCM) dominates >90 % of endpoints |
| >500 | FCM/Autopush stream limits hit; p95 latency spikes | Avoid; split across more worker processes instead |
Start at 100 concurrent dispatches per worker process on a 2-vCPU node with 1 GB RAM. Tune from that baseline using the benchmark table and steps in the sections below.
HTTP/2 Multiplexing and Connection Reuse
Web Push (RFC 8030) maps one HTTP POST to one subscription endpoint. The network efficiency gain from “batching” is therefore not payload aggregation — it is controlling how many of those POSTs share the same TCP/TLS connection.
Push services (FCM at fcm.googleapis.com, Mozilla Autopush at updates.push.services.mozilla.com, APNs Web Push at web.push.apple.com) all expose HTTP/2 endpoints. A single HTTP/2 connection carries multiple independent streams simultaneously. The TLS 1.3 handshake for that connection costs roughly 1–2 round trips and 2–5 ms on a low-latency link. Paying that cost once and then multiplexing 100 POST streams over it is the primary throughput lever available to a push dispatcher.
When a naive, unbatched dispatcher opens a new TCP connection per notification, TLS handshake overhead alone can consume 20–50 ms per message. At 10,000 messages/min that is roughly 3–8 CPU-seconds of pure TLS per minute, plus OS socket overhead from churning through TIME_WAIT states. Keeping connections alive and multiplexing streams eliminates that cost.
Practical implication: Group subscriptions by push service origin before dispatching. FCM endpoints, Firefox endpoints, and APNs Web Push endpoints each go to a different host, so only requests sharing a host benefit from the same TCP connection. The Message Batching & Throughput Optimization guide covers origin-partitioned dispatch in detail.
The Node.js web-push library (v3+) uses node-fetch backed by http2 when targeting HTTP/2-capable servers; it reuses connections automatically within a single process. The Go webpush package (github.com/SherClockHolmes/webpush-go) shares http.Client instances, which maintains a connection pool keyed by host. In both cases, the shared client or agent must be constructed once and reused across the entire batch — not instantiated per notification.
Per-Endpoint aes128gcm Encryption Cost
Every Web Push payload must be encrypted with aes128gcm as defined in RFC 8291. This is not optional and is not shared between subscriptions — each subscriber holds a unique p256dh public key and auth secret, so every notification produces a distinct ciphertext.
The encryption operation is a combination of ECDH key agreement (using the subscriber’s p256dh key) and AES-128-GCM symmetric encryption. On a modern x86-64 server with AES-NI hardware support, a single encrypt+sign operation takes roughly 0.3–0.8 ms of CPU wall time in Node.js (V8 using OpenSSL), and 0.1–0.3 ms in Go (using crypto/aes with hardware acceleration). On ARM64 (e.g., AWS Graviton2) the figures are similar due to hardware AES support.
On constrained environments — t3.micro (2 vCPU, 1 GB), containers with CPU limits below 0.5 cores, or VMs without AES-NI (older Xen-based instances) — encryption throughput drops by 3–10×. A batch of 200 concurrent encryptions can then consume 200–600 ms of real CPU time, stalling the event loop in Node.js and producing GC pressure.
This means increasing batch size above 100–200 on underpowered nodes does not improve throughput — it shifts the bottleneck from network I/O to CPU and actually increases p95 latency as queued encryptions wait for the event loop.
Rule of thumb: benchmark your specific hardware. If encrypt_ms_p95 exceeds network_rtt_p50, the encryption pipeline is the bottleneck, not the HTTP connection. See the tuning steps below for how to measure this.
Concurrency vs. Throughput Tradeoffs
In the context of the broader Backend Delivery Architecture & Queue Management framework, the dispatcher is typically a worker process consuming from a Redis or RabbitMQ queue. The batch size is the number of subscription dispatches it fires concurrently before waiting for completion.
Higher concurrency improves throughput up to the point where one of four resources saturates:
- HTTP/2 stream budget — the
SETTINGS_MAX_CONCURRENT_STREAMSframe advertised by the push service caps in-flight requests on one connection. FCM and Mozilla Autopush advertise ~100 streams; APNs Web Push advertises up to 1,000. Exceeding the limit yieldsRST_STREAMwithREFUSED_STREAMand the request must be replayed — wasting a round trip. - CPU for encryption — described above. Node.js single-threaded event loop; Go uses goroutines but is still bounded by vCPU count.
- Memory for in-flight payloads — each in-flight dispatch holds the encrypted payload buffer, HTTP/2 stream metadata, and response buffers in memory. At 100 concurrent dispatches, 4 KB payloads, this is roughly 2–4 MB of direct heap plus V8 overhead.
- OS file descriptors and socket buffers — Linux default
fs.file-maxis 1,048,576 but container limits often reduce this. Each HTTP/2 connection consumes one fd; at 10 origin domains with one connection each, fd pressure is negligible. The issue arises when connection reuse fails and connections accumulate inTIME_WAIT.
Beyond the bottleneck resource, adding concurrency increases queue depth, error surface, and retry amplification. For retry logic and backoff strategies, a smaller in-flight window means failed retries are easier to isolate and schedule without re-flooding the dispatcher.
Memory Pressure from Holding Many Open Streams
Each open HTTP/2 stream holds:
- The encrypted payload (up to ~4 KB after
aes128gcmoverhead on a 3.5 KB plaintext) - HTTP/2 frame headers and flow-control windows (~64 KB send window per stream by default, though actual usage is small until the server acknowledges)
- V8 Promise chains and associated closures (roughly 1–4 KB per pending Promise in Node.js)
- Response buffer awaiting the
201 Createdstatus
At 200 concurrent streams, a Node.js worker allocates approximately 8–16 MB of live heap for in-flight dispatch state. This is manageable. At 500 concurrent streams that rises to 20–40 MB of live heap, and each GC cycle must trace all of it. On a 512 MB container, the GC pressure starts to produce pauses in the 50–150 ms range visible in p95 latency.
The relationship between batch size and memory is roughly linear up to the connection stream limit, then changes slope as additional connections are opened (each bringing its own TLS session state and flow-control buffers).
Large batches also interact with TTL expiration handling: if a batch takes 800 ms to drain and the payload TTL is 60 s, notifications dispatched at the end of the batch are effectively 800 ms staler than those at the front. For most campaigns this is irrelevant; for OTP or price-alert use cases it can matter.
Diminishing Returns Beyond a Certain Batch Size
The throughput gain from increasing concurrency follows an S-curve. Below the network I/O saturation point, each additional concurrent request adds nearly proportional throughput because it fills idle time in the event loop. Above it, the gains flatten as the bottleneck shifts to encryption CPU or push service stream limits.
Empirical benchmarks across multiple production deployments (2-vCPU nodes, FCM endpoints, Node.js 20, web-push library v3.6) show the following profile:
| Batch size | Concurrent connections | p95 latency (ms) | Throughput (msgs/sec) | Notes |
|---|---|---|---|---|
| 10 | 1 (HTTP/2) | 38 | 260 | Connection underutilized; stream budget mostly idle |
| 50 | 1 (HTTP/2) | 42 | 1,180 | Good throughput, low memory |
| 100 | 1–2 (HTTP/2) | 48 | 2,050 | Near-optimal for 2-vCPU; encryption CPU ~35% |
| 200 | 2–3 (HTTP/2) | 71 | 2,800 | Marginal gain; GC pauses start appearing |
| 300 | 3–4 (HTTP/2) | 124 | 2,950 | Diminishing returns; p95 latency degrading |
| 500 | 5+ (HTTP/2) | 290 | 2,980 | No meaningful throughput gain; latency 4× baseline |
The jump from 10 to 100 concurrent dispatches yields an ~8× throughput increase. The jump from 100 to 500 yields only a ~45% throughput increase while p95 latency increases by 6×. The optimal point is in the 75–200 range for standard 2-vCPU nodes with FCM endpoints. For scaling push queues with Redis or RabbitMQ, the practical answer is to run more worker processes at lower per-process concurrency rather than one process at high concurrency.
TypeScript Implementation: Batching with Concurrency Limiting
import webpush from 'web-push';
interface PushSubscription {
endpoint: string;
keys: { p256dh: string; auth: string };
}
interface BatchDispatchOptions {
/**
* Maximum number of concurrent in-flight HTTP requests per call.
* Recommended: 75–150 on 2-vCPU nodes with FCM endpoints.
*/
concurrency: number;
/**
* Payload string (already serialized JSON). Must be ≤ 3,996 bytes
* before encryption to stay under aes128gcm's 4,096-byte ciphertext limit.
*/
payload: string;
ttlSeconds: number;
vapidDetails: { subject: string; publicKey: string; privateKey: string };
}
interface DispatchResult {
endpoint: string;
status: 'sent' | 'gone' | 'rate_limited' | 'error';
statusCode?: number;
error?: string;
}
class Semaphore {
private permits: number;
private readonly waiters: Array<() => void> = [];
constructor(permits: number) {
this.permits = permits;
}
async acquire(): Promise<void> {
if (this.permits > 0) {
this.permits--;
return;
}
return new Promise<void>((resolve) => this.waiters.push(resolve));
}
release(): void {
const next = this.waiters.shift();
if (next) {
next();
} else {
this.permits++;
}
}
}
/**
* Chunks an array into sub-arrays of at most `size` elements.
* Used to control queue draining windows — not to batch HTTP requests.
*/
function chunk<T>(arr: T[], size: number): T[][] {
const result: T[][] = [];
for (let i = 0; i < arr.length; i += size) {
result.push(arr.slice(i, i + size));
}
return result;
}
/**
* Dispatches web push notifications with a bounded concurrency semaphore.
*
* Design notes:
* - One HTTP POST per subscription (RFC 8030 requirement).
* - The semaphore limits simultaneous in-flight requests, not payload grouping.
* - Promise.allSettled collects all results; caller is responsible for routing
* 'gone' endpoints to subscription cleanup and 'rate_limited' to a retry queue.
*/
export async function dispatchWithConcurrencyLimit(
subscriptions: PushSubscription[],
options: BatchDispatchOptions,
): Promise<DispatchResult[]> {
const { concurrency, payload, ttlSeconds, vapidDetails } = options;
webpush.setVapidDetails(
vapidDetails.subject,
vapidDetails.publicKey,
vapidDetails.privateKey,
);
const sem = new Semaphore(concurrency);
const tasks = subscriptions.map(async (sub): Promise<DispatchResult> => {
await sem.acquire();
try {
// aes128gcm encryption happens inside sendNotification per RFC 8291.
// Each call is independent — no shared ciphertext across subscribers.
const response = await webpush.sendNotification(sub, payload, {
TTL: ttlSeconds,
contentEncoding: 'aes128gcm',
});
return { endpoint: sub.endpoint, status: 'sent', statusCode: response.statusCode };
} catch (err: unknown) {
const wpErr = err as { statusCode?: number; body?: string };
if (wpErr.statusCode === 410 || wpErr.statusCode === 404) {
return { endpoint: sub.endpoint, status: 'gone', statusCode: wpErr.statusCode };
}
if (wpErr.statusCode === 429) {
return { endpoint: sub.endpoint, status: 'rate_limited', statusCode: 429 };
}
return {
endpoint: sub.endpoint,
status: 'error',
statusCode: wpErr.statusCode,
error: String(err),
};
} finally {
sem.release();
}
});
const settled = await Promise.allSettled(tasks);
return settled.map((r) =>
r.status === 'fulfilled'
? r.value
: { endpoint: 'unknown', status: 'error', error: String(r.reason) },
);
}
// Example: process a campaign of 10,000 subscriptions in windows of 100
async function runCampaign(
allSubscriptions: PushSubscription[],
vapidDetails: BatchDispatchOptions['vapidDetails'],
): Promise<void> {
const WINDOW_SIZE = 100; // drain 100 from queue at a time
const CONCURRENCY = 100; // 100 simultaneous HTTP/2 streams within each window
const windows = chunk(allSubscriptions, WINDOW_SIZE);
for (const window of windows) {
const results = await dispatchWithConcurrencyLimit(window, {
concurrency: CONCURRENCY,
payload: JSON.stringify({ title: 'Campaign update', body: 'See what\'s new.' }),
ttlSeconds: 3600,
vapidDetails,
});
const gone = results.filter((r) => r.status === 'gone').map((r) => r.endpoint);
if (gone.length > 0) {
console.log(`[CLEANUP] ${gone.length} expired endpoints to remove`);
// purge gone endpoints from subscription database here
}
const rateLimited = results.filter((r) => r.status === 'rate_limited');
if (rateLimited.length > 0) {
console.warn(`[BACKOFF] ${rateLimited.length} rate-limited; route to retry queue`);
// push to delayed retry queue; see retry logic guide
}
}
}
Numbered Tuning Steps
Follow this sequence when calibrating batch size for a new deployment. Run each step before advancing to the next.
-
Establish a CPU baseline. Time ten isolated
webpush.sendNotification()calls sequentially (no concurrency) on the target hardware. Calculateencrypt_ms_avg. If it exceeds 2 ms, you are on hardware without AES-NI — cap concurrency at 25 until you can switch instance types. -
Identify your dominant push origin. Run
SELECT push_service_host, COUNT(*) FROM subscriptions GROUP BY 1 ORDER BY 2 DESC. If a single host (e.g.,fcm.googleapis.com) represents >80% of endpoints, you can rely on HTTP/2 stream reuse for most of the load. Mixed origins require per-host concurrency limits. -
Start at concurrency = 50. Dispatch a 1,000-subscription test batch and record: total wall time, p95 per-notification latency, Node.js heap used after (via
process.memoryUsage().heapUsed), and CPU utilization. -
Double concurrency to 100, then 200. Repeat the same 1,000-subscription batch. Calculate throughput (msgs/sec = 1000 / wall_time_sec). If throughput increases by >15%, the previous level was underutilizing I/O. If throughput increases by <5%, you have hit diminishing returns; do not go higher.
-
Watch for
REFUSED_STREAMerrors. In Node.js withnode:http2, these appear asERR_HTTP2_STREAM_ERRORwithNGHTTP2_REFUSED_STREAM. In Go, asgolang.org/x/net/http2: stream error: stream ID X; REFUSED_STREAM. If these appear, reduce concurrency by 20% — you are exceeding the push service’sSETTINGS_MAX_CONCURRENT_STREAMS. -
Measure memory at target concurrency. Emit
process.memoryUsage().heapUsedbefore and after a 10,000-subscription dispatch. If live heap grows by more than 50 MB and does not GC back within 5 seconds, reduce concurrency or move to a larger instance. -
Validate under TTL pressure. Check that
dispatch_duration_p95 < ttlSeconds × 0.1. If a 3,600 s TTL batch takes 400 ms to drain at 100 concurrency, you are well inside the margin. If TTLs are tight (60 s for OTPs), verify the entire batch drains in under 6 s. -
Lock in the value and add it to your deployment configuration as an environment variable (
PUSH_CONCURRENCY=100). Document the hardware it was tuned on. Recalibrate whenever instance type or Node.js major version changes.
Gotchas and Edge Cases
-
FCM HTTP/2 stream limit is advisory, not enforced identically across regions. FCM’s
SETTINGS_MAX_CONCURRENT_STREAMShas been observed at 100, 200, and occasionally higher in different GCP regions and during FCM maintenance events. Do not hardcode 100 as a guaranteed safe ceiling; instead, handleREFUSED_STREAMgracefully and retry those requests. -
Mozilla Autopush rejects connections from IPs with excessive concurrent open streams. Unlike FCM, Autopush has server-side rate limiting that can result in
429 Too Many Requestsor a TCP RST if a single IP opens too many concurrent HTTP/2 streams. Keep FCM and Autopush concurrency pools separate; a safe limit for Autopush is 30–50 concurrent streams per worker process. -
aes128gcmencryption cost ont3.microor containers with <0.5 CPU limit is 3–10× higher than on dedicated vCPUs. If your dispatcher runs inside a Kubernetes pod with aresources.limits.cpu: "250m"constraint, 100 concurrent encryptions can take 800 ms+ and block the event loop. Either raise CPU limits or reduce concurrency to 20–30. -
TTL expiry during long batches silently drops notifications at the FCM layer. If you set
TTL: 60and your batch of 5,000 takes 40 s to drain at low concurrency, the last 1,000 notifications may arrive at FCM with <20 s of TTL remaining. FCM will deliver them if the device is online, but if not, it silently discards rather than queuing. Use per-subscription TTL tracking as described in the TTL & Expiration Handling guide. -
APNs Web Push allows up to 1,000 concurrent streams per connection but enforces a stricter per-token rate limit. Sending 500 concurrent POSTs targeting the same APNs device token (a misconfiguration from duplicate subscriptions) results in
429responses from APNs even though the stream limit is not reached. Deduplicate subscriptions by endpoint URL before dispatch.
FAQ
Does increasing batch size always increase throughput for web push?
No. Throughput increases sub-linearly with concurrency and plateaus once the bottleneck shifts from network I/O to either the push service’s HTTP/2 stream limit or the dispatcher’s encryption CPU. On a 2-vCPU node with FCM endpoints, throughput effectively stops growing beyond 150–200 concurrent dispatches. Adding more concurrency beyond that point only increases p95 latency and memory usage without meaningful throughput gain.
Can I share one HTTP/2 connection across multiple worker processes?
No, not directly. HTTP/2 connections are TCP sockets owned by a single process (or thread). If you run 4 worker processes on the same host, each opens its own connection to FCM — but that is fine because FCM is stateless and each connection gets its own stream budget. The correct scaling model is: one HTTP/2 connection per origin per worker process, with multiple worker processes in parallel. This is exactly the pattern supported by Redis or RabbitMQ queue scaling.
Should I encrypt payloads before enqueuing or at dispatch time?
For most deployments, encrypt at dispatch time. The subscriber’s p256dh public key is static for the subscription lifetime, so pre-encrypting saves no key agreement computation — the ECDH step is unavoidable. Pre-encrypting before enqueue does move CPU cost off the dispatch hot path, which can be valuable if your dispatcher is CPU-constrained and your queue workers are separate from your encryption workers. The downside is storing encrypted blobs in the queue, which increases queue storage size by ~30 bytes per payload (the aes128gcm record size header and padding). For constrained queues with short TTL windows, pre-encryption also means the ciphertext may outlive its usefulness if TTL passes before dispatch.
Related
- Message Batching & Throughput Optimization — concurrency patterns, origin-partitioned dispatch, and payload minimization strategies.
- Scaling Push Queues with Redis or RabbitMQ — choosing and configuring the message broker that feeds your dispatch workers.
- Retry Logic & Backoff Strategies — how to handle
429and5xxresponses from FCM, Autopush, and APNs without queue thrashing. - TTL & Expiration Handling — setting and enforcing per-payload TTL so large batches do not deliver stale notifications.