Setting Optimal TTL Values for Time-Sensitive Alerts
Sending a one-time passcode or flash-sale trigger with the wrong TTL means users either never see it or see it hours after it expired — both outcomes destroy trust and waste delivery quota.
Quick Answer: TTL by Alert Type
| Alert type | Recommended TTL | Rationale |
|---|---|---|
| OTP / security code | 0 s |
Deliver immediately or discard — never queue |
| Critical system alert | 60–300 s |
Context expires with the incident |
| Flash sale / limited offer | 300–900 s |
Aligns with active session window |
| Re-engagement campaign | 3600–86400 s |
Latency-tolerant; see TTL 0 vs 86400 delivery guarantees |
| Batch marketing | 86400–604800 s |
Low urgency; maximize reach across time zones |
The formula:
TTL_optimal = relevance_window − p95_queue_latency − gateway_handshake_time
Root Cause Analysis: Why Default TTLs Fail
Vendor defaults are optimized for batch marketing campaigns, not real-time alerts. FCM uses a 4-week TTL if no TTL header is sent. APNs Web Push uses the apns-expiration header (a Unix timestamp); omitting it defaults to 0, which means deliver immediately or discard — not store indefinitely. When security codes, flash sale triggers, or system outage warnings exceed their relevance window, they trigger user fatigue and violate data retention policies. Misalignment between payload expiration and queue processing latency is the primary driver of stale delivery and gateway rejection within a broader Backend Delivery Architecture & Queue Management stack.
The three transport protocols each implement TTL differently, and this divergence is a common source of misconfiguration:
- Web Push Protocol (RFC 8030): The
TTLrequest header is in seconds. Valid range:0–2419200(28 days). The push service MUST respect it; it is not advisory. - FCM: Respects the
TTLheader per RFC 8030 for Web Push messages. For the FCM HTTP v1 API (native Android),ttlis a duration string like"3600s". - APNs Web Push: Uses
apns-expirationas a Unix epoch timestamp, not a duration.0means discard if not immediately deliverable; a future timestamp means queue until then.
Implementing Transport-Layer TTL in JavaScript
The web-push library sends the TTL header automatically when you pass the option. Store VAPID credentials in environment variables — never hardcode them.
import webpush from 'web-push';
webpush.setVapidDetails(
'mailto:ops@example.com',
process.env.VAPID_PUBLIC_KEY,
process.env.VAPID_PRIVATE_KEY
);
async function sendAlert(subscription, payload, ttlSeconds) {
const elapsed = Math.floor((Date.now() - payload.enqueuedAt) / 1000);
const ttlRemaining = ttlSeconds - elapsed;
if (ttlRemaining <= 0) {
console.warn({ msg: 'TTL expired before dispatch', endpointHash: payload.endpointHash });
return; // route to analytics sink, not retry queue
}
await webpush.sendNotification(
subscription,
JSON.stringify({ title: payload.title, body: payload.body }),
{
TTL: ttlRemaining, // RFC 8030 header, in seconds
urgency: ttlSeconds <= 60 ? 'high' : 'normal'
}
);
}
Key details:
urgency: 'high'instructs the push service to wake the device immediately rather than waiting for the next heartbeat; pair it withTTL: 0–60for OTPs.- The
TTLyou pass should bettlRemaining, not the original configured TTL, to account for time already spent in the queue. - Payloads are limited to 4 KB after
aes128gcmencryption (RFC 8291). Alert copy and metadata must fit within that envelope.
Queue-Level Expiration & Retry Backoff
Decoupling TTL from retry logic prevents wasted compute cycles. Configure your message broker to enforce expiration at the infrastructure layer so stale messages never reach delivery workers.
# RabbitMQ queue declaration — x-message-ttl is in milliseconds
# Matches a transport-layer TTL of 300 s
arguments:
x-message-ttl: 300000
x-dead-letter-exchange: push.dlx.expired
x-max-length: 50000
# Redis Streams — MAXLEN limits queue depth, not per-message TTL.
# Use XADD with approximate trimming to bound memory footprint.
XADD alerts_stream MAXLEN "~" 10000 "*" payload '{"ttl":300,"enqueuedAt":1700000000000}'
# For per-job TTL in Redis, attach it to the job key with SETEX,
# not to the stream field — stream entries have no native per-entry expiry.
SETEX push:job:a1b2c3 300 '{"endpoint":"...","ttlSeconds":300}'
AWS SQS note: VisibilityTimeout must be ≤ TTL to prevent reprocessing a message the gateway already discarded as stale. Set MessageRetentionPeriod at the queue level (60 s minimum, 14 days maximum) and never rely on SQS retention as a substitute for transport TTL.
Implement exponential backoff capped at TTL × 0.5. This ensures the final retry attempt still has a valid delivery window. Configure max_retries=3 with full jitter to mitigate thundering herd effects. Route expired payloads to an analytics sink with discard_reason: ttl_exceeded — monitoring these discards in delivery analytics reveals queue latency regressions before they impact SLAs.
Diagnostic Steps for TTL Misalignment
Follow this resolution path when alerts arrive stale or fail to deliver at all:
- Audit queue depth. Compare backlog depth against consumer throughput. Identify bottlenecks where
queue_age_p95approaches TTL limits — a symptom of queue saturation under campaign spikes. - Verify header propagation. Confirm the
TTLheader is not stripped by middleware, load balancers, or API gateways. Log the outbound HTTP request to the push service endpoint and check the raw headers. - Validate fallback routing. Confirm expired messages route to a dead-letter queue or analytics sink — review the TTL & Expiration Handling section for canonical routing patterns.
- Recalculate optimal threshold. Apply the formula:
TTL_optimal = relevance_window − p95_queue_latency − gateway_handshake_time. Gateway handshake adds 50–200 ms per vendor; p95 queue latency should come from your Prometheus or Datadog metrics. - Run a synthetic load test. Deploy a canary with
TTL=60under peak concurrency to measure actual discard rates and validate gateway handshake latency. Compare againstexpired_before_delivery_rate. - Automate calibration. Implement dynamic TTL adjustment based on real-time queue depth and device wake-state metrics. Reduce TTL when
queue_age_p95 > TTL × 0.75.
Monitoring Metrics
Track these KPIs to maintain compliance and delivery integrity:
| Metric | Target | Alert threshold |
|---|---|---|
expired_before_delivery_rate |
< 2% for critical alerts |
> 5% → investigate queue latency |
queue_age_p95 |
< 50% of configured TTL |
> 75% of TTL → scale consumers |
ttl_discard_count |
Stable baseline | Sudden spike → upstream latency degradation |
retry_attempts_post_ttl |
0 |
Any nonzero value → backoff cap misconfigured |
Implement structured logging to capture enqueued_at, dispatched_at, and ttl_remaining for post-mortem analysis. Integrate delivery acknowledgment tracking to auto-adjust TTL baselines per device cluster.
Gotchas & Edge Cases
- APNs epoch vs. duration confusion.
apns-expiration: 300means January 1, 1970 + 300 seconds — effectively immediate expiry. You must passMath.floor(Date.now() / 1000) + ttlSecondsas the value, not the TTL duration itself. - FCM collapses under TTL=0. When multiple messages with
TTL=0arrive while the device is offline, FCM delivers only the last one (collapse key behavior). Use a unique collapse key or accept thatTTL=0means best-effort single delivery. - Redis Streams have no per-entry TTL.
XADDdoes not support per-message expiry. You must enforce TTL at the consumer by comparingenqueuedAtagainstDate.now()before dispatch, or store job state in a separateSETEXkey. - Middleware TTL stripping. Some reverse proxies and API gateways normalize or drop unknown request headers. Audit your proxy config — the
TTLheader must reach the push service endpoint intact. - Backoff exceeding TTL. If
base_delay × 2^attempt > ttl_remaining, the retry will fire after the push service has already discarded the queued message. Capmax_delayatTTL × 0.5and discard the retry early rather than wasting a gateway request.
Related
- Implementing Exponential Backoff for Failed Push Deliveries — integrate TTL checks into your retry scheduler to avoid firing retries against already-expired messages.
- TTL 0 vs TTL 86400: Delivery Guarantees — deep comparison of the two extremes and their impact on offline device delivery.
- Scaling Push Queues with Redis or RabbitMQ — queue saturation is the primary cause of
queue_age_p95approaching TTL thresholds.
Back to TTL & Expiration Handling
FAQ
Should I set TTL: 0 for all time-sensitive notifications?
Only for messages where a delayed delivery is worse than no delivery — OTPs, 2FA codes, and real-time incident alerts. TTL: 0 means the push service discards the message if the device is not reachable right now. For flash sales or limited-time offers with a 5–15 minute window, a TTL of 300–900 s allows delivery to users who come online shortly after the send, without risking stale messages reaching users hours later.
How do I prevent retries firing after the TTL has expired?
Before scheduling any retry, compute ttl_remaining = (originalTTLSeconds * 1000) - (Date.now() - enqueuedAt). If ttl_remaining <= 0 or the calculated backoff delay exceeds ttl_remaining, skip the retry and route the payload to a dead-letter queue or analytics sink with discard_reason: ttl_exceeded. The exponential backoff implementation guide contains a reference TypeScript implementation of this check.
Does the TTL header affect FCM and APNs the same way?
No. FCM honors the TTL header (in seconds) per RFC 8030 for Web Push messages. APNs Web Push uses apns-expiration, which is a Unix timestamp — you must pass currentEpochSeconds + desiredTTL, not the duration. Passing the duration directly as apns-expiration results in near-immediate expiry (epoch + a few hundred seconds ≈ January 1970). Always validate outbound headers against each vendor’s specification before deploying TTL changes.