Breaking-News Push Alert Architecture
Breaking-news alerts invert every other push playbook. The message is worthless minutes after the event, so it needs maximum urgency, a very low TTL, and a collapse key — and it must fan out to millions of subscribers near-simultaneously. This guide covers the headers, the keys, and the throughput architecture that make that possible.
Quick answer
For breaking news, send with urgency: high, a low TTL (60–600 s) so stale alerts are dropped rather than delivered late, and a topic/collapse key so a correction supersedes the original instead of stacking. The dominant engineering problem isn’t the payload — it’s fan-out throughput: a single story can require millions of sends in seconds, so the architecture is a fast queue feeding batched, rate-limited workers. Keep the payload under the 4 KB limit with aes128gcm encoding; carry a headline and a deep link, fetch the article on click.
Why this playbook is different
The three axes flip. Where cart and SaaS messages tolerate latency, breaking news cannot — a TTL of minutes is a feature, not a limitation, because a late alert is misinformation. Where the others fan out per user or per segment, news fans out to your entire eligible audience at once. The hard constraint moves from payload size to raw throughput. This is one of three playbooks in the use-case playbooks reference, sitting at the extreme high-urgency, low-TTL, high-fan-out corner.
Headers and keys
Three send-time settings define a breaking-news alert:
urgency: hightells the push service to deliver immediately and wake the device if needed.- Low TTL (60–600 s) so an alert that can’t be delivered promptly is discarded rather than arriving stale.
topic(collapse key) shared across an event so a follow-up or correction replaces an undelivered original — an offline device that reconnects gets only the latest, not a pile.
Implementation
The fan-out enqueues one job; workers pull batches and send with the right headers, reading VAPID details from the environment — never hardcode the public key.
const webpush = require('web-push');
webpush.setVapidDetails(
'mailto:alerts@yourdomain.com',
process.env.VAPID_PUBLIC_KEY,
process.env.VAPID_PRIVATE_KEY
);
async function sendAlert(sub, story) {
const payload = JSON.stringify({
title: story.headline, // keep it short
body: story.summary,
data: { url: `/news/${story.slug}?src=push` } // article fetched on click
});
const subscription = {
endpoint: sub.endpoint,
keys: { p256dh: sub.p256dh_key, auth: sub.auth_secret }
};
return webpush.sendNotification(subscription, payload, {
TTL: 300, // 5 minutes — drop if not delivered promptly
urgency: 'high',
topic: `story-${story.id}` // collapse: a correction supersedes this alert
});
}
A worker pulls a batch off the queue and sends concurrently within a rate budget:
async function processBatch(queue, story, concurrency = 200) {
const batch = await queue.pull(concurrency); // pull from the fan-out queue
await Promise.allSettled(batch.map(async (sub) => {
try {
await sendAlert(sub, story);
} catch (err) {
if (err.statusCode === 410 || err.statusCode === 404) {
await queue.prune(sub); // dead endpoint
} else if (err.statusCode === 429 || err.statusCode >= 500) {
await queue.requeueWithBackoff(sub); // transient — retry later
}
}
}));
}
Steps to architect breaking-news alerts
- Decouple trigger from delivery. An editorial action enqueues one fan-out job; it never sends inline.
- Fan out through a fast queue sized for burst load. Choose the backing store and partitioning per scaling push queues with Redis or RabbitMQ.
- Batch and parallelize sends across worker pools, tuning batch size and concurrency against push-service rate limits — see message batching and throughput optimization.
- Set headers per send:
urgency: high, low TTL, sharedtopickey. - Handle failures inline: prune
410 Gone, requeue429/5xxwith backoff so a rate-limit spike doesn’t drop the audience. - Use the collapse key for corrections — reuse the same
topicso an update replaces the undelivered original. - Monitor time-to-deliver and CTR as the success metrics; for news, speed is the product.
Gotchas and edge cases
- High TTL on a time-critical alert. A long TTL means a stale headline lands hours later as misinformation. Keep TTL to minutes so undelivered alerts expire.
- No collapse key. Without a shared
topic, a correction stacks on top of the wrong original. Reuse the story-scoped key so the latest supersedes the rest. - Sending inline from the trigger. Blocking the editorial action on millions of sends guarantees a timeout. Always enqueue and fan out asynchronously.
- Ignoring
429under burst. A breaking story is exactly when you’ll hit push-service rate limits. Requeue with backoff rather than dropping; see retry logic and backoff. - Fat payloads. Embedding article text blows past the 4 KB
aes128gcmlimit. Send a headline and deep link; fetch the story on click.
Related
- Back to Use-Case Playbooks — the three playbooks and how their urgency/TTL/frequency profiles differ.
- Message Batching & Throughput Optimization — tuning batch size and concurrency for massive fan-out.
- Scaling Push Queues with Redis or RabbitMQ — the queue architecture behind burst-load delivery.
FAQ
What TTL should breaking-news alerts use?
A low TTL of roughly 60–600 seconds. The value of a news alert decays in minutes, so a long TTL just delivers a stale headline late, which reads as misinformation. A short TTL tells the push service to discard any alert it can’t deliver promptly, which is the correct behavior for time-critical news.
How do collapse keys help with breaking news?
A shared topic/collapse key lets a follow-up or correction replace an undelivered original carrying the same key. A device that was offline and reconnects then receives only the latest version of the story rather than a stack of superseded alerts, which keeps corrections accurate and avoids notification spam.
What's the main bottleneck in breaking-news delivery?
Fan-out throughput. A single story can require millions of sends within seconds, so the constraint is queueing and send concurrency against push-service rate limits, not payload size. The architecture decouples the trigger from delivery via a fast queue feeding batched, rate-limited worker pools.