Segmenting Push Subscribers by Behaviour

Behavioural segmentation groups subscribers by what they actually do — recent activity, how often they engage, and which notifications they click — rather than by static profile fields.

Quick answer

Score every subscriber on three behavioural axes — recency (how recently they engaged), frequency (how often they engage), and engagement (how often they click your pushes) — then bucket the scores into named cohorts such as champions, at-risk, and dormant. Drive the scores from an event stream, recompute them on a schedule, and target each cohort with copy and cadence tuned to its behaviour. This RFM-style model (recency, frequency, monetary, adapted here as recency/frequency/engagement) outperforms one-size-fits-all broadcasts on both click-through and opt-out retention.

Why behaviour beats attributes alone

Static attributes — locale, plan, country — tell you who someone is, but not whether they still care. A premium subscriber who has not opened a session in 40 days needs a different message than one who logs in daily. Behavioural signals capture that difference. This guide is the behavioural deep-dive for the broader push personalization and segmentation reference; pair the cohorts you build here with the templating and frequency-capping mechanics described there.

The raw material for behavioural scoring is an accurate event funnel. If you are not yet distinguishing delivered from displayed from clicked, instrument that first via delivery analytics instrumentation — behavioural segments are only as good as the events feeding them.

RFM is a decades-old retail framework — recency, frequency, and monetary value — and it transfers cleanly to notifications if you swap the monetary axis for engagement, because clicks and sessions are the currency of a push channel. The insight that makes RFM durable is that it does not try to predict the future from a single signal. A subscriber who bought once last year (high monetary, terrible recency) and one who browses daily but has never converted (great recency, no monetary) are different problems, and a single score would average away exactly the distinction you need to act on. Keeping the three axes separate lets you address each problem with the right message: re-activation for the lapsed buyer, conversion nudges for the engaged browser.

Step 1 — Define the three scores

Decide what each axis measures and what time window it spans. A common starting point:

Recency — days since last_seen_at or last_clicked_at. Lower is better.
Frequency — distinct active days or sessions in the last 30 days.
Engagement — push clicks divided by pushes received in the last 30 days.

Pick the window deliberately. Thirty days is a sensible default for most products: long enough to smooth out a quiet week, short enough that the score still reflects current behaviour. A high-frequency product like a news app might use seven or fourteen days so a lapse registers quickly; a low-frequency product like a travel site might use ninety so seasonal users are not misclassified as dormant. Whatever window you choose, the recency and frequency axes must use the same one, or the buckets describe different time periods and the cohort labels stop meaning anything coherent.

Step 2 — Compute scores in SQL

Use NTILE to bucket each axis into quintiles so the thresholds adapt to your actual distribution instead of hardcoded guesses. Quintiles are self-calibrating: a fixed threshold like “5+ sessions is frequent” ages badly as your base grows or your product’s natural cadence shifts, whereas NTILE(5) always splits the population into fifths, so “top quintile” means the same thing this quarter as last. The cost is that the buckets are relative — a quint-5 subscriber in a low-engagement base may be less active than a quint-3 in a high-engagement base — so quintile labels are for internal targeting, never for reporting absolute engagement to stakeholders.

-- RFM-style behavioural scoring (PostgreSQL)
WITH scored AS (
  SELECT
    id,
    -- recency: 5 = most recent
    NTILE(5) OVER (ORDER BY last_clicked_at ASC NULLS FIRST)        AS r,
    -- frequency: 5 = most sessions
    NTILE(5) OVER (ORDER BY session_count_30d ASC)                  AS f,
    -- engagement: 5 = highest click ratio
    NTILE(5) OVER (
      ORDER BY (click_count_30d::float / NULLIF(pushes_received_30d, 0)) ASC
    )                                                               AS e
  FROM push_subscribers
  WHERE status = 'active'
)
SELECT id, r, f, e,
  CASE
    WHEN r >= 4 AND f >= 4 AND e >= 4 THEN 'champion'
    WHEN r >= 4 AND e >= 3            THEN 'engaged'
    WHEN r <= 2 AND f >= 3            THEN 'at_risk'
    WHEN r <= 2                       THEN 'dormant'
    ELSE 'casual'
  END AS cohort
FROM scored;

Step 3 — Persist cohorts and target them

Write the computed cohort back to the subscriber row (or a side table) so campaign queries are a single indexed lookup. Persisting the label rather than recomputing it on every campaign query is the difference between a segment send that resolves in milliseconds and one that re-scans your whole base each time a marketer previews an audience. Store the label alongside a cohort_computed_at timestamp so you always know how stale a classification is and can decide whether a given campaign needs a fresh recompute first. Then build event-driven cohorts on top — subscribers who did a specific thing — by querying the event stream directly.

// Node.js: event-driven cohort — "viewed pricing, never upgraded, last 14 days"
async function pricingPageNonConverters(db) {
  const { rows } = await db.query(`
    SELECT s.id, s.endpoint, s.locale
    FROM push_subscribers s
    JOIN events e ON e.user_id = s.user_id
    WHERE e.name = 'view_pricing'
      AND e.occurred_at > NOW() - INTERVAL '14 days'
      AND s.plan_tier = 'free'
      AND s.status = 'active'
      AND NOT EXISTS (
        SELECT 1 FROM events u
        WHERE u.user_id = s.user_id AND u.name = 'upgrade'
      )
    GROUP BY s.id, s.endpoint, s.locale
  `);
  return rows;
}

Combine the standing RFM cohort with event-driven triggers: an at_risk subscriber who viewed pricing is a far stronger re-engagement target than either signal alone. Route these into re-engagement campaign strategies, and remember every send still goes through encryption — payloads must stay under the 4 KB aes128gcm limit and use process.env.VAPID_PRIVATE_KEY server-side.

Notice the difference in cost and freshness between the two query styles. The RFM query touches every active subscriber and is therefore something you run on a schedule and cache; the event-driven query touches only the subscribers who did one specific thing recently and is cheap enough to run at send time. This is why the standing cohort is stored and the triggered cohort is computed on demand — running RFM at send time would not scale, and storing every possible event-driven cohort would be a combinatorial mess.

Step 4 — Recompute on a schedule

Behavioural scores decay. Run the scoring query on a cron (hourly for high-velocity products, daily for most) and let event-driven cohorts resolve at send time since they are cheap, targeted, and time-sensitive.

How you wire the schedule depends on your stack, but two properties matter regardless. The job should be idempotent — running it twice in a row produces the same labels — so a retried or overlapping run never corrupts state. And it should be observable: log how many subscribers landed in each cohort on each run, because a sudden swing (every subscriber suddenly “dormant”) almost always means an upstream pipeline broke and stopped writing last_seen_at, not that your audience genuinely went quiet overnight. Treat the distribution of cohort sizes as a health metric in its own right.

Finally, connect the cohorts to action. A behavioural segment that no campaign ever targets is just a column taking up space. The point of classifying subscribers is to send each group the message its behaviour calls for: champions get early access and advocacy asks, engaged users get conversion nudges, at-risk users get a reason to come back before they lapse, and dormant users enter a deliberate win-back sequence. The mechanics of those messages — templating, frequency caps, timezone gating — all live in the parent guide, so behavioural scoring and personalization are designed to be used together rather than in isolation.

Gotchas & edge cases

New subscribers skew quintiles. Someone subscribed yesterday has no history and lands in the lowest frequency bucket. Exclude subscribers younger than your scoring window or give them a neutral “onboarding” cohort.
Zero-division on engagement. Subscribers who received no pushes yet have an undefined click ratio. Guard with NULLIF(pushes_received_30d, 0) and treat null as the lowest engagement.
Recency and delivery are not the same. A long recency gap can mean a dead endpoint, not disinterest. Cross-check against 410 Gone bounces before classifying someone dormant — see handling 410 Gone responses at scale.
Cohorts drift between recompute runs. A subscriber can act between your nightly job and the send. For high-stakes campaigns, re-validate the critical predicate at send time rather than trusting a stale label.
Over-segmenting fragments reach. Slicing into dozens of micro-cohorts leaves each too small to test or learn from. Start with five or six cohorts and split only when one is large enough to justify it.

Back to Push Personalization & Segmentation — the full segmentation and templating reference.
Re-engagement campaign strategies — what to send dormant and at-risk cohorts.
Delivery analytics instrumentation — the event funnel behavioural scores depend on.

FAQ

How often should I recompute behavioural scores?

Match the cadence to your product’s velocity. Daily recomputation is enough for most apps; high-velocity products (news, marketplaces) benefit from hourly. Event-driven cohorts that depend on a recent specific action should resolve at send time rather than from a stored label.

What is the difference between an RFM cohort and an event-driven cohort?

An RFM-style cohort is a standing classification based on aggregated recency, frequency, and engagement scores, recomputed on a schedule. An event-driven cohort is resolved on demand by querying the event stream for a specific action (“viewed pricing in the last 14 days”). The strongest targeting combines both.