From Cron Jobs to Event Streams: The Architecture Shift That Kills Billing Tech Debt

If your billing pipeline starts with a cron job querying a PostgreSQL table at midnight on the 1st of the month, you have already lost.

Not in the philosophical sense. In the engineering sense. You have lost the ability to bill in real-time. You cannot detect an overage mid-cycle because nothing in your system evaluates usage until that cron job fires. You cannot support prepaid wallets because there is no mechanism to decrement a balance at the moment consumption occurs. You cannot offer committed-spend plans with real-time burn-down visibility because the "burn" is invisible until the batch runs. You cannot hold escrow against a wallet for a metered session in progress because the concept of "in progress" does not exist in a system that only sees the world in monthly snapshots.

And the worst part: you know all of this. Your team has known it for two years. But the migration keeps getting deprioritized because it is "just billing infrastructure" and the product roadmap always wins the sprint.

That deprioritization is the most expensive architectural decision your team is making. Not because the cron job will fail catastrophically — it probably won't. But because every feature that depends on real-time usage awareness — overage alerts, wallet debits, mid-cycle upgrades, quota enforcement, prepaid escrow, usage-based trial conversion — is architecturally impossible until you make this shift. The cron job is not technical debt. It is a feature ceiling.

The Feature Ceiling

The cron job is not technical debt. It is a feature ceiling. Every capability that depends on real-time usage awareness — overage alerts, wallet debits, quota enforcement, prepaid escrow — is architecturally impossible until you make the shift to event streaming.

The Contrarian Insight Nobody Wants to Hear at the Architecture Review

Batch processing is an autopsy. Event streaming is a live monitor.

An autopsy is useful. It tells you what happened. But it tells you after the fact, when the only available action is forensic — dispute resolution, retroactive credit, apologetic customer communication. A live monitor tells you what is happening now, when intervention is still possible — throttle the customer approaching their quota, debit the wallet before the balance goes negative, flag the anomalous usage spike before it becomes a $40,000 invoice surprise that your customer disputes and your finance team writes off.

The gap between these two modes is not a performance optimization. It is a fundamentally different relationship between your billing system and your business logic. Batch billing is a reporting function. Event-driven billing is a control plane.

Every engineering team that has attempted to bolt real-time capabilities onto a batch-oriented billing system — adding a "usage dashboard" that queries the same table the cron job reads, or building a "wallet" that gets reconciled during the monthly run — has discovered the same thing: you cannot get real-time semantics out of a batch architecture without rebuilding the architecture. The dashboard is always stale. The wallet balance is always approximate. The overage alert always fires too late.

Deferring the shift from batch-oriented to event-driven billing is the single most expensive architectural delay a SaaS engineering team can make. Not because the migration is hard (it is, but it is bounded). Because every month you defer it, you accumulate another month of features built on the wrong foundation — features that will need to be rewritten or abandoned when the migration finally happens.

Architecture Reality Check

If a customer disputes an invoice, can your engineering team trace the exact raw usage event — with its original timestamp, metadata, and dimensions — through the rating logic to the final line item on the invoice? Or does it vanish into a black-box SQL aggregation query that produces a number nobody can decompose?
If tracing a single line item back to its source events requires an engineer with production database access running ad-hoc queries across three tables, your billing system is not auditable. It is a liability.

The Three Phases of Migration (And Where Every Team Stalls)

The shift from batch to event-driven billing is well-understood in theory. In practice, it is a three-phase migration, and most teams stall permanently at Phase 1 — not because the remaining phases are technically harder, but because Phase 1 delivers just enough improvement to relieve the political pressure that funded the migration in the first place.

Phase 1: Instrument the Source — API Gateways Emit to Kafka

The first phase is deceptively simple in concept: instead of accumulating usage in a database table and reading it at month-end, you instrument your API gateways to emit a usage event to Kafka (or Kinesis, or Pulsar — the specific broker matters less than the commitment to an append-only event log) at the moment the API call occurs.

Each event carries the raw dimensions: tenant ID, product ID, metric name, quantity, timestamp, and whatever business-specific metadata your rating logic needs (tool name, agent ID, geographic region, response tier). The event is immutable. It is appended to a topic. It is never updated in place.

Where teams get this wrong: They treat the Kafka topic as a staging area for the same batch process. Events accumulate in Kafka, and then a consumer runs on a schedule — say, hourly — to aggregate them into the same PostgreSQL table the cron job used to read. This is not event-driven billing. This is batch billing with a message queue in front of it. The latency improved from 30 days to 1 hour, but the architecture is unchanged. You still cannot hold a wallet escrow in real-time. You still cannot enforce a quota mid-request. The event stream exists, but nothing is reacting to it.

The correct implementation is a continuously running consumer that processes each event (or micro-batch of events, for throughput) as it arrives, with sub-second latency between emission and processing. The event is the trigger, not the payload for a later batch.

What this requires at the gateway layer: Your API gateway — whether it is Kong, Apigee, AWS API Gateway, Azure APIM, or MuleSoft — needs a plugin or policy that intercepts every request and emits a structured metering event. This is not a logging integration. Logs are unstructured, best-effort, and typically processed for observability, not billing. Metering events are structured, guaranteed-delivery, and processed for revenue. The difference in reliability requirements is the difference between "we can tolerate losing 0.1% of log entries" and "we cannot tolerate losing a single billable event."

Phase 2: Real-Time Enrichment — Resolving Hierarchies via Redis

Raw usage events from the gateway carry identifiers, not context. The event knows that API key sk_live_a8f3... made a request. It does not know that this key belongs to Agent "SmartSearch-Prod," which belongs to Team "Data Engineering," which belongs to Customer "Acme Corp," which is on the "Growth" offering with graduated pricing and a 10,000-call monthly commitment.

Phase 2 is the enrichment layer: resolving the billing hierarchy at event time so the rated event carries the full context needed for settlement.

The hierarchy resolution chain: API Key → App/Agent → Team → Customer → Subscription → Offering → Rate Plan → Rate Plan Version → Metric Config (pricing model, rate, tiers, limits, overage behavior).

This chain involves data from multiple services. The API key lives in the key management service. The agent and team live in the customer service. The subscription, offering, and rate plan live in the pricing service. Calling these services synchronously for every event is a non-starter — you would add 50-200ms of latency to every API call and create a hard dependency between your metering hot path and your CRUD services.

The solution is a denormalized cache layer. Redis (or any low-latency key-value store) holds a pre-computed mapping from API key to the full billing context. The cache is populated by a background sync job that watches for changes in the upstream services — new subscriptions, plan changes, key rotations — and updates the Redis entries. The cache entry has a TTL (typically 30-60 seconds) to bound staleness, and a background refresh job ensures cache entries are warm before they are needed.

When a usage event arrives, the enrichment step is a single Redis lookup: key → full billing context. The enriched event now carries everything the rating engine needs: the pricing model, the rate, the tier boundaries, the overage behavior, the billing mode. It can be rated immediately.

Where teams stall at Phase 2: The hierarchy resolution is where the complexity of your data model becomes visible. If your subscription model is 1:1 (one customer, one plan, one set of rules), the cache is trivial. If your model supports M:N relationships — multiple rate plans per offering, multiple metrics per rate plan, per-metric pricing configurations with independent tier structures — the cache entry is a nested structure that must be kept consistent across multiple upstream change events. Most teams underestimate this complexity, build a naive cache, and then spend months debugging stale-cache bugs that cause incorrect billing. The enrichment cache is not a performance optimization. It is a core billing component that must be as correct as your invoice renderer.

Phase 3: Settlement Decoupling — Rating Is Immediate, Invoicing Is Asynchronous

This is the phase that most teams never reach, and it is the phase that unlocks everything.

The insight is simple but architecturally profound: rating an event and settling an event are two different operations that should happen at two different times.

Rating means: "This event consumed 1 unit of the api_calls metric, which is priced at $0.003/unit on the customer's current rate plan version, so the charge is $0.003." Rating should happen at event time, immediately, with the full billing context from the enrichment cache.

Settlement means: "Collect all rated charges for this customer for this billing period, apply the committed minimum (true-up if usage is below the commitment), apply discounts, compute tax, and either generate an invoice (postpaid), debit a wallet (prepaid), or split across both (hybrid)." Settlement should happen asynchronously, on a schedule or on a trigger, and it should be idempotent — running settlement twice for the same period must produce the same result.

Why this decoupling matters:

When rating is immediate, you can maintain a real-time running total of charges for any customer, for any metric, for any billing period. This running total is what powers mid-cycle dashboards ("You have used $347.28 of your $500 commitment"), overage alerts ("You are at 92% of your included quota"), and wallet balance visibility ("Your prepaid balance is $152.72 after today's usage"). None of these features require settlement to have run. They read from the rated-event stream directly.

When settlement is asynchronous, you can support different billing cadences for different customers without changing the rating logic. Monthly settlement, weekly settlement, threshold-triggered settlement ("generate an invoice when charges exceed $1,000"), or event-triggered settlement ("debit the wallet immediately for each rated event") — these are all settlement policies, not rating changes. The same rated events feed all of them.

And critically, when a customer disputes an invoice, the audit trail is complete: every line item on the invoice can be decomposed into the individual rated events that produced it, each of which can be traced back to the raw usage event with its original timestamp, dimensions, and gateway metadata. The black box is gone. The SQL aggregation query is replaced by a deterministic, reproducible pipeline.

Where teams get this catastrophically wrong: They rate and settle in the same operation. The monthly cron job reads raw events, rates them, aggregates them, applies discounts, computes tax, and generates the invoice — all in one monolithic transaction. This means you cannot have real-time visibility without running the full pipeline. You cannot support prepaid wallets without synchronously blocking the API call while the rating-and-settlement transaction completes. And you cannot audit a disputed invoice without re-running the entire pipeline with debug logging and hoping the inputs haven't changed since the original run.

The Aforo Architecture: This Problem Is Already Solved

Everything described in the three phases above — gateway instrumentation, hierarchy enrichment via Redis, decoupled rating and settlement — is the native architecture of Aforo's billing pipeline. Not a migration target. Not a roadmap item. The production architecture, running today.

Here is how the pieces map:

Ingestion. Aforo provides pre-built metering plugins for the five major API gateways: Kong (Lua plugin), Apigee (Shared Flow with KVM), AWS API Gateway (Lambda-based with CloudWatch integration), Azure APIM (outbound policy fragment), and MuleSoft (Anypoint custom policy). Each plugin emits a structured usage event to Kafka with the full dimensionality — tenant, product, metric, quantity, timestamp, and product-type-specific fields (tool name and agent ID for MCP servers, session ID for agentic APIs). For teams that prefer middleware-level instrumentation over gateway plugins, SDKs are available for Node.js, Python, Java, and Go — each with decorator/middleware patterns that wrap existing request handlers with zero application code changes.

Events are validated at ingestion time. The usage-ingestor service rejects events with timestamps more than 5 minutes in the future (clock skew tolerance) and more than 90 days in the past (stale data). Late arrivals — events that are valid but older than 24 hours — are flagged for reconciliation but not rejected. This is a conscious design choice: real-world metering data is messy, and a system that rejects late-arriving valid events will underbill customers and generate revenue leakage.

Enrichment. The BillingHierarchyEnricher resolves the full chain — API key to agent to team to customer to subscription to offering to rate plan version to per-metric configuration — via a Redis cache with a 10-minute TTL. An EntitlementCacheSyncJob runs every 30 seconds to keep the cache warm. The enriched event carries everything the rating engine needs: the pricing model, the per-unit rate, the tier boundaries, the included-free quota, the overage behavior, and the billing mode of the offering.

For MCP Server products (the newest product type in the platform), the enrichment also resolves the tool registry — mapping the tool_name dimension from the raw event to the tool's cost tier and dimension pricing overrides from the rate plan. This means per-tool pricing — charging $0.01 for a search tool invocation but $0.05 for a generate tool invocation — is resolved at enrichment time, not at settlement time.

Rating. Aforo's pricing engine supports six pricing models, all evaluated at event time: per-unit (billable units multiplied by rate, after subtracting the included-free allowance), flat-rate (fixed charge regardless of usage), percentage (raw units multiplied by rate divided by 100, with an optional minimum fee floor), included-quota (free up to the quota, then overage at the configured rate), graduated (each tier charged at its own rate — the staircase model), and volume-tiered (the entire volume charged at the rate of the tier where the total falls). The rated charge is written to a rated-events stream and the running total for the customer-metric-period combination is updated in real-time.

This is what enables the dashboard to show "You have consumed $347.28 of your $500 commitment this cycle" without waiting for settlement. The running total exists because rating happened at event time. It is not an estimate. It is the sum of individually rated events, each traceable to its source.

Settlement. Aforo's billing pipeline is a 10-stage, compositional pipeline: QuotaCheck, Rollover, Aggregate, Allowance, Rate, Commit, Discount, Tax, Route, Settle. Each stage is a discrete, testable unit. The Commit stage enforces minimum-spend commitments (true-up if actual charges fall below the floor) and maximum-spend caps (block or alert if charges exceed the ceiling). The Discount stage applies percentage or fixed-amount discounts, capped at the subtotal so charges can never go negative. The Route stage is the billing-mode dispatcher: postpaid charges go to invoice generation, prepaid charges go to wallet debit, hybrid charges split — wallet first until depleted, then the remainder to invoice.

Settlement is asynchronous and idempotent. It can run on any cadence — monthly, weekly, on-demand, or threshold-triggered. Running it twice for the same period produces the same result. And because the input is rated events (not raw events), the settlement pipeline does not need access to the pricing configuration or the enrichment cache. The rating decision is already embedded in the event. Settlement is aggregation, policy application, and financial routing — not computation.

Wallet holds. For prepaid and hybrid offerings, the system supports pessimistic wallet holds — escrow reservations against a wallet balance for in-progress sessions. A hold is created when a metered session begins (e.g., an MCP server session or a long-running API call), decrementing the available balance without finalizing the charge. When the session ends, the hold is either committed (converted to a real charge) or released (balance restored). Holds that exceed their TTL are automatically expired by the HoldExpiryScheduler. The wallet uses SELECT FOR UPDATE pessimistic locking to prevent double-spend race conditions.

This is impossible in a batch architecture. A wallet hold requires the system to react to a session-start event in real-time, check the balance, reserve funds, and make the reservation visible to concurrent requests — all within the latency budget of the originating API call. If your billing system only knows about usage after the monthly cron job runs, the concept of a "hold" does not exist.

The Compounding Cost of Deferral

The argument for deferring this migration is always the same: "We'll get to it when we have bandwidth." But the cost of deferral is not linear. It compounds.

Every month you operate on batch billing, your product team builds features that assume batch semantics. The dashboard queries the aggregation table. The alerts run on a schedule. The wallet reconciliation happens at month-end. Each of these features is a dependency on the batch architecture — and each one will need to be rewritten or shimmed when you eventually migrate.

At six months of deferral, you have a handful of features to rewrite. At two years, you have an entire product surface built on the wrong foundation. The migration is no longer three phases of infrastructure work. It is three phases of infrastructure work plus a full product rebuild. And at that point, the cost-benefit analysis that originally deprioritized the migration is inverted — but the migration is now three times as expensive as it would have been if you had started when you first knew you needed it.

The alternative is to not build the infrastructure at all. Use a billing platform that provides the event-driven architecture natively — where the gateway plugins, the enrichment cache, the real-time rating engine, and the decoupled settlement pipeline are the starting point, not the destination of a multi-quarter migration.

Audit Yourself: Three Questions for Your Next Architecture Review

1. What is the maximum latency between a customer making an API call and your billing system knowing about it? If the answer is measured in hours or days — because events accumulate in a table and the cron job hasn't run yet — you are operating blind for the majority of every billing cycle. You cannot enforce quotas in real-time. You cannot show accurate usage dashboards. You cannot hold wallet escrow for in-progress sessions. Every "real-time" feature your product team has asked for is architecturally blocked by this latency, and no amount of caching or clever queries will fix it without changing the underlying event flow.

2. Can your system place a hold against a prepaid wallet balance at the moment a metered session begins — and release or commit that hold when the session ends? This is not a theoretical feature. Prepaid and hybrid billing models are increasingly demanded by enterprise customers who want budget predictability with usage-based flexibility. Wallet holds require sub-second event processing, pessimistic locking on the wallet balance, and a hold-expiry mechanism for abandoned sessions. If your architecture processes usage in batches, wallet holds are not a feature you can add. They are a feature that requires a different architecture.

3. When a customer disputes a $12,847.33 line item on their invoice, can your team trace it — without production database access — to the individual rated events that produced it, and from each rated event back to the raw usage event with its original timestamp and gateway metadata? If the audit trail is a SQL aggregation query that sums raw events into a number, you have a black box. You can tell the customer the total, but you cannot show them why. In a batch system, the intermediate state — the individual rated events — typically does not exist as a first-class entity. It is computed on the fly during the batch run and discarded. In an event-driven system, every rated event is persisted, immutable, and traceable. The audit trail is the architecture, not a report someone has to build after the fact.

The shift from batch to event-driven billing is not a performance upgrade. It is a capability unlock. Real-time dashboards, mid-cycle quota enforcement, prepaid wallet holds, per-event audit trails, multi-cadence settlement — none of these are possible in a batch architecture, and all of them are table stakes for a modern usage-based billing platform.

The question is not whether your team will make this shift. It is whether you will make it proactively — as a deliberate architectural investment — or reactively, after two years of feature requests that all end with the same answer: "We can't do that with the current billing system."

One of those paths takes a quarter. The other takes a year. Choose now.

From Cron Jobs to Event Streams: The Architecture Shift That Kills Billing Tech Debt

Batch billing is an autopsy. Event streaming is a live monitor. This deep-dive covers the three-phase migration from cron-job billing to event-driven settlement architecture.

The Contrarian Insight Nobody Wants to Hear at the Architecture Review

Architecture Reality Check

The Three Phases of Migration (And Where Every Team Stalls)

Phase 1: Instrument the Source — API Gateways Emit to Kafka

Phase 2: Real-Time Enrichment — Resolving Hierarchies via Redis

Phase 3: Settlement Decoupling — Rating Is Immediate, Invoicing Is Asynchronous

The Aforo Architecture: This Problem Is Already Solved

The Compounding Cost of Deferral

Audit Yourself: Three Questions for Your Next Architecture Review

Deploy an Intercom-style billing model in 5 minutes.
No custom middleware required.

From Cron Jobs to Event Streams: The Architecture Shift That Kills Billing Tech Debt

Batch billing is an autopsy. Event streaming is a live monitor. This deep-dive covers the three-phase migration from cron-job billing to event-driven settlement architecture.

The Contrarian Insight Nobody Wants to Hear at the Architecture Review

Architecture Reality Check

The Three Phases of Migration (And Where Every Team Stalls)

Phase 1: Instrument the Source — API Gateways Emit to Kafka

Phase 2: Real-Time Enrichment — Resolving Hierarchies via Redis

Phase 3: Settlement Decoupling — Rating Is Immediate, Invoicing Is Asynchronous

The Aforo Architecture: This Problem Is Already Solved

The Compounding Cost of Deferral

Audit Yourself: Three Questions for Your Next Architecture Review

Continue Reading

The Intercom Blueprint: How to Build Outcome-Based AI Pricing Without Breaking Your Engineering Team

Build vs. Buy in 2026: The True Cost of Homegrown Enterprise Monetization

Why Every Internal Billing Engine Breaks at $10M ARR (And What the Postmortem Always Says)

Deploy an Intercom-style billing model in 5 minutes.No custom middleware required.

Deploy an Intercom-style billing model in 5 minutes.
No custom middleware required.