The $99 per user per month pricing model is obsolete. Not metaphorically. Not in five years. Right now.
Your most sophisticated customer just deployed an AI agent that processes 500 tokens per execution across 10,000 monthly invocations. That's 5 million tokens ingested by a single "user" in a single month—potentially $150 in compute cost on your infrastructure, yet your billing system sends them the same invoice as the customer with 2,000 monthly API calls. The math is broken. The architecture is broken. The pricing model itself is broken.
When your product's value isn't measured in "seats occupied" but in "signals processed"—tokens ingested, tools invoked, agent sessions completed, queries executed—the fundamental assumption of subscription billing collapses. A seat assumes passive consumption. An AI agent's tool invocation is active, variable, and explosively high-cardinality. One deployment choice (batch vs. streaming, synchronous vs. asynchronous) can shift a customer from $99/month to $4,000/month in compute costs. Your billing system needs to capture that signal in real-time and route the margin difference to the right settlement path before your invoice ships—or you're leaving money on the table, misaligning incentives, and watching churn accelerate as customers game your pricing structure.
The AI economy has made legacy subscription billing architecturally obsolete. This isn't a feature problem. It's not about adding "usage tiers" to a Stripe integration. It's about rethinking how usage events flow into your system, how they're enriched with billing context, how they're rated against multi-dimensional pricing rules, and how they're routed to settlement without ever touching your transactional database. The infrastructure that worked for seat-based SaaS doesn't scale for signal-based AI products. And if you're still using it, your competitors who aren't—they're already pricing smarter, shipping faster, and capturing margins you don't even know you're leaving behind.
The Signal-Based Reality
One AI agent deployment can shift a customer from $99/month to $4,000/month in compute costs. Your billing system needs to capture that signal in real-time — not at the next cron job — or you are subsidizing your most expensive customers with your smallest.
The $99 Illusion: When a "Seat" Stops Meaning Anything
The subscription model was born from a simple, elegant idea: charge per user, per month, in advance. Email providers do it. Project management tools do it. CRM vendors do it. The math is straightforward. Fixed cost per seat. Predictable revenue. Easy to forecast. Your billing system is a simple table with a user ID, a plan ID, and a monthly price. A cron job runs at 11 PM every night, counts the seats, generates an invoice, and you sleep soundly knowing the revenue is locked in.
Then AI products entered the picture, and that simplicity evaporated.
A seat-based model assumes that every user consumes roughly the same amount of your product. A user in Slack sends maybe 50 messages a day. Another user sends 500 messages a day. Both are paying the same $12.50/month. Slack's infrastructure absorbs the 10x variance without breaking a sweat because message storage and delivery are cheap. The revenue from a thousand seats covers the compute cost easily.
But here's where AI products diverge radically. An AI agent deployed with streaming prompt evaluation might invoke your tools 100,000 times per month. A second agent using the exact same APIs, deployed with batching and caching, invokes tools 5,000 times per month. The infrastructure cost difference is massive. The first agent might cost you $5,000 in compute. The second costs you $250. Yet if both customers are paying $99/month for "access to our AI platform," you're subsidizing the heavy users and underpricing the light users simultaneously. You're also destroying the economic signal that should encourage customers toward efficient usage patterns. You're misaligned. Your product team has no visibility into what the infrastructure actually costs for each customer. Your sales team is discounting based on gut feel, not economics. Your CFO has no idea what your gross margin really is.
This isn't speculation. Companies building AI products have already hit this wall. A founding team at a Y Combinator batch shipped an AI assistant SaaS with seat-based pricing. Six months in, their largest customer was processing 50x the token volume of their median customer. The customer was paying $4,995/month for the "Enterprise" tier. The actual infrastructure cost to serve them was $18,000/month. The company was losing $13,000 a month on that one customer. They had no visibility into it. No dashboard that showed "this API key is consuming 47% of our monthly compute budget." No mechanism to either (a) migrate that customer to a usage-based model that captures the true cost, or (b) optimize their implementation to bring the cost down.
The billing system didn't see signals. It saw seats. And because it saw seats, the company saw revenue, but their finance team saw a different number: negative gross margin.
This is the $99 illusion. It feels simple until your product's value isn't the seat—it's the signal. The token. The invocation. The API call. The agent session. The tool execution. Once that happens, charging per seat doesn't even measure the right thing anymore. You're charging for the container and ignoring what's inside it.
The Hotel Room vs. The Power Grid: Why This Metaphor Matters
Here's a mental model that separates seat-based pricing from signal-based pricing so clearly that your billing team will never confuse the two again.
Imagine you own a hotel. Your traditional pricing model is: $150 per room per night. A guest checks in, occupies the room for eight hours, checks out. The room is booked. Revenue is locked. Your cost structure is mostly fixed: one housekeeping staff per 20 rooms, depreciation on the building, property taxes. The occupancy drives your utilization rate, not the actual cost per night. Whether the guest turns on 2 lightbulbs or 40 lightbulbs, leaves the AC running at 60°F or 78°F, they pay the same $150. The incremental cost difference is negligible. The hotel room works under a seat model because occupancy, not consumption, is what drives your cost.
Now imagine you also own a power company. Your traditional pricing model would be: $0.12 per kilowatt-hour consumed. A household that runs the AC all day in summer might use 60 kWh. A household that doesn't could use 15 kWh. You're not charging them per "connected household." You're measuring actual consumption and billing for it. You can't charge both households $150/month for "access to the grid." One household consumes four times the electricity. The infrastructure cost is four times higher. If you charged them the same, you'd be destroying the pricing signal that should encourage conservation and would be underpricing the heavy users.
Seat-based billing is the hotel model. Signal-based billing is the power grid model.
For decades, SaaS products have tried to squeeze themselves into the hotel model because subscription billing is predictable and simple. It works great when consumption variance is low (most users consume roughly the same amount) and when the cost of consumption is low relative to the fixed cost of operating the product. Slack works because message storage is cheap. Notion works because document storage is cheap. The variance in consumption across users is real, but the cost of serving high-consumption users isn't catastrophic enough to require per-unit pricing.
AI products broke that assumption. An AI agent's token consumption isn't cheap. It's the dominant cost driver. A customer who deploys a heavily-used agent with streaming prompt evaluation might consume 50x the compute of a customer with a minimally-used agent. That's not a 2x or 3x variance like in traditional SaaS. That's 50x. If you're charging both customers per "seat"—meaning per agent, or per API key, or per "deployment"—then you're running the power company on a hotel billing model. You're charging the customer who uses 50x the compute the same rate as the customer using minimal compute. You're misaligned on price signals. You're misaligned on incentives. You're leaving money on the table on one side and subsidy on the other.
The move from seat-based to signal-based billing isn't a feature you add to your subscription tool. It's a fundamental architectural shift in how your product measures and monetizes value.
Here's why this metaphor matters: It's not about being "usage-based"—that's just a pricing label. It's about whether your billing system is actually measuring and capturing the variance that drives your cost. A $99/month product with a "usage tier" that charges extra after 10,000 API calls is still seat-based thinking. You're charging for occupancy (the $99 tier) and then a little bit for consumption on top (the overage). You're not measuring the actual driver of cost. You're layering usage meters on top of a seat model instead of replacing the seat model with a signal model.
Aforo's approach is different. Instead of "seats + overages," it's all signals. From the moment an event enters the system—a tool invocation, a token count, a session created—it's metered, rated, and routed to settlement based on the actual cost to serve it and the actual contract terms for that customer. There's no concept of a "seat" that sits idle and costs nothing. There's no "overage" tacked on top. There's just the signal, measured and monetized in real-time.
The Architecture That Broke: Batch Processing in a Real-Time World
Legacy subscription billing is built on batch processing. Here's how it works in most SaaS companies today:
Your product writes usage data to a PostgreSQL table called usage_events. Each row represents a single event: an API call, a document created, a file uploaded. At midnight (or some scheduled time), a cron job wakes up, queries the table for all events from the past 24 hours, aggregates them by user and plan, applies some simple logic (maybe a price per unit, maybe a tiered model), and writes invoice line items to a second table. An accountant might review the results. A few hours later, invoices are sent. Revenue is recognized. Done.
This architecture works fine when:
- You have time to wait until tomorrow to discover how much a customer owes.
- Usage variance across customers is low enough that you don't need to react to outliers in real-time.
- You don't have regulatory or contractual requirements to enforce spending limits instantaneously.
- The number of unique dimensions in your usage (user ID, product type, location, etc.) is small and predictable.
None of those assumptions hold in AI products.
First, high-cardinality dimensions explode. An AI agent might have 10,000 unique tools. Each tool might be invoked with a different cost model. You need to know not just "this agent consumed 1M tokens" but "tool-A-consumed-750K-tokens, tool-B-consumed-200K-tokens, tool-C-consumed-50K-tokens, and we have per-tool pricing overrides in the rate plan." That's not a simple dimension. That's a multi-dimensional signal space that changes per customer, per plan, per tool, per day.
Second, batch processing creates a lag. If a customer hits a spending cap at 3 PM on a Tuesday, you don't find out until the cron job runs at midnight. Your API continues processing requests from that customer. By 11 AM Wednesday, they've spent another $2,000 over their cap. Now what? Do you refund it? Do you dispute the overcharge? Do you let them off? The window between "the event happened" and "we measured the event" is long enough for bad things to happen. In regulated industries (financial services, healthcare), this lag is unacceptable. In competitive AI markets, it's operationally sloppy.
Third, the cron job itself becomes a bottleneck. If you have 10,000 customers and each customer needs 50 different aggregations of usage data (per team, per API key, per tool, per time window), that's 500,000 aggregation queries running at midnight. Your PostgreSQL database sees a 50x spike in load for 30 minutes. Your other applications (API requests, writes from your product) get throttled. This is why many companies move usage analytics to a separate "analytics database" (usually ClickHouse or Redshift)—but that creates a new problem: now you have two databases that need to stay in sync, and the billing-critical reads from your transactional database still bottleneck around the midnight cron job.
LEGACY (24-hour lag) AFORO (sub-second)
subscriptions table Usage Events
┌──────────────────┐ ┌──────────────────┐
│ id │ plan │ price│ │ tool_name │
└────────┬─────────┘ │ agent_id │
│ │ session_id │
midnight cron ⏰ │ token_count │
│ └────────┬─────────┘
▼ │ (real-time)
┌──────────────────┐ ▼
│ invoices table │ Kafka → Enricher (Redis)
└──────────────────┘ │
▼
"We'll find out tomorrow" 10-Stage Pipeline
│
┌────────┼────────┐
▼ ▼ ▼
Invoice Wallet Hybrid
The deeper issue is architectural: batch processing assumes that measurements and actions can be decoupled. You measure everything at night, then you act on it in the morning. But in AI products, measurement and action need to happen together. The moment a customer hits a spending cap, you need to know about it. The moment an anomalous usage pattern appears (a customer burning through tokens at 10x their normal rate), you need to flag it. The moment a new customer signs up with a different pricing model than the last one, you need to apply the right rate to their events immediately. Batch processing makes all of this hard because there's a gap between "event happened" and "we know about it."
The Event-Stream Shift: From Cron to Kafka
The companies that are winning in AI products have already made the shift. They're not running cron jobs at midnight. They're flowing usage events into a streaming architecture the moment they happen.
Here's what a real-time event-stream billing architecture looks like:
Stage 1: Ingest & Durability
Usage events—tool invocations, token counts, API calls, agent sessions—hit your API and are immediately written to Kafka with transactional guarantees. At-least-once delivery. Not fire-and-forget. A producer waits for Kafka to acknowledge the write to at least three brokers before returning success to the client. This guarantees that even if your application crashes, the event is durable. No usage data is ever lost. This is non-negotiable for billing.
Stage 2: Enrichment at Ingest Time
The raw event is bare-bones: { tool_name: "search", token_count: 512, timestamp: ... }. But to rate it, you need to know: What customer does this tool belong to? What team? What API key was used? What subscription does that customer have? What rate plan? In a legacy architecture, you'd enrich this information during the batch process at night by joining against your customer, team, and subscription tables. But that creates the lag problem: if a subscription changed at 3 PM, the batch job won't know about it until midnight.
The event-stream approach enriches at ingest time. A BillingHierarchyEnricher service listens to the raw event stream and looks up the billing context: API key → team → customer → subscription → rate plan. This lookup happens in microseconds because it's cached in Redis with a 10-minute TTL. The event is re-emitted with all the enrichment added: { tool_name: "search", token_count: 512, customer_id: "cust_xyz", subscription_id: "sub_123", rate_plan_id: "plan_premium", timestamp: ... }. Now every downstream service has the full billing context available immediately. No joins needed later.
Stage 3: Real-Time Rating (The 10-Stage Pipeline)
The enriched event flows into Aforo's 10-stage billing pipeline:
- QuotaCheck: Does this customer have a spending cap? If they're already at 80% of it, flag the event as "near-limit."
- Rollover: Does this rate plan have a monthly rollover provision (e.g., 1000 free tokens per month that reset on the 1st)? Decrement the customer's available rollover.
- Aggregate: Combine related events into a single billable unit (e.g., multiple tool invocations in a single agent session become one "session event").
- Allowance: Apply any included limits or free tiers (e.g., "first 1M tokens per month are free").
- Rate: Apply the pricing model. Is it per-unit? Graduated? Volume-tiered? Percentage-based? Included quota with overage? Flat rate? Apply the right model based on the rate plan, and emit a
MetricChargewith the actual cost. - Commit: Apply minimum spend guarantees and maximum spend caps. If the customer is under their minimum spend, mark as "true-up." If they hit their cap, either block further processing or alert the customer.
- Discount: Apply any discount rules (percentage discount, fixed-amount discount). Cap at subtotal (you can't have negative charges).
- Tax: Apply tax rules if applicable (VAT, sales tax, jurisdiction-based rules). For now, most AI products set
taxAmount = 0. - Route: Route the charge to the right settlement path. Is this a postpaid customer (bill later)? Route to invoice. Prepaid customer (wallet)? Debit the wallet. Hybrid customer? Wallet first, remainder to invoice.
- Settle: Settle the transaction. For invoice path: create a line item and add to the next invoice. For wallet path: create a hold and debit immediately (or batch-settle every hour).
This entire pipeline executes in single-digit milliseconds, per event, without ever touching your transactional database. The rated charge is written to Kafka on a billing.charges topic where your invoicing system picks it up asynchronously.
Stage 4: Persistence & Analytics Separation
Aforo separates analytics from billing at the database level. Billing-critical data (subscriptions, rates, line items, holds) lives in PostgreSQL with strong consistency guarantees and ACID transactions. High-cardinality analytics (per-customer, per-tool, per-window usage aggregates) flow into ClickHouse, a columnar analytics database designed for this workload.
Why the split? Because the two have fundamentally different performance characteristics. PostgreSQL is optimized for small, consistent writes and reads (billing state changes). ClickHouse is optimized for massive fan-out analytics (answers like "give me token consumption per tool per customer per day for 10,000 customers in the past 90 days" in milliseconds). If you try to run both workloads on the same database, they interfere with each other. The analytics query locks the billing table while it scans. The billing write waits. Your invoice generation slows down. Now customers are complaining that they don't see usage dashboards, and your finance team is complaining that invoices are shipping late. The separation solves both.
💡 Architecture Reality Check: If your most active user deploys an AI agent that consumes 50x the compute of your median user, does your billing system automatically capture that margin difference in real-time, or are they both sitting in your system paying the same $99/month until tomorrow's cron job runs? Can you add a new rate plan at 3 PM and have it apply to events ingested at 3:15 PM, or does the new plan only kick in tomorrow? Can you see "this customer hit 80% of their spending cap, and we've already alerted them 3 minutes after the event"? If you're still running batch processing for any of these, you're flying blind.
Four Product Types, Four Metering Surfaces
Here's where the architecture becomes powerful: it doesn't force every product into the same metering model.
┌─────────────────────────┬──────────────────────────────┐ │ STANDARD API │ AGENTIC API │ │ │ │ │ Customer → Team → App │ Customer → Team → App │ │ → API Key │ → API Key │ │ │ + agent attribution │ │ Metered: per request │ Metered: per request │ ├─────────────────────────┼──────────────────────────────┤ │ AI AGENT │ MCP SERVER │ │ │ │ │ Customer → Team │ Customer → Team │ │ → Agent → Key │ → Agent → Key │ │ │ │ │ Metered: per session │ Metered: per tool call │ │ + tool-call granularity│ + per-tool pricing │ │ │ + session lifecycle │ │ │ + token attribution │ └─────────────────────────┴──────────────────────────────┘ Complexity →→→ (simple) ─────────────────→ (complex) Your system must handle ALL four simultaneously.
Aforo natively supports four product types, each with a different surface for monetization:
Product Type 1: Standard API
The customer owns an API endpoint. Usage is metered at the API endpoint level. Events: requests, response codes, latency. Billing hierarchy: Customer → Team → App → API Key (usage at app-instance level). This is your traditional API metering. Same as Stripe, Plaid, or any API provider.
The customer deploys two applications. App A makes 10,000 requests/month. App B makes 100,000 requests/month. You have two API keys, two rate limits, two usage streams. Billing is straightforward: App A costs less because it uses less.
Product Type 2: Agentic API
The customer owns an AI agent that uses your APIs internally. Usage is metered at the agent invocation level. Events: agent executions, token processing, tool calls within the agent. Billing hierarchy: Customer → Team → App → API Key (usage at app-instance level). But the granularity is different from Standard API: you're not counting individual HTTP requests, you're counting higher-level agent behaviors (the agent invoked my search tool 47 times during this execution).
The customer runs a search agent during business hours (high invocation rate) and a batch analytics agent overnight (fewer invocations, more tokens per invocation). Both use the same API key, but Aforo meters the distinction: search agent invocations are rated differently than analytics agent invocations because they have different tool signatures and cost profiles.
Product Type 3: AI Agent
Now the customer isn't using your APIs directly. Instead, they're deploying an AI agent on your platform. You're hosting and running the agent. Usage is metered at the agent session level. Events: agent sessions created, duration, tools invoked within sessions, tokens processed. Billing hierarchy: Customer → Team → Agent → API Key (usage at agent level).
This is where it gets interesting. A customer might own 10 agents. Some are light (used sporadically, cost $50/month in compute). Some are heavy (used by 1000 customers, cost $50,000/month in compute). You need to see that variance. Aforo meters at the agent level so you can see "Agent A cost $50, Agent B cost $50,000, so we're charging Agent B's customer differently."
Product Type 4: MCP Server
The customer owns a Model Context Protocol (MCP) server—a tool that AI agents invoke. You're metering tool-level invocations. Events: tool invocations, tool execution status, execution duration, input/output token pairs. Billing hierarchy: Customer → Team → Agent → API Key (usage at agent/session level).
This is the highest-cardinality metering case. A single customer might have 10,000 tools. Each tool might be invoked 100,000 times per month. Each invocation has multiple dimensions: tool name, execution status (success/error), execution duration, token counts. If you're still running a cron job to aggregate this, you're building a query that fans out across millions of rows. With event-stream processing, each tool invocation is a single event that flows through the pipeline in milliseconds.
The Aforo Edge: Multi-Dimensional Rating Before Settlement
Here's the critical differentiator that separates event-stream architecture from what legacy billing vendors do: rating happens before settlement, across multiple dimensions, with no joins needed.
In a traditional billing system, you have a simple pricing model. Maybe you charge $0.001 per API request. On the 1st of the month, the cron job runs, sums up all requests for each customer, multiplies by $0.001, and generates an invoice. The pricing model is static and rigid. It's the same for every customer.
Aforo's architecture allows for multi-dimensional pricing that's customer-specific, time-specific, and signal-specific. Here's what that looks like:
Per-Tool Pricing Overrides
A customer buys a rate plan for your MCP Server product. The base rate is $0.0001 per tool invocation. But the customer negotiates a special rate: Tool A (search engine) costs $0.00005 per invocation (it's cheap, they use it a lot), while Tool B (advanced analytics) costs $0.0003 per invocation (it's expensive, specialized). This pricing variance is encoded in the rate plan as dimensionPricing: { "Tool A": 0.00005, "Tool B": 0.0003 } in a JSONB column.
In legacy billing, this would require custom scripting. You'd have to add a dimension to your invoice line items, then post-process invoices to apply the custom rates. It's a one-off hack. In Aforo's architecture, the rate stage of the pipeline checks the dimensionPricing JSONB for tool-specific overrides, applies them in-line, and emits the correct charge. No post-processing. No hacks. It's just a data structure.
Graduated Pricing with Real-Time Calculations
Another customer's rate plan has graduated pricing: first 1M tool invocations cost $0.0001 each, next 5M cost $0.00005 each, anything over 6M costs $0.000025 each. This is volumetric—the tier changes based on cumulative monthly consumption. In a batch system, you'd have to calculate this after the month ends. In Aforo's real-time system, the rate stage receives a MetricUsage object that includes tier information calculated from the customer's current month-to-date consumption (updated every time an event is enriched). The rate stage applies the staircase calculation in-line: if the customer is at 950K invocations and this event brings them to 950.5K, the first 50K of this event is charged at $0.0001, the next 450.5K at $0.00005. Charge is calculated correctly, in real-time, without any batch processing.
Session-Based Minimums
Some customers negotiate minimum spend guarantees: "We're committing to at least $5,000/month in compute costs." In a batch system, you'd calculate this at the end of the month and add a line item for the shortfall. In Aforo's system, the commit stage of the pipeline maintains a running tally of charges for the month and compares against the minimum in real-time. If the customer hits the minimum on day 15, Aforo can flag "true-up satisfied" and route subsequent charges differently (maybe they tier down to a lower plan). If they don't hit the minimum by day 20, Aforo can pro-actively alert the sales team: "This customer is on pace to owe a $3,200 true-up. They might churn if we don't intervene."
Spend Caps with Instantaneous Enforcement
Another customer caps their monthly spend at $10,000. They're a startup that's afraid of surprise bills. In a batch system, they'd hit the cap on day 25, keep using the API until the cron job runs on day 30, and then you'd have to refund them the overages. In Aforo's system, the quota check stage monitors the cap in real-time. At 2:47 PM on day 20, the customer's cumulative charges hit $9,995. The next event comes in. The quota check stage sees the customer is 99.95% of cap and emits a flag. The API gateway sees the flag and either (a) starts throttling that customer's requests to prevent overshooting, or (b) rejects new requests with a 429 "too many requests" status. The customer is never overbilled. They're never surprised. The system enforces the cap at the event level, not at the end-of-month reconciliation.
Multi-Path Settlement in Parallel
Here's the real power: the route stage makes settlement decisions in real-time, and different customers can have different settlement models simultaneously.
Customer A is postpaid (bill me at the end of the month). Events flow through the pipeline, charges accumulate, and on the 1st of the month, Aforo generates a single invoice with all the line items.
Customer B is prepaid (I've loaded $1,000 of credits into my wallet). Events flow through the pipeline, charges are emitted, and the wallet service immediately debits the customer's wallet. When the wallet hits $100, an automation alerts the customer that they're low on credits.
Customer C is hybrid (I'll prepay $500 and have the overflow billed to my credit card). Events flow through the pipeline, the wallet service debits first, and when the wallet is exhausted, the route stage switches to invoice mode. The remainder of the charge flows to an invoice.
All three customers are using the same infrastructure, the same pipeline, the same 10 stages. But the route stage makes a decision based on the subscription's billingMode and routes accordingly. No special casing. No custom code. Just a data-driven decision at the right place in the pipeline.
Why Batch Thinking Will Kill Your Product Roadmap
Here's the hard truth that companies running batch billing systems don't want to admit: your product roadmap is constrained by your billing architecture.
You want to offer a new feature—maybe token-counting on a per-token basis instead of per-request. In a batch system, you'd have to:
- Add a new column to the
usage_eventstable:token_count. - Update your ingestion code to track token counts.
- Update the cron job to read the new column.
- Update your pricing logic to handle both "request-based" and "token-based" rates.
- Update your invoice generation to show the new line items separately.
- Run a historical backfill to recalculate invoices for the past month (because you realized the new feature should apply retroactively).
- Pray your finance team doesn't get upset about the invoice corrections.
This takes weeks. It's risky. By the time you ship it, your competitors have already shipped the same feature.
In Aforo's event-stream architecture, you'd:
- Add a new field to the raw event:
token_count. - Add a new line to the
dimensionPricingJSONB in the rate plan:token_based_model: true. - Tell the rate stage to check for this flag and use token-based pricing if present.
- Deploy.
The feature is live within hours. All existing events flowing through Kafka are immediately rated against the new model. You don't have to reprocess history because each event is processed once, at the moment it's ingested. Historical events are already settled. New events going forward use the new model.
This is why the companies building the best AI products (the ones that ship feature updates monthly, not quarterly) have already moved to event-stream architectures. Not because they're trying to be "cutting edge." But because the batch-processing constraint makes them unable to move fast enough to compete.
The Talent Problem: Where Batch Architects Can't Reach
There's another cost to batch thinking that most companies don't talk about: it locks in certain architectural patterns and makes it harder to hire the right talent.
A traditional billing system needs: a backend engineer who understands SQL well, a data analyst who can write complex aggregation queries, maybe a finance person who understands invoice rules. It's a well-mapped domain. You can hire from a big talent pool.
An event-stream billing system needs: an engineer who understands Kafka and streaming aggregation, someone who can think about distributed consensus and exactly-once delivery semantics, someone who's comfortable with columnar databases like ClickHouse, someone who understands how to enrich events with contextual lookups without creating bottlenecks. That's a smaller talent pool. It's also more expensive. But here's the trade-off: once you have that person, they unblock your entire product roadmap. They're not a specialist in "billing." They're a generalist in distributed systems who happens to be applying it to billing. And they can ship 10x faster.
The companies with the best billing systems didn't hire "billing engineers." They hired platform engineers and data engineers and gave them a problem: "We need to rate usage events at real-time scale." The solution they built happens to be excellent for billing, but it's actually just excellent infrastructure.
Audit Yourself: 3 Questions Your Billing Architecture Can't Answer
If you're still running batch processing for your billing system, ask yourself these three questions. If you can't answer them with confidence, your architecture is broken.
1. Do you know, in real-time, whether a customer has exceeded their spending cap?
If a customer has a $10,000/month budget, and they're currently at $9,950, can you tell them with a 30-second latency that they have $50 left? Or do they discover it when the cron job runs at midnight and they've already spent $11,000? Real-time spend tracking is non-negotiable in regulated industries. It's also the difference between a customer who trusts your platform and a customer who's terrified of unexpected bills and eventually churns. If your system can't answer this question without waiting for a batch job, you're operating in the dark. You need to fix this.
2. Can you ship a new pricing model for a subset of your customers without impacting existing customers?
A new customer wants to buy at a custom rate (maybe they negotiated per-tool pricing overrides, or graduated pricing with different tier breakpoints). Can you add their custom pricing model to your system and have it apply to events ingested at 3 PM today? Or does it only apply tomorrow after the cron job runs? If it's the latter, you can't do same-day onboarding without a manual reconciliation step. You can't move fast. If it's the former, you have the flexibility that the best SaaS companies have: ship custom pricing models without blocking on batch processing.
3. Can you correlate a sudden spike in usage with the actual product changes that caused it?
A customer's token consumption suddenly 50x for three days, then drops back to normal. Why? Did they deploy a new agent? Did they enable a feature flag? Did they optimize their prompts? In a batch system, you see the spike in the aggregated numbers at the end of the day, but you have no way to correlate it back to the actual events that caused it. You have to ask the customer. In a real-time system, you can query the event stream and see the exact product behaviors that led to the spike (which agents were active, which tools were invoked, which execution patterns changed). You can learn from your customers' usage patterns and recommend optimizations. You can intervene proactively if something looks wrong. If you can't see into the event stream in real-time, you can't do any of this.
The Cost of Waiting: What Happens If You Don't Migrate
Batch billing isn't just slow. It's actively harmful to your product strategy when you're competing in AI markets.
Your competitor has migrated to event-stream architecture. They ship a new pricing model every two weeks. They can offer custom per-tool pricing to enterprise customers in 24 hours. They see usage anomalies in real-time and proactively optimize customer implementations. They charge accurately based on actual compute cost, so their gross margins are predictable and their pricing is defensible.
You're still running cron jobs. You ship pricing updates quarterly (because every new model requires QA, invoice testing, and a month of validation). Custom pricing for enterprise customers takes three weeks of engineering work. You discover usage anomalies after they've burned through $20,000 of your compute. You're charging some customers $99/month for the same compute that costs you $5,000 to serve. You have no idea. Your competitor has already raised their Series B on the back of a pricing model that captures 10x the margin you're leaving on the table. And by the time you realize you need to migrate, they've already picked all the low-hanging fruit customers, and the remaining market assumes your company is a laggard.
This isn't hypothetical. This is happening right now in AI products. Companies that migrated to event-stream architectures six months ago are now raising Series B and C with unit economics that legacy billing vendors can't match. Companies that are still running batch processing are starting to feel the competitive pressure.
The good news: you can migrate. Aforo exists exactly for this reason. We built the architecture once so you don't have to. Your event stream flows into our pipeline. Your pricing rules are configured via API. Your customers' usage is metered accurately in real-time. You get the benefits of event-stream processing without hiring a distributed systems engineer.
The cost of not migrating is steeper than the cost of migrating.
What Real-Time Billing Actually Requires
Let's be concrete about what you're actually building (or buying) when you move to event-stream architecture.
Infrastructure:
- A distributed message queue (Kafka is industry standard, but RabbitMQ or AWS Kinesis work too). You need at-least-once delivery guarantees and the ability to replay messages.
- A stateful stream processor (Kafka Streams, Flink, or a custom service listening to Kafka topics). This is where the 10-stage pipeline runs.
- A fast KV cache (Redis) for billing hierarchy lookups (API key → team → customer → subscription at 10-minute TTL).
- A columnar analytics database (ClickHouse, Redshift, BigQuery) for high-cardinality analytics. Analytics queries cannot run against your transactional database.
- PostgreSQL for transactional billing state (subscriptions, rates, holds, invoice items). Must support ACID transactions and optimistic locking.
Operations:
- Distributed tracing (OpenTelemetry + Jaeger or similar). When an event flows through the pipeline, you need to see every stage it touched, how long each stage took, and where failures happened.
- Dead letter queue monitoring. Events that fail at any stage need to be retried, and if they fail permanently, they need to be persisted in a dead letter topic so you can replay them later.
- Real-time alerting. If the rate stage starts rejecting events because a pricing rule is malformed, you need to know about it in the next 60 seconds, not the next morning.
Operations (Human):
- Someone on your team needs to understand how to configure pricing rules in your system. For Aforo, this is API-driven: a rate plan is defined as a JSON document with all the pricing logic embedded. For a custom system, you need a DSL (domain-specific language) for encoding pricing rules in a way that's both expressive (you can capture any pricing model) and safe (the DSL prevents you from introducing billing errors).
- Someone needs to monitor the pipeline health. What's the latency from "event ingested" to "charge emitted"? What's the error rate? Are there any backpressures in the pipeline? Are we keeping up with event throughput?
- Someone needs to understand the data model. When a customer says "my invoice looks wrong," you need to be able to trace back from the invoice line item to the original event that created it, and verify the rating was correct at each stage of the pipeline.
This isn't trivial. It's a real engineering effort. But if you're building an AI product, you need to do this eventually. The only question is whether you do it now (while you're still small and the number of customers is manageable) or later (when you're big and the technical debt compounds).
The Choice Ahead
You can keep running batch billing. Cron jobs at midnight. Aggregation queries that take 30 minutes to run. Customers getting charged $99/month for compute that costs you $1,000 or $10. No visibility into spending caps until tomorrow. Inability to ship new pricing models without a quarter of engineering work. Churn accelerating as customers realize your platform doesn't work for high-velocity AI workloads.
Or you can migrate to event-stream architecture. Real-time metering. Multi-dimensional pricing. Accurate capture of compute cost variance. Same-day custom pricing. Spending caps enforced instantaneously. Product roadmap unblocked. Gross margins predictable and defensible.
The choice is yours. But the cost of delaying the choice is getting higher every month. Your competitors who migrated six months ago are now the ones defining what's possible in AI product monetization. The companies that wait until next year will be playing catch-up against incumbents who've already optimized their pricing and margins.
The AI economy doesn't run on $99/month per seat. It runs on signals—tokens, invocations, sessions, tools. And the companies that built their billing systems to capture signals, not seats, will be the ones that dominate the decade ahead.
What's your architecture? Is it built for signals, or is it still clinging to seats?