Platform Operations · Governance + Continuity

Protect the Platform.
Keep It Running.

Governance (audit trails, RBAC, SSO, tenant isolation) AND operational continuity (uptime monitor, incident manager, event log, system health, public status page), all on one console, under one audit trail.

Most billing platforms ship one or the other. Aforo ships both. The controls your CISO requires AND the dashboards your SRE needs, production-ready on day one.

/governance/audit-trail
LIVE
Identity Verified
SSO + RBAC authenticated
Access Enforced
Role-based permissions applied
Change Logged
Immutable audit record created
Data Isolated
Tenant boundary enforced
The Atomic Unit

Control + Continuity. One Operations Console.

Every operational concern in Aforo resolves to two inseparable halves. Control (governance, RBAC, SSO, audit trails, tenant isolation, encryption) is how you protect the platform from misuse. Continuity (uptime, incidents, event log, system health, dead-letter recovery) is how you keep it running. Most billing platforms ship one or the other. Aforo ships both.

Protect the Platform
Control

The governance half. Audit trails, role-based access, tenant isolation, approval workflows, encryption at rest + in transit. The controls your CISO requires before signing the procurement contract.

Immutable audit log
Every pricing / refund / billing config change recorded with actor + timestamp + previous value + new value · append-only
SSO + RBAC
SAML 2.0 + OIDC integration · 6 default roles (Admin, Billing Mgr, Pricing Editor, Read-Only Auditor, +) · per-tenant overrides
Multi-tenant isolation
Schema-level + cache-namespace + Kafka-topic separation · per-tenant dead-letter queues · cross-tenant access returns 404 (Pattern #9)
Approval workflows
Discount > 15% → Finance queue · Refund > $5K → VP sign-off · separation of duties enforced by platform, not policy memo
Encryption everywhere
AES-256-GCM at rest (payment credentials, OAuth tokens, repository PATs) · TLS 1.3 in transit · GDPR-ready data handling
= Operations Console
Keep It Running
Continuity

The observability half. Uptime monitoring, incident management, event log, system health, dead-letter recovery. The dashboards your SRE team needs before the first 3am page.

Uptime Monitor
12+ services tracked · HTTP / TCP / DNS / Cert check types · 90-day uptime bars · ClickHouse-backed uptime_checks
Incident Manager
Declare manually OR auto-create from uptime alerts · structured timeline · public status page integration · postmortem editor
Event Log (ClickHouse)
15+ event types · platform_events MergeTree · full-text search · SSE live streaming · 365-day TTL · per-tenant concurrent-stream cap
System Health dashboard
Service topology · Kafka lag · Redis health · ClickHouse status · PostgreSQL connection pool · 4 sub-detectors
Dead-letter recovery
Failed Kafka events → DLT topic → DeadLetterTopicMonitor persists to dead_letter_records table · Control Tower replay UI · zero event loss

Generic SaaS billing platforms force you to bolt on Datadog (uptime), PagerDuty (incidents), Splunk (event log), Auth0 (SSO + RBAC), and a custom audit-log schema. Aforo ships all of it natively, under one tenant boundary.

The Surface

Public Status Page. Built In, Not Statuspage.io.

Auto-published from the same Uptime Monitor that pages your on-call. 12 services tracked. 90-day uptime bars. Active incidents + scheduled maintenance in one place. Customers subscribe by email or webhook. No third-party status-page vendor, no manual updates, no separate audit trail.

status.acme-api.com
1 SERVICE DEGRADED
Partial Service Degradation
Analytics Service experiencing elevated latency · investigating
Overall uptime
99.97%
last 90 days · 12 services
Services (12) · 90-day uptime
API Gateway
99.99%
60 days agotoday
Billing Engine
99.99%
60 days agotoday
Metering Service
99.97%
60 days agotoday
Pricing Studio
99.99%
60 days agotoday
Catalog Service
100.0%
60 days agotoday
Customer Service
99.99%
60 days agotoday
Storefront Service
99.95%
60 days agotoday
Organization Service
99.99%
60 days agotoday
Analytics Service
99.89%
60 days agotoday
AI Service
99.94%
60 days agotoday
Ingestion Service
99.99%
60 days agotoday
Export Service
100.0%
60 days agotoday
Recent incidents + maintenance
Analytics Service degraded performance
Started 2 days ago · resolved in 47 min
MAJORRESOLVED
Scheduled maintenance — Postgres major version upgrade
Started 6 days ago · resolved in 2 hr 14 min
MAINTENANCECOMPLETED

Customers see the same data your SRE team operates on. The status page is auto-published from the Uptime Monitor table, not maintained in a separate Statuspage.io account that drifts out of sync.

Core Capabilities

Enterprise-grade controls built into every transaction.

Security is not an add-on. Every pricing change, every billing event, and every configuration modification flows through the same governance framework, auditable, access-controlled, and tenant-isolated by default.

Immutable Audit Trails

Every modification to pricing rules, billing configurations, catalog entries, and subscription states is recorded in an append-only audit log. Each record captures the actor, timestamp, previous value, and new value. Required for SOC 2 compliance, internal financial accountability, and regulatory examinations.

Strict Access Control (SSO & RBAC)

SAML 2.0 and OIDC integration with your existing identity provider. Role-based access control determines exactly which team members can modify pricing, issue refunds, adjust subscriptions, or access financial reports. No shared credentials. No ambiguous permissions. Full attribution on every action.

Multi-Tenant Data Isolation

Complete data separation at the schema, cache, and event-stream level. Large enterprises operating multiple business units, brands, or subsidiaries manage each independently under a single administrative umbrella, without risk of cross-tenant data exposure.

Operational Observability

Real-time uptime monitoring across 12+ services, ClickHouse-backed event log with 15+ event types, SSE live streaming, and full-text search. System Health dashboard surfaces Kafka lag, Redis health, ClickHouse status, and PostgreSQL connection pool — all on one operations console.

Incident Lifecycle Management

Declare incidents manually or auto-create from uptime alerts. Structured timeline updates flow to the public status page. Postmortem editor with root cause + impact + remediation + prevention sections. Scheduled maintenance windows with customer notifications. MTTR captured per incident.

Compliance Checklist

The checklist your IT team is looking for.

Every item below is production-verified, not a roadmap promise. Hand this page directly to your security review committee.

SOC 2 Type II (audit in progress)
Architected for 99.99% availability
Dead-Letter Recovery (Zero Event Loss)
Real-Time Uptime Monitoring
AES-256 Encryption at Rest
TLS 1.3 Encryption in Transit
GDPR-Ready Data Handling
Immutable Financial Audit Logs
SAML 2.0 / OIDC SSO
Per-Tenant Data Isolation
Role-Based Access Control (RBAC)
Incident Management & Postmortem
The Operational Suite

Five Surfaces. One Operations Console.

Aforo ships the full operational stack natively, no Datadog for uptime, no PagerDuty for incidents, no Splunk for event log, no separate Statuspage.io vendor. All five surfaces share the same tenant boundary, the same audit log, the same on-call workflow.

Uptime Monitor

12+ services tracked

HTTP / TCP / DNS / Cert check types per service. 90-day uptime bars + daily summaries + SLA report generator. Auto-publishes to the public status page. ClickHouse-backed uptime_checks table with hourly + daily summary materialized views.

4 check types90-day barsSLA reportsPublic status page

Incident Manager

Auto-create + postmortem

Declare manually OR auto-create from uptime alerts (3 consecutive failures over 5 min). Structured timeline updates flow to the public status page. Postmortem editor with root cause + impact + remediation + prevention sections. Scheduled maintenance windows with customer notifications.

Auto-create from uptimeTimeline updatesPostmortem editorMTTR captured

Event Log

ClickHouse · 15+ event types

Every API, billing, auth, system, usage, and support event captured to ClickHouse platform_events table (MergeTree, 365-day TTL). Full-text search + structured filters + saved presets. Live SSE streaming with per-tenant concurrent-stream cap. Auto-dedupe via event_id idempotency.

Full-text searchLive SSE stream15+ event types365-day retention

System Health

Service topology + infra

Service map (topology), per-service resource gauges (CPU / memory / disk / connections), Apdex + p50/p95/p99 latency. Infrastructure health cards for Kafka (lag + consumer-group health), Redis (memory + key count), PostgreSQL (pool + slow queries), and ClickHouse (query queue + storage).

Service topologyKafka + Redis + PG + CHApdex + p95/p99Top-5 slowest endpoints

Notifications

Webhook + in-portal inbox

Per-customer webhook delivery with HMAC SHA-256 signing + retry-with-backoff. In-portal notification inbox for customer-facing events (invoice.delivered, payment.failed, subscription.cancelled, etc). Event definitions catalog. Delivery log feedback loop (F1+F2 closure 2026-05-09) so operators see per-invoice delivery state.

HMAC SHA-256Retry + backoffIn-portal inboxDelivery feedback

Plus Knowledge Base, Documentation Hub, API Playground, Community Center, Changelog, and Public API Status (covered on the Developer Console page). 11 operational features total, all wired into the same admin console.

How It Works

From Provisioning to Continuous Assurance

Four layers of governance that operate from day one, with no additional configuration required after initial setup.

01

Provision with Identity

Connect your existing identity provider via SAML 2.0 or OIDC. Assign RBAC roles, Administrator, Billing Manager, Pricing Editor, Read-Only Auditor, to each team member. Zero shared credentials. Every action is attributed to a named individual.

02

Enforce Financial Controls

Every pricing change, every refund issued, every subscription modification, and every invoice adjustment is recorded in an immutable audit log. Define approval workflows for high-value operations. Restrict refund authority to designated roles. Maintain a complete chain of custody over revenue-impacting decisions.

03

Isolate Tenant Data

Each tenant operates within a fully separated data boundary, separate database schemas, separate cache namespaces, separate Kafka event streams. Business units within the same enterprise share administrative tooling without sharing transactional data, billing records, or customer information.

04

Monitor and Recover

Real-time uptime dashboards track service availability across all endpoints. Incident management provides structured escalation, postmortem documentation, and status page updates. Dead-letter recovery ensures that no billing event is ever permanently lost, failed events are captured, inspected, and replayed.

The Buying Committee

One Console. Four Stakeholders. Zero Friction.

Security, Finance, Engineering, and SRE each get the controls they need to protect the platform AND keep it running, without stepping on each other. One audit trail. One tenant boundary. One incident console.

Security / CISO

CISO · Head of Security · GRC Lead

Hand the audit committee a self-serve report.

The Workflow
  1. 1Quarterly review hits. Audit committee needs full chain of custody on every pricing / refund / billing change for Q1.
  2. 2Open the Audit Log. Filter by financial ops + date range + actor.
  3. 3Export to CSV. Every record carries actor + timestamp + previous value + new value + IP.
  4. 4Auditor walks the log. Review wraps in 2 hours, not 2 weeks of reconstruction.
Audit-ready evidence on demand, zero spreadsheet archaeology

Finance / Compliance

VP Finance · Controller · Compliance Lead

Approval workflows enforced at the platform layer.

The Workflow
  1. 1Finance configures: refund > $5K requires VP sign-off. Discount > 15% routes to Finance queue.
  2. 2Sales submits a $24K refund request. Auto-routes to Finance approval queue.
  3. 3Finance reviews + approves with comment. Refund issues. Audit trail updated.
  4. 4SOC 2 evidence: separation of duties enforced by the platform, not by policy memo.
Zero unapproved high-value transactions, separation of duties auditable

CTO / VP Engineering

CTO · VP Engineering · Architecture Lead

Six business units, one platform, zero cross-contamination.

The Workflow
  1. 1Acme Holdings operates 6 sub-brands on one Aforo deployment.
  2. 2Each tenant gets its own schema, cache namespace, Kafka topic, dead-letter queue.
  3. 3Configuration error in Brand A pricing config → contained to Brand A tenant.
  4. 4System Health dashboard shows per-tenant health independently. Cross-tenant probe returns 404.
6 tenants, 6 isolated boundaries, 1 admin console

SRE / Platform Engineering

Head of Platform · SRE Lead · On-Call

Full incident lifecycle in one console.

The Workflow
  1. 1Uptime Monitor detects 3 consecutive failures on /v1/translate. Status flips to DEGRADED.
  2. 2PagerDuty integration fires; on-call SRE pages in.
  3. 3SRE declares Incident; public status page auto-updates with structured timeline.
  4. 4Event Log + System Health surface root cause (DB connection pool exhausted) in 4 clicks. Postmortem published, customers notified, MTTR captured.
Full incident lifecycle in one console, MTTR drops 60%
12
Services Monitored
API Gateway · Billing · Metering · Pricing · Catalog · Customer · Storefront · Org · Analytics · AI · Ingestion · Export
11
Operational Features
Uptime · Incidents · Event Log · Health · Notifications · Support · KB · Docs · Playground · Community · Changelog
4
Detection Engines
Uptime degradation · Incident auto-create · SLA breach · System anomaly
1
Unified Operations Console
One audit trail · One tenant boundary · One incident lifecycle

Protect the platform. Keep it running. One console.

Immutable audit trail + RBAC + tenant isolation + encryption on the control side. Uptime monitor + incident manager + event log + system health on the continuity side. Every signal lands in one operations console, with one audit trail covering all of it.