Platform Operations · Governance + Continuity

Protect the Platform.
Keep It Running.

Governance (audit trails, RBAC, SSO, tenant isolation) AND operational continuity (uptime monitor, incident manager, event log, system health, public status page), all on one console, under one audit trail.

Most billing platforms ship one or the other. Aforo ships both. The controls your CISO requires AND the dashboards your SRE needs, production-ready on day one.

Request Security Review View Security Documentation

/governance/audit-trail

LIVE

Identity Verified

SSO + RBAC authenticated

Access Enforced

Role-based permissions applied

Change Logged

Immutable audit record created

Data Isolated

Tenant boundary enforced

The Atomic Unit

Control + Continuity. One Operations Console.

Every operational concern in Aforo resolves to two inseparable halves. Control (governance, RBAC, SSO, audit trails, tenant isolation, encryption) is how you protect the platform from misuse. Continuity (uptime, incidents, event log, system health, dead-letter recovery) is how you keep it running. Most billing platforms ship one or the other. Aforo ships both.

Protect the Platform

Control

The governance half. Audit trails, role-based access, tenant isolation, approval workflows, encryption at rest + in transit. The controls your CISO requires before signing the procurement contract.

Immutable audit log

Every pricing / refund / billing config change recorded with actor + timestamp + previous value + new value · append-only

SSO + RBAC

SAML 2.0 + OIDC integration · 6 default roles (Admin, Billing Mgr, Pricing Editor, Read-Only Auditor, +) · per-tenant overrides

Multi-tenant isolation

Schema-level + cache-namespace + Kafka-topic separation · per-tenant dead-letter queues · cross-tenant access returns 404 (Pattern #9)

Approval workflows

Discount > 15% → Finance queue · Refund > $5K → VP sign-off · separation of duties enforced by platform, not policy memo

Encryption everywhere

AES-256-GCM at rest (payment credentials, OAuth tokens, repository PATs) · TLS 1.3 in transit · GDPR-ready data handling

= Operations Console

Keep It Running

Continuity

The observability half. Uptime monitoring, incident management, event log, system health, dead-letter recovery. The dashboards your SRE team needs before the first 3am page.

Uptime Monitor

12+ services tracked · HTTP / TCP / DNS / Cert check types · 90-day uptime bars · ClickHouse-backed uptime_checks

Incident Manager

Declare manually OR auto-create from uptime alerts · structured timeline · public status page integration · postmortem editor

Event Log (ClickHouse)

15+ event types · platform_events MergeTree · full-text search · SSE live streaming · 365-day TTL · per-tenant concurrent-stream cap

System Health dashboard

Service topology · Kafka lag · Redis health · ClickHouse status · PostgreSQL connection pool · 4 sub-detectors

Dead-letter recovery

Failed Kafka events → DLT topic → DeadLetterTopicMonitor persists to dead_letter_records table · Control Tower replay UI · zero event loss

Generic SaaS billing platforms force you to bolt on Datadog (uptime), PagerDuty (incidents), Splunk (event log), Auth0 (SSO + RBAC), and a custom audit-log schema. Aforo ships all of it natively, under one tenant boundary.

The Surface

Public Status Page. Built In, Not Statuspage.io.

Auto-published from the same Uptime Monitor that pages your on-call. 12 services tracked. 90-day uptime bars. Active incidents + scheduled maintenance in one place. Customers subscribe by email or webhook. No third-party status-page vendor, no manual updates, no separate audit trail.

status.acme-api.com

1 SERVICE DEGRADED

Services (12) · 90-day uptime

API Gateway

99.99%

60 days agotoday

Billing Engine

99.99%

60 days agotoday

Metering Service

99.97%

60 days agotoday

Pricing Studio

99.99%

60 days agotoday

Catalog Service

100.0%

60 days agotoday

Customer Service

99.99%

60 days agotoday

Storefront Service

99.95%

60 days agotoday

Organization Service

99.99%

60 days agotoday

Analytics Service

99.89%

60 days agotoday

AI Service

99.94%

60 days agotoday

Ingestion Service

99.99%

60 days agotoday

Export Service

100.0%

60 days agotoday

Recent incidents + maintenance

Analytics Service degraded performance

Started 2 days ago · resolved in 47 min

MAJOR● RESOLVED

Scheduled maintenance — Postgres major version upgrade

Started 6 days ago · resolved in 2 hr 14 min

MAINTENANCE● COMPLETED

Customers see the same data your SRE team operates on. The status page is auto-published from the Uptime Monitor table, not maintained in a separate Statuspage.io account that drifts out of sync.

Core Capabilities

Enterprise-grade controls built into every transaction.

Security is not an add-on. Every pricing change, every billing event, and every configuration modification flows through the same governance framework, auditable, access-controlled, and tenant-isolated by default.

Immutable Audit Trails

Every modification to pricing rules, billing configurations, catalog entries, and subscription states is recorded in an append-only audit log. Each record captures the actor, timestamp, previous value, and new value. Required for SOC 2 compliance, internal financial accountability, and regulatory examinations.

Strict Access Control (SSO & RBAC)

SAML 2.0 and OIDC integration with your existing identity provider. Role-based access control determines exactly which team members can modify pricing, issue refunds, adjust subscriptions, or access financial reports. No shared credentials. No ambiguous permissions. Full attribution on every action.

Multi-Tenant Data Isolation

Complete data separation at the schema, cache, and event-stream level. Large enterprises operating multiple business units, brands, or subsidiaries manage each independently under a single administrative umbrella, without risk of cross-tenant data exposure.

Operational Observability

Real-time uptime monitoring across 12+ services, ClickHouse-backed event log with 15+ event types, SSE live streaming, and full-text search. System Health dashboard surfaces Kafka lag, Redis health, ClickHouse status, and PostgreSQL connection pool — all on one operations console.

Incident Lifecycle Management

Declare incidents manually or auto-create from uptime alerts. Structured timeline updates flow to the public status page. Postmortem editor with root cause + impact + remediation + prevention sections. Scheduled maintenance windows with customer notifications. MTTR captured per incident.

Compliance Checklist

The checklist your IT team is looking for.

Every item below is production-verified, not a roadmap promise. Hand this page directly to your security review committee.

SOC 2 Type II (audit in progress)

Architected for 99.99% availability

Dead-Letter Recovery (Zero Event Loss)

Real-Time Uptime Monitoring

AES-256 Encryption at Rest

TLS 1.3 Encryption in Transit

GDPR-Ready Data Handling

Immutable Financial Audit Logs

SAML 2.0 / OIDC SSO

Per-Tenant Data Isolation

Role-Based Access Control (RBAC)

Incident Management & Postmortem

The Operational Suite

Five Surfaces. One Operations Console.

Aforo ships the full operational stack natively, no Datadog for uptime, no PagerDuty for incidents, no Splunk for event log, no separate Statuspage.io vendor. All five surfaces share the same tenant boundary, the same audit log, the same on-call workflow.

Uptime Monitor

12+ services tracked

HTTP / TCP / DNS / Cert check types per service. 90-day uptime bars + daily summaries + SLA report generator. Auto-publishes to the public status page. ClickHouse-backed uptime_checks table with hourly + daily summary materialized views.

4 check types90-day barsSLA reportsPublic status page

Incident Manager

Auto-create + postmortem

Declare manually OR auto-create from uptime alerts (3 consecutive failures over 5 min). Structured timeline updates flow to the public status page. Postmortem editor with root cause + impact + remediation + prevention sections. Scheduled maintenance windows with customer notifications.

Auto-create from uptimeTimeline updatesPostmortem editorMTTR captured

Event Log

ClickHouse · 15+ event types

Every API, billing, auth, system, usage, and support event captured to ClickHouse platform_events table (MergeTree, 365-day TTL). Full-text search + structured filters + saved presets. Live SSE streaming with per-tenant concurrent-stream cap. Auto-dedupe via event_id idempotency.

Full-text searchLive SSE stream15+ event types365-day retention

System Health

Service topology + infra

Service map (topology), per-service resource gauges (CPU / memory / disk / connections), Apdex + p50/p95/p99 latency. Infrastructure health cards for Kafka (lag + consumer-group health), Redis (memory + key count), PostgreSQL (pool + slow queries), and ClickHouse (query queue + storage).

Service topologyKafka + Redis + PG + CHApdex + p95/p99Top-5 slowest endpoints

Notifications

Webhook + in-portal inbox

Per-customer webhook delivery with HMAC SHA-256 signing + retry-with-backoff. In-portal notification inbox for customer-facing events (invoice.delivered, payment.failed, subscription.cancelled, etc). Event definitions catalog. Delivery log feedback loop (F1+F2 closure 2026-05-09) so operators see per-invoice delivery state.

HMAC SHA-256Retry + backoffIn-portal inboxDelivery feedback

Plus Knowledge Base, Documentation Hub, API Playground, Community Center, Changelog, and Public API Status (covered on the Developer Console page). 11 operational features total, all wired into the same admin console.

How It Works

From Provisioning to Continuous Assurance

Four layers of governance that operate from day one, with no additional configuration required after initial setup.

Provision with Identity

Connect your existing identity provider via SAML 2.0 or OIDC. Assign RBAC roles, Administrator, Billing Manager, Pricing Editor, Read-Only Auditor, to each team member. Zero shared credentials. Every action is attributed to a named individual.

Enforce Financial Controls

Every pricing change, every refund issued, every subscription modification, and every invoice adjustment is recorded in an immutable audit log. Define approval workflows for high-value operations. Restrict refund authority to designated roles. Maintain a complete chain of custody over revenue-impacting decisions.

Isolate Tenant Data

Each tenant operates within a fully separated data boundary, separate database schemas, separate cache namespaces, separate Kafka event streams. Business units within the same enterprise share administrative tooling without sharing transactional data, billing records, or customer information.

Monitor and Recover

Real-time uptime dashboards track service availability across all endpoints. Incident management provides structured escalation, postmortem documentation, and status page updates. Dead-letter recovery ensures that no billing event is ever permanently lost, failed events are captured, inspected, and replayed.

The Buying Committee

One Console. Four Stakeholders. Zero Friction.

Security, Finance, Engineering, and SRE each get the controls they need to protect the platform AND keep it running, without stepping on each other. One audit trail. One tenant boundary. One incident console.

Security / CISO

CISO · Head of Security · GRC Lead

Hand the audit committee a self-serve report.

The Workflow

1Quarterly review hits. Audit committee needs full chain of custody on every pricing / refund / billing change for Q1.
2Open the Audit Log. Filter by financial ops + date range + actor.
3Export to CSV. Every record carries actor + timestamp + previous value + new value + IP.
4Auditor walks the log. Review wraps in 2 hours, not 2 weeks of reconstruction.

Audit-ready evidence on demand, zero spreadsheet archaeology

Finance / Compliance

VP Finance · Controller · Compliance Lead

Approval workflows enforced at the platform layer.

The Workflow

1Finance configures: refund > $5K requires VP sign-off. Discount > 15% routes to Finance queue.
2Sales submits a $24K refund request. Auto-routes to Finance approval queue.
3Finance reviews + approves with comment. Refund issues. Audit trail updated.
4SOC 2 evidence: separation of duties enforced by the platform, not by policy memo.

Zero unapproved high-value transactions, separation of duties auditable

CTO / VP Engineering

CTO · VP Engineering · Architecture Lead

Six business units, one platform, zero cross-contamination.

The Workflow

1Acme Holdings operates 6 sub-brands on one Aforo deployment.
2Each tenant gets its own schema, cache namespace, Kafka topic, dead-letter queue.
3Configuration error in Brand A pricing config → contained to Brand A tenant.
4System Health dashboard shows per-tenant health independently. Cross-tenant probe returns 404.

6 tenants, 6 isolated boundaries, 1 admin console

SRE / Platform Engineering

Head of Platform · SRE Lead · On-Call

Full incident lifecycle in one console.

The Workflow

1Uptime Monitor detects 3 consecutive failures on /v1/translate. Status flips to DEGRADED.
2PagerDuty integration fires; on-call SRE pages in.
3SRE declares Incident; public status page auto-updates with structured timeline.
4Event Log + System Health surface root cause (DB connection pool exhausted) in 4 clicks. Postmortem published, customers notified, MTTR captured.

Full incident lifecycle in one console, MTTR drops 60%

Services Monitored

API Gateway · Billing · Metering · Pricing · Catalog · Customer · Storefront · Org · Analytics · AI · Ingestion · Export

Operational Features

Uptime · Incidents · Event Log · Health · Notifications · Support · KB · Docs · Playground · Community · Changelog

Detection Engines

Uptime degradation · Incident auto-create · SLA breach · System anomaly

Unified Operations Console

One audit trail · One tenant boundary · One incident lifecycle