Observability & Logging

Two distinct logging tiers, each serving a different audience.

Tier 1 — Customer-facing audit & activity logs (product feature)

Customers see their own activity and audit data inside their tenant. Lives in the application database, never mixed with operational logs.

Implementation:

audit_events table inside each tenant Postgres schema — schema-per-tenant model means a customer can only see their own data, enforced at the database boundary
NestJS audit interceptor captures every mutating request: actor, action, target entity, tenant context, timestamp, IP, user-agent
Append-only writes — DELETE/UPDATE permissions explicitly revoked at the database role level (tamper-evidence)
PII redaction — payloads scrub passwords, raw evidence contents, secret values
7-year retention by default (SOC 2 / regulatory typical)
Queryable via S.E.T's UI — Settings → Activity / Audit page
Exportable as CSV, JSON, or via the tenant's REST API
Optional outbound webhook — customers can stream their own audit events into their own SIEM (Splunk, Microsoft Sentinel, Datadog Security, etc.) in real time

What gets captured:

Activity (broad): logins, evidence uploads, finding triages, integration connections, questionnaire submissions, report exports, member invites
Audit (security-specific): permission changes, MFA events, IdP connection changes, role assignments, password resets, API key rotations, tenant settings changes, schema migrations applied

Tier 2 — Internal operational logging → Logz.io

S.E.T's own operational observability. Logz.io is the destination; nothing customer-specific lives here.

Why Logz.io:

Israeli company (HQ Tel Aviv) — aligned with Amendment 13 + Nimbus posture; Israeli vendor for Israeli-regulated workloads is a real advantage
Native ELK stack as-a-service — familiar query language, Kibana visualizations, no Elasticsearch ops burden
Cloud SIEM features included — primary alert pane for security events (GuardDuty findings + app errors + auth anomalies in one place)
APM via OpenTelemetry — distributed tracing across NestJS BFF + workers
Frontend error tracking — Logz.io browser SDK replaces a separate Sentry/Bugsnag vendor
Default region: EU (Ireland) — covers global + EU + most Israeli customers under Israel-EU adequacy
SOC 2 Type II + ISO 27001 + GDPR-compliant — listed as sub-processor in customer DPAs

Shipping architecture (Pattern B — CloudWatch transit → Lambda forwarder → Logz.io)

ECS Fargate logs ──┐
ALB access logs ───┤
CloudTrail ────────┼──► CloudWatch ──► Lambda shipper ──► LOGZ.IO
VPC Flow Logs ─────┤        (transit,                       (logs +
RDS Postgres logs ─┤         7-day buffer)                    APM +
GuardDuty findings ┤                                          SIEM +
ECS task events ───┤                                          frontend
GitHub Actions ────┘                                          errors)
                                                                ▲
                       OpenTelemetry traces ───────────────────┤
                       Frontend error SDK (React SPA) ─────────┘

Why this pattern (not direct FireLens sidecars):

Single pipeline for app logs + AWS-service logs (no two-mechanism complexity)
CloudWatch acts as a 7-day buffer if Logz.io is briefly unavailable
Slightly higher latency (~10–30s) acceptable for ops, not for live debugging
One Lambda forwarder configured once, ships every CloudWatch log group

Sources flowing through CloudWatch:

NestJS API logs (BFF stdout/stderr)
All 3 worker task family logs (api-workers, scanner-workers, ai-workers Phase 2)
ALB access logs
CloudTrail (every AWS API call)
VPC Flow Logs (network audit trail)
GuardDuty findings (via EventBridge → CloudWatch)
RDS Postgres logs (slow queries, errors, connections)
ECS task events (OOM, deploy failures, restarts)
BCP / product availability data (workers write structured JSON)
GitHub Actions deployment events (webhook → Lambda → CloudWatch)

Sources going directly to Logz.io (bypass CloudWatch):

OpenTelemetry traces — high-volume, latency-sensitive; Logz.io has a native OTel collector endpoint
Frontend errors — Logz.io browser SDK ships errors from the user's browser (CloudWatch doesn't accept browser-side traffic)

Security alerting: GuardDuty as detection layer, Logz.io as primary SIEM

GuardDuty acts like an EDR for AWS infrastructure (anomalous API calls, compromised credentials, crypto-mining detection). Its findings — alongside app errors, failed-login bursts at WorkOS, RDS anomalies, and custom rule violations — funnel into Logz.io as the unified human-facing alert pane (analogous to EDR feeding into Splunk).

GuardDuty findings                       ┐
RDS / Postgres anomalies                  │
Failed-login bursts at WorkOS             ├──► CloudWatch ──► Lambda ──► Logz.io ──► Alert
Application errors / 5xx spikes           │     (single pane of glass)
Custom rule violations                    ┘
                                                       │
                                                       ▼
                                          PagerDuty / Slack / on-call SMS

AWS Security Hub is not used — Logz.io is the primary alert pane; AWS-native detection (GuardDuty) feeds into it directly.

MITRE ATT&CK tagging: every detection rule in Logz.io is tagged with the matching MITRE ATT&CK technique ID(s) where applicable — e.g. T1078 (valid accounts), T1190 (exploit public-facing app), T1059 (command-and-scripting interpreter), T1486 (data encrypted for impact). This makes findings traceable to real-world attacker behavior and feeds the Incident Response record classification. The complementary design-phase view — STRIDE at the design stage — is owned by the Security Requirements & Threat Modeling Policy.

Compliance posture

SOC 2 Type II: audit log evidence (both tiers), access controls, monitoring + alerting controls all satisfied
Amendment 13 (PPL): customer audit logs stay in their tenant schema (no cross-border concerns); internal logs ship to Logz.io EU region (Israel-EU adequacy applies); breach notification supported by alert pipeline
Nimbus Phase 2: Logz.io Israel offering (if available) or self-hosted ELK in Tel Aviv to be evaluated when first Nimbus customer materializes
ISO 27001 Annex A.12: operations security + logging requirements satisfied
GDPR: data subject access requests served from tenant-scoped audit log; Logz.io listed in customer DPAs

Retention

Data	Where	Retention
Customer audit/activity (tenant-facing)	Tenant Postgres schema	7 years
Internal app logs	Logz.io (hot)	30 days
Internal app logs (archive)	S3 Glacier	7 years
CloudTrail	S3 with Object Lock	7 years
VPC Flow Logs	CloudWatch → S3 Glacier	30 days hot, 1 year cold
GuardDuty findings	CloudWatch + Logz.io	90 days hot, 7 years archived
Distributed traces	Logz.io APM	14 days
Frontend errors	Logz.io	30 days

Tier 1 — Customer-facing audit & activity logs (product feature)​

Tier 2 — Internal operational logging → Logz.io​

Shipping architecture (Pattern B — CloudWatch transit → Lambda forwarder → Logz.io)​

Security alerting: GuardDuty as detection layer, Logz.io as primary SIEM​

Compliance posture​

Retention​