Skip to main content

Observability & Logging

Two distinct logging tiers, each serving a different audience.

Tier 1 — Customer-facing audit & activity logs (product feature)

Customers see their own activity and audit data inside their tenant. Lives in the application database, never mixed with operational logs.

Implementation:

  • audit_events table inside each tenant Postgres schema — schema-per-tenant model means a customer can only see their own data, enforced at the database boundary
  • NestJS audit interceptor captures every mutating request: actor, action, target entity, tenant context, timestamp, IP, user-agent
  • Append-only writesDELETE/UPDATE permissions explicitly revoked at the database role level (tamper-evidence)
  • PII redaction — payloads scrub passwords, raw evidence contents, secret values
  • 7-year retention by default (SOC 2 / regulatory typical)
  • Queryable via S.E.T's UI — Settings → Activity / Audit page
  • Exportable as CSV, JSON, or via the tenant's REST API
  • Optional outbound webhook — customers can stream their own audit events into their own SIEM (Splunk, Microsoft Sentinel, Datadog Security, etc.) in real time

What gets captured:

  • Activity (broad): logins, evidence uploads, finding triages, integration connections, questionnaire submissions, report exports, member invites
  • Audit (security-specific): permission changes, MFA events, IdP connection changes, role assignments, password resets, API key rotations, tenant settings changes, schema migrations applied

Tier 2 — Internal operational logging → Logz.io

S.E.T's own operational observability. Logz.io is the destination; nothing customer-specific lives here.

Why Logz.io:

  • Israeli company (HQ Tel Aviv) — aligned with Amendment 13 + Nimbus posture; Israeli vendor for Israeli-regulated workloads is a real advantage
  • Native ELK stack as-a-service — familiar query language, Kibana visualizations, no Elasticsearch ops burden
  • Cloud SIEM features included — primary alert pane for security events (GuardDuty findings + app errors + auth anomalies in one place)
  • APM via OpenTelemetry — distributed tracing across NestJS BFF + workers
  • Frontend error tracking — Logz.io browser SDK replaces a separate Sentry/Bugsnag vendor
  • Default region: EU (Ireland) — covers global + EU + most Israeli customers under Israel-EU adequacy
  • SOC 2 Type II + ISO 27001 + GDPR-compliant — listed as sub-processor in customer DPAs

Shipping architecture (Pattern B — CloudWatch transit → Lambda forwarder → Logz.io)

ECS Fargate logs ──┐
ALB access logs ───┤
CloudTrail ────────┼──► CloudWatch ──► Lambda shipper ──► LOGZ.IO
VPC Flow Logs ─────┤ (transit, (logs +
RDS Postgres logs ─┤ 7-day buffer) APM +
GuardDuty findings ┤ SIEM +
ECS task events ───┤ frontend
GitHub Actions ────┘ errors)

OpenTelemetry traces ───────────────────┤
Frontend error SDK (React SPA) ─────────┘

Why this pattern (not direct FireLens sidecars):

  • Single pipeline for app logs + AWS-service logs (no two-mechanism complexity)
  • CloudWatch acts as a 7-day buffer if Logz.io is briefly unavailable
  • Slightly higher latency (~10–30s) acceptable for ops, not for live debugging
  • One Lambda forwarder configured once, ships every CloudWatch log group

Sources flowing through CloudWatch:

  • NestJS API logs (BFF stdout/stderr)
  • All 3 worker task family logs (api-workers, scanner-workers, ai-workers Phase 2)
  • ALB access logs
  • CloudTrail (every AWS API call)
  • VPC Flow Logs (network audit trail)
  • GuardDuty findings (via EventBridge → CloudWatch)
  • RDS Postgres logs (slow queries, errors, connections)
  • ECS task events (OOM, deploy failures, restarts)
  • BCP / product availability data (workers write structured JSON)
  • GitHub Actions deployment events (webhook → Lambda → CloudWatch)

Sources going directly to Logz.io (bypass CloudWatch):

  • OpenTelemetry traces — high-volume, latency-sensitive; Logz.io has a native OTel collector endpoint
  • Frontend errors — Logz.io browser SDK ships errors from the user's browser (CloudWatch doesn't accept browser-side traffic)

Security alerting: GuardDuty as detection layer, Logz.io as primary SIEM

GuardDuty acts like an EDR for AWS infrastructure (anomalous API calls, compromised credentials, crypto-mining detection). Its findings — alongside app errors, failed-login bursts at WorkOS, RDS anomalies, and custom rule violations — funnel into Logz.io as the unified human-facing alert pane (analogous to EDR feeding into Splunk).

GuardDuty findings ┐
RDS / Postgres anomalies │
Failed-login bursts at WorkOS ├──► CloudWatch ──► Lambda ──► Logz.io ──► Alert
Application errors / 5xx spikes │ (single pane of glass)
Custom rule violations ┘


PagerDuty / Slack / on-call SMS

AWS Security Hub is not used — Logz.io is the primary alert pane; AWS-native detection (GuardDuty) feeds into it directly.

MITRE ATT&CK tagging: every detection rule in Logz.io is tagged with the matching MITRE ATT&CK technique ID(s) where applicable — e.g. T1078 (valid accounts), T1190 (exploit public-facing app), T1059 (command-and-scripting interpreter), T1486 (data encrypted for impact). This makes findings traceable to real-world attacker behavior and feeds the Incident Response record classification. The complementary design-phase view — STRIDE at the design stage — is owned by the Security Requirements & Threat Modeling Policy.

Compliance posture

  • SOC 2 Type II: audit log evidence (both tiers), access controls, monitoring + alerting controls all satisfied
  • Amendment 13 (PPL): customer audit logs stay in their tenant schema (no cross-border concerns); internal logs ship to Logz.io EU region (Israel-EU adequacy applies); breach notification supported by alert pipeline
  • Nimbus Phase 2: Logz.io Israel offering (if available) or self-hosted ELK in Tel Aviv to be evaluated when first Nimbus customer materializes
  • ISO 27001 Annex A.12: operations security + logging requirements satisfied
  • GDPR: data subject access requests served from tenant-scoped audit log; Logz.io listed in customer DPAs

Retention

DataWhereRetention
Customer audit/activity (tenant-facing)Tenant Postgres schema7 years
Internal app logsLogz.io (hot)30 days
Internal app logs (archive)S3 Glacier7 years
CloudTrailS3 with Object Lock7 years
VPC Flow LogsCloudWatch → S3 Glacier30 days hot, 1 year cold
GuardDuty findingsCloudWatch + Logz.io90 days hot, 7 years archived
Distributed tracesLogz.io APM14 days
Frontend errorsLogz.io30 days