Observability & Logging
Two distinct logging tiers, each serving a different audience.
Tier 1 — Customer-facing audit & activity logs (product feature)
Customers see their own activity and audit data inside their tenant. Lives in the application database, never mixed with operational logs.
Implementation:
audit_eventstable inside each tenant Postgres schema — schema-per-tenant model means a customer can only see their own data, enforced at the database boundary- NestJS audit interceptor captures every mutating request: actor, action, target entity, tenant context, timestamp, IP, user-agent
- Append-only writes —
DELETE/UPDATEpermissions explicitly revoked at the database role level (tamper-evidence) - PII redaction — payloads scrub passwords, raw evidence contents, secret values
- 7-year retention by default (SOC 2 / regulatory typical)
- Queryable via S.E.T's UI — Settings → Activity / Audit page
- Exportable as CSV, JSON, or via the tenant's REST API
- Optional outbound webhook — customers can stream their own audit events into their own SIEM (Splunk, Microsoft Sentinel, Datadog Security, etc.) in real time
What gets captured:
- Activity (broad): logins, evidence uploads, finding triages, integration connections, questionnaire submissions, report exports, member invites
- Audit (security-specific): permission changes, MFA events, IdP connection changes, role assignments, password resets, API key rotations, tenant settings changes, schema migrations applied
Tier 2 — Internal operational logging → Logz.io
S.E.T's own operational observability. Logz.io is the destination; nothing customer-specific lives here.
Why Logz.io:
- Israeli company (HQ Tel Aviv) — aligned with Amendment 13 + Nimbus posture; Israeli vendor for Israeli-regulated workloads is a real advantage
- Native ELK stack as-a-service — familiar query language, Kibana visualizations, no Elasticsearch ops burden
- Cloud SIEM features included — primary alert pane for security events (GuardDuty findings + app errors + auth anomalies in one place)
- APM via OpenTelemetry — distributed tracing across NestJS BFF + workers
- Frontend error tracking — Logz.io browser SDK replaces a separate Sentry/Bugsnag vendor
- Default region: EU (Ireland) — covers global + EU + most Israeli customers under Israel-EU adequacy
- SOC 2 Type II + ISO 27001 + GDPR-compliant — listed as sub-processor in customer DPAs
Shipping architecture (Pattern B — CloudWatch transit → Lambda forwarder → Logz.io)
ECS Fargate logs ──┐
ALB access logs ───┤
CloudTrail ────────┼──► CloudWatch ──► Lambda shipper ──► LOGZ.IO
VPC Flow Logs ─────┤ (transit, (logs +
RDS Postgres logs ─┤ 7-day buffer) APM +
GuardDuty findings ┤ SIEM +
ECS task events ───┤ frontend
GitHub Actions ────┘ errors)
▲
OpenTelemetry traces ───────────────────┤
Frontend error SDK (React SPA) ─────────┘
Why this pattern (not direct FireLens sidecars):
- Single pipeline for app logs + AWS-service logs (no two-mechanism complexity)
- CloudWatch acts as a 7-day buffer if Logz.io is briefly unavailable
- Slightly higher latency (~10–30s) acceptable for ops, not for live debugging
- One Lambda forwarder configured once, ships every CloudWatch log group
Sources flowing through CloudWatch:
- NestJS API logs (BFF stdout/stderr)
- All 3 worker task family logs (
api-workers,scanner-workers,ai-workersPhase 2) - ALB access logs
- CloudTrail (every AWS API call)
- VPC Flow Logs (network audit trail)
- GuardDuty findings (via EventBridge → CloudWatch)
- RDS Postgres logs (slow queries, errors, connections)
- ECS task events (OOM, deploy failures, restarts)
- BCP / product availability data (workers write structured JSON)
- GitHub Actions deployment events (webhook → Lambda → CloudWatch)
Sources going directly to Logz.io (bypass CloudWatch):
- OpenTelemetry traces — high-volume, latency-sensitive; Logz.io has a native OTel collector endpoint
- Frontend errors — Logz.io browser SDK ships errors from the user's browser (CloudWatch doesn't accept browser-side traffic)
Security alerting: GuardDuty as detection layer, Logz.io as primary SIEM
GuardDuty acts like an EDR for AWS infrastructure (anomalous API calls, compromised credentials, crypto-mining detection). Its findings — alongside app errors, failed-login bursts at WorkOS, RDS anomalies, and custom rule violations — funnel into Logz.io as the unified human-facing alert pane (analogous to EDR feeding into Splunk).
GuardDuty findings ┐
RDS / Postgres anomalies │
Failed-login bursts at WorkOS ├──► CloudWatch ──► Lambda ──► Logz.io ──► Alert
Application errors / 5xx spikes │ (single pane of glass)
Custom rule violations ┘
│
▼
PagerDuty / Slack / on-call SMS
AWS Security Hub is not used — Logz.io is the primary alert pane; AWS-native detection (GuardDuty) feeds into it directly.
MITRE ATT&CK tagging: every detection rule in Logz.io is tagged with the matching MITRE ATT&CK technique ID(s) where applicable — e.g. T1078 (valid accounts), T1190 (exploit public-facing app), T1059 (command-and-scripting interpreter), T1486 (data encrypted for impact). This makes findings traceable to real-world attacker behavior and feeds the Incident Response record classification. The complementary design-phase view — STRIDE at the design stage — is owned by the Security Requirements & Threat Modeling Policy.
Compliance posture
- SOC 2 Type II: audit log evidence (both tiers), access controls, monitoring + alerting controls all satisfied
- Amendment 13 (PPL): customer audit logs stay in their tenant schema (no cross-border concerns); internal logs ship to Logz.io EU region (Israel-EU adequacy applies); breach notification supported by alert pipeline
- Nimbus Phase 2: Logz.io Israel offering (if available) or self-hosted ELK in Tel Aviv to be evaluated when first Nimbus customer materializes
- ISO 27001 Annex A.12: operations security + logging requirements satisfied
- GDPR: data subject access requests served from tenant-scoped audit log; Logz.io listed in customer DPAs
Retention
| Data | Where | Retention |
|---|---|---|
| Customer audit/activity (tenant-facing) | Tenant Postgres schema | 7 years |
| Internal app logs | Logz.io (hot) | 30 days |
| Internal app logs (archive) | S3 Glacier | 7 years |
| CloudTrail | S3 with Object Lock | 7 years |
| VPC Flow Logs | CloudWatch → S3 Glacier | 30 days hot, 1 year cold |
| GuardDuty findings | CloudWatch + Logz.io | 90 days hot, 7 years archived |
| Distributed traces | Logz.io APM | 14 days |
| Frontend errors | Logz.io | 30 days |