Files
cold_display_guard/docs/superpowers/specs/2026-06-09-webhook-case-management-design.md

7.9 KiB

Webhook Case Management Design

Goal: Add outbound webhooks plus a local case-management layer so the project can both push runtime facts to external systems and independently track pending/handled cases in the local management console.

Architecture: Keep the existing runtime event stream as the source of operational facts. Add a separate case-state layer that consumes selected runtime events, persists case state transitions, exposes management APIs, and emits case webhooks without mutating the underlying batch facts. Integrate manual handling and external callback handling through the same case-state model.

Tech Stack: Python 3.11+ standard library backend, JSONL persistence, Vite + vanilla JavaScript frontend, existing unittest and Node test suites.


Scope

This design extends the current project in four focused areas:

  1. Add outbound webhook delivery for runtime batch events.
  2. Add a local case model for operator workflow.
  3. Add management APIs for listing, summarizing, manually handling, and externally updating cases.
  4. Add frontend views and actions for local case operations.

The runtime batch engine remains the producer of factual detection events. Case handling is a downstream interpretation layer.

Current Constraints

  • The current runtime writes facts to logs/events.jsonl and diagnostics to logs/runtime_diagnostics.jsonl.
  • The management API is a small standard-library HTTP server and should stay that way.
  • The frontend already renders runtime metrics and runtime events and should continue to do so.
  • The user-selected workflow requires both manual handling and external callback handling.
  • The user-selected workflow requires both event webhooks and case webhooks.
  • The events that should enter the local pending-case flow are time_alarm, batch_pending_disposal, and warning_escalated.

Design Summary

The system is split into three cooperating layers:

  1. Batch event layer Produces facts such as batch_started, time_alarm, batch_pending_disposal, batch_discarded, and warning_escalated. These remain append-only runtime facts.

  2. Case state layer Consumes selected batch events and maintains a separate per-batch local case state. The case layer owns pending/handled workflow and does not rewrite prior runtime facts.

  3. Integration layer Delivers outbound event and case webhooks, accepts external case callbacks, and records webhook delivery attempts for audit and debugging.

Persistence Model

  • logs/events.jsonl Existing runtime fact log. No schema removals.
  • logs/cases.jsonl New append-only case transition log. Each line records a case snapshot after a state change.
  • logs/webhook_delivery.jsonl New append-only webhook delivery audit log. Each line records an attempted outbound delivery result.

events.jsonl remains the source of factual batch history. cases.jsonl is the source of case workflow state. webhook_delivery.jsonl is operational telemetry only.

Case Model

Each batch can own at most one local case. A case is created or updated from selected batch events and then independently handled by a local operator or external callback.

Case fields

  • case_id
  • batch_id
  • camera_id
  • zone_id
  • zone_label
  • case_type
  • case_status
  • source_event
  • created_at
  • updated_at
  • handled_at
  • handled_by
  • handled_source
  • last_event_ts
  • payload

Case type values

  • time_alarm
  • pending_disposal
  • warning_escalated

Case status values

  • open
  • handled

Handled source values

  • manual
  • webhook_callback
  • auto_closed

Case State Flow

  1. time_alarm Create a case if one does not exist for the batch. If a case already exists, keep it open and refresh timestamps.

  2. batch_pending_disposal Create a case if one does not exist. If one exists, update it in place and upgrade case_type to pending_disposal.

  3. warning_escalated Update the same case in place and upgrade case_type to warning_escalated.

  4. Manual handling Mark the case as handled, set handled_source=manual, record handled_by, and append the new snapshot to cases.jsonl.

  5. External callback handling Mark the case as handled, set handled_source=webhook_callback, optionally record handled_by and source_ref, and append the new snapshot to cases.jsonl.

  6. batch_discarded If the related case is still open, close it automatically with handled_source=auto_closed.

Handled cases must not reopen when stale older events are replayed or re-read. Only new event processing in forward time may mutate an existing case. Restore logic must preserve handled status across runtime/API restarts.

Backend Components

  • Create src/cold_display_guard/cases.py for case transition logic, persistence, restore, and summary helpers.
  • Create src/cold_display_guard/webhooks.py for webhook config parsing, payload building, synchronous delivery, and delivery audit logging.
  • Extend src/cold_display_guard/config.py for webhook configuration and case/log sink paths.
  • Extend src/cold_display_guard/main.py to feed runtime events into case persistence and webhook delivery.
  • Extend src/cold_display_guard/manage_api.py to expose case listing, case summary, manual handling, and token-protected callback handling.

API Design

All new endpoints stay under /api/manage/*.

  • GET /api/manage/cases Query: status=open|handled optional, limit optional.
  • GET /api/manage/cases/summary Returns case counts and latest update time.
  • POST /api/manage/cases/{case_id}/handle Body: handled_by required, note optional.
  • POST /api/manage/webhooks/case-update Body: case_id required, status required and must equal handled, handled_by optional, source_ref optional.

The callback endpoint must require the configured shared token in the X-Webhook-Token header and must reject unauthenticated updates.

Webhook Configuration

[webhooks]
enabled = true
event_url = "https://example.com/runtime-events"
case_url = "https://example.com/case-events"
callback_token = "shared-secret"
connect_timeout_seconds = 3
read_timeout_seconds = 5

Outbound Webhook Delivery

Event webhook payload core fields:

  • kind = "batch_event"
  • event
  • ts
  • batch_id
  • camera_id
  • zone_id
  • zone_label
  • severity
  • state

Case webhook payload core fields:

  • kind = "case_event"
  • action = "created" | "updated" | "handled"
  • case_id
  • case_type
  • case_status
  • batch_id
  • source_event
  • handled_source
  • updated_at

Delivery rules:

  • Local runtime facts and case state must be persisted before webhook failure can affect control flow.
  • Webhook failure must append a line to logs/webhook_delivery.jsonl.
  • Webhook failure must not stop local event persistence or local case persistence.
  • This batch does not add a retry queue.

Frontend Changes

  • Keep the current runtime event table for factual runtime events only.
  • Add a separate case table with:
    • case_id
    • case_type
    • case_status
    • zone_label
    • batch_id
    • created_at
    • updated_at
    • handled_source
  • Add manual-handle UI for open cases with handled_by required and note optional.
  • Add summary cards for:
    • open_case_count
    • handled_case_count
    • time_alarm_case_count
    • pending_disposal_case_count
    • warning_escalated_case_count

Testing Plan

  • Preserve existing batch engine behavior tests.
  • Add case tests for create, escalate, manual handle, callback handle, auto-close, and non-reopen behavior.
  • Add webhook tests for payloads, delivery success, and failure audit logging.
  • Add API tests for new case and callback endpoints.
  • Add frontend tests for case rendering, case summary mapping, and manual-handle request flow.

Verification commands:

  • eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest discover -s tests -v
  • node --test web/test/zone-state.test.js
  • cd web && pnpm build