feat: add webhook case management
This commit is contained in:
@@ -0,0 +1,216 @@
|
||||
# Webhook Case Management Design
|
||||
|
||||
**Goal:** Add outbound webhooks plus a local case-management layer so the project can both push runtime facts to external systems and independently track pending/handled cases in the local management console.
|
||||
|
||||
**Architecture:** Keep the existing runtime event stream as the source of operational facts. Add a separate case-state layer that consumes selected runtime events, persists case state transitions, exposes management APIs, and emits case webhooks without mutating the underlying batch facts. Integrate manual handling and external callback handling through the same case-state model.
|
||||
|
||||
**Tech Stack:** Python 3.11+ standard library backend, JSONL persistence, Vite + vanilla JavaScript frontend, existing unittest and Node test suites.
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
This design extends the current project in four focused areas:
|
||||
|
||||
1. Add outbound webhook delivery for runtime batch events.
|
||||
2. Add a local case model for operator workflow.
|
||||
3. Add management APIs for listing, summarizing, manually handling, and externally updating cases.
|
||||
4. Add frontend views and actions for local case operations.
|
||||
|
||||
The runtime batch engine remains the producer of factual detection events. Case handling is a downstream interpretation layer.
|
||||
|
||||
## Current Constraints
|
||||
|
||||
- The current runtime writes facts to `logs/events.jsonl` and diagnostics to `logs/runtime_diagnostics.jsonl`.
|
||||
- The management API is a small standard-library HTTP server and should stay that way.
|
||||
- The frontend already renders runtime metrics and runtime events and should continue to do so.
|
||||
- The user-selected workflow requires both manual handling and external callback handling.
|
||||
- The user-selected workflow requires both event webhooks and case webhooks.
|
||||
- The events that should enter the local pending-case flow are `time_alarm`, `batch_pending_disposal`, and `warning_escalated`.
|
||||
|
||||
## Design Summary
|
||||
|
||||
The system is split into three cooperating layers:
|
||||
|
||||
1. **Batch event layer**
|
||||
Produces facts such as `batch_started`, `time_alarm`, `batch_pending_disposal`, `batch_discarded`, and `warning_escalated`. These remain append-only runtime facts.
|
||||
|
||||
2. **Case state layer**
|
||||
Consumes selected batch events and maintains a separate per-batch local case state. The case layer owns pending/handled workflow and does not rewrite prior runtime facts.
|
||||
|
||||
3. **Integration layer**
|
||||
Delivers outbound event and case webhooks, accepts external case callbacks, and records webhook delivery attempts for audit and debugging.
|
||||
|
||||
## Persistence Model
|
||||
|
||||
- `logs/events.jsonl`
|
||||
Existing runtime fact log. No schema removals.
|
||||
- `logs/cases.jsonl`
|
||||
New append-only case transition log. Each line records a case snapshot after a state change.
|
||||
- `logs/webhook_delivery.jsonl`
|
||||
New append-only webhook delivery audit log. Each line records an attempted outbound delivery result.
|
||||
|
||||
`events.jsonl` remains the source of factual batch history. `cases.jsonl` is the source of case workflow state. `webhook_delivery.jsonl` is operational telemetry only.
|
||||
|
||||
## Case Model
|
||||
|
||||
Each batch can own at most one local case. A case is created or updated from selected batch events and then independently handled by a local operator or external callback.
|
||||
|
||||
### Case fields
|
||||
|
||||
- `case_id`
|
||||
- `batch_id`
|
||||
- `camera_id`
|
||||
- `zone_id`
|
||||
- `zone_label`
|
||||
- `case_type`
|
||||
- `case_status`
|
||||
- `source_event`
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
- `handled_at`
|
||||
- `handled_by`
|
||||
- `handled_source`
|
||||
- `last_event_ts`
|
||||
- `payload`
|
||||
|
||||
### Case type values
|
||||
|
||||
- `time_alarm`
|
||||
- `pending_disposal`
|
||||
- `warning_escalated`
|
||||
|
||||
### Case status values
|
||||
|
||||
- `open`
|
||||
- `handled`
|
||||
|
||||
### Handled source values
|
||||
|
||||
- `manual`
|
||||
- `webhook_callback`
|
||||
- `auto_closed`
|
||||
|
||||
## Case State Flow
|
||||
|
||||
1. `time_alarm`
|
||||
Create a case if one does not exist for the batch. If a case already exists, keep it open and refresh timestamps.
|
||||
|
||||
2. `batch_pending_disposal`
|
||||
Create a case if one does not exist. If one exists, update it in place and upgrade `case_type` to `pending_disposal`.
|
||||
|
||||
3. `warning_escalated`
|
||||
Update the same case in place and upgrade `case_type` to `warning_escalated`.
|
||||
|
||||
4. Manual handling
|
||||
Mark the case as `handled`, set `handled_source=manual`, record `handled_by`, and append the new snapshot to `cases.jsonl`.
|
||||
|
||||
5. External callback handling
|
||||
Mark the case as `handled`, set `handled_source=webhook_callback`, optionally record `handled_by` and `source_ref`, and append the new snapshot to `cases.jsonl`.
|
||||
|
||||
6. `batch_discarded`
|
||||
If the related case is still `open`, close it automatically with `handled_source=auto_closed`.
|
||||
|
||||
Handled cases must not reopen when stale older events are replayed or re-read. Only new event processing in forward time may mutate an existing case. Restore logic must preserve handled status across runtime/API restarts.
|
||||
|
||||
## Backend Components
|
||||
|
||||
- Create `src/cold_display_guard/cases.py` for case transition logic, persistence, restore, and summary helpers.
|
||||
- Create `src/cold_display_guard/webhooks.py` for webhook config parsing, payload building, synchronous delivery, and delivery audit logging.
|
||||
- Extend `src/cold_display_guard/config.py` for webhook configuration and case/log sink paths.
|
||||
- Extend `src/cold_display_guard/main.py` to feed runtime events into case persistence and webhook delivery.
|
||||
- Extend `src/cold_display_guard/manage_api.py` to expose case listing, case summary, manual handling, and token-protected callback handling.
|
||||
|
||||
## API Design
|
||||
|
||||
All new endpoints stay under `/api/manage/*`.
|
||||
|
||||
- `GET /api/manage/cases`
|
||||
Query: `status=open|handled` optional, `limit` optional.
|
||||
- `GET /api/manage/cases/summary`
|
||||
Returns case counts and latest update time.
|
||||
- `POST /api/manage/cases/{case_id}/handle`
|
||||
Body: `handled_by` required, `note` optional.
|
||||
- `POST /api/manage/webhooks/case-update`
|
||||
Body: `case_id` required, `status` required and must equal `handled`, `handled_by` optional, `source_ref` optional.
|
||||
|
||||
The callback endpoint must require the configured shared token in the `X-Webhook-Token` header and must reject unauthenticated updates.
|
||||
|
||||
## Webhook Configuration
|
||||
|
||||
```toml
|
||||
[webhooks]
|
||||
enabled = true
|
||||
event_url = "https://example.com/runtime-events"
|
||||
case_url = "https://example.com/case-events"
|
||||
callback_token = "shared-secret"
|
||||
connect_timeout_seconds = 3
|
||||
read_timeout_seconds = 5
|
||||
```
|
||||
|
||||
## Outbound Webhook Delivery
|
||||
|
||||
Event webhook payload core fields:
|
||||
|
||||
- `kind = "batch_event"`
|
||||
- `event`
|
||||
- `ts`
|
||||
- `batch_id`
|
||||
- `camera_id`
|
||||
- `zone_id`
|
||||
- `zone_label`
|
||||
- `severity`
|
||||
- `state`
|
||||
|
||||
Case webhook payload core fields:
|
||||
|
||||
- `kind = "case_event"`
|
||||
- `action = "created" | "updated" | "handled"`
|
||||
- `case_id`
|
||||
- `case_type`
|
||||
- `case_status`
|
||||
- `batch_id`
|
||||
- `source_event`
|
||||
- `handled_source`
|
||||
- `updated_at`
|
||||
|
||||
Delivery rules:
|
||||
|
||||
- Local runtime facts and case state must be persisted before webhook failure can affect control flow.
|
||||
- Webhook failure must append a line to `logs/webhook_delivery.jsonl`.
|
||||
- Webhook failure must not stop local event persistence or local case persistence.
|
||||
- This batch does not add a retry queue.
|
||||
|
||||
## Frontend Changes
|
||||
|
||||
- Keep the current runtime event table for factual runtime events only.
|
||||
- Add a separate case table with:
|
||||
- `case_id`
|
||||
- `case_type`
|
||||
- `case_status`
|
||||
- `zone_label`
|
||||
- `batch_id`
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
- `handled_source`
|
||||
- Add manual-handle UI for `open` cases with `handled_by` required and `note` optional.
|
||||
- Add summary cards for:
|
||||
- `open_case_count`
|
||||
- `handled_case_count`
|
||||
- `time_alarm_case_count`
|
||||
- `pending_disposal_case_count`
|
||||
- `warning_escalated_case_count`
|
||||
|
||||
## Testing Plan
|
||||
|
||||
- Preserve existing batch engine behavior tests.
|
||||
- Add case tests for create, escalate, manual handle, callback handle, auto-close, and non-reopen behavior.
|
||||
- Add webhook tests for payloads, delivery success, and failure audit logging.
|
||||
- Add API tests for new case and callback endpoints.
|
||||
- Add frontend tests for case rendering, case summary mapping, and manual-handle request flow.
|
||||
|
||||
Verification commands:
|
||||
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest discover -s tests -v`
|
||||
- `node --test web/test/zone-state.test.js`
|
||||
- `cd web && pnpm build`
|
||||
Reference in New Issue
Block a user