fix: preserve handled display cabinet cases
This commit is contained in:
@@ -1,5 +1,37 @@
|
||||
# Task Todo
|
||||
|
||||
## Current Task: Runtime/API Case State Reopen Fix
|
||||
|
||||
**Goal:** When the management API marks a display-cabinet case as handled, the runtime process must not later append a newer `open` snapshot for the same case from stale in-memory state.
|
||||
|
||||
- [x] Add a failing regression test for API-written `handled` state being preserved when runtime persists later events.
|
||||
- [x] Fix runtime case persistence to reconcile with the latest JSONL snapshots before applying new events.
|
||||
- [x] Run targeted case/runtime tests.
|
||||
- [x] Record remote chain verification and deployment status.
|
||||
|
||||
### Findings
|
||||
|
||||
- On `xiaozheng@10.8.0.23`, `case_batch_000911` was marked `handled` at `2026-06-15T07:27:12Z`, then runtime appended a newer `open` snapshot for the same case at `2026-06-15T15:38:03+08:00`.
|
||||
- The API and runtime are separate processes sharing `logs/cases.jsonl`; runtime keeps a long-lived `CaseStore` loaded at startup and did not see the API-written handled snapshot.
|
||||
|
||||
### Verification
|
||||
|
||||
- RED:
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests.test_main.RuntimeRestoreTests.test_persist_case_updates_preserves_api_handled_snapshot -v`
|
||||
- Result before fix: failed because runtime appended a later `open` snapshot.
|
||||
- Local targeted verification:
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests.test_main.RuntimeRestoreTests.test_persist_case_updates_preserves_api_handled_snapshot -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_cases.py -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
|
||||
- Result: all passed.
|
||||
- Remote deployment:
|
||||
- Synced only `src/cold_display_guard/main.py` to `xiaozheng@10.8.0.23:/home/xiaozheng/cold_display_guard/src/cold_display_guard/main.py`.
|
||||
- Ran `docker compose --env-file deploy/cold-display-guard.env -f deploy/docker-compose.yml up -d --build cold-display-guard-runtime`.
|
||||
- Compose recreated `cold-display-guard-api` and `cold-display-guard-runtime`; health check returned `status=ok`.
|
||||
- Remote behavior check:
|
||||
- Ran the same API-handled/runtime-later-event scenario inside `cold-display-guard-runtime` using a temp JSONL file.
|
||||
- Result: `{"handled_source": "manual", "latest_status": "handled", "new_snapshots": 0}`.
|
||||
|
||||
- [x] Review the current project instructions and check for task-relevant lessons.
|
||||
- [x] Inspect the OTA upload API document and current runtime/webhook capture path.
|
||||
- [x] Create an isolated worktree for alarm snapshot upload implementation.
|
||||
@@ -339,3 +371,51 @@
|
||||
- `GET /api/manage/health` returned `status=ok` and `runtime_status=running`.
|
||||
- Running container uses `fontsize=13`, `boxcolor=black@0.34`, and `boxborderw=2` for region labels.
|
||||
- `cold-display-guard-runtime` logs show normal startup after restart.
|
||||
|
||||
## Current Task: Limit Alert Snapshot Overlay To Event Zones
|
||||
|
||||
**Goal:** Uploaded warning/alarm screenshots should only draw the cold-display region polygons and names for the zones that actually triggered the warning/alarm event. Other configured zones and the trash ROI should not be drawn on those uploaded screenshots.
|
||||
|
||||
**Plan:** Keep the full calibration overlay helper available for tests and general use, but pass alert event zone IDs from `capture_alert_snapshot` into the overlay loader and disable trash ROI drawing for alert uploads.
|
||||
|
||||
- [x] Add a regression test proving alert snapshot upload only annotates the triggering event zone.
|
||||
- [x] Filter snapshot overlay regions by event `zone_id` during alert upload.
|
||||
- [x] Preserve full overlay behavior when `apply_calibration_overlay` is called without filters.
|
||||
- [x] Run full local Python verification.
|
||||
- [x] Deploy `alarm_snapshots.py` to `xiaozheng@10.8.0.23`.
|
||||
- [x] Verify remote API/runtime health and deployed filtered-overlay behavior.
|
||||
|
||||
### Review
|
||||
|
||||
- Local verification passed:
|
||||
- `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`
|
||||
- `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`104` tests)
|
||||
- Deployed only `src/cold_display_guard/alarm_snapshots.py` to `xiaozheng@10.8.0.23` after backing up the previous remote file; live config was not overwritten.
|
||||
- Rebuilt `cold-display-guard:dev` and restarted `cold-display-guard-api` plus `cold-display-guard-runtime`.
|
||||
- Remote verification passed:
|
||||
- `GET /api/manage/health` returned `status=ok` and `runtime_status=running`.
|
||||
- Container-side smoke test for a zone-1 alert returned `zone1_changed=True`, `zone2_unchanged=True`, and `trash_unchanged=True`.
|
||||
- API/runtime logs show normal startup after restart.
|
||||
|
||||
## Current Task: Check Webhook Duplicate Delivery
|
||||
|
||||
**Goal:** Verify whether `cold_display_guard` is sending duplicate Webhook requests to `video-recognition` on `xiaozheng@10.8.0.23`.
|
||||
|
||||
**Investigation:** Compare the sending code path, remote webhook delivery audit, retry queue state, cold-display event/case logs, `video-recognition` HTTP logs, and the receiver-side JSONL payloads.
|
||||
|
||||
- [x] Inspect sender code path for direct event/case delivery and retry drain behavior.
|
||||
- [x] Confirm remote Webhook config uses the same URL for `event_url` and `case_url`.
|
||||
- [x] Check sender delivery audit for duplicate receiver `task_id` values.
|
||||
- [x] Check retry queue for pending successful redelivery risk.
|
||||
- [x] Check receiver-side cold-display JSONL for duplicate payloads and duplicate business keys.
|
||||
- [x] Trace the only coarse duplicate-looking case around `batch_000898`.
|
||||
|
||||
### Review
|
||||
|
||||
- Current remote config sends both `batch_event` and `case_event` to `http://10.8.0.23:8080/api/webhook/cold-display-guard`, so one business transition can produce two HTTP POSTs to the same endpoint with different `kind` values.
|
||||
- Sender audit `logs/webhook_delivery.jsonl` contains `3056` records total; recent valid delivery has `321` direct `ok` records and `0` retry `ok` records.
|
||||
- Receiver-returned `task_id` values are unique: `321` unique task IDs and `0` duplicate task IDs.
|
||||
- Retry queue has `547` latest retry items, all `dead_letter`; there are no pending retries.
|
||||
- Receiver-side `video-recognition` cold-display files for `2026-06-15` contain `181` business payloads; exact payload duplicates are `0`, and fine-grained business key duplicates are `0`.
|
||||
- Sender `events.jsonl` contains `3325` events; duplicate `(batch_id, event, ts, zone_id)` keys are `0`.
|
||||
- The only coarse duplicate-looking receiver entry was `batch_000898` at `13:20:26`: the same frame emitted `time_pre_warning` and `pre_warning_handled`, which produced separate `case_event` actions `created` and `handled`. This is not the same Webhook request repeated.
|
||||
|
||||
Reference in New Issue
Block a user