feat: draw calibration overlay on alarm snapshots

Before JPEG encoding and OTA upload, paint the configured [[zones]]
polygons (yellow) and the [trash].roi (red) directly onto a copied
Frame.rgb so uploaded alarm snapshots visually carry the calibrated
regions. Normalized coordinates are clamped to image bounds, the source
frame stays untouched for downstream runtime processing, and
non-alert/disabled paths are unchanged. Adds stdlib-only polygon
fill/outline helpers plus focused unit tests.

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2026-06-15 12:34:46 +08:00
parent 547fb6290f
commit 46889c0621
3 changed files with 424 additions and 1 deletions

View File

@@ -64,3 +64,168 @@
- `PYTHONPATH=src python3 -m unittest discover -s tests -v`
- Deployed updated code to `xiaozheng@10.8.0.11` without overwriting the remote `config/example.toml`, rebuilt `cold-display-guard:dev`, and restarted only `cold-display-guard-api` plus `cold-display-guard-runtime`.
- Natural post-deploy traffic did not arrive during the 2-minute observation window, so final runtime verification used the deployed container to build representative batch/case webhook payloads with the live remote config and confirmed `camera_ip = 192.168.3.4` plus all new downstream fields were present.
## Current Task: Deploy To 192.168.5.103
- [x] Inspect the existing deployment layout and active containers on `xiaozheng@192.168.5.103`.
- [x] Verify the exact webhook route on that host before writing config.
- [x] Sync the current project code to the remote deployment directory without overwriting the live RTSP and calibration config.
- [x] Configure the remote webhook settings for the local `video-recognition` receiver.
- [x] Rebuild and restart the remote API/runtime containers, then verify health and outbound webhook configuration.
### Deployment Findings
- Existing deployment path on `192.168.5.103` is `/home/xiaozheng/cold_display_guard`, not `~/apps/cold-display-guard/app`.
- The host already runs `cold-display-guard-api`, `cold-display-guard-runtime`, and `cold-display-guard-web` on ports `19080` and `23000`.
- The same host also runs `video-recognition`, and a direct probe to `http://127.0.0.1:8080/api/webhook/cold-display-guard` returned `200 OK`, so this is the verified webhook target for this environment.
### Deployment Verification
- From inside the running `cold-display-guard-api` container on `192.168.5.103`:
- `http://host.docker.internal:8080/api/webhook/cold-display-guard` failed DNS resolution.
- `http://172.17.0.1:8080/api/webhook/cold-display-guard` returned `200 OK`.
- `http://192.168.5.103:8080/api/webhook/cold-display-guard` returned `200 OK`.
- The configured webhook target was set to `http://192.168.5.103:8080/api/webhook/cold-display-guard` for both `event_url` and `case_url`.
- Remote config was enriched to include:
- `case_sink`
- `alarm_snapshot_upload`
- `webhook_retry_sink`
- `webhook_delivery_sink`
- `webhooks`
- Code sync used `rsync` with `config/example.toml` excluded so the live RTSP URL and calibration polygons were preserved.
- Remote rebuild/restart completed for `cold-display-guard-api` and `cold-display-guard-runtime`.
- Verified after restart:
- `GET http://127.0.0.1:19080/api/manage/health` returned `status=ok`
- `GET http://127.0.0.1:19080/api/manage/config` showed `webhooks.enabled=true`
- `event_url` and `case_url` both active on `http://192.168.5.103:8080/api/webhook/cold-display-guard`
- `alarm_snapshot_upload.enabled=true`
## Current Task: Alarm Snapshot Calibration Overlay
**Goal:** Webhook-linked uploaded alarm snapshots should visually include the calibrated cold display zones and trash confirmation ROI from the current config.
**Design:** Keep the existing runtime flow intact: capture current RTSP frame, process events, then upload an alarm snapshot only for warning/alarm events. Before JPEG encoding, build overlay regions from `[[zones]]` plus `[trash].roi`, clamp normalized polygon coordinates to the image bounds, draw a semi-transparent fill and visible outline directly onto a copied `Frame.rgb`, and pass that annotated frame to the existing encoder/uploader. Do not change `BatchEngine`, Webhook payload shape, OTA upload protocol, or management snapshot capture.
- [x] Review task-relevant lessons and current dirty worktree.
- [x] Inspect `alarm_snapshots.py`, `main.py`, config polygon shape, and existing tests.
- [x] Write a failing unit test proving alert snapshot upload encodes an annotated frame when zones/trash ROI are configured.
- [x] Write focused unit tests for polygon overlay behavior using a tiny RGB frame.
- [x] Run targeted tests and confirm the new tests fail for the expected missing overlay behavior.
- [x] Implement the smallest standard-library overlay helper in `src/cold_display_guard/alarm_snapshots.py`.
- [x] Wire `capture_alert_snapshot` to apply configured overlays before JPEG encoding.
- [x] Run targeted snapshot/runtime tests.
- [x] Run the full Python test suite.
### Review
- Added `apply_calibration_overlay` in `src/cold_display_guard/alarm_snapshots.py` to draw configured food-zone polygons in yellow and the trash ROI in red onto a copied frame before JPEG encoding and OTA upload.
- The overlay clamps normalized coordinates to image bounds, draws semi-transparent fills plus outlines, and leaves the original `Frame.rgb` unchanged for downstream runtime processing.
- `capture_alert_snapshot` now encodes the annotated frame when warning/alarm events trigger snapshot upload; non-alert events and disabled upload behavior are unchanged.
- Targeted verification passed:
- `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`
- `PYTHONPATH=src python3 -m unittest tests/test_main.py -v`
- Full verification passed:
- `PYTHONPATH=src python3 -m unittest discover -s tests -v`
## Current Task: Deploy Overlay Update To 10.8.0.23
**Goal:** Deploy the alarm snapshot calibration overlay change to `xiaozheng@10.8.0.23` without overwriting live RTSP/calibration config or unrelated local changes.
**Plan:** Inspect the remote deployment layout first, confirm which containers are active, sync only the runtime source file required for the overlay change, rebuild/restart the API/runtime services that use the Python image, and verify both service health and the deployed source code.
- [x] Inspect remote deployment directory, Docker/Compose files, and active containers on `xiaozheng@10.8.0.23`.
- [x] Confirm the remote config file remains present and is not overwritten.
- [x] Sync `src/cold_display_guard/alarm_snapshots.py` to the remote deployment path.
- [x] Rebuild and restart only the affected `cold-display-guard-api` and `cold-display-guard-runtime` services when Compose is available.
- [x] Verify management API health after restart.
- [x] Verify the deployed remote source contains `apply_calibration_overlay`.
### Deployment Review
- Remote deployment path confirmed as `/home/xiaozheng/cold_display_guard`.
- Active services before deployment: `cold-display-guard-api`, `cold-display-guard-runtime`, and `cold-display-guard-web`.
- Remote live `config/example.toml` was checked before and after deployment and was not overwritten.
- Synced only `src/cold_display_guard/alarm_snapshots.py` to avoid deploying unrelated local `web/nginx.conf` changes.
- Created a timestamped backup of the previous remote `alarm_snapshots.py` beside the source file before syncing.
- Rebuilt `cold-display-guard:dev` with `docker compose --env-file deploy/cold-display-guard.env -f deploy/docker-compose.yml build cold-display-guard-api`.
- Restarted only `cold-display-guard-api` and `cold-display-guard-runtime` with Compose; `cold-display-guard-web` remained untouched.
- Verification passed:
- `curl http://127.0.0.1:19080/api/manage/health` returned `status=ok` and `runtime_status=running`.
- `docker exec cold-display-guard-api python3 -c ...` confirmed `apply_calibration_overlay` exists in the running image with signature `(frame, config) -> Frame`.
- API and runtime logs show normal startup after restart.
## Current Task: Update Timing Parameters On 10.8.0.23
**Goal:** Adjust the live timing settings on `xiaozheng@10.8.0.23` per operator request.
**Applied mapping:** The current application has no separate pre-warning threshold. It supports `max_dwell_seconds` for the time alarm/overdue threshold and `trash_confirmation_seconds` for the disposal confirmation window before warning escalation. Applied `max_dwell_seconds = 120` and `trash_confirmation_seconds = 30`.
- [x] Back up `/home/xiaozheng/cold_display_guard/config/example.toml`.
- [x] Update `[thresholds].max_dwell_seconds` from `300` to `120`.
- [x] Update `[thresholds].trash_confirmation_seconds` from `120` to `30`.
- [x] Restart `cold-display-guard-api` and `cold-display-guard-runtime`.
- [x] Verify `/api/manage/health`.
- [x] Verify `/api/manage/config` returns `{"max_dwell_seconds": 120, "trash_confirmation_seconds": 30}`.
### Timing Update Review
- Remote config was edited in place after creating a timestamped backup.
- `cold-display-guard-api` and `cold-display-guard-runtime` were explicitly restarted with Docker Compose.
- `cold-display-guard-web` was not restarted.
- Verification passed:
- `GET http://127.0.0.1:19080/api/manage/health` returned `status=ok` and `runtime_status=running`.
- `GET http://127.0.0.1:19080/api/manage/config` returned `max_dwell_seconds = 120` and `trash_confirmation_seconds = 30`.
- Container status showed `cold-display-guard-api` healthy and `cold-display-guard-runtime` running after restart.
- Note: requested `预警时长 = 1min` is not independently configurable in the current codebase; supporting distinct pre-warning at 60 seconds and overdue at 120 seconds would require a code change.
## Current Task: Pre-Warning Alarm Flow And Full Webhook/MQTT Chain
**Goal:** Implement the requested camera-side timing flow, deploy it to `xiaozheng@10.8.0.23`, and verify the Webhook -> `video_recognition_local` -> MQTT -> `store_data_platform` chain.
**Design:** Keep all timing decisions inside `cold_display_guard.BatchEngine`. Add separate thresholds for pre-warning, alarm, and alarm-removal timeout; emit explicit lifecycle events so downstream services do not infer camera-side timers. Keep `video_recognition_local` as a transparent Webhook/MQTT bridge, and update `store_data_platform` only where event names map to notifications, case types, and CRM penalty submission.
- [x] Review task-relevant instructions, lessons, and dirty worktree.
- [x] Inspect the current cold-display engine, case store, webhook payload, and tests.
- [x] Inspect `video_recognition_local` cold-display Webhook receiver and MQTT publisher.
- [x] Inspect `store_data_platform` cold-display MQTT consumer, notification mapping, and CRM submission trigger.
- [x] Inspect `xiaozheng@10.8.0.23` active containers and deployment paths.
- [x] Add failing cold-display engine/case/config/webhook tests for `time_pre_warning`, `pre_warning_handled`, `time_alarm`, and `alarm_removal_timeout`.
- [x] Implement the camera-side state machine and config fields.
- [x] Add/adjust `video_recognition_local` passthrough tests for the new event names.
- [x] Add/adjust `store_data_platform` tests and mappings for new event semantics.
- [x] Run local targeted and full relevant verification.
- [x] Deploy changed services to `xiaozheng@10.8.0.23` without overwriting live RTSP/calibration secrets.
- [x] Update the remote timing config to `pre_warning_seconds=60`, `max_dwell_seconds=120`, `alarm_removal_seconds=30`, `trash_confirmation_seconds=30`.
- [x] Verify remote Webhook target reachability from the cold-display container to local `video-recognition`.
- [x] Observe cold-display, video-recognition, MQTT, and platform logs; record the result.
### Current Findings
- `cold_display_guard` currently has only `max_dwell_seconds` and `trash_confirmation_seconds`; it cannot independently represent 1-minute pre-warning, 2-minute alarm, and 30-second alarm-removal timeout.
- `video_recognition_local` receives `/api/webhook/cold-display-guard` payloads as generic JSON and forwards them to MQTT; new event names should remain transparent, but tests should lock this behavior.
- `store_data_platform` currently treats `time_alarm` and `batch_pending_disposal` as warning notifications, and only `warning_escalated` triggers CRM penalty submission. This must change so `time_pre_warning` is the warning, `time_alarm` is the alert reminder, and `alarm_removal_timeout` triggers CRM submission.
- On `10.8.0.23`, active containers include `cold-display-guard-*`, `video-recognition`, and `mosquitto`; `video-recognition` runs with host networking, while `cold-display-guard-api` runs on its Compose network.
### Local Verification
- Cold-display full Python suite passed: `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`98` tests).
- `video_recognition_local` cold-display focused tests passed: `go test ./internal/server ./internal/mqtt ./cmd -run 'TestColdDisplayGuard|Test.*ColdDisplayGuard' -count=1`.
- `store_data_platform` display-cabinet service focused tests passed: `go test ./store_data/service -run 'Test.*StoreDisplayCabinet|TestResolveStoreDisplayCabinet.*|TestShouldSubmitStoreDisplayCabinetPenalty|TestBuildStoreDisplayCabinet.*' -count=1`.
### Deployment Review
- Synced only these cold-display source files to `xiaozheng@10.8.0.23:/home/xiaozheng/cold_display_guard/src/cold_display_guard/`: `models.py`, `config.py`, `engine.py`, `cases.py`, `webhooks.py`.
- Backed up the remote source files and live `config/example.toml` before deployment.
- Updated the live remote thresholds to `pre_warning_seconds=60`, `max_dwell_seconds=120`, `alarm_removal_seconds=30`, and `trash_confirmation_seconds=30`.
- Updated the live remote Webhook target from the unreachable old host to `http://10.8.0.23:8080/api/webhook/cold-display-guard`.
- Rebuilt `cold-display-guard:dev` and restarted only `cold-display-guard-api` and `cold-display-guard-runtime`.
- Remote verification passed:
- `GET /api/manage/health` returned `status=ok` and `runtime_status=running`.
- `GET /api/manage/config` returned the four expected threshold values and the new Webhook target.
- Container-side synthetic engine run emitted `batch_started`, `time_pre_warning`, `time_alarm`, `alarm_removal_timeout`, then `batch_pending_disposal` plus `batch_discarded`.
- Natural runtime log emitted `alarm_removal_timeout` for `batch_000881` at `2026-06-15T11:52:20+08:00`.
- Webhook delivery for that event returned HTTP `200` from `video-recognition`.
- `video_recognition_local` result JSONL recorded both `alarm_removal_timeout` batch and case events.
- MQTT probe confirmed `video-recognition` published to `video/cold-display-guard/result/cold-display-guard` with `device_identifier=cold-display-guard`.
- `store_data_platform` is not deployed on `10.8.0.23` under that repository name or as an identifiable container; platform handling changes were completed and verified in the local repository.
- The cold-display retry queue has no pending entries; old `192.168.5.103` failures are already dead-letter history.