# Task Todo

## Current Task: Runtime/API Case State Reopen Fix

**Goal:** When the management API marks a display-cabinet case as handled, the runtime process must not later append a newer `open` snapshot for the same case from stale in-memory state.

- [x] Add a failing regression test for API-written `handled` state being preserved when runtime persists later events.
- [x] Fix runtime case persistence to reconcile with the latest JSONL snapshots before applying new events.
- [x] Run targeted case/runtime tests.
- [x] Record remote chain verification and deployment status.

### Findings

- On `xiaozheng@10.8.0.23`, `case_batch_000911` was marked `handled` at `2026-06-15T07:27:12Z`, then runtime appended a newer `open` snapshot for the same case at `2026-06-15T15:38:03+08:00`.
- The API and runtime are separate processes sharing `logs/cases.jsonl`; runtime keeps a long-lived `CaseStore` loaded at startup and did not see the API-written handled snapshot.

### Verification

- RED:
  - `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests.test_main.RuntimeRestoreTests.test_persist_case_updates_preserves_api_handled_snapshot -v`
  - Result before fix: failed because runtime appended a later `open` snapshot.
- Local targeted verification:
  - `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests.test_main.RuntimeRestoreTests.test_persist_case_updates_preserves_api_handled_snapshot -v`
  - `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_cases.py -v`
  - `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
  - Result: all passed.
- Remote deployment:
  - Synced only `src/cold_display_guard/main.py` to `xiaozheng@10.8.0.23:/home/xiaozheng/cold_display_guard/src/cold_display_guard/main.py`.
  - Ran `docker compose --env-file deploy/cold-display-guard.env -f deploy/docker-compose.yml up -d --build cold-display-guard-runtime`.
  - Compose recreated `cold-display-guard-api` and `cold-display-guard-runtime`; health check returned `status=ok`.
- Remote behavior check:
  - Ran the same API-handled/runtime-later-event scenario inside `cold-display-guard-runtime` using a temp JSONL file.
  - Result: `{"handled_source": "manual", "latest_status": "handled", "new_snapshots": 0}`.

- [x] Review the current project instructions and check for task-relevant lessons.
- [x] Inspect the OTA upload API document and current runtime/webhook capture path.
- [x] Create an isolated worktree for alarm snapshot upload implementation.
- [x] Write the detailed implementation plan to `docs/superpowers/plans/2026-06-09-alarm-snapshot-upload.md`.
- [x] Execute alarm snapshot upload client TDD cycle.
- [x] Execute runtime and webhook payload integration TDD cycle.
- [x] Update config surface, docs, and verification notes.
- [x] Run targeted verification and final full verification.

## Notes

- `tasks/lessons.md` is absent in this repository/worktree, so there were no prior session lessons to review.
- Upload API reference: `/Users/glo/code/go/wenma/ai_manager/zd-ai-manager/chunk-upload-oss-service/UPLOAD_API.md`
- User-provided upload target: `https://ota.zhengxinshipin.com`
- User-provided token secret: `change-me-in-production`

## Review

- Plan saved to `docs/superpowers/plans/2026-06-09-alarm-snapshot-upload.md`.
- Chosen implementation keeps snapshot upload entirely outside `BatchEngine` and enriches webhook payloads from the runtime side using the already captured frame.
- Implemented `src/cold_display_guard/alarm_snapshots.py` for JPEG encoding plus OTA chunk-upload orchestration, runtime integration in `src/cold_display_guard/main.py`, webhook payload enrichment in `src/cold_display_guard/webhooks.py`, config exposure/secret stripping in `src/cold_display_guard/config.py` and `src/cold_display_guard/manage_api.py`, and config/doc updates in `config/example.toml` and `README_zh.md`.
- Targeted verification passed:
  - `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_alarm_snapshots.py -v`
  - `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
  - `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py tests/test_config.py tests/test_manage_api.py -v`
- Final verification passed:
  - `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest discover -s tests -v`
  - `cd web && pnpm install --frozen-lockfile && pnpm build`

## Current Task: Webhook Payload Field Gap Check

- [x] Pull the actual payload currently received by `video-recognition` and compare it against the required event list fields.
- [x] Patch webhook payload builders to include the missing non-store fields required by the downstream table.
- [x] Add or update focused webhook tests for the enriched payload shape.
- [x] Run targeted verification and record the result here.

### Current Findings

- Current received payload only includes `batch_id`, `camera_id`, `event`, `kind`, `severity`, `source_id`, `state`, `ts`, `zone_id`, and `zone_label`.
- Missing or not explicitly populated for the downstream event table: event code, camera IP, batch start time, removal time, dwell duration, discard flag, discard time, create time, alarm time, and update time.

### Field Gap Verification

- Actual receiver payload before the fix, from `video-recognition` result JSONL on `10.8.0.11`, confirmed only the base fields above and did not include the downstream table time/discard/IP fields.
- Updated `src/cold_display_guard/webhooks.py` so both `batch_event` and `case_event` now include:
  - `event_code`
  - `camera_ip`
  - `started_at`
  - `ended_at`
  - `removed_at`
  - `dwell_seconds`
  - `is_discarded`
  - `discarded_at`
  - `created_at`
  - `alerted_at`
  - `alarm_at`
  - `updated_at`
- `case_event` also now carries the missing contextual fields `camera_id`, `zone_id`, and `zone_label`.
- Verification passed:
  - `PYTHONPATH=src python3 -m unittest tests/test_webhooks.py -v`
  - `PYTHONPATH=src python3 -m unittest tests/test_main.py -v`
  - `PYTHONPATH=src python3 -m unittest discover -s tests -v`
- Deployed updated code to `xiaozheng@10.8.0.11` without overwriting the remote `config/example.toml`, rebuilt `cold-display-guard:dev`, and restarted only `cold-display-guard-api` plus `cold-display-guard-runtime`.
- Natural post-deploy traffic did not arrive during the 2-minute observation window, so final runtime verification used the deployed container to build representative batch/case webhook payloads with the live remote config and confirmed `camera_ip = 192.168.3.4` plus all new downstream fields were present.

## Current Task: Deploy To 192.168.5.103

- [x] Inspect the existing deployment layout and active containers on `xiaozheng@192.168.5.103`.
- [x] Verify the exact webhook route on that host before writing config.
- [x] Sync the current project code to the remote deployment directory without overwriting the live RTSP and calibration config.
- [x] Configure the remote webhook settings for the local `video-recognition` receiver.
- [x] Rebuild and restart the remote API/runtime containers, then verify health and outbound webhook configuration.

### Deployment Findings

- Existing deployment path on `192.168.5.103` is `/home/xiaozheng/cold_display_guard`, not `~/apps/cold-display-guard/app`.
- The host already runs `cold-display-guard-api`, `cold-display-guard-runtime`, and `cold-display-guard-web` on ports `19080` and `23000`.
- The same host also runs `video-recognition`, and a direct probe to `http://127.0.0.1:8080/api/webhook/cold-display-guard` returned `200 OK`, so this is the verified webhook target for this environment.

### Deployment Verification

- From inside the running `cold-display-guard-api` container on `192.168.5.103`:
  - `http://host.docker.internal:8080/api/webhook/cold-display-guard` failed DNS resolution.
  - `http://172.17.0.1:8080/api/webhook/cold-display-guard` returned `200 OK`.
  - `http://192.168.5.103:8080/api/webhook/cold-display-guard` returned `200 OK`.
- The configured webhook target was set to `http://192.168.5.103:8080/api/webhook/cold-display-guard` for both `event_url` and `case_url`.
- Remote config was enriched to include:
  - `case_sink`
  - `alarm_snapshot_upload`
  - `webhook_retry_sink`
  - `webhook_delivery_sink`
  - `webhooks`
- Code sync used `rsync` with `config/example.toml` excluded so the live RTSP URL and calibration polygons were preserved.
- Remote rebuild/restart completed for `cold-display-guard-api` and `cold-display-guard-runtime`.
- Verified after restart:
  - `GET http://127.0.0.1:19080/api/manage/health` returned `status=ok`
  - `GET http://127.0.0.1:19080/api/manage/config` showed `webhooks.enabled=true`
  - `event_url` and `case_url` both active on `http://192.168.5.103:8080/api/webhook/cold-display-guard`
  - `alarm_snapshot_upload.enabled=true`

## Current Task: Alarm Snapshot Calibration Overlay

**Goal:** Webhook-linked uploaded alarm snapshots should visually include the calibrated cold display zones and trash confirmation ROI from the current config.

**Design:** Keep the existing runtime flow intact: capture current RTSP frame, process events, then upload an alarm snapshot only for warning/alarm events. Before JPEG encoding, build overlay regions from `[[zones]]` plus `[trash].roi`, clamp normalized polygon coordinates to the image bounds, draw a semi-transparent fill and visible outline directly onto a copied `Frame.rgb`, and pass that annotated frame to the existing encoder/uploader. Do not change `BatchEngine`, Webhook payload shape, OTA upload protocol, or management snapshot capture.

- [x] Review task-relevant lessons and current dirty worktree.
- [x] Inspect `alarm_snapshots.py`, `main.py`, config polygon shape, and existing tests.
- [x] Write a failing unit test proving alert snapshot upload encodes an annotated frame when zones/trash ROI are configured.
- [x] Write focused unit tests for polygon overlay behavior using a tiny RGB frame.
- [x] Run targeted tests and confirm the new tests fail for the expected missing overlay behavior.
- [x] Implement the smallest standard-library overlay helper in `src/cold_display_guard/alarm_snapshots.py`.
- [x] Wire `capture_alert_snapshot` to apply configured overlays before JPEG encoding.
- [x] Run targeted snapshot/runtime tests.
- [x] Run the full Python test suite.

### Review

- Added `apply_calibration_overlay` in `src/cold_display_guard/alarm_snapshots.py` to draw configured food-zone polygons in yellow and the trash ROI in red onto a copied frame before JPEG encoding and OTA upload.
- The overlay clamps normalized coordinates to image bounds, draws semi-transparent fills plus outlines, and leaves the original `Frame.rgb` unchanged for downstream runtime processing.
- `capture_alert_snapshot` now encodes the annotated frame when warning/alarm events trigger snapshot upload; non-alert events and disabled upload behavior are unchanged.
- Targeted verification passed:
  - `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`
  - `PYTHONPATH=src python3 -m unittest tests/test_main.py -v`
- Full verification passed:
  - `PYTHONPATH=src python3 -m unittest discover -s tests -v`

## Current Task: Deploy Overlay Update To 10.8.0.23

**Goal:** Deploy the alarm snapshot calibration overlay change to `xiaozheng@10.8.0.23` without overwriting live RTSP/calibration config or unrelated local changes.

**Plan:** Inspect the remote deployment layout first, confirm which containers are active, sync only the runtime source file required for the overlay change, rebuild/restart the API/runtime services that use the Python image, and verify both service health and the deployed source code.

- [x] Inspect remote deployment directory, Docker/Compose files, and active containers on `xiaozheng@10.8.0.23`.
- [x] Confirm the remote config file remains present and is not overwritten.
- [x] Sync `src/cold_display_guard/alarm_snapshots.py` to the remote deployment path.
- [x] Rebuild and restart only the affected `cold-display-guard-api` and `cold-display-guard-runtime` services when Compose is available.
- [x] Verify management API health after restart.
- [x] Verify the deployed remote source contains `apply_calibration_overlay`.

### Deployment Review

- Remote deployment path confirmed as `/home/xiaozheng/cold_display_guard`.
- Active services before deployment: `cold-display-guard-api`, `cold-display-guard-runtime`, and `cold-display-guard-web`.
- Remote live `config/example.toml` was checked before and after deployment and was not overwritten.
- Synced only `src/cold_display_guard/alarm_snapshots.py` to avoid deploying unrelated local `web/nginx.conf` changes.
- Created a timestamped backup of the previous remote `alarm_snapshots.py` beside the source file before syncing.
- Rebuilt `cold-display-guard:dev` with `docker compose --env-file deploy/cold-display-guard.env -f deploy/docker-compose.yml build cold-display-guard-api`.
- Restarted only `cold-display-guard-api` and `cold-display-guard-runtime` with Compose; `cold-display-guard-web` remained untouched.
- Verification passed:
  - `curl http://127.0.0.1:19080/api/manage/health` returned `status=ok` and `runtime_status=running`.
  - `docker exec cold-display-guard-api python3 -c ...` confirmed `apply_calibration_overlay` exists in the running image with signature `(frame, config) -> Frame`.
  - API and runtime logs show normal startup after restart.

## Current Task: Update Timing Parameters On 10.8.0.23

**Goal:** Adjust the live timing settings on `xiaozheng@10.8.0.23` per operator request.

**Applied mapping:** The current application has no separate pre-warning threshold. It supports `max_dwell_seconds` for the time alarm/overdue threshold and `trash_confirmation_seconds` for the disposal confirmation window before warning escalation. Applied `max_dwell_seconds = 120` and `trash_confirmation_seconds = 30`.

- [x] Back up `/home/xiaozheng/cold_display_guard/config/example.toml`.
- [x] Update `[thresholds].max_dwell_seconds` from `300` to `120`.
- [x] Update `[thresholds].trash_confirmation_seconds` from `120` to `30`.
- [x] Restart `cold-display-guard-api` and `cold-display-guard-runtime`.
- [x] Verify `/api/manage/health`.
- [x] Verify `/api/manage/config` returns `{"max_dwell_seconds": 120, "trash_confirmation_seconds": 30}`.

### Timing Update Review

- Remote config was edited in place after creating a timestamped backup.
- `cold-display-guard-api` and `cold-display-guard-runtime` were explicitly restarted with Docker Compose.
- `cold-display-guard-web` was not restarted.
- Verification passed:
  - `GET http://127.0.0.1:19080/api/manage/health` returned `status=ok` and `runtime_status=running`.
  - `GET http://127.0.0.1:19080/api/manage/config` returned `max_dwell_seconds = 120` and `trash_confirmation_seconds = 30`.
  - Container status showed `cold-display-guard-api` healthy and `cold-display-guard-runtime` running after restart.
- Note: requested `预警时长 = 1min` is not independently configurable in the current codebase; supporting distinct pre-warning at 60 seconds and overdue at 120 seconds would require a code change.

## Current Task: Pre-Warning Alarm Flow And Full Webhook/MQTT Chain

**Goal:** Implement the requested camera-side timing flow, deploy it to `xiaozheng@10.8.0.23`, and verify the Webhook -> `video_recognition_local` -> MQTT -> `store_data_platform` chain.

**Design:** Keep all timing decisions inside `cold_display_guard.BatchEngine`. Add separate thresholds for pre-warning, alarm, and alarm-removal timeout; emit explicit lifecycle events so downstream services do not infer camera-side timers. Keep `video_recognition_local` as a transparent Webhook/MQTT bridge, and update `store_data_platform` only where event names map to notifications, case types, and CRM penalty submission.

- [x] Review task-relevant instructions, lessons, and dirty worktree.
- [x] Inspect the current cold-display engine, case store, webhook payload, and tests.
- [x] Inspect `video_recognition_local` cold-display Webhook receiver and MQTT publisher.
- [x] Inspect `store_data_platform` cold-display MQTT consumer, notification mapping, and CRM submission trigger.
- [x] Inspect `xiaozheng@10.8.0.23` active containers and deployment paths.
- [x] Add failing cold-display engine/case/config/webhook tests for `time_pre_warning`, `pre_warning_handled`, `time_alarm`, and `alarm_removal_timeout`.
- [x] Implement the camera-side state machine and config fields.
- [x] Add/adjust `video_recognition_local` passthrough tests for the new event names.
- [x] Add/adjust `store_data_platform` tests and mappings for new event semantics.
- [x] Run local targeted and full relevant verification.
- [x] Deploy changed services to `xiaozheng@10.8.0.23` without overwriting live RTSP/calibration secrets.
- [x] Update the remote timing config to `pre_warning_seconds=60`, `max_dwell_seconds=120`, `alarm_removal_seconds=30`, `trash_confirmation_seconds=30`.
- [x] Verify remote Webhook target reachability from the cold-display container to local `video-recognition`.
- [x] Observe cold-display, video-recognition, MQTT, and platform logs; record the result.

### Current Findings

- `cold_display_guard` currently has only `max_dwell_seconds` and `trash_confirmation_seconds`; it cannot independently represent 1-minute pre-warning, 2-minute alarm, and 30-second alarm-removal timeout.
- `video_recognition_local` receives `/api/webhook/cold-display-guard` payloads as generic JSON and forwards them to MQTT; new event names should remain transparent, but tests should lock this behavior.
- `store_data_platform` currently treats `time_alarm` and `batch_pending_disposal` as warning notifications, and only `warning_escalated` triggers CRM penalty submission. This must change so `time_pre_warning` is the warning, `time_alarm` is the alert reminder, and `alarm_removal_timeout` triggers CRM submission.
- On `10.8.0.23`, active containers include `cold-display-guard-*`, `video-recognition`, and `mosquitto`; `video-recognition` runs with host networking, while `cold-display-guard-api` runs on its Compose network.

### Local Verification

- Cold-display full Python suite passed: `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`98` tests).
- `video_recognition_local` cold-display focused tests passed: `go test ./internal/server ./internal/mqtt ./cmd -run 'TestColdDisplayGuard|Test.*ColdDisplayGuard' -count=1`.
- `store_data_platform` display-cabinet service focused tests passed: `go test ./store_data/service -run 'Test.*StoreDisplayCabinet|TestResolveStoreDisplayCabinet.*|TestShouldSubmitStoreDisplayCabinetPenalty|TestBuildStoreDisplayCabinet.*' -count=1`.

### Deployment Review

- Synced only these cold-display source files to `xiaozheng@10.8.0.23:/home/xiaozheng/cold_display_guard/src/cold_display_guard/`: `models.py`, `config.py`, `engine.py`, `cases.py`, `webhooks.py`.
- Backed up the remote source files and live `config/example.toml` before deployment.
- Updated the live remote thresholds to `pre_warning_seconds=60`, `max_dwell_seconds=120`, `alarm_removal_seconds=30`, and `trash_confirmation_seconds=30`.
- Updated the live remote Webhook target from the unreachable old host to `http://10.8.0.23:8080/api/webhook/cold-display-guard`.
- Rebuilt `cold-display-guard:dev` and restarted only `cold-display-guard-api` and `cold-display-guard-runtime`.
- Remote verification passed:
  - `GET /api/manage/health` returned `status=ok` and `runtime_status=running`.
  - `GET /api/manage/config` returned the four expected threshold values and the new Webhook target.
  - Container-side synthetic engine run emitted `batch_started`, `time_pre_warning`, `time_alarm`, `alarm_removal_timeout`, then `batch_pending_disposal` plus `batch_discarded`.
  - Natural runtime log emitted `alarm_removal_timeout` for `batch_000881` at `2026-06-15T11:52:20+08:00`.
  - Webhook delivery for that event returned HTTP `200` from `video-recognition`.
  - `video_recognition_local` result JSONL recorded both `alarm_removal_timeout` batch and case events.
  - MQTT probe confirmed `video-recognition` published to `video/cold-display-guard/result/cold-display-guard` with `device_identifier=cold-display-guard`.
- `store_data_platform` is not deployed on `10.8.0.23` under that repository name or as an identifiable container; platform handling changes were completed and verified in the local repository.
- The cold-display retry queue has no pending entries; old `192.168.5.103` failures are already dead-letter history.

## Current Task: Alarm Snapshot Labels And Zone Colors

**Goal:** Uploaded alarm screenshots should show each calibrated region name directly on the image, and different cold-display zones should use different overlay colors.

**Design:** Extend the existing standard-library overlay path. Keep drawing configured polygons before JPEG upload, but carry a display label for each region, choose a stable color from a fixed palette by zone order, and draw a small high-contrast text label inside the polygon. Keep trash ROI red and labeled separately.

- [x] Inspect the current calibration overlay helper and tests.
- [x] Add failing tests for per-zone colors and visible region labels.
- [x] Implement labels and stable zone color palette.
- [x] Run snapshot tests and full Python tests.
- [x] Deploy the overlay update to `xiaozheng@10.8.0.23`.
- [x] Verify remote API/runtime health and deployed overlay helper.

### Review

- `apply_calibration_overlay` now assigns each cold-display zone a stable color from a fixed palette and keeps the trash ROI red.
- Each overlay region now carries a label and draws a small high-contrast label box directly on the frame before JPEG encoding/upload.
- The built-in label renderer covers common现场 labels such as `区域 1` through digits and `垃圾区`, plus basic ASCII for custom numeric/English labels.
- Verification passed:
  - `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`
  - `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`99` tests)
- Deployed `src/cold_display_guard/alarm_snapshots.py` to `xiaozheng@10.8.0.23` after backing up the previous remote file.
- Rebuilt `cold-display-guard:dev` and restarted `cold-display-guard-api` plus `cold-display-guard-runtime`.
- Remote verification passed:
  - `GET /api/manage/health` returned `status=ok` and `runtime_status=running`.
  - Container-side overlay smoke test confirmed two zones render different RGB values and label text pixels are present.

## Current Task: Alarm Snapshot Chinese Label Rendering Fix

**Goal:** Fix unreadable/garbled Chinese region names on uploaded alarm screenshots while keeping per-zone colors and fallback labeling robust.

**Design:** Use a real CJK font renderer for Chinese labels in the alarm snapshot overlay path. Install Noto CJK fonts in the runtime image, render labels through ffmpeg `drawtext` when the font is available, and fall back to readable ASCII labels if the font renderer is unavailable.

- [x] Reproduce and identify the likely root cause: remote container only matched DejaVu for `zh-cn`, so Chinese labels had no real CJK font path.
- [x] Add regression tests for Docker CJK font installation and readable ASCII fallback labels.
- [x] Update `Dockerfile` to install `fonts-noto-cjk`.
- [x] Update `alarm_snapshots.py` to prefer CJK font rendering and use `R1`/`TRASH` fallback text when needed.
- [x] Run focused and full local Python verification.
- [x] Deploy `Dockerfile` and `alarm_snapshots.py` to `xiaozheng@10.8.0.23` without overwriting live config.
- [x] Rebuild/restart `cold-display-guard-api` and `cold-display-guard-runtime`.
- [x] Verify remote API/runtime health, CJK font availability, overlay smoke behavior, and runtime logs.

### Review

- Root cause was the screenshot overlay path not having a real Chinese font renderer in the deployed image; the container matched DejaVu before this fix.
- The rebuilt remote container now reports `NotoSansCJK-Regular.ttc: "Noto Sans CJK SC" "Regular"` for `fc-match :lang=zh-cn`.
- Remote overlay smoke test confirmed `find_cjk_font_file()` returns `/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc`, Chinese labels change the frame, bright label pixels are present, and different regions retain distinct colors.
- Local verification passed:
  - `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`
  - `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`101` tests)
- Remote verification passed:
  - `GET /api/manage/health` returned `status=ok`, `runtime_status=running`, and version `dev`.
  - `cold-display-guard-api` is healthy and `cold-display-guard-runtime` is running after restart.
  - Runtime logs show normal startup after the restart.

## Current Task: Investigate False Normal Consumption Events On 10.8.0.23

**Goal:** Determine why the live system records a normal consumption event about every two minutes with a dwell time near 13 seconds even when no one touched the cold display cabinet.

**Debug plan:** Inspect remote runtime/event/case/diagnostic logs first, correlate `batch_started` and `batch_consumed` pairs by zone and dwell time, then trace the vision metrics for those timestamps to identify whether the source is occupancy flicker, runtime restart state restoration, config thresholds, or downstream display interpretation.

- [ ] Inspect recent remote events and confirm the exact event names, zones, dwell seconds, and cadence.
- [ ] Inspect runtime diagnostics around those timestamps for occupancy and vision metric flicker.
- [ ] Inspect live config and runtime logs for sampling/stabilization settings and restarts.
- [x] Form and test a root-cause hypothesis before changing code or live thresholds.
- [x] Record findings, fix if needed, and verify with logs/tests.

### Findings And Fix

- The repeated records were real `batch_started` -> `batch_consumed` events from the camera-side engine, not a downstream display issue.
- Before the fix, recent events showed repeated zone 1 batches ending after 13-33 seconds, matching the two-frame confirmation cadence at the current sampling rate.
- Root cause had two parts:
  - Zone 1 was genuinely occupied, but its vision signal hovered around the old relative dark threshold, so short raw-occupancy dips were interpreted as item removal.
  - Zone 2 was occupied before or during baseline learning, so its relative difference from baseline stayed near zero and it was not detected as occupied.
- Added `occupancy_absolute_dark_fraction` in `src/cold_display_guard/vision.py`, defaulting to `0.0` so existing configs are unchanged unless they opt in.
- Updated the live config on `xiaozheng@10.8.0.23`:
  - `occupancy_dark_fraction = 0.12`
  - `occupancy_absolute_dark_fraction = 0.085`
  - `empty_confirm_frames = 6`
- Rebuilt and restarted `cold-display-guard-api` and `cold-display-guard-runtime`.
- Verification:
  - Local full Python suite passed: `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`102` tests).
  - Remote health returned `status=ok` and `runtime_status=running`.
  - Remote container config shows the new thresholds.
  - After deployment, latest diagnostics stabilized at `zone_counts = {"1": 1, "2": 1, "6": 1}`.
- During a two-minute observation window after `13:25`, no new `batch_consumed` events were emitted; only expected pre-warning/alarm lifecycle events appeared for the occupied zones.

## Current Task: Reduce Alarm Snapshot Label Visual Obstruction

**Goal:** Region labels on uploaded alarm screenshots should be smaller and more transparent so operators can inspect the food/display image underneath.

**Design:** Keep the existing label content, placement, CJK font rendering, and per-zone colors. Only reduce the visual weight of the label layer by lowering font size, black label-box opacity, border width, and fallback label-box opacity.

- [x] Inspect current alarm snapshot label rendering style.
- [x] Add a regression test for smaller ffmpeg drawtext label style.
- [x] Reduce drawtext font size and label-box opacity.
- [x] Keep fallback label renderer visually consistent with the ffmpeg path.
- [x] Run full local verification.
- [x] Deploy the updated snapshot overlay style to `xiaozheng@10.8.0.23`.
- [x] Verify remote runtime health and deployed label style.

### Notes

- Targeted snapshot test passed: `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`.
- Full local verification passed: `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`103` tests).
- Remote verification passed:
  - `GET /api/manage/health` returned `status=ok` and `runtime_status=running`.
  - Running container uses `fontsize=13`, `boxcolor=black@0.34`, and `boxborderw=2` for region labels.
  - `cold-display-guard-runtime` logs show normal startup after restart.

## Current Task: Limit Alert Snapshot Overlay To Event Zones

**Goal:** Uploaded warning/alarm screenshots should only draw the cold-display region polygons and names for the zones that actually triggered the warning/alarm event. Other configured zones and the trash ROI should not be drawn on those uploaded screenshots.

**Plan:** Keep the full calibration overlay helper available for tests and general use, but pass alert event zone IDs from `capture_alert_snapshot` into the overlay loader and disable trash ROI drawing for alert uploads.

- [x] Add a regression test proving alert snapshot upload only annotates the triggering event zone.
- [x] Filter snapshot overlay regions by event `zone_id` during alert upload.
- [x] Preserve full overlay behavior when `apply_calibration_overlay` is called without filters.
- [x] Run full local Python verification.
- [x] Deploy `alarm_snapshots.py` to `xiaozheng@10.8.0.23`.
- [x] Verify remote API/runtime health and deployed filtered-overlay behavior.

### Review

- Local verification passed:
  - `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`
  - `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`104` tests)
- Deployed only `src/cold_display_guard/alarm_snapshots.py` to `xiaozheng@10.8.0.23` after backing up the previous remote file; live config was not overwritten.
- Rebuilt `cold-display-guard:dev` and restarted `cold-display-guard-api` plus `cold-display-guard-runtime`.
- Remote verification passed:
  - `GET /api/manage/health` returned `status=ok` and `runtime_status=running`.
  - Container-side smoke test for a zone-1 alert returned `zone1_changed=True`, `zone2_unchanged=True`, and `trash_unchanged=True`.
  - API/runtime logs show normal startup after restart.

## Current Task: Check Webhook Duplicate Delivery

**Goal:** Verify whether `cold_display_guard` is sending duplicate Webhook requests to `video-recognition` on `xiaozheng@10.8.0.23`.

**Investigation:** Compare the sending code path, remote webhook delivery audit, retry queue state, cold-display event/case logs, `video-recognition` HTTP logs, and the receiver-side JSONL payloads.

- [x] Inspect sender code path for direct event/case delivery and retry drain behavior.
- [x] Confirm remote Webhook config uses the same URL for `event_url` and `case_url`.
- [x] Check sender delivery audit for duplicate receiver `task_id` values.
- [x] Check retry queue for pending successful redelivery risk.
- [x] Check receiver-side cold-display JSONL for duplicate payloads and duplicate business keys.
- [x] Trace the only coarse duplicate-looking case around `batch_000898`.

### Review

- Current remote config sends both `batch_event` and `case_event` to `http://10.8.0.23:8080/api/webhook/cold-display-guard`, so one business transition can produce two HTTP POSTs to the same endpoint with different `kind` values.
- Sender audit `logs/webhook_delivery.jsonl` contains `3056` records total; recent valid delivery has `321` direct `ok` records and `0` retry `ok` records.
- Receiver-returned `task_id` values are unique: `321` unique task IDs and `0` duplicate task IDs.
- Retry queue has `547` latest retry items, all `dead_letter`; there are no pending retries.
- Receiver-side `video-recognition` cold-display files for `2026-06-15` contain `181` business payloads; exact payload duplicates are `0`, and fine-grained business key duplicates are `0`.
- Sender `events.jsonl` contains `3325` events; duplicate `(batch_id, event, ts, zone_id)` keys are `0`.
- The only coarse duplicate-looking receiver entry was `batch_000898` at `13:20:26`: the same frame emitted `time_pre_warning` and `pre_warning_handled`, which produced separate `case_event` actions `created` and `handled`. This is not the same Webhook request repeated.