287 lines
22 KiB
Markdown
287 lines
22 KiB
Markdown
# Task Todo
|
|
|
|
- [x] Review the current project instructions and check for task-relevant lessons.
|
|
- [x] Inspect the OTA upload API document and current runtime/webhook capture path.
|
|
- [x] Create an isolated worktree for alarm snapshot upload implementation.
|
|
- [x] Write the detailed implementation plan to `docs/superpowers/plans/2026-06-09-alarm-snapshot-upload.md`.
|
|
- [x] Execute alarm snapshot upload client TDD cycle.
|
|
- [x] Execute runtime and webhook payload integration TDD cycle.
|
|
- [x] Update config surface, docs, and verification notes.
|
|
- [x] Run targeted verification and final full verification.
|
|
|
|
## Notes
|
|
|
|
- `tasks/lessons.md` is absent in this repository/worktree, so there were no prior session lessons to review.
|
|
- Upload API reference: `/Users/glo/code/go/wenma/ai_manager/zd-ai-manager/chunk-upload-oss-service/UPLOAD_API.md`
|
|
- User-provided upload target: `https://ota.zhengxinshipin.com`
|
|
- User-provided token secret: `change-me-in-production`
|
|
|
|
## Review
|
|
|
|
- Plan saved to `docs/superpowers/plans/2026-06-09-alarm-snapshot-upload.md`.
|
|
- Chosen implementation keeps snapshot upload entirely outside `BatchEngine` and enriches webhook payloads from the runtime side using the already captured frame.
|
|
- Implemented `src/cold_display_guard/alarm_snapshots.py` for JPEG encoding plus OTA chunk-upload orchestration, runtime integration in `src/cold_display_guard/main.py`, webhook payload enrichment in `src/cold_display_guard/webhooks.py`, config exposure/secret stripping in `src/cold_display_guard/config.py` and `src/cold_display_guard/manage_api.py`, and config/doc updates in `config/example.toml` and `README_zh.md`.
|
|
- Targeted verification passed:
|
|
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_alarm_snapshots.py -v`
|
|
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
|
|
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py tests/test_config.py tests/test_manage_api.py -v`
|
|
- Final verification passed:
|
|
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest discover -s tests -v`
|
|
- `cd web && pnpm install --frozen-lockfile && pnpm build`
|
|
|
|
## Current Task: Webhook Payload Field Gap Check
|
|
|
|
- [x] Pull the actual payload currently received by `video-recognition` and compare it against the required event list fields.
|
|
- [x] Patch webhook payload builders to include the missing non-store fields required by the downstream table.
|
|
- [x] Add or update focused webhook tests for the enriched payload shape.
|
|
- [x] Run targeted verification and record the result here.
|
|
|
|
### Current Findings
|
|
|
|
- Current received payload only includes `batch_id`, `camera_id`, `event`, `kind`, `severity`, `source_id`, `state`, `ts`, `zone_id`, and `zone_label`.
|
|
- Missing or not explicitly populated for the downstream event table: event code, camera IP, batch start time, removal time, dwell duration, discard flag, discard time, create time, alarm time, and update time.
|
|
|
|
### Field Gap Verification
|
|
|
|
- Actual receiver payload before the fix, from `video-recognition` result JSONL on `10.8.0.11`, confirmed only the base fields above and did not include the downstream table time/discard/IP fields.
|
|
- Updated `src/cold_display_guard/webhooks.py` so both `batch_event` and `case_event` now include:
|
|
- `event_code`
|
|
- `camera_ip`
|
|
- `started_at`
|
|
- `ended_at`
|
|
- `removed_at`
|
|
- `dwell_seconds`
|
|
- `is_discarded`
|
|
- `discarded_at`
|
|
- `created_at`
|
|
- `alerted_at`
|
|
- `alarm_at`
|
|
- `updated_at`
|
|
- `case_event` also now carries the missing contextual fields `camera_id`, `zone_id`, and `zone_label`.
|
|
- Verification passed:
|
|
- `PYTHONPATH=src python3 -m unittest tests/test_webhooks.py -v`
|
|
- `PYTHONPATH=src python3 -m unittest tests/test_main.py -v`
|
|
- `PYTHONPATH=src python3 -m unittest discover -s tests -v`
|
|
- Deployed updated code to `xiaozheng@10.8.0.11` without overwriting the remote `config/example.toml`, rebuilt `cold-display-guard:dev`, and restarted only `cold-display-guard-api` plus `cold-display-guard-runtime`.
|
|
- Natural post-deploy traffic did not arrive during the 2-minute observation window, so final runtime verification used the deployed container to build representative batch/case webhook payloads with the live remote config and confirmed `camera_ip = 192.168.3.4` plus all new downstream fields were present.
|
|
|
|
## Current Task: Deploy To 192.168.5.103
|
|
|
|
- [x] Inspect the existing deployment layout and active containers on `xiaozheng@192.168.5.103`.
|
|
- [x] Verify the exact webhook route on that host before writing config.
|
|
- [x] Sync the current project code to the remote deployment directory without overwriting the live RTSP and calibration config.
|
|
- [x] Configure the remote webhook settings for the local `video-recognition` receiver.
|
|
- [x] Rebuild and restart the remote API/runtime containers, then verify health and outbound webhook configuration.
|
|
|
|
### Deployment Findings
|
|
|
|
- Existing deployment path on `192.168.5.103` is `/home/xiaozheng/cold_display_guard`, not `~/apps/cold-display-guard/app`.
|
|
- The host already runs `cold-display-guard-api`, `cold-display-guard-runtime`, and `cold-display-guard-web` on ports `19080` and `23000`.
|
|
- The same host also runs `video-recognition`, and a direct probe to `http://127.0.0.1:8080/api/webhook/cold-display-guard` returned `200 OK`, so this is the verified webhook target for this environment.
|
|
|
|
### Deployment Verification
|
|
|
|
- From inside the running `cold-display-guard-api` container on `192.168.5.103`:
|
|
- `http://host.docker.internal:8080/api/webhook/cold-display-guard` failed DNS resolution.
|
|
- `http://172.17.0.1:8080/api/webhook/cold-display-guard` returned `200 OK`.
|
|
- `http://192.168.5.103:8080/api/webhook/cold-display-guard` returned `200 OK`.
|
|
- The configured webhook target was set to `http://192.168.5.103:8080/api/webhook/cold-display-guard` for both `event_url` and `case_url`.
|
|
- Remote config was enriched to include:
|
|
- `case_sink`
|
|
- `alarm_snapshot_upload`
|
|
- `webhook_retry_sink`
|
|
- `webhook_delivery_sink`
|
|
- `webhooks`
|
|
- Code sync used `rsync` with `config/example.toml` excluded so the live RTSP URL and calibration polygons were preserved.
|
|
- Remote rebuild/restart completed for `cold-display-guard-api` and `cold-display-guard-runtime`.
|
|
- Verified after restart:
|
|
- `GET http://127.0.0.1:19080/api/manage/health` returned `status=ok`
|
|
- `GET http://127.0.0.1:19080/api/manage/config` showed `webhooks.enabled=true`
|
|
- `event_url` and `case_url` both active on `http://192.168.5.103:8080/api/webhook/cold-display-guard`
|
|
- `alarm_snapshot_upload.enabled=true`
|
|
|
|
## Current Task: Alarm Snapshot Calibration Overlay
|
|
|
|
**Goal:** Webhook-linked uploaded alarm snapshots should visually include the calibrated cold display zones and trash confirmation ROI from the current config.
|
|
|
|
**Design:** Keep the existing runtime flow intact: capture current RTSP frame, process events, then upload an alarm snapshot only for warning/alarm events. Before JPEG encoding, build overlay regions from `[[zones]]` plus `[trash].roi`, clamp normalized polygon coordinates to the image bounds, draw a semi-transparent fill and visible outline directly onto a copied `Frame.rgb`, and pass that annotated frame to the existing encoder/uploader. Do not change `BatchEngine`, Webhook payload shape, OTA upload protocol, or management snapshot capture.
|
|
|
|
- [x] Review task-relevant lessons and current dirty worktree.
|
|
- [x] Inspect `alarm_snapshots.py`, `main.py`, config polygon shape, and existing tests.
|
|
- [x] Write a failing unit test proving alert snapshot upload encodes an annotated frame when zones/trash ROI are configured.
|
|
- [x] Write focused unit tests for polygon overlay behavior using a tiny RGB frame.
|
|
- [x] Run targeted tests and confirm the new tests fail for the expected missing overlay behavior.
|
|
- [x] Implement the smallest standard-library overlay helper in `src/cold_display_guard/alarm_snapshots.py`.
|
|
- [x] Wire `capture_alert_snapshot` to apply configured overlays before JPEG encoding.
|
|
- [x] Run targeted snapshot/runtime tests.
|
|
- [x] Run the full Python test suite.
|
|
|
|
### Review
|
|
|
|
- Added `apply_calibration_overlay` in `src/cold_display_guard/alarm_snapshots.py` to draw configured food-zone polygons in yellow and the trash ROI in red onto a copied frame before JPEG encoding and OTA upload.
|
|
- The overlay clamps normalized coordinates to image bounds, draws semi-transparent fills plus outlines, and leaves the original `Frame.rgb` unchanged for downstream runtime processing.
|
|
- `capture_alert_snapshot` now encodes the annotated frame when warning/alarm events trigger snapshot upload; non-alert events and disabled upload behavior are unchanged.
|
|
- Targeted verification passed:
|
|
- `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`
|
|
- `PYTHONPATH=src python3 -m unittest tests/test_main.py -v`
|
|
- Full verification passed:
|
|
- `PYTHONPATH=src python3 -m unittest discover -s tests -v`
|
|
|
|
## Current Task: Deploy Overlay Update To 10.8.0.23
|
|
|
|
**Goal:** Deploy the alarm snapshot calibration overlay change to `xiaozheng@10.8.0.23` without overwriting live RTSP/calibration config or unrelated local changes.
|
|
|
|
**Plan:** Inspect the remote deployment layout first, confirm which containers are active, sync only the runtime source file required for the overlay change, rebuild/restart the API/runtime services that use the Python image, and verify both service health and the deployed source code.
|
|
|
|
- [x] Inspect remote deployment directory, Docker/Compose files, and active containers on `xiaozheng@10.8.0.23`.
|
|
- [x] Confirm the remote config file remains present and is not overwritten.
|
|
- [x] Sync `src/cold_display_guard/alarm_snapshots.py` to the remote deployment path.
|
|
- [x] Rebuild and restart only the affected `cold-display-guard-api` and `cold-display-guard-runtime` services when Compose is available.
|
|
- [x] Verify management API health after restart.
|
|
- [x] Verify the deployed remote source contains `apply_calibration_overlay`.
|
|
|
|
### Deployment Review
|
|
|
|
- Remote deployment path confirmed as `/home/xiaozheng/cold_display_guard`.
|
|
- Active services before deployment: `cold-display-guard-api`, `cold-display-guard-runtime`, and `cold-display-guard-web`.
|
|
- Remote live `config/example.toml` was checked before and after deployment and was not overwritten.
|
|
- Synced only `src/cold_display_guard/alarm_snapshots.py` to avoid deploying unrelated local `web/nginx.conf` changes.
|
|
- Created a timestamped backup of the previous remote `alarm_snapshots.py` beside the source file before syncing.
|
|
- Rebuilt `cold-display-guard:dev` with `docker compose --env-file deploy/cold-display-guard.env -f deploy/docker-compose.yml build cold-display-guard-api`.
|
|
- Restarted only `cold-display-guard-api` and `cold-display-guard-runtime` with Compose; `cold-display-guard-web` remained untouched.
|
|
- Verification passed:
|
|
- `curl http://127.0.0.1:19080/api/manage/health` returned `status=ok` and `runtime_status=running`.
|
|
- `docker exec cold-display-guard-api python3 -c ...` confirmed `apply_calibration_overlay` exists in the running image with signature `(frame, config) -> Frame`.
|
|
- API and runtime logs show normal startup after restart.
|
|
|
|
## Current Task: Update Timing Parameters On 10.8.0.23
|
|
|
|
**Goal:** Adjust the live timing settings on `xiaozheng@10.8.0.23` per operator request.
|
|
|
|
**Applied mapping:** The current application has no separate pre-warning threshold. It supports `max_dwell_seconds` for the time alarm/overdue threshold and `trash_confirmation_seconds` for the disposal confirmation window before warning escalation. Applied `max_dwell_seconds = 120` and `trash_confirmation_seconds = 30`.
|
|
|
|
- [x] Back up `/home/xiaozheng/cold_display_guard/config/example.toml`.
|
|
- [x] Update `[thresholds].max_dwell_seconds` from `300` to `120`.
|
|
- [x] Update `[thresholds].trash_confirmation_seconds` from `120` to `30`.
|
|
- [x] Restart `cold-display-guard-api` and `cold-display-guard-runtime`.
|
|
- [x] Verify `/api/manage/health`.
|
|
- [x] Verify `/api/manage/config` returns `{"max_dwell_seconds": 120, "trash_confirmation_seconds": 30}`.
|
|
|
|
### Timing Update Review
|
|
|
|
- Remote config was edited in place after creating a timestamped backup.
|
|
- `cold-display-guard-api` and `cold-display-guard-runtime` were explicitly restarted with Docker Compose.
|
|
- `cold-display-guard-web` was not restarted.
|
|
- Verification passed:
|
|
- `GET http://127.0.0.1:19080/api/manage/health` returned `status=ok` and `runtime_status=running`.
|
|
- `GET http://127.0.0.1:19080/api/manage/config` returned `max_dwell_seconds = 120` and `trash_confirmation_seconds = 30`.
|
|
- Container status showed `cold-display-guard-api` healthy and `cold-display-guard-runtime` running after restart.
|
|
- Note: requested `预警时长 = 1min` is not independently configurable in the current codebase; supporting distinct pre-warning at 60 seconds and overdue at 120 seconds would require a code change.
|
|
|
|
## Current Task: Pre-Warning Alarm Flow And Full Webhook/MQTT Chain
|
|
|
|
**Goal:** Implement the requested camera-side timing flow, deploy it to `xiaozheng@10.8.0.23`, and verify the Webhook -> `video_recognition_local` -> MQTT -> `store_data_platform` chain.
|
|
|
|
**Design:** Keep all timing decisions inside `cold_display_guard.BatchEngine`. Add separate thresholds for pre-warning, alarm, and alarm-removal timeout; emit explicit lifecycle events so downstream services do not infer camera-side timers. Keep `video_recognition_local` as a transparent Webhook/MQTT bridge, and update `store_data_platform` only where event names map to notifications, case types, and CRM penalty submission.
|
|
|
|
- [x] Review task-relevant instructions, lessons, and dirty worktree.
|
|
- [x] Inspect the current cold-display engine, case store, webhook payload, and tests.
|
|
- [x] Inspect `video_recognition_local` cold-display Webhook receiver and MQTT publisher.
|
|
- [x] Inspect `store_data_platform` cold-display MQTT consumer, notification mapping, and CRM submission trigger.
|
|
- [x] Inspect `xiaozheng@10.8.0.23` active containers and deployment paths.
|
|
- [x] Add failing cold-display engine/case/config/webhook tests for `time_pre_warning`, `pre_warning_handled`, `time_alarm`, and `alarm_removal_timeout`.
|
|
- [x] Implement the camera-side state machine and config fields.
|
|
- [x] Add/adjust `video_recognition_local` passthrough tests for the new event names.
|
|
- [x] Add/adjust `store_data_platform` tests and mappings for new event semantics.
|
|
- [x] Run local targeted and full relevant verification.
|
|
- [x] Deploy changed services to `xiaozheng@10.8.0.23` without overwriting live RTSP/calibration secrets.
|
|
- [x] Update the remote timing config to `pre_warning_seconds=60`, `max_dwell_seconds=120`, `alarm_removal_seconds=30`, `trash_confirmation_seconds=30`.
|
|
- [x] Verify remote Webhook target reachability from the cold-display container to local `video-recognition`.
|
|
- [x] Observe cold-display, video-recognition, MQTT, and platform logs; record the result.
|
|
|
|
### Current Findings
|
|
|
|
- `cold_display_guard` currently has only `max_dwell_seconds` and `trash_confirmation_seconds`; it cannot independently represent 1-minute pre-warning, 2-minute alarm, and 30-second alarm-removal timeout.
|
|
- `video_recognition_local` receives `/api/webhook/cold-display-guard` payloads as generic JSON and forwards them to MQTT; new event names should remain transparent, but tests should lock this behavior.
|
|
- `store_data_platform` currently treats `time_alarm` and `batch_pending_disposal` as warning notifications, and only `warning_escalated` triggers CRM penalty submission. This must change so `time_pre_warning` is the warning, `time_alarm` is the alert reminder, and `alarm_removal_timeout` triggers CRM submission.
|
|
- On `10.8.0.23`, active containers include `cold-display-guard-*`, `video-recognition`, and `mosquitto`; `video-recognition` runs with host networking, while `cold-display-guard-api` runs on its Compose network.
|
|
|
|
### Local Verification
|
|
|
|
- Cold-display full Python suite passed: `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`98` tests).
|
|
- `video_recognition_local` cold-display focused tests passed: `go test ./internal/server ./internal/mqtt ./cmd -run 'TestColdDisplayGuard|Test.*ColdDisplayGuard' -count=1`.
|
|
- `store_data_platform` display-cabinet service focused tests passed: `go test ./store_data/service -run 'Test.*StoreDisplayCabinet|TestResolveStoreDisplayCabinet.*|TestShouldSubmitStoreDisplayCabinetPenalty|TestBuildStoreDisplayCabinet.*' -count=1`.
|
|
|
|
### Deployment Review
|
|
|
|
- Synced only these cold-display source files to `xiaozheng@10.8.0.23:/home/xiaozheng/cold_display_guard/src/cold_display_guard/`: `models.py`, `config.py`, `engine.py`, `cases.py`, `webhooks.py`.
|
|
- Backed up the remote source files and live `config/example.toml` before deployment.
|
|
- Updated the live remote thresholds to `pre_warning_seconds=60`, `max_dwell_seconds=120`, `alarm_removal_seconds=30`, and `trash_confirmation_seconds=30`.
|
|
- Updated the live remote Webhook target from the unreachable old host to `http://10.8.0.23:8080/api/webhook/cold-display-guard`.
|
|
- Rebuilt `cold-display-guard:dev` and restarted only `cold-display-guard-api` and `cold-display-guard-runtime`.
|
|
- Remote verification passed:
|
|
- `GET /api/manage/health` returned `status=ok` and `runtime_status=running`.
|
|
- `GET /api/manage/config` returned the four expected threshold values and the new Webhook target.
|
|
- Container-side synthetic engine run emitted `batch_started`, `time_pre_warning`, `time_alarm`, `alarm_removal_timeout`, then `batch_pending_disposal` plus `batch_discarded`.
|
|
- Natural runtime log emitted `alarm_removal_timeout` for `batch_000881` at `2026-06-15T11:52:20+08:00`.
|
|
- Webhook delivery for that event returned HTTP `200` from `video-recognition`.
|
|
- `video_recognition_local` result JSONL recorded both `alarm_removal_timeout` batch and case events.
|
|
- MQTT probe confirmed `video-recognition` published to `video/cold-display-guard/result/cold-display-guard` with `device_identifier=cold-display-guard`.
|
|
- `store_data_platform` is not deployed on `10.8.0.23` under that repository name or as an identifiable container; platform handling changes were completed and verified in the local repository.
|
|
- The cold-display retry queue has no pending entries; old `192.168.5.103` failures are already dead-letter history.
|
|
|
|
## Current Task: Alarm Snapshot Labels And Zone Colors
|
|
|
|
**Goal:** Uploaded alarm screenshots should show each calibrated region name directly on the image, and different cold-display zones should use different overlay colors.
|
|
|
|
**Design:** Extend the existing standard-library overlay path. Keep drawing configured polygons before JPEG upload, but carry a display label for each region, choose a stable color from a fixed palette by zone order, and draw a small high-contrast text label inside the polygon. Keep trash ROI red and labeled separately.
|
|
|
|
- [x] Inspect the current calibration overlay helper and tests.
|
|
- [x] Add failing tests for per-zone colors and visible region labels.
|
|
- [x] Implement labels and stable zone color palette.
|
|
- [x] Run snapshot tests and full Python tests.
|
|
- [x] Deploy the overlay update to `xiaozheng@10.8.0.23`.
|
|
- [x] Verify remote API/runtime health and deployed overlay helper.
|
|
|
|
### Review
|
|
|
|
- `apply_calibration_overlay` now assigns each cold-display zone a stable color from a fixed palette and keeps the trash ROI red.
|
|
- Each overlay region now carries a label and draws a small high-contrast label box directly on the frame before JPEG encoding/upload.
|
|
- The built-in label renderer covers common现场 labels such as `区域 1` through digits and `垃圾区`, plus basic ASCII for custom numeric/English labels.
|
|
- Verification passed:
|
|
- `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`
|
|
- `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`99` tests)
|
|
- Deployed `src/cold_display_guard/alarm_snapshots.py` to `xiaozheng@10.8.0.23` after backing up the previous remote file.
|
|
- Rebuilt `cold-display-guard:dev` and restarted `cold-display-guard-api` plus `cold-display-guard-runtime`.
|
|
- Remote verification passed:
|
|
- `GET /api/manage/health` returned `status=ok` and `runtime_status=running`.
|
|
- Container-side overlay smoke test confirmed two zones render different RGB values and label text pixels are present.
|
|
|
|
## Current Task: Alarm Snapshot Chinese Label Rendering Fix
|
|
|
|
**Goal:** Fix unreadable/garbled Chinese region names on uploaded alarm screenshots while keeping per-zone colors and fallback labeling robust.
|
|
|
|
**Design:** Use a real CJK font renderer for Chinese labels in the alarm snapshot overlay path. Install Noto CJK fonts in the runtime image, render labels through ffmpeg `drawtext` when the font is available, and fall back to readable ASCII labels if the font renderer is unavailable.
|
|
|
|
- [x] Reproduce and identify the likely root cause: remote container only matched DejaVu for `zh-cn`, so Chinese labels had no real CJK font path.
|
|
- [x] Add regression tests for Docker CJK font installation and readable ASCII fallback labels.
|
|
- [x] Update `Dockerfile` to install `fonts-noto-cjk`.
|
|
- [x] Update `alarm_snapshots.py` to prefer CJK font rendering and use `R1`/`TRASH` fallback text when needed.
|
|
- [x] Run focused and full local Python verification.
|
|
- [x] Deploy `Dockerfile` and `alarm_snapshots.py` to `xiaozheng@10.8.0.23` without overwriting live config.
|
|
- [x] Rebuild/restart `cold-display-guard-api` and `cold-display-guard-runtime`.
|
|
- [x] Verify remote API/runtime health, CJK font availability, overlay smoke behavior, and runtime logs.
|
|
|
|
### Review
|
|
|
|
- Root cause was the screenshot overlay path not having a real Chinese font renderer in the deployed image; the container matched DejaVu before this fix.
|
|
- The rebuilt remote container now reports `NotoSansCJK-Regular.ttc: "Noto Sans CJK SC" "Regular"` for `fc-match :lang=zh-cn`.
|
|
- Remote overlay smoke test confirmed `find_cjk_font_file()` returns `/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc`, Chinese labels change the frame, bright label pixels are present, and different regions retain distinct colors.
|
|
- Local verification passed:
|
|
- `PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v`
|
|
- `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`101` tests)
|
|
- Remote verification passed:
|
|
- `GET /api/manage/health` returned `status=ok`, `runtime_status=running`, and version `dev`.
|
|
- `cold-display-guard-api` is healthy and `cold-display-guard-runtime` is running after restart.
|
|
- Runtime logs show normal startup after the restart.
|