32 KiB
Task Todo
Current Task: Runtime/API Case State Reopen Fix
Goal: When the management API marks a display-cabinet case as handled, the runtime process must not later append a newer open snapshot for the same case from stale in-memory state.
- Add a failing regression test for API-written
handledstate being preserved when runtime persists later events. - Fix runtime case persistence to reconcile with the latest JSONL snapshots before applying new events.
- Run targeted case/runtime tests.
- Record remote chain verification and deployment status.
Findings
- On
xiaozheng@10.8.0.23,case_batch_000911was markedhandledat2026-06-15T07:27:12Z, then runtime appended a neweropensnapshot for the same case at2026-06-15T15:38:03+08:00. - The API and runtime are separate processes sharing
logs/cases.jsonl; runtime keeps a long-livedCaseStoreloaded at startup and did not see the API-written handled snapshot.
Verification
-
RED:
eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests.test_main.RuntimeRestoreTests.test_persist_case_updates_preserves_api_handled_snapshot -v- Result before fix: failed because runtime appended a later
opensnapshot.
-
Local targeted verification:
eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests.test_main.RuntimeRestoreTests.test_persist_case_updates_preserves_api_handled_snapshot -veval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_cases.py -veval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v- Result: all passed.
-
Remote deployment:
- Synced only
src/cold_display_guard/main.pytoxiaozheng@10.8.0.23:/home/xiaozheng/cold_display_guard/src/cold_display_guard/main.py. - Ran
docker compose --env-file deploy/cold-display-guard.env -f deploy/docker-compose.yml up -d --build cold-display-guard-runtime. - Compose recreated
cold-display-guard-apiandcold-display-guard-runtime; health check returnedstatus=ok.
- Synced only
-
Remote behavior check:
- Ran the same API-handled/runtime-later-event scenario inside
cold-display-guard-runtimeusing a temp JSONL file. - Result:
{"handled_source": "manual", "latest_status": "handled", "new_snapshots": 0}.
- Ran the same API-handled/runtime-later-event scenario inside
-
Review the current project instructions and check for task-relevant lessons.
-
Inspect the OTA upload API document and current runtime/webhook capture path.
-
Create an isolated worktree for alarm snapshot upload implementation.
-
Write the detailed implementation plan to
docs/superpowers/plans/2026-06-09-alarm-snapshot-upload.md. -
Execute alarm snapshot upload client TDD cycle.
-
Execute runtime and webhook payload integration TDD cycle.
-
Update config surface, docs, and verification notes.
-
Run targeted verification and final full verification.
Notes
tasks/lessons.mdis absent in this repository/worktree, so there were no prior session lessons to review.- Upload API reference:
/Users/glo/code/go/wenma/ai_manager/zd-ai-manager/chunk-upload-oss-service/UPLOAD_API.md - User-provided upload target:
https://ota.zhengxinshipin.com - User-provided token secret:
change-me-in-production
Review
- Plan saved to
docs/superpowers/plans/2026-06-09-alarm-snapshot-upload.md. - Chosen implementation keeps snapshot upload entirely outside
BatchEngineand enriches webhook payloads from the runtime side using the already captured frame. - Implemented
src/cold_display_guard/alarm_snapshots.pyfor JPEG encoding plus OTA chunk-upload orchestration, runtime integration insrc/cold_display_guard/main.py, webhook payload enrichment insrc/cold_display_guard/webhooks.py, config exposure/secret stripping insrc/cold_display_guard/config.pyandsrc/cold_display_guard/manage_api.py, and config/doc updates inconfig/example.tomlandREADME_zh.md. - Targeted verification passed:
eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_alarm_snapshots.py -veval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -veval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py tests/test_config.py tests/test_manage_api.py -v
- Final verification passed:
eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest discover -s tests -vcd web && pnpm install --frozen-lockfile && pnpm build
Current Task: Webhook Payload Field Gap Check
- Pull the actual payload currently received by
video-recognitionand compare it against the required event list fields. - Patch webhook payload builders to include the missing non-store fields required by the downstream table.
- Add or update focused webhook tests for the enriched payload shape.
- Run targeted verification and record the result here.
Current Findings
- Current received payload only includes
batch_id,camera_id,event,kind,severity,source_id,state,ts,zone_id, andzone_label. - Missing or not explicitly populated for the downstream event table: event code, camera IP, batch start time, removal time, dwell duration, discard flag, discard time, create time, alarm time, and update time.
Field Gap Verification
- Actual receiver payload before the fix, from
video-recognitionresult JSONL on10.8.0.11, confirmed only the base fields above and did not include the downstream table time/discard/IP fields. - Updated
src/cold_display_guard/webhooks.pyso bothbatch_eventandcase_eventnow include:event_codecamera_ipstarted_atended_atremoved_atdwell_secondsis_discardeddiscarded_atcreated_atalerted_atalarm_atupdated_at
case_eventalso now carries the missing contextual fieldscamera_id,zone_id, andzone_label.- Verification passed:
PYTHONPATH=src python3 -m unittest tests/test_webhooks.py -vPYTHONPATH=src python3 -m unittest tests/test_main.py -vPYTHONPATH=src python3 -m unittest discover -s tests -v
- Deployed updated code to
xiaozheng@10.8.0.11without overwriting the remoteconfig/example.toml, rebuiltcold-display-guard:dev, and restarted onlycold-display-guard-apipluscold-display-guard-runtime. - Natural post-deploy traffic did not arrive during the 2-minute observation window, so final runtime verification used the deployed container to build representative batch/case webhook payloads with the live remote config and confirmed
camera_ip = 192.168.3.4plus all new downstream fields were present.
Current Task: Deploy To 192.168.5.103
- Inspect the existing deployment layout and active containers on
xiaozheng@192.168.5.103. - Verify the exact webhook route on that host before writing config.
- Sync the current project code to the remote deployment directory without overwriting the live RTSP and calibration config.
- Configure the remote webhook settings for the local
video-recognitionreceiver. - Rebuild and restart the remote API/runtime containers, then verify health and outbound webhook configuration.
Deployment Findings
- Existing deployment path on
192.168.5.103is/home/xiaozheng/cold_display_guard, not~/apps/cold-display-guard/app. - The host already runs
cold-display-guard-api,cold-display-guard-runtime, andcold-display-guard-webon ports19080and23000. - The same host also runs
video-recognition, and a direct probe tohttp://127.0.0.1:8080/api/webhook/cold-display-guardreturned200 OK, so this is the verified webhook target for this environment.
Deployment Verification
- From inside the running
cold-display-guard-apicontainer on192.168.5.103:http://host.docker.internal:8080/api/webhook/cold-display-guardfailed DNS resolution.http://172.17.0.1:8080/api/webhook/cold-display-guardreturned200 OK.http://192.168.5.103:8080/api/webhook/cold-display-guardreturned200 OK.
- The configured webhook target was set to
http://192.168.5.103:8080/api/webhook/cold-display-guardfor bothevent_urlandcase_url. - Remote config was enriched to include:
case_sinkalarm_snapshot_uploadwebhook_retry_sinkwebhook_delivery_sinkwebhooks
- Code sync used
rsyncwithconfig/example.tomlexcluded so the live RTSP URL and calibration polygons were preserved. - Remote rebuild/restart completed for
cold-display-guard-apiandcold-display-guard-runtime. - Verified after restart:
GET http://127.0.0.1:19080/api/manage/healthreturnedstatus=okGET http://127.0.0.1:19080/api/manage/configshowedwebhooks.enabled=trueevent_urlandcase_urlboth active onhttp://192.168.5.103:8080/api/webhook/cold-display-guardalarm_snapshot_upload.enabled=true
Current Task: Alarm Snapshot Calibration Overlay
Goal: Webhook-linked uploaded alarm snapshots should visually include the calibrated cold display zones and trash confirmation ROI from the current config.
Design: Keep the existing runtime flow intact: capture current RTSP frame, process events, then upload an alarm snapshot only for warning/alarm events. Before JPEG encoding, build overlay regions from [[zones]] plus [trash].roi, clamp normalized polygon coordinates to the image bounds, draw a semi-transparent fill and visible outline directly onto a copied Frame.rgb, and pass that annotated frame to the existing encoder/uploader. Do not change BatchEngine, Webhook payload shape, OTA upload protocol, or management snapshot capture.
- Review task-relevant lessons and current dirty worktree.
- Inspect
alarm_snapshots.py,main.py, config polygon shape, and existing tests. - Write a failing unit test proving alert snapshot upload encodes an annotated frame when zones/trash ROI are configured.
- Write focused unit tests for polygon overlay behavior using a tiny RGB frame.
- Run targeted tests and confirm the new tests fail for the expected missing overlay behavior.
- Implement the smallest standard-library overlay helper in
src/cold_display_guard/alarm_snapshots.py. - Wire
capture_alert_snapshotto apply configured overlays before JPEG encoding. - Run targeted snapshot/runtime tests.
- Run the full Python test suite.
Review
- Added
apply_calibration_overlayinsrc/cold_display_guard/alarm_snapshots.pyto draw configured food-zone polygons in yellow and the trash ROI in red onto a copied frame before JPEG encoding and OTA upload. - The overlay clamps normalized coordinates to image bounds, draws semi-transparent fills plus outlines, and leaves the original
Frame.rgbunchanged for downstream runtime processing. capture_alert_snapshotnow encodes the annotated frame when warning/alarm events trigger snapshot upload; non-alert events and disabled upload behavior are unchanged.- Targeted verification passed:
PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -vPYTHONPATH=src python3 -m unittest tests/test_main.py -v
- Full verification passed:
PYTHONPATH=src python3 -m unittest discover -s tests -v
Current Task: Deploy Overlay Update To 10.8.0.23
Goal: Deploy the alarm snapshot calibration overlay change to xiaozheng@10.8.0.23 without overwriting live RTSP/calibration config or unrelated local changes.
Plan: Inspect the remote deployment layout first, confirm which containers are active, sync only the runtime source file required for the overlay change, rebuild/restart the API/runtime services that use the Python image, and verify both service health and the deployed source code.
- Inspect remote deployment directory, Docker/Compose files, and active containers on
xiaozheng@10.8.0.23. - Confirm the remote config file remains present and is not overwritten.
- Sync
src/cold_display_guard/alarm_snapshots.pyto the remote deployment path. - Rebuild and restart only the affected
cold-display-guard-apiandcold-display-guard-runtimeservices when Compose is available. - Verify management API health after restart.
- Verify the deployed remote source contains
apply_calibration_overlay.
Deployment Review
- Remote deployment path confirmed as
/home/xiaozheng/cold_display_guard. - Active services before deployment:
cold-display-guard-api,cold-display-guard-runtime, andcold-display-guard-web. - Remote live
config/example.tomlwas checked before and after deployment and was not overwritten. - Synced only
src/cold_display_guard/alarm_snapshots.pyto avoid deploying unrelated localweb/nginx.confchanges. - Created a timestamped backup of the previous remote
alarm_snapshots.pybeside the source file before syncing. - Rebuilt
cold-display-guard:devwithdocker compose --env-file deploy/cold-display-guard.env -f deploy/docker-compose.yml build cold-display-guard-api. - Restarted only
cold-display-guard-apiandcold-display-guard-runtimewith Compose;cold-display-guard-webremained untouched. - Verification passed:
curl http://127.0.0.1:19080/api/manage/healthreturnedstatus=okandruntime_status=running.docker exec cold-display-guard-api python3 -c ...confirmedapply_calibration_overlayexists in the running image with signature(frame, config) -> Frame.- API and runtime logs show normal startup after restart.
Current Task: Update Timing Parameters On 10.8.0.23
Goal: Adjust the live timing settings on xiaozheng@10.8.0.23 per operator request.
Applied mapping: The current application has no separate pre-warning threshold. It supports max_dwell_seconds for the time alarm/overdue threshold and trash_confirmation_seconds for the disposal confirmation window before warning escalation. Applied max_dwell_seconds = 120 and trash_confirmation_seconds = 30.
- Back up
/home/xiaozheng/cold_display_guard/config/example.toml. - Update
[thresholds].max_dwell_secondsfrom300to120. - Update
[thresholds].trash_confirmation_secondsfrom120to30. - Restart
cold-display-guard-apiandcold-display-guard-runtime. - Verify
/api/manage/health. - Verify
/api/manage/configreturns{"max_dwell_seconds": 120, "trash_confirmation_seconds": 30}.
Timing Update Review
- Remote config was edited in place after creating a timestamped backup.
cold-display-guard-apiandcold-display-guard-runtimewere explicitly restarted with Docker Compose.cold-display-guard-webwas not restarted.- Verification passed:
GET http://127.0.0.1:19080/api/manage/healthreturnedstatus=okandruntime_status=running.GET http://127.0.0.1:19080/api/manage/configreturnedmax_dwell_seconds = 120andtrash_confirmation_seconds = 30.- Container status showed
cold-display-guard-apihealthy andcold-display-guard-runtimerunning after restart.
- Note: requested
预警时长 = 1minis not independently configurable in the current codebase; supporting distinct pre-warning at 60 seconds and overdue at 120 seconds would require a code change.
Current Task: Pre-Warning Alarm Flow And Full Webhook/MQTT Chain
Goal: Implement the requested camera-side timing flow, deploy it to xiaozheng@10.8.0.23, and verify the Webhook -> video_recognition_local -> MQTT -> store_data_platform chain.
Design: Keep all timing decisions inside cold_display_guard.BatchEngine. Add separate thresholds for pre-warning, alarm, and alarm-removal timeout; emit explicit lifecycle events so downstream services do not infer camera-side timers. Keep video_recognition_local as a transparent Webhook/MQTT bridge, and update store_data_platform only where event names map to notifications, case types, and CRM penalty submission.
- Review task-relevant instructions, lessons, and dirty worktree.
- Inspect the current cold-display engine, case store, webhook payload, and tests.
- Inspect
video_recognition_localcold-display Webhook receiver and MQTT publisher. - Inspect
store_data_platformcold-display MQTT consumer, notification mapping, and CRM submission trigger. - Inspect
xiaozheng@10.8.0.23active containers and deployment paths. - Add failing cold-display engine/case/config/webhook tests for
time_pre_warning,pre_warning_handled,time_alarm, andalarm_removal_timeout. - Implement the camera-side state machine and config fields.
- Add/adjust
video_recognition_localpassthrough tests for the new event names. - Add/adjust
store_data_platformtests and mappings for new event semantics. - Run local targeted and full relevant verification.
- Deploy changed services to
xiaozheng@10.8.0.23without overwriting live RTSP/calibration secrets. - Update the remote timing config to
pre_warning_seconds=60,max_dwell_seconds=120,alarm_removal_seconds=30,trash_confirmation_seconds=30. - Verify remote Webhook target reachability from the cold-display container to local
video-recognition. - Observe cold-display, video-recognition, MQTT, and platform logs; record the result.
Current Findings
cold_display_guardcurrently has onlymax_dwell_secondsandtrash_confirmation_seconds; it cannot independently represent 1-minute pre-warning, 2-minute alarm, and 30-second alarm-removal timeout.video_recognition_localreceives/api/webhook/cold-display-guardpayloads as generic JSON and forwards them to MQTT; new event names should remain transparent, but tests should lock this behavior.store_data_platformcurrently treatstime_alarmandbatch_pending_disposalas warning notifications, and onlywarning_escalatedtriggers CRM penalty submission. This must change sotime_pre_warningis the warning,time_alarmis the alert reminder, andalarm_removal_timeouttriggers CRM submission.- On
10.8.0.23, active containers includecold-display-guard-*,video-recognition, andmosquitto;video-recognitionruns with host networking, whilecold-display-guard-apiruns on its Compose network.
Local Verification
- Cold-display full Python suite passed:
PYTHONPATH=src python3 -m unittest discover -s tests -v(98tests). video_recognition_localcold-display focused tests passed:go test ./internal/server ./internal/mqtt ./cmd -run 'TestColdDisplayGuard|Test.*ColdDisplayGuard' -count=1.store_data_platformdisplay-cabinet service focused tests passed:go test ./store_data/service -run 'Test.*StoreDisplayCabinet|TestResolveStoreDisplayCabinet.*|TestShouldSubmitStoreDisplayCabinetPenalty|TestBuildStoreDisplayCabinet.*' -count=1.
Deployment Review
- Synced only these cold-display source files to
xiaozheng@10.8.0.23:/home/xiaozheng/cold_display_guard/src/cold_display_guard/:models.py,config.py,engine.py,cases.py,webhooks.py. - Backed up the remote source files and live
config/example.tomlbefore deployment. - Updated the live remote thresholds to
pre_warning_seconds=60,max_dwell_seconds=120,alarm_removal_seconds=30, andtrash_confirmation_seconds=30. - Updated the live remote Webhook target from the unreachable old host to
http://10.8.0.23:8080/api/webhook/cold-display-guard. - Rebuilt
cold-display-guard:devand restarted onlycold-display-guard-apiandcold-display-guard-runtime. - Remote verification passed:
GET /api/manage/healthreturnedstatus=okandruntime_status=running.GET /api/manage/configreturned the four expected threshold values and the new Webhook target.- Container-side synthetic engine run emitted
batch_started,time_pre_warning,time_alarm,alarm_removal_timeout, thenbatch_pending_disposalplusbatch_discarded. - Natural runtime log emitted
alarm_removal_timeoutforbatch_000881at2026-06-15T11:52:20+08:00. - Webhook delivery for that event returned HTTP
200fromvideo-recognition. video_recognition_localresult JSONL recorded bothalarm_removal_timeoutbatch and case events.- MQTT probe confirmed
video-recognitionpublished tovideo/cold-display-guard/result/cold-display-guardwithdevice_identifier=cold-display-guard.
store_data_platformis not deployed on10.8.0.23under that repository name or as an identifiable container; platform handling changes were completed and verified in the local repository.- The cold-display retry queue has no pending entries; old
192.168.5.103failures are already dead-letter history.
Current Task: Alarm Snapshot Labels And Zone Colors
Goal: Uploaded alarm screenshots should show each calibrated region name directly on the image, and different cold-display zones should use different overlay colors.
Design: Extend the existing standard-library overlay path. Keep drawing configured polygons before JPEG upload, but carry a display label for each region, choose a stable color from a fixed palette by zone order, and draw a small high-contrast text label inside the polygon. Keep trash ROI red and labeled separately.
- Inspect the current calibration overlay helper and tests.
- Add failing tests for per-zone colors and visible region labels.
- Implement labels and stable zone color palette.
- Run snapshot tests and full Python tests.
- Deploy the overlay update to
xiaozheng@10.8.0.23. - Verify remote API/runtime health and deployed overlay helper.
Review
apply_calibration_overlaynow assigns each cold-display zone a stable color from a fixed palette and keeps the trash ROI red.- Each overlay region now carries a label and draws a small high-contrast label box directly on the frame before JPEG encoding/upload.
- The built-in label renderer covers common现场 labels such as
区域 1through digits and垃圾区, plus basic ASCII for custom numeric/English labels. - Verification passed:
PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -vPYTHONPATH=src python3 -m unittest discover -s tests -v(99tests)
- Deployed
src/cold_display_guard/alarm_snapshots.pytoxiaozheng@10.8.0.23after backing up the previous remote file. - Rebuilt
cold-display-guard:devand restartedcold-display-guard-apipluscold-display-guard-runtime. - Remote verification passed:
GET /api/manage/healthreturnedstatus=okandruntime_status=running.- Container-side overlay smoke test confirmed two zones render different RGB values and label text pixels are present.
Current Task: Alarm Snapshot Chinese Label Rendering Fix
Goal: Fix unreadable/garbled Chinese region names on uploaded alarm screenshots while keeping per-zone colors and fallback labeling robust.
Design: Use a real CJK font renderer for Chinese labels in the alarm snapshot overlay path. Install Noto CJK fonts in the runtime image, render labels through ffmpeg drawtext when the font is available, and fall back to readable ASCII labels if the font renderer is unavailable.
- Reproduce and identify the likely root cause: remote container only matched DejaVu for
zh-cn, so Chinese labels had no real CJK font path. - Add regression tests for Docker CJK font installation and readable ASCII fallback labels.
- Update
Dockerfileto installfonts-noto-cjk. - Update
alarm_snapshots.pyto prefer CJK font rendering and useR1/TRASHfallback text when needed. - Run focused and full local Python verification.
- Deploy
Dockerfileandalarm_snapshots.pytoxiaozheng@10.8.0.23without overwriting live config. - Rebuild/restart
cold-display-guard-apiandcold-display-guard-runtime. - Verify remote API/runtime health, CJK font availability, overlay smoke behavior, and runtime logs.
Review
- Root cause was the screenshot overlay path not having a real Chinese font renderer in the deployed image; the container matched DejaVu before this fix.
- The rebuilt remote container now reports
NotoSansCJK-Regular.ttc: "Noto Sans CJK SC" "Regular"forfc-match :lang=zh-cn. - Remote overlay smoke test confirmed
find_cjk_font_file()returns/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc, Chinese labels change the frame, bright label pixels are present, and different regions retain distinct colors. - Local verification passed:
PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -vPYTHONPATH=src python3 -m unittest discover -s tests -v(101tests)
- Remote verification passed:
GET /api/manage/healthreturnedstatus=ok,runtime_status=running, and versiondev.cold-display-guard-apiis healthy andcold-display-guard-runtimeis running after restart.- Runtime logs show normal startup after the restart.
Current Task: Investigate False Normal Consumption Events On 10.8.0.23
Goal: Determine why the live system records a normal consumption event about every two minutes with a dwell time near 13 seconds even when no one touched the cold display cabinet.
Debug plan: Inspect remote runtime/event/case/diagnostic logs first, correlate batch_started and batch_consumed pairs by zone and dwell time, then trace the vision metrics for those timestamps to identify whether the source is occupancy flicker, runtime restart state restoration, config thresholds, or downstream display interpretation.
- Inspect recent remote events and confirm the exact event names, zones, dwell seconds, and cadence.
- Inspect runtime diagnostics around those timestamps for occupancy and vision metric flicker.
- Inspect live config and runtime logs for sampling/stabilization settings and restarts.
- Form and test a root-cause hypothesis before changing code or live thresholds.
- Record findings, fix if needed, and verify with logs/tests.
Findings And Fix
- The repeated records were real
batch_started->batch_consumedevents from the camera-side engine, not a downstream display issue. - Before the fix, recent events showed repeated zone 1 batches ending after 13-33 seconds, matching the two-frame confirmation cadence at the current sampling rate.
- Root cause had two parts:
- Zone 1 was genuinely occupied, but its vision signal hovered around the old relative dark threshold, so short raw-occupancy dips were interpreted as item removal.
- Zone 2 was occupied before or during baseline learning, so its relative difference from baseline stayed near zero and it was not detected as occupied.
- Added
occupancy_absolute_dark_fractioninsrc/cold_display_guard/vision.py, defaulting to0.0so existing configs are unchanged unless they opt in. - Updated the live config on
xiaozheng@10.8.0.23:occupancy_dark_fraction = 0.12occupancy_absolute_dark_fraction = 0.085empty_confirm_frames = 6
- Rebuilt and restarted
cold-display-guard-apiandcold-display-guard-runtime. - Verification:
- Local full Python suite passed:
PYTHONPATH=src python3 -m unittest discover -s tests -v(102tests). - Remote health returned
status=okandruntime_status=running. - Remote container config shows the new thresholds.
- After deployment, latest diagnostics stabilized at
zone_counts = {"1": 1, "2": 1, "6": 1}.
- Local full Python suite passed:
- During a two-minute observation window after
13:25, no newbatch_consumedevents were emitted; only expected pre-warning/alarm lifecycle events appeared for the occupied zones.
Current Task: Reduce Alarm Snapshot Label Visual Obstruction
Goal: Region labels on uploaded alarm screenshots should be smaller and more transparent so operators can inspect the food/display image underneath.
Design: Keep the existing label content, placement, CJK font rendering, and per-zone colors. Only reduce the visual weight of the label layer by lowering font size, black label-box opacity, border width, and fallback label-box opacity.
- Inspect current alarm snapshot label rendering style.
- Add a regression test for smaller ffmpeg drawtext label style.
- Reduce drawtext font size and label-box opacity.
- Keep fallback label renderer visually consistent with the ffmpeg path.
- Run full local verification.
- Deploy the updated snapshot overlay style to
xiaozheng@10.8.0.23. - Verify remote runtime health and deployed label style.
Notes
- Targeted snapshot test passed:
PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -v. - Full local verification passed:
PYTHONPATH=src python3 -m unittest discover -s tests -v(103tests). - Remote verification passed:
GET /api/manage/healthreturnedstatus=okandruntime_status=running.- Running container uses
fontsize=13,boxcolor=black@0.34, andboxborderw=2for region labels. cold-display-guard-runtimelogs show normal startup after restart.
Current Task: Limit Alert Snapshot Overlay To Event Zones
Goal: Uploaded warning/alarm screenshots should only draw the cold-display region polygons and names for the zones that actually triggered the warning/alarm event. Other configured zones and the trash ROI should not be drawn on those uploaded screenshots.
Plan: Keep the full calibration overlay helper available for tests and general use, but pass alert event zone IDs from capture_alert_snapshot into the overlay loader and disable trash ROI drawing for alert uploads.
- Add a regression test proving alert snapshot upload only annotates the triggering event zone.
- Filter snapshot overlay regions by event
zone_idduring alert upload. - Preserve full overlay behavior when
apply_calibration_overlayis called without filters. - Run full local Python verification.
- Deploy
alarm_snapshots.pytoxiaozheng@10.8.0.23. - Verify remote API/runtime health and deployed filtered-overlay behavior.
Review
- Local verification passed:
PYTHONPATH=src python3 -m unittest tests/test_alarm_snapshots.py -vPYTHONPATH=src python3 -m unittest discover -s tests -v(104tests)
- Deployed only
src/cold_display_guard/alarm_snapshots.pytoxiaozheng@10.8.0.23after backing up the previous remote file; live config was not overwritten. - Rebuilt
cold-display-guard:devand restartedcold-display-guard-apipluscold-display-guard-runtime. - Remote verification passed:
GET /api/manage/healthreturnedstatus=okandruntime_status=running.- Container-side smoke test for a zone-1 alert returned
zone1_changed=True,zone2_unchanged=True, andtrash_unchanged=True. - API/runtime logs show normal startup after restart.
Current Task: Check Webhook Duplicate Delivery
Goal: Verify whether cold_display_guard is sending duplicate Webhook requests to video-recognition on xiaozheng@10.8.0.23.
Investigation: Compare the sending code path, remote webhook delivery audit, retry queue state, cold-display event/case logs, video-recognition HTTP logs, and the receiver-side JSONL payloads.
- Inspect sender code path for direct event/case delivery and retry drain behavior.
- Confirm remote Webhook config uses the same URL for
event_urlandcase_url. - Check sender delivery audit for duplicate receiver
task_idvalues. - Check retry queue for pending successful redelivery risk.
- Check receiver-side cold-display JSONL for duplicate payloads and duplicate business keys.
- Trace the only coarse duplicate-looking case around
batch_000898.
Review
- Current remote config sends both
batch_eventandcase_eventtohttp://10.8.0.23:8080/api/webhook/cold-display-guard, so one business transition can produce two HTTP POSTs to the same endpoint with differentkindvalues. - Sender audit
logs/webhook_delivery.jsonlcontains3056records total; recent valid delivery has321directokrecords and0retryokrecords. - Receiver-returned
task_idvalues are unique:321unique task IDs and0duplicate task IDs. - Retry queue has
547latest retry items, alldead_letter; there are no pending retries. - Receiver-side
video-recognitioncold-display files for2026-06-15contain181business payloads; exact payload duplicates are0, and fine-grained business key duplicates are0. - Sender
events.jsonlcontains3325events; duplicate(batch_id, event, ts, zone_id)keys are0. - The only coarse duplicate-looking receiver entry was
batch_000898at13:20:26: the same frame emittedtime_pre_warningandpre_warning_handled, which produced separatecase_eventactionscreatedandhandled. This is not the same Webhook request repeated.