fix: stabilize cold display occupancy detection

This commit is contained in:
2026-06-15 13:40:20 +08:00
parent 1059850378
commit fa2c90e250
5 changed files with 68 additions and 1 deletions

View File

@@ -284,3 +284,35 @@
- `GET /api/manage/health` returned `status=ok`, `runtime_status=running`, and version `dev`.
- `cold-display-guard-api` is healthy and `cold-display-guard-runtime` is running after restart.
- Runtime logs show normal startup after the restart.
## Current Task: Investigate False Normal Consumption Events On 10.8.0.23
**Goal:** Determine why the live system records a normal consumption event about every two minutes with a dwell time near 13 seconds even when no one touched the cold display cabinet.
**Debug plan:** Inspect remote runtime/event/case/diagnostic logs first, correlate `batch_started` and `batch_consumed` pairs by zone and dwell time, then trace the vision metrics for those timestamps to identify whether the source is occupancy flicker, runtime restart state restoration, config thresholds, or downstream display interpretation.
- [ ] Inspect recent remote events and confirm the exact event names, zones, dwell seconds, and cadence.
- [ ] Inspect runtime diagnostics around those timestamps for occupancy and vision metric flicker.
- [ ] Inspect live config and runtime logs for sampling/stabilization settings and restarts.
- [x] Form and test a root-cause hypothesis before changing code or live thresholds.
- [x] Record findings, fix if needed, and verify with logs/tests.
### Findings And Fix
- The repeated records were real `batch_started` -> `batch_consumed` events from the camera-side engine, not a downstream display issue.
- Before the fix, recent events showed repeated zone 1 batches ending after 13-33 seconds, matching the two-frame confirmation cadence at the current sampling rate.
- Root cause had two parts:
- Zone 1 was genuinely occupied, but its vision signal hovered around the old relative dark threshold, so short raw-occupancy dips were interpreted as item removal.
- Zone 2 was occupied before or during baseline learning, so its relative difference from baseline stayed near zero and it was not detected as occupied.
- Added `occupancy_absolute_dark_fraction` in `src/cold_display_guard/vision.py`, defaulting to `0.0` so existing configs are unchanged unless they opt in.
- Updated the live config on `xiaozheng@10.8.0.23`:
- `occupancy_dark_fraction = 0.12`
- `occupancy_absolute_dark_fraction = 0.085`
- `empty_confirm_frames = 6`
- Rebuilt and restarted `cold-display-guard-api` and `cold-display-guard-runtime`.
- Verification:
- Local full Python suite passed: `PYTHONPATH=src python3 -m unittest discover -s tests -v` (`102` tests).
- Remote health returned `status=ok` and `runtime_status=running`.
- Remote container config shows the new thresholds.
- After deployment, latest diagnostics stabilized at `zone_counts = {"1": 1, "2": 1, "6": 1}`.
- During a two-minute observation window after `13:25`, no new `batch_consumed` events were emitted; only expected pre-warning/alarm lifecycle events appeared for the occupied zones.