Files
cold_display_guard/docs/plans/2026-04-27-cold-display-guard-design.md
2026-04-27 11:28:57 +08:00

4.4 KiB

Cold Display Guard Design

Date: 2026-04-27

Goal

Build a single-camera refrigerated display monitoring service that records how long each food batch remains in each display zone and raises alerts when over-threshold food is removed without confirmed disposal or is placed back into the display.

Scope

  • One camera sees both the refrigerated display cabinet and the trash bin.
  • The initial cabinet layout is 4 columns by 2 rows.
  • The layout must be configurable because the physical arrangement may change.
  • Each zone may contain multiple food items.
  • Items in one zone are treated as one batch.
  • Mixed batches are not allowed.
  • A new batch can only start after a zone becomes empty.

Architecture

The project separates business state from computer vision:

  1. Vision adapters detect display-zone occupancy and trash-bin deposit events.
  2. The batch engine receives normalized observations with timestamps.
  3. The engine maintains zone and batch state.
  4. The notifier layer emits JSONL events and later can send webhooks or UI alerts.

The first implementation focuses on the batch engine and event contract. Camera inference can be added later without changing compliance rules.

Zone Model

Each zone has:

  • zone_id
  • configured polygon or bounding box
  • current observed item count
  • optional active batch

The initial default zones are r1c1 through r2c4.

Batch State Machine

active

Food is currently visible in the zone. The batch has started_at, zone_id, and current observed count.

pending_disposal

The batch exceeded the maximum dwell time and the zone became empty. The system waits for a trash-bin deposit event within a configurable window.

discarded

The over-threshold batch was removed and a trash-bin deposit was observed in the confirmation window.

consumed

The batch was removed before the maximum dwell threshold. No trash confirmation is required.

violation

The system observed one of these conditions:

  • food was added to an already occupied zone, which indicates a mixed batch
  • an over-threshold removed batch was put back into any display zone before confirmed disposal
  • an over-threshold removed batch was not followed by trash disposal before the confirmation deadline

Timing Rules

  • Default maximum dwell time: 3 hours (10800 seconds).
  • Default trash confirmation window: 2 minutes (120 seconds).
  • A zone changing from 0 to >0 starts a new batch.
  • A zone changing from >0 to 0 ends the visible dwell period.
  • Count decreases while still >0 do not end the batch.
  • Count increases while already >0 produce a mixed-batch violation.

Event Contract

The engine emits JSON-compatible events:

  • batch_started
  • batch_count_changed
  • batch_consumed
  • batch_pending_disposal
  • batch_discarded
  • mixed_batch_violation
  • overdue_return_violation
  • missing_disposal_violation

Each event includes:

  • event
  • ts
  • camera_id
  • zone_id when applicable
  • batch_id when applicable
  • timing fields such as started_at, ended_at, and dwell_seconds

Future Vision Integration

The vision layer should output normalized observations:

{
  "ts": "2026-04-27T10:00:00+08:00",
  "zone_counts": {
    "r1c1": 3,
    "r1c2": 0
  },
  "trash_deposit": false
}

Trash disposal confirmation should use motion/object evidence inside the trash ROI, not merely a person standing near the bin.

Calibration Tool

The project includes a managed web UI with frontend port 23000 and backend port 19080.

The backend exposes:

  • GET /api/manage/config
  • PUT /api/manage/config
  • POST /api/manage/snapshot
  • PUT /api/manage/calibration
  • GET /api/manage/summary
  • GET /api/manage/events

The frontend can pull one RTSP snapshot, draw polygons for r1c1 through r2c4 and trash, then save calibration directly to the project TOML config.

The project also keeps a lightweight local RTSP snapshot calibration tool under tools/calibrator.

The tool runs a small standard-library HTTP server. The browser submits an RTSP URL to /api/capture; the server calls ffmpeg, extracts one JPEG frame, and returns it to the browser. The page then lets the operator draw normalized polygons for r1c1 through r2c4 plus trash.

This intentionally uses a single captured frame rather than a live preview. Calibration only needs a representative camera view, and a snapshot avoids browser RTSP limitations and live stream transcoding.