Files
managed-portal/tasks/lessons.md
skye.yue 4e2ca3cff7 feat: improve webhook filtering, worker status startup handling, and timestamp parsing
- Skip half_hour_report events from webhook posts in people_flow
- Handle pre-existing stale worker status files during startup gracefully
- Make store_dwell_alert timestamp parsing robust against invalid/empty values
- Update lessons learned and todo documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-10 17:05:31 +08:00

9.4 KiB

Lessons

2026-05-12

  • Trigger: the user corrected the execution workflow for non-trivial tasks and required persistent task tracking.

  • Rule: for any non-trivial task, create or update tasks/todo.md before substantive implementation, keep progress current, and do not mark done without review evidence.

  • Preventive action: check for tasks/todo.md, tasks/lessons.md, and repository guidance files before editing code; if the user corrects process expectations, record the rule immediately.

  • Trigger: the user required corrections to be persisted for future sessions.

  • Rule: any user correction must be recorded in tasks/lessons.md as trigger -> rule -> preventive action.

  • Preventive action: after any correction, update lessons before closing the task and mention the recorded rule in the final verification summary.

  • Trigger: the user clarified that this repository is meant to run in mainland China environments.

  • Rule: future code, build, deployment, and integration changes must consider mainland China network accessibility and should prefer China-friendly defaults where practical.

  • Preventive action: when adding dependencies, mirrors, external endpoints, or download flows, explicitly check whether the default path works reliably in mainland China and add configuration or fallback when needed.

  • Trigger: the user required deployment to use docker compose only and explicitly disallowed host environment changes.

  • Rule: for remote rollout tasks in this repo, prefer repository-contained docker compose changes and do not install packages, edit host configs, or mutate global environment state unless the user explicitly approves it.

  • Preventive action: when a deployment is blocked, first fix Dockerfiles, compose files, env files, and mounted paths inside the repo before considering any host-level workaround.

2026-05-15

  • Trigger: the .11 OTA bundle host did not have a zip executable, and the current Python containers no longer exposed the historical lap overlay paths.
  • Rule: OTA bundle publication must not assume host archive tools or historical runtime overlay paths are present.
  • Preventive action: when cutting a release, package the ZIP with Python stdlib if zip is unavailable, treat overlay extraction as optional unless the paths are verified live in containers, and validate the final archive contents before upload.

2026-05-18

  • Trigger: the user clarified that OTA installer updates should not keep repackaging and uploading the whole repository tree or fixed people_flow_project weights.
  • Rule: managed-portal OTA releases should ship a minimal ZIP with deploy metadata and managed config only; people_flow_project weights should be reused from a stable host location unless the weights themselves changed or the host is new.
  • Preventive action: when preparing OTA artifacts, use the minimal packaging script, exclude managed/people_flow_project/weights by default, and only publish a weights-bearing bundle for first-time installs or actual weight updates.

2026-05-19

  • Trigger: the user corrected the OTA publication login for 10.8.0.1.

  • Rule: the OTA web host 10.8.0.1 must be published with root, not xiaozheng.

  • Preventive action: for future managed-portal OTA rollouts, verify publication access against root@10.8.0.1:/var/www/html/ai_deploy before treating upload as blocked.

  • Trigger: the user clarified that all new installation targets are Ubuntu machines and asked for missing unzip to be handled automatically, with weights delivered separately.

  • Rule: the managed-portal OTA installer should treat Ubuntu as the first-install baseline, auto-install unzip via apt-get when needed, and use a separate people-flow weights archive instead of forcing weights into the main ZIP.

  • Preventive action: keep the main OTA ZIP minimal, publish people-flow-weights-<RELEASE_VERSION>.tar.gz alongside each release when weights are available, and validate that the installer still reuses shared weights on upgrades.

  • Trigger: the user corrected the YOLO weight repair strategy after a host had DeepFace weights but lacked only yolo11n.pt.

  • Rule: OTA recovery for a missing small model must not force a full 1GB+ weights archive download or fall back to public GitHub downloads.

  • Preventive action: publish a small people-flow-yolo11n-<RELEASE_VERSION>.tar.gz artifact and make the installer download it when only people_flow_project/weights/yolo11n.pt is missing.

2026-06-04

  • Trigger: the user corrected the OTA Docker registry address for the video-recognition rollout on 10.8.0.14.

  • Rule: when updating OTA-hosted Docker images, use the exact registry host and port provided by the user; ota.zhengxinshipin.com and ota.zhengxinshipin.com:5443 are not interchangeable.

  • Preventive action: before concluding a remote image reference is missing, verify whether the intended registry includes a non-default port and test the exact host:port/repo:tag reference.

  • Trigger: the user clarified that the managed-portal four-service rollout must follow the published installer on root@10.8.0.1:/var/www/html/ai_deploy.

  • Rule: for managed-portal release updates, treat the published installer bundle and its embedded Compose/env files as the deployment source of truth instead of reverse-engineering the current host state.

  • Preventive action: before updating the managed-portal stack on a target host, inspect install-managed-portal-*.sh, release-manifest.env, and the bundled docker-compose.ota-release.yml under /var/www/html/ai_deploy.

  • Trigger: the user redirected a live service investigation from 10.8.0.14 to 10.8.0.15.

  • Rule: when continuing operational debugging across multiple hosts, do not assume the previously investigated host is still the active target after the user switches machines.

  • Preventive action: restate the target host before diagnosis or remediation, and refresh runtime evidence from that exact machine instead of carrying over prior-host conclusions.

2026-06-09

  • Trigger: the user corrected the intended people-flow RTSP source on 10.8.0.22.

  • Rule: when validating or repairing managed child-service deployments, treat the user-provided live RTSP URL as the source of truth and verify that the running container environment matches it exactly.

  • Preventive action: after any host-specific stream correction, inspect both the release env file and the container's effective RTSP_URL; if they differ, recreate only the affected service with the repository Compose/env inputs and record the exact URL used.

  • Trigger: the user corrected the intended store_dwell_alert RTSP source on 10.8.0.15.

  • Rule: for host-specific store_dwell_alert stream changes, verify both RTSP_URL and any derived identifiers such as CAMERA_ID in the deployed release env and the running container before concluding the rollout is correct.

  • Preventive action: after changing a store_dwell_alert stream on a target host, inspect the release env, render docker compose config, and recreate only store-dwell-alert so the effective RTSP_URL and CAMERA_ID match the intended source.

  • Trigger: the user corrected the intended store_dwell_alert RTSP source on 10.8.0.22.

  • Rule: even when the deployed release env on a host already has the intended store_dwell_alert stream, do not assume the running container picked it up; verify the live container environment separately.

  • Preventive action: on host-specific store_dwell_alert changes, compare deploy/managed-portal.release.env with docker inspect store-dwell-alert; if the env is already correct but the container is stale, force-recreate only store-dwell-alert.

2026-06-10

  • Trigger: the user clarified during the .14 webhook repair that video-recognition input_mode is dedicated to the RTSP recognition path and must not be changed for webhook integration.

  • Rule: when repairing store-dwell-alert to video-recognition webhook delivery on a host that already runs RTSP recognition, keep the main video-recognition input_mode unchanged unless the user explicitly requests a recognition-mode switch.

  • Preventive action: before mirroring a reference host's webhook setup, check whether that host's input_mode differs from the target and, if it does, design the fix around a separate receiver path or image rather than changing the target's main recognition mode.

  • Trigger: the user redirected the .11 image reuse plan to go through the shared OTA registry tag instead of a host-local sidecar-only image.

  • Rule: when a working image on one host needs to be reused by other machines, publish the exact validated image content to the user-specified OTA registry tag first, then update targets by pulling that registry tag rather than relying on host-local image transfer alone.

  • Preventive action: before rolling a host-specific image fix to a single machine, check whether the user expects the image to become the shared registry baseline; if yes, validate the source image digest and publish it to the exact registry path before updating consumers.

  • Trigger: the user clarified that the live .14 deployment fix may use sudo on the target host.

  • Rule: when host-owned deployment files block a required live fix and the user explicitly grants sudo, prefer the direct sudo path over indirect container-side file mutation.

  • Preventive action: if a remote deployment edit fails on file ownership, check whether the user has authorized sudo; when authorized, switch to sudo for the host-side config edit and service recreation commands.