diff --git a/tasks/lessons.md b/tasks/lessons.md index d417bed..d6b9c96 100644 --- a/tasks/lessons.md +++ b/tasks/lessons.md @@ -39,3 +39,17 @@ - Trigger: the user clarified that all new installation targets are Ubuntu machines and asked for missing `unzip` to be handled automatically, with weights delivered separately. - Rule: the managed-portal OTA installer should treat Ubuntu as the first-install baseline, auto-install `unzip` via `apt-get` when needed, and use a separate people-flow weights archive instead of forcing weights into the main ZIP. - Preventive action: keep the main OTA ZIP minimal, publish `people-flow-weights-.tar.gz` alongside each release when weights are available, and validate that the installer still reuses shared weights on upgrades. + +- Trigger: the user corrected the YOLO weight repair strategy after a host had DeepFace weights but lacked only `yolo11n.pt`. +- Rule: OTA recovery for a missing small model must not force a full 1GB+ weights archive download or fall back to public GitHub downloads. +- Preventive action: publish a small `people-flow-yolo11n-.tar.gz` artifact and make the installer download it when only `people_flow_project/weights/yolo11n.pt` is missing. + +## 2026-06-04 + +- Trigger: the user corrected the OTA Docker registry address for the video-recognition rollout on `10.8.0.14`. +- Rule: when updating OTA-hosted Docker images, use the exact registry host and port provided by the user; `ota.zhengxinshipin.com` and `ota.zhengxinshipin.com:5443` are not interchangeable. +- Preventive action: before concluding a remote image reference is missing, verify whether the intended registry includes a non-default port and test the exact `host:port/repo:tag` reference. + +- Trigger: the user clarified that the managed-portal four-service rollout must follow the published installer on `root@10.8.0.1:/var/www/html/ai_deploy`. +- Rule: for managed-portal release updates, treat the published installer bundle and its embedded Compose/env files as the deployment source of truth instead of reverse-engineering the current host state. +- Preventive action: before updating the managed-portal stack on a target host, inspect `install-managed-portal-*.sh`, `release-manifest.env`, and the bundled `docker-compose.ota-release.yml` under `/var/www/html/ai_deploy`. diff --git a/tasks/todo.md b/tasks/todo.md index ebd5ecf..43b20b3 100644 --- a/tasks/todo.md +++ b/tasks/todo.md @@ -2,45 +2,54 @@ ## Checklist -- [ ] Reuse the already-published `managed-portal-20260519-f3f40b5-11.zip` main bundle and cut updated installer/weights artifacts for the same tag. -- [ ] Publish the updated installer and separate weights archive to `10.8.0.1` and verify the HTTP endpoints. -- [ ] Commit and push the repository changes for the split-weights Ubuntu installer flow. +- [x] Inspect the published managed-portal installer and release manifest under `root@10.8.0.1:/var/www/html/ai_deploy`. +- [x] Confirm the registry tags currently published for `managed-portal`, `managed-portal-web`, `people-flow-project`, and `store-dwell-alert`. +- [x] Prepare `10.8.0.14` for an installer-aligned rollout of the four-service managed-portal stack. +- [x] Recreate the four target containers on `10.8.0.14` using the published release version and corresponding Compose layout. +- [x] Verify the running stack on `10.8.0.14` uses the published registry images and the installer-managed Compose project. ## Scope And Risks -- Scope: keep the existing OTA application ZIP for `20260519-f3f40b5-11`, generate a refreshed installer plus separate people-flow weights archive for that same release tag, publish them to `10.8.0.1`, and push the supporting repository changes to Git. -- Expected touch points: `.11` release artifacts, `/var/www/html/ai_deploy` on `10.8.0.1`, `deploy/package-managed-portal-ota.sh`, `deploy/install-managed-portal-ota.sh`, `README.md`, `.gitignore`, and task tracking files. -- Risk: reusing the existing main ZIP means the installer and weights archive must remain compatible with the already-published `managed-portal-20260519-f3f40b5-11.zip`. -- Risk: the current local repository does not contain real weights payload files, so the separate weights archive may need to be generated from the `.11` host release workspace or a stable host weights directory instead of local source control. -- Risk: the commit must exclude local artifact files and only capture the intended repo changes. +- Scope: use the published managed-portal release artifacts on `10.8.0.1` as the source of truth for image names, tags, and Compose topology. +- Scope: update the four-service managed-portal group on `10.8.0.14`: `managed-portal`, `managed-portal-web`, `people-flow-project`, and `store-dwell-alert`. +- Scope: keep unrelated stacks, especially the `iot-main` video-recognition project, untouched. +- Risk: the current four containers on `10.8.0.14` are not managed by one Compose project, so installer-based recreation will conflict on fixed container names unless the old containers are replaced cleanly. +- Risk: the published installer seeds config, data, outputs, and weights under `/opt/managed-portal-releases`; switching to it changes the runtime paths from the current ad hoc directories. +- Risk: service recreation causes a brief interruption for the portal and both child-service APIs. ## Validation Intent -- Prove the refreshed installer and separate weights archive exist for tag `20260519-f3f40b5-11`. -- Prove both artifacts are downloadable from `10.8.0.1/ai_deploy`. -- Prove the Git commit/push contains only the intended repository changes. +- Read the published installer and `release-manifest.env` to confirm the exact release version and image references. +- Verify the registry exposes the four target tags before rollout. +- Use the installer-aligned Compose files and environment from the published bundle, not a hand-built local variant. +- Confirm the final containers are recreated from the published registry images and are running under the installer-managed release directory. ## Review - Status: complete. -- Reused OTA application bundle: - - Kept the already-published `managed-portal-20260519-f3f40b5-11.zip` as-is because the main program contents did not change. - - Regenerated only the installer and the separate weights archive for the same release tag. -- `.11` artifact refresh: - - Synced the updated packaging and installer scripts to `/home/xiaozheng/managed-portal`. - - Reused `release_build/release-manifest-20260519-f3f40b5-11.env`. - - Generated `release_build/install-managed-portal-20260519-f3f40b5-11.sh`. - - Generated `release_build/people-flow-weights-20260519-f3f40b5-11.tar.gz` from `/home/xiaozheng/people_flow_project/weights`. -- OTA publication result on `10.8.0.1`: - - Published `install-managed-portal-20260519-f3f40b5-11.sh`. - - Published `people-flow-weights-20260519-f3f40b5-11.tar.gz`. - - Preserved the existing `managed-portal-20260519-f3f40b5-11.zip`. - - Confirmed `install-managed-portal-latest.sh` still resolves to `install-managed-portal-20260519-f3f40b5-11.sh`. - - Confirmed `managed-portal-latest.zip` still resolves to `managed-portal-20260519-f3f40b5-11.zip`. -- HTTP verification: - - `http://10.8.0.1/ai_deploy/people-flow-weights-20260519-f3f40b5-11.tar.gz` returns `200 OK` with `Content-Length: 1135171626`. - - `http://10.8.0.1/ai_deploy/install-managed-portal-20260519-f3f40b5-11.sh` returns `200 OK` with `Content-Length: 8081`. - - `http://10.8.0.1/ai_deploy/managed-portal-20260519-f3f40b5-11.zip` returns `200 OK` with `Content-Length: 4880`. -- Git result: - - Committed the repo changes as `d1c4b77` with message `Split OTA weights for Ubuntu installs`. - - Pushed `main` to `origin`. +- Result: + - Confirmed the published managed-portal installer source of truth is `root@10.8.0.1:/var/www/html/ai_deploy/install-managed-portal-20260519-f3f40b5-11.sh`. + - Confirmed the published registry images in `release-manifest.env` are: + - `ota.zhengxinshipin.com:5443/managed-portal:20260519-f3f40b5-11` + - `ota.zhengxinshipin.com:5443/managed-portal-web:20260519-f3f40b5-11` + - `ota.zhengxinshipin.com:5443/people-flow-project:20260519-f3f40b5-11` + - `ota.zhengxinshipin.com:5443/store-dwell-alert:20260519-f3f40b5-11` + - Confirmed all four image tags exist in the registry on `10.8.0.14`. + - Extracted the published release bundle to `/opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11`. + - Generated `/opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/deploy/managed-portal.runtime.env` from the published release, while keeping the host-specific data and output directories on `10.8.0.14`. + - Replaced the ad hoc `managed-portal`, `managed-portal-web`, `people-flow-project`, and `store-dwell-alert` containers with the installer-managed Compose project under `/opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/deploy/docker-compose.ota-release.yml`. + - Migrated `store-dwell-alert` to a schema-compatible config under `/opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/managed/store_dwell_alert/config/local.yaml` while preserving the current `192.168.0.5` RTSP source. + - Kept the current people-flow config/output/weights directories, but removed `gpus: all` from the release Compose file because the host currently fails `nvidia-container-cli` startup with an NVML driver/library mismatch. The new image falls back to CPU at runtime and still reports healthy. +- Verification: + - Published release manifest from `10.8.0.1` resolves to the four `20260519-f3f40b5-11` image tags above. + - Registry presence checks from `10.8.0.14` succeeded for all four image tags via `docker manifest inspect`. + - `sudo docker compose --env-file /opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/deploy/managed-portal.runtime.env -f /opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/deploy/docker-compose.ota-release.yml ps` showed: + - `managed-portal` -> `ota.zhengxinshipin.com:5443/managed-portal:20260519-f3f40b5-11`, `Up` + - `managed-portal-web` -> `ota.zhengxinshipin.com:5443/managed-portal-web:20260519-f3f40b5-11`, `Up` + - `people-flow-project` -> `ota.zhengxinshipin.com:5443/people-flow-project:20260519-f3f40b5-11`, `Up (healthy)` + - `store-dwell-alert` -> `ota.zhengxinshipin.com:5443/store-dwell-alert:20260519-f3f40b5-11`, `Up (healthy)` + - `sudo docker inspect` confirmed the four containers use the published registry image references; `people-flow-project` and `store-dwell-alert` report healthy. + - `curl -fsS http://127.0.0.1:13000` succeeded. + - `curl -fsS http://127.0.0.1:13000/api/managed-services` returned both managed services with `status: "running"`. + - `curl -fsS http://127.0.0.1:13000/api/managed-services/store_dwell_alert` returned the `192.168.0.5` RTSP source and `status: "running"`. + - `curl -fsS http://127.0.0.1:13000/api/managed-services/people_flow_project` returned the `192.168.0.4` RTSP source and `status: "running"`.