docs: record managed portal rollout runbook
This commit is contained in:
@@ -39,3 +39,17 @@
|
|||||||
- Trigger: the user clarified that all new installation targets are Ubuntu machines and asked for missing `unzip` to be handled automatically, with weights delivered separately.
|
- Trigger: the user clarified that all new installation targets are Ubuntu machines and asked for missing `unzip` to be handled automatically, with weights delivered separately.
|
||||||
- Rule: the managed-portal OTA installer should treat Ubuntu as the first-install baseline, auto-install `unzip` via `apt-get` when needed, and use a separate people-flow weights archive instead of forcing weights into the main ZIP.
|
- Rule: the managed-portal OTA installer should treat Ubuntu as the first-install baseline, auto-install `unzip` via `apt-get` when needed, and use a separate people-flow weights archive instead of forcing weights into the main ZIP.
|
||||||
- Preventive action: keep the main OTA ZIP minimal, publish `people-flow-weights-<RELEASE_VERSION>.tar.gz` alongside each release when weights are available, and validate that the installer still reuses shared weights on upgrades.
|
- Preventive action: keep the main OTA ZIP minimal, publish `people-flow-weights-<RELEASE_VERSION>.tar.gz` alongside each release when weights are available, and validate that the installer still reuses shared weights on upgrades.
|
||||||
|
|
||||||
|
- Trigger: the user corrected the YOLO weight repair strategy after a host had DeepFace weights but lacked only `yolo11n.pt`.
|
||||||
|
- Rule: OTA recovery for a missing small model must not force a full 1GB+ weights archive download or fall back to public GitHub downloads.
|
||||||
|
- Preventive action: publish a small `people-flow-yolo11n-<RELEASE_VERSION>.tar.gz` artifact and make the installer download it when only `people_flow_project/weights/yolo11n.pt` is missing.
|
||||||
|
|
||||||
|
## 2026-06-04
|
||||||
|
|
||||||
|
- Trigger: the user corrected the OTA Docker registry address for the video-recognition rollout on `10.8.0.14`.
|
||||||
|
- Rule: when updating OTA-hosted Docker images, use the exact registry host and port provided by the user; `ota.zhengxinshipin.com` and `ota.zhengxinshipin.com:5443` are not interchangeable.
|
||||||
|
- Preventive action: before concluding a remote image reference is missing, verify whether the intended registry includes a non-default port and test the exact `host:port/repo:tag` reference.
|
||||||
|
|
||||||
|
- Trigger: the user clarified that the managed-portal four-service rollout must follow the published installer on `root@10.8.0.1:/var/www/html/ai_deploy`.
|
||||||
|
- Rule: for managed-portal release updates, treat the published installer bundle and its embedded Compose/env files as the deployment source of truth instead of reverse-engineering the current host state.
|
||||||
|
- Preventive action: before updating the managed-portal stack on a target host, inspect `install-managed-portal-*.sh`, `release-manifest.env`, and the bundled `docker-compose.ota-release.yml` under `/var/www/html/ai_deploy`.
|
||||||
|
|||||||
@@ -2,45 +2,54 @@
|
|||||||
|
|
||||||
## Checklist
|
## Checklist
|
||||||
|
|
||||||
- [ ] Reuse the already-published `managed-portal-20260519-f3f40b5-11.zip` main bundle and cut updated installer/weights artifacts for the same tag.
|
- [x] Inspect the published managed-portal installer and release manifest under `root@10.8.0.1:/var/www/html/ai_deploy`.
|
||||||
- [ ] Publish the updated installer and separate weights archive to `10.8.0.1` and verify the HTTP endpoints.
|
- [x] Confirm the registry tags currently published for `managed-portal`, `managed-portal-web`, `people-flow-project`, and `store-dwell-alert`.
|
||||||
- [ ] Commit and push the repository changes for the split-weights Ubuntu installer flow.
|
- [x] Prepare `10.8.0.14` for an installer-aligned rollout of the four-service managed-portal stack.
|
||||||
|
- [x] Recreate the four target containers on `10.8.0.14` using the published release version and corresponding Compose layout.
|
||||||
|
- [x] Verify the running stack on `10.8.0.14` uses the published registry images and the installer-managed Compose project.
|
||||||
|
|
||||||
## Scope And Risks
|
## Scope And Risks
|
||||||
|
|
||||||
- Scope: keep the existing OTA application ZIP for `20260519-f3f40b5-11`, generate a refreshed installer plus separate people-flow weights archive for that same release tag, publish them to `10.8.0.1`, and push the supporting repository changes to Git.
|
- Scope: use the published managed-portal release artifacts on `10.8.0.1` as the source of truth for image names, tags, and Compose topology.
|
||||||
- Expected touch points: `.11` release artifacts, `/var/www/html/ai_deploy` on `10.8.0.1`, `deploy/package-managed-portal-ota.sh`, `deploy/install-managed-portal-ota.sh`, `README.md`, `.gitignore`, and task tracking files.
|
- Scope: update the four-service managed-portal group on `10.8.0.14`: `managed-portal`, `managed-portal-web`, `people-flow-project`, and `store-dwell-alert`.
|
||||||
- Risk: reusing the existing main ZIP means the installer and weights archive must remain compatible with the already-published `managed-portal-20260519-f3f40b5-11.zip`.
|
- Scope: keep unrelated stacks, especially the `iot-main` video-recognition project, untouched.
|
||||||
- Risk: the current local repository does not contain real weights payload files, so the separate weights archive may need to be generated from the `.11` host release workspace or a stable host weights directory instead of local source control.
|
- Risk: the current four containers on `10.8.0.14` are not managed by one Compose project, so installer-based recreation will conflict on fixed container names unless the old containers are replaced cleanly.
|
||||||
- Risk: the commit must exclude local artifact files and only capture the intended repo changes.
|
- Risk: the published installer seeds config, data, outputs, and weights under `/opt/managed-portal-releases`; switching to it changes the runtime paths from the current ad hoc directories.
|
||||||
|
- Risk: service recreation causes a brief interruption for the portal and both child-service APIs.
|
||||||
|
|
||||||
## Validation Intent
|
## Validation Intent
|
||||||
|
|
||||||
- Prove the refreshed installer and separate weights archive exist for tag `20260519-f3f40b5-11`.
|
- Read the published installer and `release-manifest.env` to confirm the exact release version and image references.
|
||||||
- Prove both artifacts are downloadable from `10.8.0.1/ai_deploy`.
|
- Verify the registry exposes the four target tags before rollout.
|
||||||
- Prove the Git commit/push contains only the intended repository changes.
|
- Use the installer-aligned Compose files and environment from the published bundle, not a hand-built local variant.
|
||||||
|
- Confirm the final containers are recreated from the published registry images and are running under the installer-managed release directory.
|
||||||
|
|
||||||
## Review
|
## Review
|
||||||
|
|
||||||
- Status: complete.
|
- Status: complete.
|
||||||
- Reused OTA application bundle:
|
- Result:
|
||||||
- Kept the already-published `managed-portal-20260519-f3f40b5-11.zip` as-is because the main program contents did not change.
|
- Confirmed the published managed-portal installer source of truth is `root@10.8.0.1:/var/www/html/ai_deploy/install-managed-portal-20260519-f3f40b5-11.sh`.
|
||||||
- Regenerated only the installer and the separate weights archive for the same release tag.
|
- Confirmed the published registry images in `release-manifest.env` are:
|
||||||
- `.11` artifact refresh:
|
- `ota.zhengxinshipin.com:5443/managed-portal:20260519-f3f40b5-11`
|
||||||
- Synced the updated packaging and installer scripts to `/home/xiaozheng/managed-portal`.
|
- `ota.zhengxinshipin.com:5443/managed-portal-web:20260519-f3f40b5-11`
|
||||||
- Reused `release_build/release-manifest-20260519-f3f40b5-11.env`.
|
- `ota.zhengxinshipin.com:5443/people-flow-project:20260519-f3f40b5-11`
|
||||||
- Generated `release_build/install-managed-portal-20260519-f3f40b5-11.sh`.
|
- `ota.zhengxinshipin.com:5443/store-dwell-alert:20260519-f3f40b5-11`
|
||||||
- Generated `release_build/people-flow-weights-20260519-f3f40b5-11.tar.gz` from `/home/xiaozheng/people_flow_project/weights`.
|
- Confirmed all four image tags exist in the registry on `10.8.0.14`.
|
||||||
- OTA publication result on `10.8.0.1`:
|
- Extracted the published release bundle to `/opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11`.
|
||||||
- Published `install-managed-portal-20260519-f3f40b5-11.sh`.
|
- Generated `/opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/deploy/managed-portal.runtime.env` from the published release, while keeping the host-specific data and output directories on `10.8.0.14`.
|
||||||
- Published `people-flow-weights-20260519-f3f40b5-11.tar.gz`.
|
- Replaced the ad hoc `managed-portal`, `managed-portal-web`, `people-flow-project`, and `store-dwell-alert` containers with the installer-managed Compose project under `/opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/deploy/docker-compose.ota-release.yml`.
|
||||||
- Preserved the existing `managed-portal-20260519-f3f40b5-11.zip`.
|
- Migrated `store-dwell-alert` to a schema-compatible config under `/opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/managed/store_dwell_alert/config/local.yaml` while preserving the current `192.168.0.5` RTSP source.
|
||||||
- Confirmed `install-managed-portal-latest.sh` still resolves to `install-managed-portal-20260519-f3f40b5-11.sh`.
|
- Kept the current people-flow config/output/weights directories, but removed `gpus: all` from the release Compose file because the host currently fails `nvidia-container-cli` startup with an NVML driver/library mismatch. The new image falls back to CPU at runtime and still reports healthy.
|
||||||
- Confirmed `managed-portal-latest.zip` still resolves to `managed-portal-20260519-f3f40b5-11.zip`.
|
- Verification:
|
||||||
- HTTP verification:
|
- Published release manifest from `10.8.0.1` resolves to the four `20260519-f3f40b5-11` image tags above.
|
||||||
- `http://10.8.0.1/ai_deploy/people-flow-weights-20260519-f3f40b5-11.tar.gz` returns `200 OK` with `Content-Length: 1135171626`.
|
- Registry presence checks from `10.8.0.14` succeeded for all four image tags via `docker manifest inspect`.
|
||||||
- `http://10.8.0.1/ai_deploy/install-managed-portal-20260519-f3f40b5-11.sh` returns `200 OK` with `Content-Length: 8081`.
|
- `sudo docker compose --env-file /opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/deploy/managed-portal.runtime.env -f /opt/managed-portal-releases/managed-portal-20260519-f3f40b5-11/deploy/docker-compose.ota-release.yml ps` showed:
|
||||||
- `http://10.8.0.1/ai_deploy/managed-portal-20260519-f3f40b5-11.zip` returns `200 OK` with `Content-Length: 4880`.
|
- `managed-portal` -> `ota.zhengxinshipin.com:5443/managed-portal:20260519-f3f40b5-11`, `Up`
|
||||||
- Git result:
|
- `managed-portal-web` -> `ota.zhengxinshipin.com:5443/managed-portal-web:20260519-f3f40b5-11`, `Up`
|
||||||
- Committed the repo changes as `d1c4b77` with message `Split OTA weights for Ubuntu installs`.
|
- `people-flow-project` -> `ota.zhengxinshipin.com:5443/people-flow-project:20260519-f3f40b5-11`, `Up (healthy)`
|
||||||
- Pushed `main` to `origin`.
|
- `store-dwell-alert` -> `ota.zhengxinshipin.com:5443/store-dwell-alert:20260519-f3f40b5-11`, `Up (healthy)`
|
||||||
|
- `sudo docker inspect` confirmed the four containers use the published registry image references; `people-flow-project` and `store-dwell-alert` report healthy.
|
||||||
|
- `curl -fsS http://127.0.0.1:13000` succeeded.
|
||||||
|
- `curl -fsS http://127.0.0.1:13000/api/managed-services` returned both managed services with `status: "running"`.
|
||||||
|
- `curl -fsS http://127.0.0.1:13000/api/managed-services/store_dwell_alert` returned the `192.168.0.5` RTSP source and `status: "running"`.
|
||||||
|
- `curl -fsS http://127.0.0.1:13000/api/managed-services/people_flow_project` returned the `192.168.0.4` RTSP source and `status: "running"`.
|
||||||
|
|||||||
Reference in New Issue
Block a user