feat: add deployment configuration and scripts for managed-portal, including Dockerfiles and environment settings

This commit is contained in:
2026-05-13 16:49:21 +08:00
parent 330373b8f1
commit f8a6d9803d
13 changed files with 563 additions and 71 deletions

View File

@@ -13,3 +13,7 @@
- Trigger: the user clarified that this repository is meant to run in mainland China environments.
- Rule: future code, build, deployment, and integration changes must consider mainland China network accessibility and should prefer China-friendly defaults where practical.
- Preventive action: when adding dependencies, mirrors, external endpoints, or download flows, explicitly check whether the default path works reliably in mainland China and add configuration or fallback when needed.
- Trigger: the user required deployment to use `docker compose` only and explicitly disallowed host environment changes.
- Rule: for remote rollout tasks in this repo, prefer repository-contained `docker compose` changes and do not install packages, edit host configs, or mutate global environment state unless the user explicitly approves it.
- Preventive action: when a deployment is blocked, first fix Dockerfiles, compose files, env files, and mounted paths inside the repo before considering any host-level workaround.

View File

@@ -2,44 +2,64 @@
## Checklist
- [x] Confirm the changed `people_flow_project` slice is locally validated before deploy.
- [x] Verify the plan covers remote sync, service rebuild, health verification, and post-deploy output inspection.
- [x] Sync the updated `people_flow_project` runtime files to `10.8.0.11` and verify remote hashes.
- [x] Rebuild and restart only the `people-flow-project` service on the remote host.
- [x] Verify the remote container is healthy after deployment.
- [x] Print the actual new output structure from the deployed remote code path and note any limitation versus waiting for the next live half-hour webhook.
- [x] Record deployment and verification evidence in the Review section.
- [x] Audit the current `.11` deployment state, image tags, and runtime container diffs.
- [x] Identify the minimal release payload: pushed images, compose/env/config assets, weights, and runtime-added files not present in the base images.
- [x] Push the `.11` images to `ota.zhengxinshipin.com:5443` with stable release tags.
- [x] Build a ZIP bundle containing compose files and all required non-image runtime assets.
- [x] Publish the ZIP bundle and an install script under `/var/www/html/ai_deploy` on `10.8.0.1`.
- [x] Verify the published artifacts are downloadable and the install flow is internally consistent.
## Scope And Risks
- Scope: deploy the `people_flow_project` output-label changes to `10.8.0.11` and inspect the newly available output structure from the remote deployed code.
- Expected touch points: `managed/people_flow_project/src/people_flow/queue_analytics.py`, `managed/people_flow_project/src/people_flow/manage_api.py`, remote deployment under `/home/xiaozheng/managed-portal`, and the `people-flow-project` docker compose service.
- Risk: the currently saved live webhook/window JSON files on the remote host will not gain the new label fields until the next real half-hour window is emitted after restart, so immediate inspection may need to use a direct code-path sample or manage API response rather than a freshly emitted live webhook file.
- Risk: restarting `people-flow-project` resets the current rolling half-hour window boundary; that is acceptable for deployment but should be stated explicitly.
- Scope: publish the current managed-portal deployment that is running on `10.8.0.11` by pushing its images to `ota.zhengxinshipin.com:5443`, generating a downloadable install script on `10.8.0.1`, and uploading a ZIP bundle with compose/runtime assets required for the stack to run correctly elsewhere.
- Expected touch points: remote Docker images on `.11`, runtime asset directories under `managed/`, deployment compose/env files under `deploy/`, and installer artifacts on `/var/www/html/ai_deploy` on `10.8.0.1`.
- Risk: the running `.11` containers use local `:dev` images and also contain runtime-added files such as `lap` inside `people-flow-project`; pushing only the local images will not fully reproduce the running state unless those extras are separately bundled or the install path reapplies them.
- Risk: required assets may live outside the image as mounted files, especially configs, outputs, weights, and managed data. Missing any of these will produce an install that starts but does not behave like `.11`.
- Risk: registry push may require credentials that are not currently cached for user `xiaozheng`; confirm push access before finalizing the artifact layout.
## Validation Intent
- Verify remote file parity before rebuilding.
- Check container health and startup logs after deployment.
- Print an actual structure from the deployed remote code path immediately, and distinguish it from the next live webhook file that will only appear after the next rollover.
- Prove the exact `.11` images were retagged and pushed to `ota.zhengxinshipin.com:5443`.
- Prove the ZIP bundle includes compose/env/config/runtime assets needed by the current `.11` deployment.
- Prove the install script on `10.8.0.1` references the published URLs, downloads the ZIP, unpacks it, and pulls the registry images expected by the compose file.
## Review
- Status: completed.
- Result: the updated `people_flow_project` code is deployed on `10.8.0.11`, the rebuilt `people-flow-project` container is healthy, and the deployed remote code path now exposes the new human-readable queue level and change labels. The currently saved live window/webhook files were generated before the next post-restart half-hour rollover, so the most immediate proof comes from the deployed manage API response and a direct runtime-code simulation inside the container.
- Result: published the current `.11` managed-portal stack as release `20260513-330373b-11`, including pushed registry images, a runtime-asset ZIP, and an install script under `/var/www/html/ai_deploy` on `10.8.0.1`.
- Release payload:
- Registry images pushed to `ota.zhengxinshipin.com:5443`:
- `managed-portal:20260513-330373b-11`
- `managed-portal-web:20260513-330373b-11`
- `people-flow-project:20260513-330373b-11`
- `store-dwell-alert:20260513-330373b-11`
- ZIP bundle: `/var/www/html/ai_deploy/managed-portal-20260513-330373b-11.zip`
- Installer script: `/var/www/html/ai_deploy/install-managed-portal-20260513-330373b-11.sh`
- Latest symlinks:
- `/var/www/html/ai_deploy/managed-portal-latest.zip`
- `/var/www/html/ai_deploy/install-managed-portal-latest.sh`
- ZIP contents include:
- `deploy/docker-compose.yml`
- `deploy/docker-compose.ota-release.yml`
- `deploy/managed-portal.release.env`
- `deploy/Dockerfile.runtime-overlay`
- `managed_services.yaml`
- mounted runtime assets from `.11`: people-flow config/outputs/weights and store-dwell config/data
- runtime overlays extracted from running containers for `lap` in both Python services and `/app/logs/events.jsonl` from `store-dwell-alert`
- Verification:
- synced `managed/people_flow_project/src/people_flow/queue_analytics.py` and `managed/people_flow_project/src/people_flow/manage_api.py` to `/home/xiaozheng/managed-portal/managed/people_flow_project/src/people_flow/` on `10.8.0.11` and verified SHA256 parity with local files:
- `queue_analytics.py`: `dd12c0a7af2d7c1bf68e3496560fe2ea0fb5c1d582bea7c4dada0caf105711c8`
- `manage_api.py`: `c723fd570a29b43cd055dfaca4a5fc9ce1459b55754d2dbd0b8edcdef7da4cf1`
- rebuilt and restarted only `people-flow-project` with `docker compose --env-file managed-portal.10.8.0.11.env up -d --build people-flow-project` on the remote host;
- confirmed remote status after deploy: `people-flow-project` is `Up` and `healthy`;
- queried the deployed manage API summary endpoint inside the container and observed these actual metrics keys/values from the live response: `{ "queue_level": "normal", "queue_level_label": "人数正常", "previous_queue_level": "few", "previous_queue_level_label": "人少", "status_change": "queue_normalized", "status_change_label": "人数变正常" }`;
- executed a direct simulation inside the deployed container using the updated `QueueWindowTracker` code path and printed the actual new `queue_metrics` JSON:
- `queue_level`: `crowded`
- `queue_level_label`: `人多`
- `previous_queue_level`: `null`
- `previous_queue_level_label`: `""`
- `status_change`: `initial`
- `status_change_label`: `初始`
- plus the existing `queue_time_threshold_seconds`, `over_threshold_count`, `under_threshold_count`, and `people[]` fields;
- noted deployment side effect: restarting `people-flow-project` resets the current rolling 1800-second window, so the next real live `half_hour_report` file/webhook emitted after this restart will be the first persisted artifact that contains the new label fields.
- Registry push succeeded for all four images. Observed repo digests:
- `managed-portal@sha256:589f699edce8271c80516030eae81abed95d8e62804976955eb86bf211d98f4e`
- `managed-portal-web@sha256:f2e99c4745a3c16118a74084585f0a455e4f5295d9eb4cbabf2689b841966d9b`
- `people-flow-project@sha256:963ecd41ee8a3f986c581b5330ce7163614571427711d524b936f05c3e84ec96`
- `store-dwell-alert@sha256:d324cb2653ef25f6984a12b0cfa92064bf2c86b2946462001d14d254818d243d`
- Source and published ZIP sizes match exactly: `1261636056` bytes on `.11` and `.1`.
- HTTP validation succeeded:
- `http://10.8.0.1/ai_deploy/managed-portal-20260513-330373b-11.zip` => `200 OK`, `Content-Length: 1261636056`
- `http://10.8.0.1/ai_deploy/install-managed-portal-20260513-330373b-11.sh` => `200 OK`
- ZIP content validation succeeded both at the source and after upload, including `release-manifest.env`, `deploy/docker-compose.ota-release.yml`, and runtime overlay files under `runtime-overlays/.../lap/...`.
- Local release asset validation passed:
- `sh -n deploy/install-managed-portal-ota.sh`
- compose config expansion for `deploy/docker-compose.ota-release.yml` with the `.11` env file and placeholder image refs
- Residual risk:
- The published installer was validated for syntax and asset consistency, but it was not executed end-to-end on a fresh target host in this task.
- The bundle intentionally excludes ephemeral `/tmp`, `/run`, and NVIDIA runtime-injected host libraries; reproducing GPU runtime behavior still depends on the target host having a working NVIDIA container runtime when `gpus: all` is used.