Initial video AI analysis project
This commit is contained in:
719
docs/project.md
Normal file
719
docs/project.md
Normal file
@@ -0,0 +1,719 @@
|
||||
# Project Documentation
|
||||
|
||||
## Goal
|
||||
|
||||
本项目是在 `/Users/yoilun/AI-train/video-ai-analysis-poc` 中实现视频离线批处理分析 PoC。`v1.0` 已支持本地视频文件夹;`v1.1` 新增海康云存储录像下载作为视频来源,下载完成后复用现有抽帧、clip、VLM 推理和聚合流程。
|
||||
|
||||
必须支持:
|
||||
|
||||
- 选择一个本地视频文件夹。
|
||||
- 直接调用海康云存储录像下载 API 获取录像下载地址并下载视频。
|
||||
- AccessToken 通过 config 或环境变量配置,不写入测试夹具和文档样例。
|
||||
- 设备序列号和通道可配置,并支持多设备。
|
||||
- 分析时间段包含年月日,支持 `YYYY-MM-DD HH:MM:SS` 配置。
|
||||
- 海康 API 单次最多下载 1 小时,超过 1 小时的时间段必须拆成多个不超过 3600 秒的请求;默认示例使用 600 秒分片,真实 smoke 中比 3600 秒更稳定。
|
||||
- 自动发现文件夹内所有常见视频文件。
|
||||
- 对每个视频按 1 FPS 抽帧,按 10-20 秒 clip 组织输入。
|
||||
- 使用已有 4B VLM 模型能力,兼容 `memai-zhengxin-v3-20260413` 的 OpenAI-compatible vLLM 接口。
|
||||
- prompt 通过 config 调整。
|
||||
- 输出结构化 JSON/JSONL。
|
||||
- 输出中必须包含监控画面的时间轴,包括视频、clip、frame 和事件的时间定位。
|
||||
|
||||
## v1.1 Hik Cloud Storage Source
|
||||
|
||||
海康文档 `录像下载流程_1.pdf` 的“2、获取录像下载地址”定义:
|
||||
|
||||
```text
|
||||
POST https://api2.hik-cloud.com/v1/carrier/cstorage/open/play/download
|
||||
Authorization: bearer <AccessToken>
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
请求 body:
|
||||
|
||||
```json
|
||||
{
|
||||
"deviceSerial": "EXAMPLE_DEVICE_SERIAL",
|
||||
"channelNo": 1,
|
||||
"timeBegin": 1764856787,
|
||||
"timeEnd": 1764856978
|
||||
}
|
||||
```
|
||||
|
||||
成功返回 `data.url`、`actualBeginTime`、`actualEndTime`。错误码 `80430002` 包含起止时间大于 3600 秒的参数错误,错误码 `80438027` 表示起始时间内没有录像。
|
||||
|
||||
配置示例:
|
||||
|
||||
```yaml
|
||||
source:
|
||||
mode: hik_cloud # local | hik_cloud
|
||||
|
||||
hik_cloud:
|
||||
api_base_url: https://api2.hik-cloud.com
|
||||
download_path: /v1/carrier/cstorage/open/play/download
|
||||
access_token: null
|
||||
access_token_env: HIK_CLOUD_ACCESS_TOKEN
|
||||
chunk_seconds: 600
|
||||
timeout_seconds: 60
|
||||
download_timeout_seconds: 600
|
||||
devices:
|
||||
- device_serial: EXAMPLE_DEVICE_SERIAL
|
||||
channel_no: 1
|
||||
name: store-front
|
||||
time_ranges:
|
||||
- begin: "2026-02-03 09:00:00"
|
||||
end: "2026-02-03 11:30:00"
|
||||
```
|
||||
|
||||
云下载输出:
|
||||
|
||||
- `hik_cloud_download_manifest.jsonl`:每个设备/通道/时间分片的请求、实际时间、状态和错误。`--dry-run` 云模式只请求下载地址并写入 `address_ok` / failure 状态,不下载 mp4,不 probe。
|
||||
- `downloads/hik_cloud/<device_serial>/ch<channel_no>/*.mp4`:下载后供现有分析链路消费的视频文件。
|
||||
- `video_manifest.jsonl`:保留现有契约,并附加云来源元数据。
|
||||
|
||||
运行本地文件夹模式:
|
||||
|
||||
```bash
|
||||
python3 -B -m video_ai_analysis_poc.cli \
|
||||
--config config/local_batch.yaml \
|
||||
--input-dir /path/to/local/videos \
|
||||
--output-dir ./outputs/local-batch
|
||||
```
|
||||
|
||||
运行海康云存储模式时,复制配置文件并设置 `source.mode: hik_cloud`,AccessToken 优先通过环境变量提供:
|
||||
|
||||
```bash
|
||||
export HIK_CLOUD_ACCESS_TOKEN='<redacted>'
|
||||
python3 -B -m video_ai_analysis_poc.cli \
|
||||
--config /path/to/hik-cloud.yaml \
|
||||
--output-dir ./outputs/hik-cloud
|
||||
```
|
||||
|
||||
`--dry-run` 会请求海康下载地址并写 `hik_cloud_download_manifest.jsonl`,但不会下载视频文件、probe、抽帧、推理或聚合。`--until clips` 会在下载、探测、抽帧和 clip manifest 后停止;`--until inference` 会继续运行模型推理并写入 `clip_results.jsonl`。
|
||||
|
||||
真实远端 smoke 观察到同一 1 小时时间段直接按 3600 秒下载时,云端返回的 MP4 缺少 `moov` atom,`ffprobe` 无法解析;改用 600 秒分片后 6 个分片均可探测并进入抽帧。抽帧阶段会根据云下载记录的 `actual_begin/actual_end` 或 `requested_begin/requested_end` 给 FFmpeg 加输出帧数上限,避免海康 MP4 异常时间戳导致 `fps=1` 复制出过量帧。
|
||||
|
||||
海康云存储安全规则:
|
||||
|
||||
- 不提交真实 AccessToken。
|
||||
- 优先使用 `hik_cloud.access_token_env: HIK_CLOUD_ACCESS_TOKEN`。
|
||||
- 不记录 Authorization header。
|
||||
- 不持久化签名下载 URL query,例如 `sign`、`sig`、`token`、`access_token`。
|
||||
- `access_token.md` 是敏感验证文件,只能用于远端真实 smoke,不复制进文档、测试或输出样例。
|
||||
|
||||
## Directory Boundaries
|
||||
|
||||
```text
|
||||
/Users/yoilun/AI-train/video-ai-analysis-poc
|
||||
本次 PoC 项目目录,后续代码、配置、计划、文档都放这里。
|
||||
|
||||
/Users/yoilun/AI-train/zhengxin-vlm-0413
|
||||
外部模型和参考实现目录,不是本次项目目录。
|
||||
```
|
||||
|
||||
硬性边界:
|
||||
|
||||
- 不在 `zhengxin-vlm-0413` 中创建本项目文件。
|
||||
- 不修改 `zhengxin-vlm-0413/models/**`。
|
||||
- 不修改 `zhengxin-vlm-0413/service/config.yaml`、`service/config.yaml-bk`、`docker/.env`。
|
||||
- 不把参考项目真实 RTSP、Webhook、token、Cookie、密码写入本项目示例配置、测试夹具、文档或输出样例。
|
||||
- 输出目录只能是用户显式传入目录,或本项目内 `outputs/`。
|
||||
- 不覆盖用户原始视频文件。
|
||||
|
||||
## Inference Architecture Decision
|
||||
|
||||
本 PoC 明确选择:
|
||||
|
||||
```text
|
||||
OpenAI-compatible vLLM API
|
||||
```
|
||||
|
||||
不在 PoC 第一版中直接加载 PyTorch + Transformers + PEFT。原因:
|
||||
|
||||
- 用户说明测试环境已有模型。
|
||||
- 参考项目已经使用 vLLM OpenAI-compatible API。
|
||||
- 本地视频批处理的主要目标是打通工程链路,而不是重新实现模型服务。
|
||||
|
||||
配置字段固定为:
|
||||
|
||||
```yaml
|
||||
vlm:
|
||||
api_base_url: http://localhost:8679
|
||||
chat_completions_path: /v1/chat/completions
|
||||
```
|
||||
|
||||
代码拼接规则:
|
||||
|
||||
```text
|
||||
chat_url = api_base_url.rstrip("/") + chat_completions_path
|
||||
```
|
||||
|
||||
不要在配置中同时传完整 endpoint 和 base URL,避免出现 `/v1/chat/completions/v1/chat/completions` 之类的双拼路径。
|
||||
|
||||
## Target File Structure
|
||||
|
||||
```text
|
||||
video-ai-analysis-poc/
|
||||
agent.md
|
||||
task_plan.md
|
||||
findings.md
|
||||
progress.md
|
||||
memories.md
|
||||
video_ai_analysis_system_plan.md
|
||||
config/
|
||||
local_batch.yaml
|
||||
video_ai_analysis_poc/
|
||||
__init__.py
|
||||
cli.py
|
||||
config.py
|
||||
paths.py
|
||||
discovery.py
|
||||
probe.py
|
||||
ffmpeg_sampler.py
|
||||
frames.py
|
||||
clips.py
|
||||
vlm_client.py
|
||||
result_parser.py
|
||||
aggregator.py
|
||||
manifest.py
|
||||
logging_utils.py
|
||||
schemas/
|
||||
clip_result.schema.json
|
||||
video_result.schema.json
|
||||
folder_summary.schema.json
|
||||
tests/
|
||||
test_config.py
|
||||
test_discovery.py
|
||||
test_probe.py
|
||||
test_clips.py
|
||||
test_result_parser.py
|
||||
test_aggregator.py
|
||||
outputs/
|
||||
.gitkeep
|
||||
```
|
||||
|
||||
## Module Boundaries
|
||||
|
||||
### `config.py`
|
||||
|
||||
- 加载 `config/local_batch.yaml`。
|
||||
- 合并 CLI 参数覆盖项。
|
||||
- 校验必填字段、数值范围、路径安全。
|
||||
- 不访问视频、不调用 FFmpeg、不调用模型。
|
||||
|
||||
### `paths.py`
|
||||
|
||||
- 生成稳定 `video_id`、`clip_id`。
|
||||
- 生成输出目录结构。
|
||||
- 防止输出目录指向参考模型目录或覆盖输入视频目录。
|
||||
|
||||
### `discovery.py`
|
||||
|
||||
- 只负责按 `input.dir`、`recursive`、`extensions` 发现视频。
|
||||
- 输出 `video_manifest.jsonl`。
|
||||
- 不做 ffprobe,不做抽帧,不调用模型。
|
||||
|
||||
### `probe.py`
|
||||
|
||||
- 包装 `ffprobe`。
|
||||
- 输出 `duration_seconds`、`codec_name`、`width`、`height`、`fps`、`format_name`、`start_time`。
|
||||
- 损坏或不支持视频标记 `probe_failed`,记录 `last_error`,不阻塞其他视频。
|
||||
|
||||
### `ffmpeg_sampler.py`
|
||||
|
||||
- 使用 FFmpeg + NVDEC 做 1 FPS 抽帧。
|
||||
- 根据 codec 选择 `h264_cuvid` / `hevc_cuvid`。
|
||||
- 默认 `allow_cpu_fallback: false`。
|
||||
- 输出 JPEG 和 `frame_manifest.jsonl`。
|
||||
- 保存 FFmpeg stderr 摘要,作为实际使用 GPU 解码的证据。
|
||||
|
||||
### `frames.py`
|
||||
|
||||
- 计算 frame 的相对秒数和 timecode。
|
||||
- 维护 frame 文件路径、offset、timecode。
|
||||
- 优先使用可获得的 `pts_time`,否则使用抽帧序号按 FPS 推导相对时间。
|
||||
|
||||
### `clips.py`
|
||||
|
||||
- 读取 `frame_manifest.jsonl`。
|
||||
- 按 `clip.length_seconds` 和 `clip.stride_seconds` 构建 clip。
|
||||
- 从 1 FPS 帧中均匀采样 `frames_per_clip`。
|
||||
- 输出 `clip_manifest.jsonl`,必须包含参与推理的实际帧时间。
|
||||
|
||||
### `vlm_client.py`
|
||||
|
||||
- 调用 OpenAI-compatible `/v1/chat/completions`。
|
||||
- 多帧使用 `image_url`,默认 `data:image/jpeg;base64`。
|
||||
- prompt 来自 config,不硬编码。
|
||||
- 不解析业务事件,只返回 raw response、latency 和 HTTP 状态。
|
||||
- 阶段 4 实现使用 Python 标准库 `urllib`,并暴露可注入 HTTP 函数以便测试 mock;默认 URL 拼接为 `vlm.api_base_url.rstrip("/") + vlm.chat_completions_path`。
|
||||
|
||||
### `result_parser.py`
|
||||
|
||||
- 从 raw response 中提取严格 JSON。
|
||||
- 校验 `schema_version`、`events`、`screen_time`、事件枚举等字段。
|
||||
- 解析失败触发一次严格 prompt 重试。
|
||||
- 仍失败写 `parse_failed`,保留 `raw_response`。
|
||||
- 阶段 4 实现支持 raw JSON、markdown/prose 中嵌入 JSON,输出 clip 级 `monitoring_timeline`、`events`、`raw_response`、`processing` 和 `error` 字段。
|
||||
|
||||
### `aggregator.py`
|
||||
|
||||
- 消费 `video_manifest.jsonl`、`clip_manifest.jsonl` 和 `clip_results.jsonl`。
|
||||
- 聚合为 `videos/<video_id>/video_result.json` 和输出根目录下的 `folder_summary.json`。
|
||||
- 按 `merge_gap_seconds` 合并同视频、同类型、相邻时间范围接近的事件。
|
||||
- 保留事件相对时间轴、screen_time、clip evidence 和 frame evidence。
|
||||
- 统计 `parse_failed` / `inference_failed` clip 数量。
|
||||
|
||||
### `manifest.py`
|
||||
|
||||
- 负责 JSONL 读写和状态字段。
|
||||
- 支持断点续跑。
|
||||
- 每条记录包含 `status`、`retry_count`、`last_error`。
|
||||
|
||||
## Config Schema
|
||||
|
||||
`config/local_batch.yaml` 建议字段:
|
||||
|
||||
```yaml
|
||||
input:
|
||||
dir: /path/to/videos
|
||||
recursive: true
|
||||
extensions: [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"]
|
||||
|
||||
source:
|
||||
mode: local
|
||||
|
||||
output:
|
||||
dir: ./outputs/local-batch
|
||||
overwrite: false
|
||||
resume: true
|
||||
keep_frames: true
|
||||
|
||||
hik_cloud:
|
||||
api_base_url: https://api2.hik-cloud.com
|
||||
download_path: /v1/carrier/cstorage/open/play/download
|
||||
access_token: null
|
||||
access_token_env: HIK_CLOUD_ACCESS_TOKEN
|
||||
chunk_seconds: 600
|
||||
timeout_seconds: 60
|
||||
download_timeout_seconds: 600
|
||||
devices:
|
||||
- device_serial: EXAMPLE_DEVICE_SERIAL
|
||||
channel_no: 1
|
||||
name: example-device
|
||||
time_ranges:
|
||||
- begin: "2026-02-03 09:00:00"
|
||||
end: "2026-02-03 10:00:00"
|
||||
|
||||
ffprobe:
|
||||
timeout_seconds: 30
|
||||
|
||||
ffmpeg:
|
||||
prefer_nvdec: true
|
||||
allow_cpu_fallback: false
|
||||
hwaccel: cuda
|
||||
codec_decoders:
|
||||
h264: h264_cuvid
|
||||
hevc: hevc_cuvid
|
||||
frame_fps: 1
|
||||
frame_width: 640
|
||||
jpeg_quality: 4
|
||||
timeout_seconds_per_video: 3600
|
||||
|
||||
clip:
|
||||
length_seconds: 10
|
||||
stride_seconds: 10
|
||||
frames_per_clip: 8
|
||||
min_frames_per_clip: 4
|
||||
|
||||
vlm:
|
||||
api_base_url: http://localhost:8679
|
||||
chat_completions_path: /v1/chat/completions
|
||||
model: memai-zhengxin-v3-20260413
|
||||
timeout_seconds: 120
|
||||
max_tokens: 512
|
||||
temperature: 0
|
||||
batch_size: 1
|
||||
image_transport: data_uri
|
||||
retries: 1
|
||||
|
||||
prompt:
|
||||
system: "You are a store video analysis assistant. Return strict JSON only."
|
||||
user: "Analyze this clip. Return events and screen_time. If no event, return events: []."
|
||||
|
||||
schema:
|
||||
version: local-batch-v1
|
||||
event_types:
|
||||
- customer_enter
|
||||
- customer_leave
|
||||
- queue_detected
|
||||
- staff_absent
|
||||
- staff_present
|
||||
- area_crowded
|
||||
- abnormal_behavior
|
||||
- unknown
|
||||
require_strict_json: true
|
||||
parse_retry: 1
|
||||
merge_gap_seconds: 30
|
||||
|
||||
runtime:
|
||||
timezone: Asia/Shanghai
|
||||
log_level: INFO
|
||||
```
|
||||
|
||||
## File Contracts
|
||||
|
||||
### `video_manifest.jsonl`
|
||||
|
||||
One line per discovered video:
|
||||
|
||||
```json
|
||||
{
|
||||
"video_id": "stable_hash_or_slug",
|
||||
"source_path": "/path/to/video.mp4",
|
||||
"status": "pending",
|
||||
"probe": null,
|
||||
"retry_count": 0,
|
||||
"last_error": null
|
||||
}
|
||||
```
|
||||
|
||||
### `frame_manifest.jsonl`
|
||||
|
||||
One line per sampled frame:
|
||||
|
||||
```json
|
||||
{
|
||||
"video_id": "stable_hash_or_slug",
|
||||
"frame_id": "stable_hash_or_slug_f000120",
|
||||
"frame_path": "frames/stable_hash_or_slug/000120.jpg",
|
||||
"offset_seconds": 120.0,
|
||||
"timecode": "00:02:00",
|
||||
"pts_time": 120.0,
|
||||
"status": "sampled"
|
||||
}
|
||||
```
|
||||
|
||||
### `clip_manifest.jsonl`
|
||||
|
||||
One line per clip:
|
||||
|
||||
```json
|
||||
{
|
||||
"video_id": "stable_hash_or_slug",
|
||||
"clip_id": "stable_hash_or_slug_c000012",
|
||||
"clip_start_seconds": 120.0,
|
||||
"clip_end_seconds": 130.0,
|
||||
"clip_start_timecode": "00:02:00",
|
||||
"clip_end_timecode": "00:02:10",
|
||||
"frame_times": [
|
||||
{
|
||||
"frame_path": "frames/stable_hash_or_slug/000120.jpg",
|
||||
"offset_seconds": 120.0,
|
||||
"timecode": "00:02:00"
|
||||
}
|
||||
],
|
||||
"status": "pending",
|
||||
"retry_count": 0,
|
||||
"last_error": null
|
||||
}
|
||||
```
|
||||
|
||||
### `clip_results.jsonl`
|
||||
|
||||
One line per inferred clip:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": "local-batch-v1",
|
||||
"video_id": "stable_hash_or_slug",
|
||||
"video_path": "/path/to/video.mp4",
|
||||
"clip_id": "stable_hash_or_slug_c000012",
|
||||
"status": "ok",
|
||||
"monitoring_timeline": {
|
||||
"timezone": "Asia/Shanghai",
|
||||
"video_start_time": null,
|
||||
"clip_start_seconds": 120.0,
|
||||
"clip_end_seconds": 130.0,
|
||||
"clip_start_timecode": "00:02:00",
|
||||
"clip_end_timecode": "00:02:10",
|
||||
"frame_times": [
|
||||
{
|
||||
"frame_path": "frames/stable_hash_or_slug/000120.jpg",
|
||||
"offset_seconds": 120.0,
|
||||
"timecode": "00:02:00"
|
||||
}
|
||||
],
|
||||
"screen_time": "2026-06-14 12:31:20"
|
||||
},
|
||||
"events": [
|
||||
{
|
||||
"event_type": "queue_detected",
|
||||
"start_time": null,
|
||||
"end_time": null,
|
||||
"start_offset_seconds": 120.0,
|
||||
"end_offset_seconds": 130.0,
|
||||
"confidence": 0.86,
|
||||
"severity": "medium",
|
||||
"attributes": {},
|
||||
"evidence": {
|
||||
"clip_id": "stable_hash_or_slug_c000012",
|
||||
"frame_paths": ["frames/stable_hash_or_slug/000120.jpg"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"raw_response": null,
|
||||
"processing": {
|
||||
"started_at": "2026-06-15T10:00:00+08:00",
|
||||
"finished_at": "2026-06-15T10:00:02+08:00",
|
||||
"latency_ms": 1800
|
||||
},
|
||||
"error": null
|
||||
}
|
||||
```
|
||||
|
||||
### `video_result.json`
|
||||
|
||||
Written to:
|
||||
|
||||
```text
|
||||
videos/<video_id>/video_result.json
|
||||
```
|
||||
|
||||
Required top-level fields:
|
||||
|
||||
```text
|
||||
schema_version
|
||||
video_id
|
||||
video_path
|
||||
probe
|
||||
monitoring_timeline.video_start_time
|
||||
monitoring_timeline.video_duration_seconds
|
||||
clip_count
|
||||
failed_clip_count
|
||||
event_counts
|
||||
events
|
||||
outputs.clip_results_jsonl
|
||||
processing
|
||||
```
|
||||
|
||||
### `folder_summary.json`
|
||||
|
||||
Required top-level fields:
|
||||
|
||||
```text
|
||||
schema_version
|
||||
input_dir
|
||||
video_count
|
||||
processed_video_count
|
||||
failed_video_count
|
||||
event_counts
|
||||
videos
|
||||
processing
|
||||
```
|
||||
|
||||
## Timeline Rules
|
||||
|
||||
时间轴必须区分三类时间:
|
||||
|
||||
- 视频相对时间:`offset_seconds`、`timecode`。
|
||||
- 画面 OCR 时间:`screen_time` 或模型输出里的 `画面时间`。
|
||||
- 处理时间:`processing.started_at`、`processing.finished_at`。
|
||||
|
||||
本地视频没有可靠业务开始时间时:
|
||||
|
||||
- `video_start_time` 必须为 `null`。
|
||||
- 不允许伪造绝对时间。
|
||||
- 事件必须保留 `start_offset_seconds` 和 `end_offset_seconds`。
|
||||
|
||||
参与推理的实际帧时间必须写入 `frame_times`。不能只写 clip 起止时间。
|
||||
|
||||
## Reference Code Usage
|
||||
|
||||
可以参考:
|
||||
|
||||
- `zhengxin-vlm-0413/shared/vlm_client.py` 的 OpenAI-compatible payload 结构。
|
||||
- `zhengxin-vlm-0413/shared/frame_utils.py` 的 base64 data URI 处理方式。
|
||||
- `zhengxin-vlm-0413/service/config.yaml` 的 prompt 配置风格。
|
||||
|
||||
不能直接复用为核心实现:
|
||||
|
||||
- `frame_utils.extract_frames_from_video`,因为它是整段均匀抽 8 帧,不满足 1 FPS、clip manifest、时间轴要求。
|
||||
- `vlm_client.extract_action`,因为它只解析 `Action`,不能覆盖本项目完整事件和时间轴 schema。
|
||||
- `rtsp_service.py` 主循环,因为它服务实时 RTSP,不适合离线文件夹批处理。
|
||||
|
||||
## Validation Matrix
|
||||
|
||||
### Phase 1 Architecture Validation
|
||||
|
||||
阶段 1 complete 条件:
|
||||
|
||||
- `docs/project.md` 固化模块边界、文件输出契约、config schema、时间轴 schema、安全边界和验证矩阵。
|
||||
- 推理接口选择已明确为 OpenAI-compatible vLLM。
|
||||
- API URL 字段语义已固定为 `api_base_url` + `chat_completions_path`。
|
||||
- 已声明参考 `frame_utils.py` / `vlm_client.py` 哪些可借鉴、哪些不能直接复用。
|
||||
- 已列出阶段 2-6 的 smoke test 输入、命令、期望输出字段和失败判定标准。
|
||||
- 子 agent 审查结论记录到 `progress.md`。
|
||||
|
||||
### Phase 2 Validation
|
||||
|
||||
目标:本地视频发现、ffprobe、manifest、CLI 骨架。
|
||||
|
||||
命令:
|
||||
|
||||
```bash
|
||||
python3 -m py_compile video_ai_analysis_poc/*.py
|
||||
python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/videos --output-dir ./outputs/local-batch --dry-run
|
||||
```
|
||||
|
||||
期望:
|
||||
|
||||
- 生成 `video_manifest.jsonl`。
|
||||
- 损坏/不支持视频被标记失败,不阻塞其他视频。
|
||||
- 不读取或写入参考模型目录。
|
||||
|
||||
### Phase 3 Validation
|
||||
|
||||
目标:FFmpeg/NVDEC 1 FPS 抽帧和 clip 构建。
|
||||
|
||||
命令:
|
||||
|
||||
```bash
|
||||
ffmpeg -hwaccels
|
||||
ffmpeg -decoders | grep cuvid
|
||||
python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/short-videos --output-dir ./outputs/local-batch --until clips
|
||||
```
|
||||
|
||||
期望:
|
||||
|
||||
- 对一个样例视频实际运行带 `-hwaccel cuda` 和 `h264_cuvid` 或 `hevc_cuvid` 的抽帧命令。
|
||||
- 保存 FFmpeg stderr 或日志中的解码器证据。
|
||||
- 生成 `frame_manifest.jsonl` 和 `clip_manifest.jsonl`。
|
||||
- `clip_manifest.jsonl` 包含 `frame_times`。
|
||||
|
||||
### Phase 4 Validation
|
||||
|
||||
目标:vLLM OpenAI-compatible API、prompt 配置、JSON 解析重试。
|
||||
|
||||
命令:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8679/v1/models
|
||||
python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/short-videos --output-dir ./outputs/local-batch --until inference --limit-clips 3
|
||||
```
|
||||
|
||||
期望:
|
||||
|
||||
- prompt 从 config 读取。
|
||||
- 请求 URL 使用 `api_base_url + chat_completions_path`。
|
||||
- 生成 `clip_results.jsonl`。
|
||||
- 每条结果包含 `monitoring_timeline.frame_times` 和 `screen_time` 字段。
|
||||
|
||||
### Phase 5 Validation
|
||||
|
||||
目标:clip/video/folder 聚合和 schema 校验。
|
||||
|
||||
命令:
|
||||
|
||||
```bash
|
||||
python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/short-videos --output-dir ./outputs/local-batch
|
||||
python3 -m json.tool ./outputs/local-batch/folder_summary.json >/dev/null
|
||||
```
|
||||
|
||||
期望:
|
||||
|
||||
- 默认 CLI 运行不传 `--dry-run` 或 `--until` 时,会执行到 inference 并继续 aggregation。
|
||||
- `--until clips` 和 `--until inference` 仍停在各自阶段,不写聚合输出。
|
||||
- 生成 `videos/<video_id>/video_result.json`。
|
||||
- 生成 `folder_summary.json`。
|
||||
- 事件聚合保留相对时间轴。
|
||||
- JSON 可被标准工具解析。
|
||||
|
||||
### Phase 6 Validation
|
||||
|
||||
目标:测试环境 smoke test 与文档更新。
|
||||
|
||||
远端环境:
|
||||
|
||||
```text
|
||||
ssh xiaozheng@192.168.5.100
|
||||
/home/xiaozheng/video-ai-analysis-poc
|
||||
```
|
||||
|
||||
模型服务:
|
||||
|
||||
```bash
|
||||
ssh xiaozheng@192.168.5.100 'curl http://localhost:8679/v1/models'
|
||||
```
|
||||
|
||||
当前服务状态:
|
||||
|
||||
- 容器:`zhengxin-vllm`
|
||||
- 镜像:`vllm/vllm-openai:v0.14.1`
|
||||
- 端口:`8679`
|
||||
- 模型:`memai-zhengxin-v3-20260413`
|
||||
- 模型目录挂载:`/home/xiaozheng/zhengxin-vlm-0413/models:/models:ro`
|
||||
|
||||
远端能力验证命令:
|
||||
|
||||
```bash
|
||||
ssh xiaozheng@192.168.5.100 'nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv,noheader'
|
||||
ssh xiaozheng@192.168.5.100 'ffmpeg -hwaccels'
|
||||
ssh xiaozheng@192.168.5.100 'ffmpeg -decoders'
|
||||
```
|
||||
|
||||
已验证:
|
||||
|
||||
- GPU: `NVIDIA GeForce RTX 3080`, `20480 MiB`, driver `595.71.05`。
|
||||
- FFmpeg 6.1.1 支持 `cuda` hwaccel。
|
||||
- FFmpeg decoders 包含 `h264_cuvid` 和 `hevc_cuvid`。
|
||||
- `/v1/models` 返回模型 id `memai-zhengxin-v3-20260413`。
|
||||
- `/v1/chat/completions` 安全 quoted health check 返回 `OK`。
|
||||
|
||||
远端 smoke 输入:
|
||||
|
||||
```text
|
||||
/tmp/video-ai-analysis-poc-smoke.h1cZUR/input/sample_h264.mp4
|
||||
```
|
||||
|
||||
远端 smoke 输出:
|
||||
|
||||
```text
|
||||
/tmp/video-ai-analysis-poc-smoke.h1cZUR/output
|
||||
```
|
||||
|
||||
远端批处理命令:
|
||||
|
||||
```bash
|
||||
ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m unittest discover -s /home/xiaozheng/video-ai-analysis-poc/tests -v'
|
||||
ssh xiaozheng@192.168.5.100 'python3 -B -m compileall -q /home/xiaozheng/video-ai-analysis-poc/video_ai_analysis_poc'
|
||||
ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m video_ai_analysis_poc.cli --config /home/xiaozheng/video-ai-analysis-poc/config/local_batch.yaml --input-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/input --output-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/output --until clips'
|
||||
ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m video_ai_analysis_poc.cli --config /home/xiaozheng/video-ai-analysis-poc/config/local_batch.yaml --input-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/input --output-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/output --until inference --limit-clips 1'
|
||||
ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m video_ai_analysis_poc.cli --config /home/xiaozheng/video-ai-analysis-poc/config/local_batch.yaml --input-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/input --output-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/output'
|
||||
```
|
||||
|
||||
已验证输出:
|
||||
|
||||
- `video_manifest.jsonl`: 1 条视频记录。
|
||||
- `frame_manifest.jsonl`: 12 条 sampled frame 记录。
|
||||
- `clip_manifest.jsonl`: 1 条 clip 记录。
|
||||
- frame manifest 中持久化 `hwaccel: cuda`、`decoder: h264_cuvid`、`ffmpeg_command` 和 FFmpeg stderr 摘要。
|
||||
- `clip_results.jsonl`: 1 条记录,`status: ok`,包含 `monitoring_timeline.frame_times`。
|
||||
- `videos/<video_id>/video_result.json`: JSON 可解析,`failed_clip_count: 0`。
|
||||
- `folder_summary.json`: JSON 可解析,`video_count: 1`、`processed_video_count: 1`。
|
||||
- 本地视频没有可靠业务开始时间时,`monitoring_timeline.video_start_time` 输出 `null`;ffprobe 的 `start_time: 0.0` 只保留在 `probe`。
|
||||
|
||||
远端验证约束:
|
||||
|
||||
- 只写入明确输出目录。
|
||||
- 不覆盖远端已有模型、配置和视频。
|
||||
- 不复制真实凭据到日志或文档。
|
||||
|
||||
## Known Risks
|
||||
|
||||
- HEVC decoder 可用性已验证,但实际 smoke 只覆盖 H.264 样例视频。
|
||||
- 24 小时真实门店视频吞吐量尚未压测。
|
||||
- 海康云眸云录像/RTSP 接入仍在当前本地文件夹 PoC 范围之外。
|
||||
- 本地视频可能没有画面内时间戳,必须同时保留相对时间。
|
||||
- 模型事件质量尚未用真实门店素材验收;合成测试图没有业务事件,输出空事件是合理结果。
|
||||
- 远端 vLLM 容器当前为手工启动,不是生产级 systemd/compose 托管。
|
||||
190
docs/superpowers/plans/2026-06-16-hik-cloud-download-analysis.md
Normal file
190
docs/superpowers/plans/2026-06-16-hik-cloud-download-analysis.md
Normal file
@@ -0,0 +1,190 @@
|
||||
# Hik Cloud Download Analysis Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add Hik Cloud Storage recording download as a configurable multi-device source, then feed downloaded videos into the existing model analysis pipeline.
|
||||
|
||||
**Architecture:** Keep the current local-folder pipeline intact. Add a cloud acquisition module that plans one-hour chunks, calls the Hik download-address API, downloads videos to local output storage, records a download manifest, and returns local file records for the existing probe/frame/clip/inference/aggregate stages.
|
||||
|
||||
**Tech Stack:** Python standard library, existing `unittest` suite, existing JSONL manifest helpers, FFmpeg/vLLM pipeline already in `video_ai_analysis_poc`.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Config Schema And Time Chunking
|
||||
|
||||
**Files:**
|
||||
- Modify: `video_ai_analysis_poc/config.py`
|
||||
- Create: `video_ai_analysis_poc/hik_cloud.py`
|
||||
- Modify: `tests/test_config.py`
|
||||
- Create: `tests/test_hik_cloud.py`
|
||||
|
||||
- [ ] **Step 1: Write failing config tests**
|
||||
|
||||
Add tests that load:
|
||||
|
||||
```yaml
|
||||
source:
|
||||
mode: hik_cloud
|
||||
hik_cloud:
|
||||
access_token_env: HIK_CLOUD_ACCESS_TOKEN
|
||||
devices:
|
||||
- device_serial: EXAMPLE_DEVICE_SERIAL
|
||||
channel_no: 1
|
||||
name: front
|
||||
time_ranges:
|
||||
- begin: "2026-02-03 09:00:00"
|
||||
end: "2026-02-03 10:30:00"
|
||||
```
|
||||
|
||||
Expected: `source.mode == "hik_cloud"`, `devices` is a list of dicts, and `time_ranges` is a list of dicts.
|
||||
|
||||
- [ ] **Step 2: Write failing chunk tests**
|
||||
|
||||
Test that `build_download_chunks(...)` converts the range above into chunks with `timeEnd - timeBegin <= 3600`.
|
||||
|
||||
- [ ] **Step 3: Run red tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python3 -B -m unittest tests.test_config tests.test_hik_cloud -v
|
||||
```
|
||||
|
||||
Expected: fail because list-of-mapping parsing and `hik_cloud.py` do not exist yet.
|
||||
|
||||
- [ ] **Step 4: Implement minimal parser/defaults/chunking**
|
||||
|
||||
Extend the simple YAML parser only enough for list items shaped as mappings. Add defaults for `source` and `hik_cloud`. Implement date-time parsing with `zoneinfo.ZoneInfo`.
|
||||
|
||||
- [ ] **Step 5: Run green tests**
|
||||
|
||||
Run the same unittest command. Expected: pass.
|
||||
|
||||
### Task 2: Hik Download Address API Client
|
||||
|
||||
**Files:**
|
||||
- Modify: `video_ai_analysis_poc/hik_cloud.py`
|
||||
- Modify: `tests/test_hik_cloud.py`
|
||||
|
||||
- [ ] **Step 1: Write failing API client tests**
|
||||
|
||||
Mock the HTTP function and verify:
|
||||
|
||||
- URL is `api_base_url.rstrip("/") + download_path`.
|
||||
- Headers include `Authorization: bearer TOKEN`.
|
||||
- JSON body includes `deviceSerial`, `channelNo`, `timeBegin`, `timeEnd`.
|
||||
- Success returns URL and actual begin/end.
|
||||
- Code `80438027` returns a structured `no_recording` result.
|
||||
- Other non-zero codes return `address_failed`.
|
||||
|
||||
- [ ] **Step 2: Run red tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python3 -B -m unittest tests.test_hik_cloud -v
|
||||
```
|
||||
|
||||
Expected: fail because the client is missing.
|
||||
|
||||
- [ ] **Step 3: Implement client**
|
||||
|
||||
Use `urllib.request` and injectable callables for tests. Do not log or persist the token.
|
||||
|
||||
- [ ] **Step 4: Run green tests**
|
||||
|
||||
Run the same command. Expected: pass.
|
||||
|
||||
### Task 3: Download Files And Manifest
|
||||
|
||||
**Files:**
|
||||
- Modify: `video_ai_analysis_poc/hik_cloud.py`
|
||||
- Modify: `video_ai_analysis_poc/paths.py`
|
||||
- Modify: `tests/test_hik_cloud.py`
|
||||
|
||||
- [ ] **Step 1: Write failing downloader tests**
|
||||
|
||||
Mock address results and download bytes. Verify downloaded files are written under `downloads/hik_cloud/<device>/ch<channel>/`, filenames contain requested timestamps, manifest rows are written, token/query signatures are not in filenames, and resume skips already downloaded files.
|
||||
|
||||
- [ ] **Step 2: Run red tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python3 -B -m unittest tests.test_hik_cloud -v
|
||||
```
|
||||
|
||||
Expected: fail because downloader/manifest behavior is missing.
|
||||
|
||||
- [ ] **Step 3: Implement downloader**
|
||||
|
||||
Write `download_hik_cloud_recordings(config, output_dir, *, address_client=None, download_url=None)` returning downloaded video records with cloud metadata.
|
||||
|
||||
- [ ] **Step 4: Run green tests**
|
||||
|
||||
Run the same command. Expected: pass.
|
||||
|
||||
### Task 4: CLI Cloud Source Integration
|
||||
|
||||
**Files:**
|
||||
- Modify: `video_ai_analysis_poc/cli.py`
|
||||
- Modify: `tests/test_cli.py`
|
||||
|
||||
- [ ] **Step 1: Write failing CLI tests**
|
||||
|
||||
Add tests that:
|
||||
|
||||
- `source.mode: local` still uses `discover_videos`.
|
||||
- `source.mode: hik_cloud` calls the cloud downloader and probes returned downloaded paths.
|
||||
- `--dry-run` in cloud mode requests download addresses and writes the download manifest, but does not download video files, probe, call FFmpeg, call VLM, or aggregate.
|
||||
- `--until clips` in cloud mode produces video/frame/clip manifests from mocked downloaded video records.
|
||||
|
||||
- [ ] **Step 2: Run red tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python3 -B -m unittest tests.test_cli -v
|
||||
```
|
||||
|
||||
Expected: fail because CLI has no source mode branch.
|
||||
|
||||
- [ ] **Step 3: Implement CLI branch**
|
||||
|
||||
Keep local behavior unchanged. In cloud mode, call downloader before probe and carry cloud metadata into `video_manifest.jsonl`.
|
||||
|
||||
- [ ] **Step 4: Run green tests**
|
||||
|
||||
Run the same command. Expected: pass.
|
||||
|
||||
### Task 5: Docs, Example Config, And Full Verification
|
||||
|
||||
**Files:**
|
||||
- Modify: `config/local_batch.yaml`
|
||||
- Modify: `docs/project.md`
|
||||
- Modify: `findings.md`
|
||||
- Modify: `progress.md`
|
||||
- Modify: `memories.md`
|
||||
|
||||
- [ ] **Step 1: Update docs/config**
|
||||
|
||||
Add a commented or safe example for `source.mode: hik_cloud`, token env var, devices, and time ranges. Do not include a real token.
|
||||
|
||||
- [ ] **Step 2: Run full tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python3 -B -m unittest discover -s tests -v
|
||||
python3 -B -m py_compile video_ai_analysis_poc/*.py
|
||||
```
|
||||
|
||||
Expected: all pass.
|
||||
|
||||
- [ ] **Step 3: Run local mock smoke**
|
||||
|
||||
Use test mocks or a temporary local HTTP fixture to verify cloud mode can produce downloaded files and continue to `--until clips` without a real Hik token.
|
||||
|
||||
- [ ] **Step 4: Record results**
|
||||
|
||||
Update `progress.md` with commands, results, files changed, and remaining risk. Real Hik API verification is skipped until a real AccessToken/device/time range is provided.
|
||||
@@ -0,0 +1,151 @@
|
||||
# Hik Cloud Download Analysis Design
|
||||
|
||||
## Goal
|
||||
|
||||
Add Hik Cloud Storage recording download as a first-class video source for the existing video analysis pipeline. The implementation must support configurable AccessToken, multiple devices, configurable date-time ranges, one-hour API slicing, video downloads, and reuse the existing local analysis pipeline.
|
||||
|
||||
## Source Model
|
||||
|
||||
The pipeline keeps the existing local mode and adds a cloud mode:
|
||||
|
||||
```yaml
|
||||
source:
|
||||
mode: local # local | hik_cloud
|
||||
```
|
||||
|
||||
`local` keeps the current folder discovery behavior. `hik_cloud` runs a download stage first, then analyzes the downloaded files exactly like local files.
|
||||
|
||||
## Hik Cloud Configuration
|
||||
|
||||
The config should allow a literal token for controlled testing and an environment variable for normal use:
|
||||
|
||||
```yaml
|
||||
hik_cloud:
|
||||
api_base_url: https://api2.hik-cloud.com
|
||||
download_path: /v1/carrier/cstorage/open/play/download
|
||||
access_token: null
|
||||
access_token_env: HIK_CLOUD_ACCESS_TOKEN
|
||||
chunk_seconds: 600
|
||||
timeout_seconds: 60
|
||||
download_timeout_seconds: 600
|
||||
devices:
|
||||
- device_serial: EXAMPLE_DEVICE_SERIAL
|
||||
channel_no: 1
|
||||
name: store-front
|
||||
time_ranges:
|
||||
- begin: "2026-02-03 09:00:00"
|
||||
end: "2026-02-03 11:30:00"
|
||||
```
|
||||
|
||||
The implementation must not print or persist the token. Manifest entries may record the API URL path, device serial, channel, requested times, actual times, and status, but not the Authorization header.
|
||||
|
||||
## Time Handling
|
||||
|
||||
The user-facing time range includes year, month, day, hour, minute, and second. The config supports both `YYYY-MM-DD HH:MM:SS` strings and integer epoch seconds. String parsing uses `runtime.timezone`, defaulting to `Asia/Shanghai`, and converts to Unix seconds for `timeBegin` and `timeEnd`.
|
||||
|
||||
Ranges are split into chunks with `end - begin <= 3600` because the PDF documents error `80430002` when the requested interval exceeds 3600 seconds. The example default uses 600 seconds because real remote smoke found that shorter chunks produced valid, probeable MP4 files for the provided test range.
|
||||
|
||||
## API Contract
|
||||
|
||||
Use the PDF section “2、获取录像下载地址”:
|
||||
|
||||
```text
|
||||
POST https://api2.hik-cloud.com/v1/carrier/cstorage/open/play/download
|
||||
Authorization: bearer <AccessToken>
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
Request body:
|
||||
|
||||
```json
|
||||
{
|
||||
"deviceSerial": "EXAMPLE_DEVICE_SERIAL",
|
||||
"channelNo": 1,
|
||||
"timeBegin": 1764856787,
|
||||
"timeEnd": 1764856978
|
||||
}
|
||||
```
|
||||
|
||||
Successful response:
|
||||
|
||||
```json
|
||||
{
|
||||
"code": 0,
|
||||
"data": {
|
||||
"url": "https://...",
|
||||
"actualBeginTime": "1764856787",
|
||||
"actualEndTime": "1764856978"
|
||||
},
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
|
||||
Non-zero codes become structured failures. `80438027` is treated as `no_recording` so one empty chunk does not stop the batch.
|
||||
|
||||
## Output Contract
|
||||
|
||||
Cloud downloads write a dedicated manifest:
|
||||
|
||||
```text
|
||||
<output.dir>/hik_cloud_download_manifest.jsonl
|
||||
```
|
||||
|
||||
Each row contains:
|
||||
|
||||
- `source: hik_cloud`
|
||||
- `device_serial`
|
||||
- `channel_no`
|
||||
- `requested_begin`, `requested_end`
|
||||
- `actual_begin`, `actual_end`
|
||||
- `download_url_host` or no URL at all if avoiding host persistence is preferred
|
||||
- `path` for downloaded video
|
||||
- `status`: `address_ok`, `downloaded`, `no_recording`, `address_failed`, `download_failed`
|
||||
- `retry_count`, `last_error`
|
||||
|
||||
Downloaded videos go under:
|
||||
|
||||
```text
|
||||
<output.dir>/downloads/hik_cloud/<device_serial>/ch<channel_no>/
|
||||
```
|
||||
|
||||
Filenames use device/channel/requested timestamps and never include URL query signatures or tokens.
|
||||
|
||||
## Pipeline Integration
|
||||
|
||||
`cli.py` should branch only at source acquisition:
|
||||
|
||||
```text
|
||||
local mode:
|
||||
discover local videos -> probe -> frames -> clips -> inference -> aggregate
|
||||
|
||||
hik_cloud mode:
|
||||
build chunks -> request download URLs -> download videos -> probe -> frames -> clips -> inference -> aggregate
|
||||
```
|
||||
|
||||
After downloads complete, the rest of the pipeline should consume downloaded file paths and preserve cloud metadata in `video_manifest.jsonl`.
|
||||
|
||||
FFmpeg sampling caps output frames from the requested/actual cloud chunk duration. This prevents malformed or irregular Hik MP4 timestamps from making the `fps=1` filter duplicate tens of thousands of frames for a 10-minute chunk.
|
||||
|
||||
Cloud `--dry-run` stops at download-address planning: it requests addresses and writes `hik_cloud_download_manifest.jsonl`, but does not download video files, run ffprobe, sample frames, infer, or aggregate.
|
||||
|
||||
## Error Handling
|
||||
|
||||
- Missing token: fail fast with a clear config error in `hik_cloud` mode.
|
||||
- Invalid range: fail fast if `end <= begin`.
|
||||
- API code 80438027: record `no_recording`, continue.
|
||||
- Other API non-zero code: record `address_failed`, continue other chunks.
|
||||
- Download HTTP/IO failure: record `download_failed`, continue other chunks.
|
||||
- Existing downloaded file with manifest status `downloaded`: skip on resume.
|
||||
|
||||
## Testing
|
||||
|
||||
Use TDD with standard-library mocks:
|
||||
|
||||
- config parser loads `devices` as list of dicts.
|
||||
- time parser accepts date-time strings and epoch integers.
|
||||
- splitter produces max-3600-second chunks.
|
||||
- API client builds correct URL, body, bearer header, and parses success/failure.
|
||||
- downloader writes bytes and manifest without persisting token.
|
||||
- CLI cloud mode uses downloaded files and keeps local mode unchanged.
|
||||
|
||||
Real Hik API smoke uses the sensitive `access_token.md` file provided by the user on the remote test environment. Do not copy values from that file into docs, tests, logs, or final responses.
|
||||
Reference in New Issue
Block a user