Initial video AI analysis project

This commit is contained in:
yangyl
2026-06-17 11:33:54 +08:00
commit ef0047af6d
35 changed files with 8613 additions and 0 deletions

37
.gitignore vendored Normal file
View File

@@ -0,0 +1,37 @@
# Secrets and local credentials
access_token.md
.env
.env.*
*.pem
*.key
# Runtime inputs and generated outputs
outputs/
videos/
downloads/
frames/
codex_records/
# Agent working notes, not project source
findings.md
memories.md
progress.md
task_plan.md
# Python caches and test artifacts
__pycache__/
*.py[cod]
.pytest_cache/
.coverage
htmlcov/
# Local indexes, editor, and OS files
.codegraph/
.DS_Store
.idea/
.vscode/
# Logs and temporary files
*.log
*.pid
*.tmp

250
agent.md Normal file
View File

@@ -0,0 +1,250 @@
# Video AI Analysis PoC Agent Instructions
本文件约束后续 AI 在 `/Users/yoilun/AI-train/video-ai-analysis-poc` 中的开发、审查、测试和文档维护行为。任何业务代码修改前必须先阅读并遵守本文件。
## Repository Snapshot
- 项目名称:`video-ai-analysis-poc`
- 项目目录:`/Users/yoilun/AI-train/video-ai-analysis-poc`
- 项目目标:实现本地视频文件夹离线批处理分析 PoC。
- 外部模型/参考实现目录:`/Users/yoilun/AI-train/zhengxin-vlm-0413`
- 参考 VLM 模型:`/Users/yoilun/AI-train/zhengxin-vlm-0413/models/memai-zhengxin-v3-20260413`
- 测试环境:`ssh xiaozheng@192.168.5.100`,用户说明该环境已有模型。
- 运行目标Ubuntu 24单机 NVIDIA RTX 3080 20GB离线批处理优先吞吐而非低延迟。
## Hard Directory Boundary
`video-ai-analysis-poc` 是本次项目目录。后续代码、配置、计划、文档、测试和输出模板都必须放在这里。
`zhengxin-vlm-0413` 不是本次项目目录,只能作为:
- 已有模型目录。
- 参考实现目录。
- VLM API、prompt、输出解析、部署方式的参考。
默认禁止在 `zhengxin-vlm-0413` 中创建本项目文件或修改业务代码。特别禁止改动、移动、删除或复制:
- `zhengxin-vlm-0413/models/**`
- `zhengxin-vlm-0413/service/config.yaml`
- `zhengxin-vlm-0413/service/config.yaml-bk`
- `zhengxin-vlm-0413/docker/.env`
如果后续确实需要从参考项目复制逻辑,必须复制到本项目目录内,并注明来源和差异。
## Repository Map
当前项目文件:
- `video_ai_analysis_system_plan.md`:前期系统实施方案。
- `agent.md`:本文件,约束后续 AI 工作。
- `task_plan.md`:本次 goal 的阶段计划。
- `findings.md`:代码阅读、约束、关键发现。
- `progress.md`执行记录、测试结果、bug 循环。
- `docs/project.md`:项目目标、架构、配置、运行方式和风险。
- `memories.md`:主 agent 对用户要求和关键决策的长期记忆。
参考目录关键文件:
- `/Users/yoilun/AI-train/zhengxin-vlm-0413/service/rtsp_service.py`:实时 RTSP 服务入口。
- `/Users/yoilun/AI-train/zhengxin-vlm-0413/service/config.yaml`现有推理、摄像头、prompt、服务、YOLO 配置,包含敏感信息。
- `/Users/yoilun/AI-train/zhengxin-vlm-0413/shared/vlm_client.py`VLM 请求构建、OpenAI-compatible API 调用、Action 解析。
- `/Users/yoilun/AI-train/zhengxin-vlm-0413/shared/frame_utils.py`:已有本地视频抽帧辅助函数,但不满足本次完整离线批处理需求。
- `/Users/yoilun/AI-train/zhengxin-vlm-0413/docker/docker-compose.yml`vLLM 与 RTSP 服务容器编排。
## Current Workflow Batch
```text
[项目: /Users/yoilun/AI-train/video-ai-analysis-poc]
[工作流批次: v1.0 本地视频批处理PoC]
```
派发任何子 agent 时,任务首段必须包含:
```text
[项目: /Users/yoilun/AI-train/video-ai-analysis-poc]
[工作流批次: v1.0 本地视频批处理PoC]
[阶段: 阶段 x <阶段名>]
[角色: <角色名>]
[子agent名称: <从指定名单中选择>]
```
子 agent 名称必须从以下列表选择:
```text
huzenan, jiangzhiyou, linjiayu, hujiarui, wangchiheng, niwenhao,
caiziquan, yepeijun, lizheng, zhengchenda, chenruihao, yangyilun, donglele
```
## Required Workflow
### 1. Agent Rules Before Code
`agent.md` 未确定前,不允许修改业务代码。
阶段 0 允许修改:
- `agent.md`
- `task_plan.md`
- `findings.md`
- `progress.md`
- `docs/project.md`
- `memories.md`
### 2. File-Based Planning Is Mandatory
非简单任务必须维护:
- `task_plan.md`
- `findings.md`
- `progress.md`
- `docs/project.md`
- `memories.md`
每个阶段开始前,主 agent 必须读取这些文件,确认当前目标和下一步。每个阶段完成后,必须更新阶段状态、验证记录、关键文件和剩余风险。
### 3. Sub Agent Workflow
每个实现阶段至少使用:
- coding agent只实现当前阶段不处理未来阶段不做无关重构。
- testing/review agent只测试、审查、复现问题和报告 bug不直接修改代码。
如果 testing/review agent 发现 bug
1. 主 agent 将 bug report 记录到 `progress.md`
2. 主 agent 将 bug report 转发给当前阶段 coding agent。
3. coding agent 只修复报告中的问题。
4. testing/review agent 复测。
5. 同一问题最多 3 轮,仍失败则暂停并向用户报告。
### 4. TDD And Verification
新增功能或 bugfix 必须优先写测试或最小可复现验证,再写实现。无法自动化测试的 GPU/视频/环境行为,必须写清楚 smoke test 命令、输入样例和人工判定标准。
完成任何阶段前,必须有新鲜验证证据:
- 单元测试。
- CLI smoke test。
- FFmpeg 命令检查。
- vLLM health check。
- 输出 JSON schema/字段检查。
不能只根据代码阅读声称完成。
## Local Batch PoC Requirements
### 1. Input
本次 PoC 优先支持本地视频文件夹:
- 通过 CLI 参数或 config 选择输入目录。
- 递归或非递归行为必须可配置。
- 支持常见视频格式,例如 `.mp4``.mov``.mkv``.avi``.flv``.ts``.m4v`
- 不支持或损坏的视频要记录失败原因,不能阻塞整个文件夹。
### 2. Video Processing
- 必须优先使用 FFmpeg + NVDEC GPU 解码。
- 默认 1 FPS 抽帧。
- 默认 clip 长度 10 秒,允许配置 10-20 秒。
- 禁止逐帧 LLM 推理,必须 clip 级推理。
- Clip 输入帧数要小,默认 8-10 帧,避免 RTX 3080 20GB OOM。
- 输出目录保存 manifest、抽帧中间结果、clip 结果和汇总 JSON。
### 3. Prompt Configuration
- prompt 必须从 config 读取,不能硬编码在业务逻辑中。
- 支持 `prompt.system``prompt.user`
- Prompt 必须要求模型输出严格 JSON。
- Prompt 必须要求输出画面时间字段;如果画面时间不可读,要保留 clip 的视频相对时间。
### 4. Timeline Output
输出结果必须包含监控画面的时间轴。至少包含:
- `video_id` 或视频文件路径。
- `video_start_time`:如果文件或画面可识别则填写,否则为 `null`
- `clip_start_seconds`
- `clip_end_seconds`
- `clip_start_timecode`,格式如 `HH:MM:SS`
- `clip_end_timecode`,格式如 `HH:MM:SS`
- `frame_times`clip 内参与推理帧的相对秒数或 timecode。
- `screen_time``画面时间`:模型从监控画面 OCR 到的时间,无法读取则为空。
- 事件级 `start_time` / `end_time` 或对应 clip 范围。
不能只输出 `datetime.now()` 这种服务处理时间。
### 5. Model Inference
- 优先兼容 OpenAI-compatible `/v1/chat/completions`
- 默认模型名:`memai-zhengxin-v3-20260413`
- 默认配置使用 `api_base_url: http://localhost:8679``chat_completions_path: /v1/chat/completions`,由代码拼接为完整请求 URL。
- RTX 3080 20GB 上 batch 保守起步,先 batch size 1再逐步尝试 2-4。
- vLLM dtype 和显存参数要在测试环境验证;如 BF16 不稳定,优先 FP16。
## Security And Data Rules
- 不要在新文档、测试夹具或输出示例中复制真实 RTSP 密码、token、Webhook 密钥、Cookie。
- 参考项目 `service/config.yaml` 包含真实内网 RTSP URL 和密码,阅读可以,传播要脱敏。
- 本地视频、抽帧图片、模型输出可能包含门店画面,默认视为敏感数据。
- 不要把视频帧、日志、输出样例批量复制到仓库外部。
- 不要删除用户已有视频或模型文件。
## Implementation Rules
- 所有新增代码放在本项目目录内。
- 不修改参考项目实时 RTSP 主链路。
- 可参考 `shared/vlm_client.py` 的接口设计,但新实现应位于本项目。
- 不引入不必要的分布式系统。
- 不引入大型依赖解决小问题。
- 保持配置、运行命令、文档一致。
- 所有输出文件命名要稳定,支持断点续跑。
- JSON 输出必须可被机器解析;模型 raw response 可以保留,但不能作为唯一结构化结果。
## Validation Matrix
阶段 0 文档/agent 规则:
- 检查文件存在:
- `agent.md`
- `task_plan.md`
- `findings.md`
- `progress.md`
- `docs/project.md`
- `memories.md`
- 检查 `zhengxin-vlm-0413` 下没有本次误放的工作流文件。
本地批处理代码变更后:
- Python 语法检查:
- `python3 -m py_compile <changed python files>`
- 单元测试:
- 如新增 tests运行对应 `python3 -m unittest ...``pytest ...`
- FFmpeg/NVDEC 检查:
- `ffmpeg -hwaccels`
- `ffmpeg -decoders | grep cuvid`
- vLLM health check
- `curl http://localhost:8679/v1/models`
- 最小视频 smoke test
- 使用一个短视频目录运行本地批处理入口。
- 检查输出包含 clip 级时间轴和汇总 JSON。
测试环境验证:
- 通过 `ssh xiaozheng@192.168.5.100` 执行前,先确认路径、依赖和 GPU 状态。
- 远端命令要尽量只读或写入明确输出目录。
- 不要覆盖远端已有模型和配置。
## Definition Of Done
本次 PoC 完成必须满足:
1. 支持本地文件夹所有视频分析。
2. 不依赖海康云眸云存储。
3. 模型提示词可通过 config 调整。
4. 输出包含视频、clip、事件的监控时间轴。
5. 4B VLM 使用现有模型路径或测试环境已有模型。
6. 断点续跑和失败记录有基本支持。
7. 文档更新,包含运行命令、配置项和输出结构。
8. 必要验证命令已运行并记录。
9. 每个阶段的子 agent 审查结论记录在 `progress.md`

173
config/local_batch.yaml Normal file
View File

@@ -0,0 +1,173 @@
input:
dir: ./videos
recursive: true
extensions: [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"]
source:
mode: local
output:
dir: ./outputs/local-batch
overwrite: false
resume: true
keep_frames: true
hik_cloud:
api_base_url: https://api2.hik-cloud.com
download_path: /v1/carrier/cstorage/open/play/download
access_token: null
access_token_env: HIK_CLOUD_ACCESS_TOKEN
chunk_seconds: 600
timeout_seconds: 60
download_timeout_seconds: 600
devices:
- device_serial: EXAMPLE_DEVICE_SERIAL
channel_no: 1
name: example-device
time_ranges:
- begin: "2026-02-03 09:00:00"
end: "2026-02-03 10:00:00"
ffprobe:
timeout_seconds: 30
ffmpeg:
prefer_nvdec: true
allow_cpu_fallback: false
hwaccel: cuda
codec_decoders:
h264: h264_cuvid
hevc: hevc_cuvid
frame_fps: 1
frame_width: 640
jpeg_quality: 4
timeout_seconds_per_video: 3600
clip:
length_seconds: 10
stride_seconds: 10
frames_per_clip: 8
min_frames_per_clip: 4
vlm:
api_base_url: http://localhost:8679
chat_completions_path: /v1/chat/completions
model: memai-zhengxin-v3-20260413
timeout_seconds: 120
max_tokens: 512
temperature: 0
batch_size: 1
image_transport: data_uri
retries: 1
prompt:
system: >-
You are an AI quality inspector and store monitoring assistant for a fried chicken cutlet (鸡排) production line and storefront.
Your task is to analyze a short video clip and output a structured JSON describing actions, quality statuses, errors, safety hazards, personnel (employees/guests), and the frame timestamp.
All 9 top-level keys below are REQUIRED in every response. Use the specified empty-value convention when a field does not apply — never omit a key.
### 1. Action (REQUIRED)
Identify the primary action. Use the "Action_" prefix on every label except End_Frying. If no action is detected, output "Action_Idle".
Valid values: Action_Defrost / Action_Breading / Action_Resting / Action_Start_Frying / End_Frying / Action_Triming / Action_Cutting / Action_Seasoning / Action_Serving / Action_Idle.
### 2. quality_status (REQUIRED — "" if not applicable)
Choose based on the action:
- Action_Breading → fully_covered | uneven
- Action_Resting → stacked | qualified
- Action_Start_Frying / End_Frying → standard_time | early_retrieval | overcooked | double_fried
- Action_Cutting → complete_cut | linked | dusted_before_cut
- Action_Seasoning → coverage_high | missed | single_side_dusted
- Other actions → qualified
If no ingredient is visible or the action has no applicable status, output "".
### 3. error_type (REQUIRED — "" if no error)
Short description of any anomaly. Examples: "smoking", "dusted_before_cut", "single_side_dusted", "double_fried". If the operation is normal, output "".
### 4. 安全隐患 (REQUIRED — "" if no hazard)
Chinese description of any safety hazard visible in the scene (e.g., "油锅附近有易燃物"). If none, output "".
### 5. 人物位置 (REQUIRED — "" if no people)
Descriptive Chinese sentence of where people are and how they are moving. Example: "员工在油锅边". If no one is in the frame, output "".
### 6. 总结 (REQUIRED — "无" if no people)
Descriptive Chinese sentence summarizing the scene with the exact person count. Example: "员工在油锅边炸鸡,顾客在收银台前等待". If no one is in the frame, output "无".
### 7. 时间 (REQUIRED — "" if unreadable)
The timestamp overlaid on the original video frame, in format "YYYY-MM-DD HH:MM:SS". If the timestamp is not visible or cannot be read, output "".
### 8. employees (REQUIRED — [] if none)
Array of employee objects. Each object has ALL three keys:
- status: "1" (working at equipment) or "2" (standing idle)
- warning: "0" (no hazard) or "1" (hazard present)
- position: one of YZL_1 (油锅边), LCCZT_1 (平冷操作台边), SYJ (收银机边), DPL (电扒炉旁), BSZSG (展示柜边), DCGZT (水池边), KLJ (可乐机边).
If no employees are in the frame, output [].
### 9. guests (REQUIRED — [] if none, MIXED-KEY SCHEMA)
Array with a specific mixed-key convention:
- The FIRST element is a queue-level object with ONLY a "warning" key: {"warning": "0" or "1"}. "1" means the queue has ≥ 3 people; "0" means < 3.
- Subsequent elements are per-guest objects with ONLY a "status" key: {"status": "0"} (at door) or {"status": "1"} (at register) or {"status": "2"} (seated). One such object per visible guest.
If there are no guests at all, output []. If only the queue header is known, output [{"warning": "0 or 1"}].
Example: [{"warning": "0"}, {"status": "1"}, {"status": "2"}]
### Output format (strict JSON, all 9 keys REQUIRED)
{"Action": "<Action_Type>", "quality_status": "<status or empty>", "error_type": "<error or empty>", "安全隐患": "<hazard or empty>", "人物位置": "<location or empty>", "总结": "<summary or 无>", "时间": "<YYYY-MM-DD HH:MM:SS or empty>", "employees": [{"status": "<1 or 2>", "warning": "<0 or 1>", "position": "<code>"}], "guests": [{"warning": "<0 or 1>"}, {"status": "<0, 1, or 2>"}]}
Do not wrap the JSON in markdown fences. Do not add any prose before or after the JSON.
user: 'Analyze the video clip and return the required JSON with all 9 keys. Read the timestamp from the frame overlay into "时间".'
schema:
version: local-batch-v1
event_types:
- customer_enter
- customer_leave
- queue_detected
- staff_absent
- staff_present
- area_crowded
- abnormal_behavior
- unknown
require_strict_json: true
parse_retry: 1
merge_gap_seconds: 30
runtime:
timezone: Asia/Shanghai
log_level: INFO

719
docs/project.md Normal file
View File

@@ -0,0 +1,719 @@
# Project Documentation
## Goal
本项目是在 `/Users/yoilun/AI-train/video-ai-analysis-poc` 中实现视频离线批处理分析 PoC。`v1.0` 已支持本地视频文件夹;`v1.1` 新增海康云存储录像下载作为视频来源下载完成后复用现有抽帧、clip、VLM 推理和聚合流程。
必须支持:
- 选择一个本地视频文件夹。
- 直接调用海康云存储录像下载 API 获取录像下载地址并下载视频。
- AccessToken 通过 config 或环境变量配置,不写入测试夹具和文档样例。
- 设备序列号和通道可配置,并支持多设备。
- 分析时间段包含年月日,支持 `YYYY-MM-DD HH:MM:SS` 配置。
- 海康 API 单次最多下载 1 小时,超过 1 小时的时间段必须拆成多个不超过 3600 秒的请求;默认示例使用 600 秒分片,真实 smoke 中比 3600 秒更稳定。
- 自动发现文件夹内所有常见视频文件。
- 对每个视频按 1 FPS 抽帧,按 10-20 秒 clip 组织输入。
- 使用已有 4B VLM 模型能力,兼容 `memai-zhengxin-v3-20260413` 的 OpenAI-compatible vLLM 接口。
- prompt 通过 config 调整。
- 输出结构化 JSON/JSONL。
- 输出中必须包含监控画面的时间轴包括视频、clip、frame 和事件的时间定位。
## v1.1 Hik Cloud Storage Source
海康文档 `录像下载流程_1.pdf` 的“2、获取录像下载地址”定义
```text
POST https://api2.hik-cloud.com/v1/carrier/cstorage/open/play/download
Authorization: bearer <AccessToken>
Content-Type: application/json
```
请求 body
```json
{
"deviceSerial": "EXAMPLE_DEVICE_SERIAL",
"channelNo": 1,
"timeBegin": 1764856787,
"timeEnd": 1764856978
}
```
成功返回 `data.url``actualBeginTime``actualEndTime`。错误码 `80430002` 包含起止时间大于 3600 秒的参数错误,错误码 `80438027` 表示起始时间内没有录像。
配置示例:
```yaml
source:
mode: hik_cloud # local | hik_cloud
hik_cloud:
api_base_url: https://api2.hik-cloud.com
download_path: /v1/carrier/cstorage/open/play/download
access_token: null
access_token_env: HIK_CLOUD_ACCESS_TOKEN
chunk_seconds: 600
timeout_seconds: 60
download_timeout_seconds: 600
devices:
- device_serial: EXAMPLE_DEVICE_SERIAL
channel_no: 1
name: store-front
time_ranges:
- begin: "2026-02-03 09:00:00"
end: "2026-02-03 11:30:00"
```
云下载输出:
- `hik_cloud_download_manifest.jsonl`:每个设备/通道/时间分片的请求、实际时间、状态和错误。`--dry-run` 云模式只请求下载地址并写入 `address_ok` / failure 状态,不下载 mp4不 probe。
- `downloads/hik_cloud/<device_serial>/ch<channel_no>/*.mp4`:下载后供现有分析链路消费的视频文件。
- `video_manifest.jsonl`:保留现有契约,并附加云来源元数据。
运行本地文件夹模式:
```bash
python3 -B -m video_ai_analysis_poc.cli \
--config config/local_batch.yaml \
--input-dir /path/to/local/videos \
--output-dir ./outputs/local-batch
```
运行海康云存储模式时,复制配置文件并设置 `source.mode: hik_cloud`AccessToken 优先通过环境变量提供:
```bash
export HIK_CLOUD_ACCESS_TOKEN='<redacted>'
python3 -B -m video_ai_analysis_poc.cli \
--config /path/to/hik-cloud.yaml \
--output-dir ./outputs/hik-cloud
```
`--dry-run` 会请求海康下载地址并写 `hik_cloud_download_manifest.jsonl`但不会下载视频文件、probe、抽帧、推理或聚合。`--until clips` 会在下载、探测、抽帧和 clip manifest 后停止;`--until inference` 会继续运行模型推理并写入 `clip_results.jsonl`
真实远端 smoke 观察到同一 1 小时时间段直接按 3600 秒下载时,云端返回的 MP4 缺少 `moov` atom`ffprobe` 无法解析;改用 600 秒分片后 6 个分片均可探测并进入抽帧。抽帧阶段会根据云下载记录的 `actual_begin/actual_end``requested_begin/requested_end` 给 FFmpeg 加输出帧数上限,避免海康 MP4 异常时间戳导致 `fps=1` 复制出过量帧。
海康云存储安全规则:
- 不提交真实 AccessToken。
- 优先使用 `hik_cloud.access_token_env: HIK_CLOUD_ACCESS_TOKEN`
- 不记录 Authorization header。
- 不持久化签名下载 URL query例如 `sign``sig``token``access_token`
- `access_token.md` 是敏感验证文件,只能用于远端真实 smoke不复制进文档、测试或输出样例。
## Directory Boundaries
```text
/Users/yoilun/AI-train/video-ai-analysis-poc
本次 PoC 项目目录,后续代码、配置、计划、文档都放这里。
/Users/yoilun/AI-train/zhengxin-vlm-0413
外部模型和参考实现目录,不是本次项目目录。
```
硬性边界:
- 不在 `zhengxin-vlm-0413` 中创建本项目文件。
- 不修改 `zhengxin-vlm-0413/models/**`
- 不修改 `zhengxin-vlm-0413/service/config.yaml``service/config.yaml-bk``docker/.env`
- 不把参考项目真实 RTSP、Webhook、token、Cookie、密码写入本项目示例配置、测试夹具、文档或输出样例。
- 输出目录只能是用户显式传入目录,或本项目内 `outputs/`
- 不覆盖用户原始视频文件。
## Inference Architecture Decision
本 PoC 明确选择:
```text
OpenAI-compatible vLLM API
```
不在 PoC 第一版中直接加载 PyTorch + Transformers + PEFT。原因
- 用户说明测试环境已有模型。
- 参考项目已经使用 vLLM OpenAI-compatible API。
- 本地视频批处理的主要目标是打通工程链路,而不是重新实现模型服务。
配置字段固定为:
```yaml
vlm:
api_base_url: http://localhost:8679
chat_completions_path: /v1/chat/completions
```
代码拼接规则:
```text
chat_url = api_base_url.rstrip("/") + chat_completions_path
```
不要在配置中同时传完整 endpoint 和 base URL避免出现 `/v1/chat/completions/v1/chat/completions` 之类的双拼路径。
## Target File Structure
```text
video-ai-analysis-poc/
agent.md
task_plan.md
findings.md
progress.md
memories.md
video_ai_analysis_system_plan.md
config/
local_batch.yaml
video_ai_analysis_poc/
__init__.py
cli.py
config.py
paths.py
discovery.py
probe.py
ffmpeg_sampler.py
frames.py
clips.py
vlm_client.py
result_parser.py
aggregator.py
manifest.py
logging_utils.py
schemas/
clip_result.schema.json
video_result.schema.json
folder_summary.schema.json
tests/
test_config.py
test_discovery.py
test_probe.py
test_clips.py
test_result_parser.py
test_aggregator.py
outputs/
.gitkeep
```
## Module Boundaries
### `config.py`
- 加载 `config/local_batch.yaml`
- 合并 CLI 参数覆盖项。
- 校验必填字段、数值范围、路径安全。
- 不访问视频、不调用 FFmpeg、不调用模型。
### `paths.py`
- 生成稳定 `video_id``clip_id`
- 生成输出目录结构。
- 防止输出目录指向参考模型目录或覆盖输入视频目录。
### `discovery.py`
- 只负责按 `input.dir``recursive``extensions` 发现视频。
- 输出 `video_manifest.jsonl`
- 不做 ffprobe不做抽帧不调用模型。
### `probe.py`
- 包装 `ffprobe`
- 输出 `duration_seconds``codec_name``width``height``fps``format_name``start_time`
- 损坏或不支持视频标记 `probe_failed`,记录 `last_error`,不阻塞其他视频。
### `ffmpeg_sampler.py`
- 使用 FFmpeg + NVDEC 做 1 FPS 抽帧。
- 根据 codec 选择 `h264_cuvid` / `hevc_cuvid`
- 默认 `allow_cpu_fallback: false`
- 输出 JPEG 和 `frame_manifest.jsonl`
- 保存 FFmpeg stderr 摘要,作为实际使用 GPU 解码的证据。
### `frames.py`
- 计算 frame 的相对秒数和 timecode。
- 维护 frame 文件路径、offset、timecode。
- 优先使用可获得的 `pts_time`,否则使用抽帧序号按 FPS 推导相对时间。
### `clips.py`
- 读取 `frame_manifest.jsonl`
-`clip.length_seconds``clip.stride_seconds` 构建 clip。
- 从 1 FPS 帧中均匀采样 `frames_per_clip`
- 输出 `clip_manifest.jsonl`,必须包含参与推理的实际帧时间。
### `vlm_client.py`
- 调用 OpenAI-compatible `/v1/chat/completions`
- 多帧使用 `image_url`,默认 `data:image/jpeg;base64`
- prompt 来自 config不硬编码。
- 不解析业务事件,只返回 raw response、latency 和 HTTP 状态。
- 阶段 4 实现使用 Python 标准库 `urllib`,并暴露可注入 HTTP 函数以便测试 mock默认 URL 拼接为 `vlm.api_base_url.rstrip("/") + vlm.chat_completions_path`
### `result_parser.py`
- 从 raw response 中提取严格 JSON。
- 校验 `schema_version``events``screen_time`、事件枚举等字段。
- 解析失败触发一次严格 prompt 重试。
- 仍失败写 `parse_failed`,保留 `raw_response`
- 阶段 4 实现支持 raw JSON、markdown/prose 中嵌入 JSON输出 clip 级 `monitoring_timeline``events``raw_response``processing``error` 字段。
### `aggregator.py`
- 消费 `video_manifest.jsonl``clip_manifest.jsonl``clip_results.jsonl`
- 聚合为 `videos/<video_id>/video_result.json` 和输出根目录下的 `folder_summary.json`
-`merge_gap_seconds` 合并同视频、同类型、相邻时间范围接近的事件。
- 保留事件相对时间轴、screen_time、clip evidence 和 frame evidence。
- 统计 `parse_failed` / `inference_failed` clip 数量。
### `manifest.py`
- 负责 JSONL 读写和状态字段。
- 支持断点续跑。
- 每条记录包含 `status``retry_count``last_error`
## Config Schema
`config/local_batch.yaml` 建议字段:
```yaml
input:
dir: /path/to/videos
recursive: true
extensions: [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"]
source:
mode: local
output:
dir: ./outputs/local-batch
overwrite: false
resume: true
keep_frames: true
hik_cloud:
api_base_url: https://api2.hik-cloud.com
download_path: /v1/carrier/cstorage/open/play/download
access_token: null
access_token_env: HIK_CLOUD_ACCESS_TOKEN
chunk_seconds: 600
timeout_seconds: 60
download_timeout_seconds: 600
devices:
- device_serial: EXAMPLE_DEVICE_SERIAL
channel_no: 1
name: example-device
time_ranges:
- begin: "2026-02-03 09:00:00"
end: "2026-02-03 10:00:00"
ffprobe:
timeout_seconds: 30
ffmpeg:
prefer_nvdec: true
allow_cpu_fallback: false
hwaccel: cuda
codec_decoders:
h264: h264_cuvid
hevc: hevc_cuvid
frame_fps: 1
frame_width: 640
jpeg_quality: 4
timeout_seconds_per_video: 3600
clip:
length_seconds: 10
stride_seconds: 10
frames_per_clip: 8
min_frames_per_clip: 4
vlm:
api_base_url: http://localhost:8679
chat_completions_path: /v1/chat/completions
model: memai-zhengxin-v3-20260413
timeout_seconds: 120
max_tokens: 512
temperature: 0
batch_size: 1
image_transport: data_uri
retries: 1
prompt:
system: "You are a store video analysis assistant. Return strict JSON only."
user: "Analyze this clip. Return events and screen_time. If no event, return events: []."
schema:
version: local-batch-v1
event_types:
- customer_enter
- customer_leave
- queue_detected
- staff_absent
- staff_present
- area_crowded
- abnormal_behavior
- unknown
require_strict_json: true
parse_retry: 1
merge_gap_seconds: 30
runtime:
timezone: Asia/Shanghai
log_level: INFO
```
## File Contracts
### `video_manifest.jsonl`
One line per discovered video:
```json
{
"video_id": "stable_hash_or_slug",
"source_path": "/path/to/video.mp4",
"status": "pending",
"probe": null,
"retry_count": 0,
"last_error": null
}
```
### `frame_manifest.jsonl`
One line per sampled frame:
```json
{
"video_id": "stable_hash_or_slug",
"frame_id": "stable_hash_or_slug_f000120",
"frame_path": "frames/stable_hash_or_slug/000120.jpg",
"offset_seconds": 120.0,
"timecode": "00:02:00",
"pts_time": 120.0,
"status": "sampled"
}
```
### `clip_manifest.jsonl`
One line per clip:
```json
{
"video_id": "stable_hash_or_slug",
"clip_id": "stable_hash_or_slug_c000012",
"clip_start_seconds": 120.0,
"clip_end_seconds": 130.0,
"clip_start_timecode": "00:02:00",
"clip_end_timecode": "00:02:10",
"frame_times": [
{
"frame_path": "frames/stable_hash_or_slug/000120.jpg",
"offset_seconds": 120.0,
"timecode": "00:02:00"
}
],
"status": "pending",
"retry_count": 0,
"last_error": null
}
```
### `clip_results.jsonl`
One line per inferred clip:
```json
{
"schema_version": "local-batch-v1",
"video_id": "stable_hash_or_slug",
"video_path": "/path/to/video.mp4",
"clip_id": "stable_hash_or_slug_c000012",
"status": "ok",
"monitoring_timeline": {
"timezone": "Asia/Shanghai",
"video_start_time": null,
"clip_start_seconds": 120.0,
"clip_end_seconds": 130.0,
"clip_start_timecode": "00:02:00",
"clip_end_timecode": "00:02:10",
"frame_times": [
{
"frame_path": "frames/stable_hash_or_slug/000120.jpg",
"offset_seconds": 120.0,
"timecode": "00:02:00"
}
],
"screen_time": "2026-06-14 12:31:20"
},
"events": [
{
"event_type": "queue_detected",
"start_time": null,
"end_time": null,
"start_offset_seconds": 120.0,
"end_offset_seconds": 130.0,
"confidence": 0.86,
"severity": "medium",
"attributes": {},
"evidence": {
"clip_id": "stable_hash_or_slug_c000012",
"frame_paths": ["frames/stable_hash_or_slug/000120.jpg"]
}
}
],
"raw_response": null,
"processing": {
"started_at": "2026-06-15T10:00:00+08:00",
"finished_at": "2026-06-15T10:00:02+08:00",
"latency_ms": 1800
},
"error": null
}
```
### `video_result.json`
Written to:
```text
videos/<video_id>/video_result.json
```
Required top-level fields:
```text
schema_version
video_id
video_path
probe
monitoring_timeline.video_start_time
monitoring_timeline.video_duration_seconds
clip_count
failed_clip_count
event_counts
events
outputs.clip_results_jsonl
processing
```
### `folder_summary.json`
Required top-level fields:
```text
schema_version
input_dir
video_count
processed_video_count
failed_video_count
event_counts
videos
processing
```
## Timeline Rules
时间轴必须区分三类时间:
- 视频相对时间:`offset_seconds``timecode`
- 画面 OCR 时间:`screen_time` 或模型输出里的 `画面时间`
- 处理时间:`processing.started_at``processing.finished_at`
本地视频没有可靠业务开始时间时:
- `video_start_time` 必须为 `null`
- 不允许伪造绝对时间。
- 事件必须保留 `start_offset_seconds``end_offset_seconds`
参与推理的实际帧时间必须写入 `frame_times`。不能只写 clip 起止时间。
## Reference Code Usage
可以参考:
- `zhengxin-vlm-0413/shared/vlm_client.py` 的 OpenAI-compatible payload 结构。
- `zhengxin-vlm-0413/shared/frame_utils.py` 的 base64 data URI 处理方式。
- `zhengxin-vlm-0413/service/config.yaml` 的 prompt 配置风格。
不能直接复用为核心实现:
- `frame_utils.extract_frames_from_video`,因为它是整段均匀抽 8 帧,不满足 1 FPS、clip manifest、时间轴要求。
- `vlm_client.extract_action`,因为它只解析 `Action`,不能覆盖本项目完整事件和时间轴 schema。
- `rtsp_service.py` 主循环,因为它服务实时 RTSP不适合离线文件夹批处理。
## Validation Matrix
### Phase 1 Architecture Validation
阶段 1 complete 条件:
- `docs/project.md` 固化模块边界、文件输出契约、config schema、时间轴 schema、安全边界和验证矩阵。
- 推理接口选择已明确为 OpenAI-compatible vLLM。
- API URL 字段语义已固定为 `api_base_url` + `chat_completions_path`
- 已声明参考 `frame_utils.py` / `vlm_client.py` 哪些可借鉴、哪些不能直接复用。
- 已列出阶段 2-6 的 smoke test 输入、命令、期望输出字段和失败判定标准。
- 子 agent 审查结论记录到 `progress.md`
### Phase 2 Validation
目标本地视频发现、ffprobe、manifest、CLI 骨架。
命令:
```bash
python3 -m py_compile video_ai_analysis_poc/*.py
python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/videos --output-dir ./outputs/local-batch --dry-run
```
期望:
- 生成 `video_manifest.jsonl`
- 损坏/不支持视频被标记失败,不阻塞其他视频。
- 不读取或写入参考模型目录。
### Phase 3 Validation
目标FFmpeg/NVDEC 1 FPS 抽帧和 clip 构建。
命令:
```bash
ffmpeg -hwaccels
ffmpeg -decoders | grep cuvid
python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/short-videos --output-dir ./outputs/local-batch --until clips
```
期望:
- 对一个样例视频实际运行带 `-hwaccel cuda``h264_cuvid``hevc_cuvid` 的抽帧命令。
- 保存 FFmpeg stderr 或日志中的解码器证据。
- 生成 `frame_manifest.jsonl``clip_manifest.jsonl`
- `clip_manifest.jsonl` 包含 `frame_times`
### Phase 4 Validation
目标vLLM OpenAI-compatible API、prompt 配置、JSON 解析重试。
命令:
```bash
curl http://localhost:8679/v1/models
python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/short-videos --output-dir ./outputs/local-batch --until inference --limit-clips 3
```
期望:
- prompt 从 config 读取。
- 请求 URL 使用 `api_base_url + chat_completions_path`
- 生成 `clip_results.jsonl`
- 每条结果包含 `monitoring_timeline.frame_times``screen_time` 字段。
### Phase 5 Validation
目标clip/video/folder 聚合和 schema 校验。
命令:
```bash
python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/short-videos --output-dir ./outputs/local-batch
python3 -m json.tool ./outputs/local-batch/folder_summary.json >/dev/null
```
期望:
- 默认 CLI 运行不传 `--dry-run``--until` 时,会执行到 inference 并继续 aggregation。
- `--until clips``--until inference` 仍停在各自阶段,不写聚合输出。
- 生成 `videos/<video_id>/video_result.json`
- 生成 `folder_summary.json`
- 事件聚合保留相对时间轴。
- JSON 可被标准工具解析。
### Phase 6 Validation
目标:测试环境 smoke test 与文档更新。
远端环境:
```text
ssh xiaozheng@192.168.5.100
/home/xiaozheng/video-ai-analysis-poc
```
模型服务:
```bash
ssh xiaozheng@192.168.5.100 'curl http://localhost:8679/v1/models'
```
当前服务状态:
- 容器:`zhengxin-vllm`
- 镜像:`vllm/vllm-openai:v0.14.1`
- 端口:`8679`
- 模型:`memai-zhengxin-v3-20260413`
- 模型目录挂载:`/home/xiaozheng/zhengxin-vlm-0413/models:/models:ro`
远端能力验证命令:
```bash
ssh xiaozheng@192.168.5.100 'nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv,noheader'
ssh xiaozheng@192.168.5.100 'ffmpeg -hwaccels'
ssh xiaozheng@192.168.5.100 'ffmpeg -decoders'
```
已验证:
- GPU: `NVIDIA GeForce RTX 3080`, `20480 MiB`, driver `595.71.05`
- FFmpeg 6.1.1 支持 `cuda` hwaccel。
- FFmpeg decoders 包含 `h264_cuvid``hevc_cuvid`
- `/v1/models` 返回模型 id `memai-zhengxin-v3-20260413`
- `/v1/chat/completions` 安全 quoted health check 返回 `OK`
远端 smoke 输入:
```text
/tmp/video-ai-analysis-poc-smoke.h1cZUR/input/sample_h264.mp4
```
远端 smoke 输出:
```text
/tmp/video-ai-analysis-poc-smoke.h1cZUR/output
```
远端批处理命令:
```bash
ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m unittest discover -s /home/xiaozheng/video-ai-analysis-poc/tests -v'
ssh xiaozheng@192.168.5.100 'python3 -B -m compileall -q /home/xiaozheng/video-ai-analysis-poc/video_ai_analysis_poc'
ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m video_ai_analysis_poc.cli --config /home/xiaozheng/video-ai-analysis-poc/config/local_batch.yaml --input-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/input --output-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/output --until clips'
ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m video_ai_analysis_poc.cli --config /home/xiaozheng/video-ai-analysis-poc/config/local_batch.yaml --input-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/input --output-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/output --until inference --limit-clips 1'
ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m video_ai_analysis_poc.cli --config /home/xiaozheng/video-ai-analysis-poc/config/local_batch.yaml --input-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/input --output-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/output'
```
已验证输出:
- `video_manifest.jsonl`: 1 条视频记录。
- `frame_manifest.jsonl`: 12 条 sampled frame 记录。
- `clip_manifest.jsonl`: 1 条 clip 记录。
- frame manifest 中持久化 `hwaccel: cuda``decoder: h264_cuvid``ffmpeg_command` 和 FFmpeg stderr 摘要。
- `clip_results.jsonl`: 1 条记录,`status: ok`,包含 `monitoring_timeline.frame_times`
- `videos/<video_id>/video_result.json`: JSON 可解析,`failed_clip_count: 0`
- `folder_summary.json`: JSON 可解析,`video_count: 1``processed_video_count: 1`
- 本地视频没有可靠业务开始时间时,`monitoring_timeline.video_start_time` 输出 `null`ffprobe 的 `start_time: 0.0` 只保留在 `probe`
远端验证约束:
- 只写入明确输出目录。
- 不覆盖远端已有模型、配置和视频。
- 不复制真实凭据到日志或文档。
## Known Risks
- HEVC decoder 可用性已验证,但实际 smoke 只覆盖 H.264 样例视频。
- 24 小时真实门店视频吞吐量尚未压测。
- 海康云眸云录像/RTSP 接入仍在当前本地文件夹 PoC 范围之外。
- 本地视频可能没有画面内时间戳,必须同时保留相对时间。
- 模型事件质量尚未用真实门店素材验收;合成测试图没有业务事件,输出空事件是合理结果。
- 远端 vLLM 容器当前为手工启动,不是生产级 systemd/compose 托管。

View File

@@ -0,0 +1,190 @@
# Hik Cloud Download Analysis Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add Hik Cloud Storage recording download as a configurable multi-device source, then feed downloaded videos into the existing model analysis pipeline.
**Architecture:** Keep the current local-folder pipeline intact. Add a cloud acquisition module that plans one-hour chunks, calls the Hik download-address API, downloads videos to local output storage, records a download manifest, and returns local file records for the existing probe/frame/clip/inference/aggregate stages.
**Tech Stack:** Python standard library, existing `unittest` suite, existing JSONL manifest helpers, FFmpeg/vLLM pipeline already in `video_ai_analysis_poc`.
---
### Task 1: Config Schema And Time Chunking
**Files:**
- Modify: `video_ai_analysis_poc/config.py`
- Create: `video_ai_analysis_poc/hik_cloud.py`
- Modify: `tests/test_config.py`
- Create: `tests/test_hik_cloud.py`
- [ ] **Step 1: Write failing config tests**
Add tests that load:
```yaml
source:
mode: hik_cloud
hik_cloud:
access_token_env: HIK_CLOUD_ACCESS_TOKEN
devices:
- device_serial: EXAMPLE_DEVICE_SERIAL
channel_no: 1
name: front
time_ranges:
- begin: "2026-02-03 09:00:00"
end: "2026-02-03 10:30:00"
```
Expected: `source.mode == "hik_cloud"`, `devices` is a list of dicts, and `time_ranges` is a list of dicts.
- [ ] **Step 2: Write failing chunk tests**
Test that `build_download_chunks(...)` converts the range above into chunks with `timeEnd - timeBegin <= 3600`.
- [ ] **Step 3: Run red tests**
Run:
```bash
python3 -B -m unittest tests.test_config tests.test_hik_cloud -v
```
Expected: fail because list-of-mapping parsing and `hik_cloud.py` do not exist yet.
- [ ] **Step 4: Implement minimal parser/defaults/chunking**
Extend the simple YAML parser only enough for list items shaped as mappings. Add defaults for `source` and `hik_cloud`. Implement date-time parsing with `zoneinfo.ZoneInfo`.
- [ ] **Step 5: Run green tests**
Run the same unittest command. Expected: pass.
### Task 2: Hik Download Address API Client
**Files:**
- Modify: `video_ai_analysis_poc/hik_cloud.py`
- Modify: `tests/test_hik_cloud.py`
- [ ] **Step 1: Write failing API client tests**
Mock the HTTP function and verify:
- URL is `api_base_url.rstrip("/") + download_path`.
- Headers include `Authorization: bearer TOKEN`.
- JSON body includes `deviceSerial`, `channelNo`, `timeBegin`, `timeEnd`.
- Success returns URL and actual begin/end.
- Code `80438027` returns a structured `no_recording` result.
- Other non-zero codes return `address_failed`.
- [ ] **Step 2: Run red tests**
Run:
```bash
python3 -B -m unittest tests.test_hik_cloud -v
```
Expected: fail because the client is missing.
- [ ] **Step 3: Implement client**
Use `urllib.request` and injectable callables for tests. Do not log or persist the token.
- [ ] **Step 4: Run green tests**
Run the same command. Expected: pass.
### Task 3: Download Files And Manifest
**Files:**
- Modify: `video_ai_analysis_poc/hik_cloud.py`
- Modify: `video_ai_analysis_poc/paths.py`
- Modify: `tests/test_hik_cloud.py`
- [ ] **Step 1: Write failing downloader tests**
Mock address results and download bytes. Verify downloaded files are written under `downloads/hik_cloud/<device>/ch<channel>/`, filenames contain requested timestamps, manifest rows are written, token/query signatures are not in filenames, and resume skips already downloaded files.
- [ ] **Step 2: Run red tests**
Run:
```bash
python3 -B -m unittest tests.test_hik_cloud -v
```
Expected: fail because downloader/manifest behavior is missing.
- [ ] **Step 3: Implement downloader**
Write `download_hik_cloud_recordings(config, output_dir, *, address_client=None, download_url=None)` returning downloaded video records with cloud metadata.
- [ ] **Step 4: Run green tests**
Run the same command. Expected: pass.
### Task 4: CLI Cloud Source Integration
**Files:**
- Modify: `video_ai_analysis_poc/cli.py`
- Modify: `tests/test_cli.py`
- [ ] **Step 1: Write failing CLI tests**
Add tests that:
- `source.mode: local` still uses `discover_videos`.
- `source.mode: hik_cloud` calls the cloud downloader and probes returned downloaded paths.
- `--dry-run` in cloud mode requests download addresses and writes the download manifest, but does not download video files, probe, call FFmpeg, call VLM, or aggregate.
- `--until clips` in cloud mode produces video/frame/clip manifests from mocked downloaded video records.
- [ ] **Step 2: Run red tests**
Run:
```bash
python3 -B -m unittest tests.test_cli -v
```
Expected: fail because CLI has no source mode branch.
- [ ] **Step 3: Implement CLI branch**
Keep local behavior unchanged. In cloud mode, call downloader before probe and carry cloud metadata into `video_manifest.jsonl`.
- [ ] **Step 4: Run green tests**
Run the same command. Expected: pass.
### Task 5: Docs, Example Config, And Full Verification
**Files:**
- Modify: `config/local_batch.yaml`
- Modify: `docs/project.md`
- Modify: `findings.md`
- Modify: `progress.md`
- Modify: `memories.md`
- [ ] **Step 1: Update docs/config**
Add a commented or safe example for `source.mode: hik_cloud`, token env var, devices, and time ranges. Do not include a real token.
- [ ] **Step 2: Run full tests**
Run:
```bash
python3 -B -m unittest discover -s tests -v
python3 -B -m py_compile video_ai_analysis_poc/*.py
```
Expected: all pass.
- [ ] **Step 3: Run local mock smoke**
Use test mocks or a temporary local HTTP fixture to verify cloud mode can produce downloaded files and continue to `--until clips` without a real Hik token.
- [ ] **Step 4: Record results**
Update `progress.md` with commands, results, files changed, and remaining risk. Real Hik API verification is skipped until a real AccessToken/device/time range is provided.

View File

@@ -0,0 +1,151 @@
# Hik Cloud Download Analysis Design
## Goal
Add Hik Cloud Storage recording download as a first-class video source for the existing video analysis pipeline. The implementation must support configurable AccessToken, multiple devices, configurable date-time ranges, one-hour API slicing, video downloads, and reuse the existing local analysis pipeline.
## Source Model
The pipeline keeps the existing local mode and adds a cloud mode:
```yaml
source:
mode: local # local | hik_cloud
```
`local` keeps the current folder discovery behavior. `hik_cloud` runs a download stage first, then analyzes the downloaded files exactly like local files.
## Hik Cloud Configuration
The config should allow a literal token for controlled testing and an environment variable for normal use:
```yaml
hik_cloud:
api_base_url: https://api2.hik-cloud.com
download_path: /v1/carrier/cstorage/open/play/download
access_token: null
access_token_env: HIK_CLOUD_ACCESS_TOKEN
chunk_seconds: 600
timeout_seconds: 60
download_timeout_seconds: 600
devices:
- device_serial: EXAMPLE_DEVICE_SERIAL
channel_no: 1
name: store-front
time_ranges:
- begin: "2026-02-03 09:00:00"
end: "2026-02-03 11:30:00"
```
The implementation must not print or persist the token. Manifest entries may record the API URL path, device serial, channel, requested times, actual times, and status, but not the Authorization header.
## Time Handling
The user-facing time range includes year, month, day, hour, minute, and second. The config supports both `YYYY-MM-DD HH:MM:SS` strings and integer epoch seconds. String parsing uses `runtime.timezone`, defaulting to `Asia/Shanghai`, and converts to Unix seconds for `timeBegin` and `timeEnd`.
Ranges are split into chunks with `end - begin <= 3600` because the PDF documents error `80430002` when the requested interval exceeds 3600 seconds. The example default uses 600 seconds because real remote smoke found that shorter chunks produced valid, probeable MP4 files for the provided test range.
## API Contract
Use the PDF section “2、获取录像下载地址”:
```text
POST https://api2.hik-cloud.com/v1/carrier/cstorage/open/play/download
Authorization: bearer <AccessToken>
Content-Type: application/json
```
Request body:
```json
{
"deviceSerial": "EXAMPLE_DEVICE_SERIAL",
"channelNo": 1,
"timeBegin": 1764856787,
"timeEnd": 1764856978
}
```
Successful response:
```json
{
"code": 0,
"data": {
"url": "https://...",
"actualBeginTime": "1764856787",
"actualEndTime": "1764856978"
},
"success": true
}
```
Non-zero codes become structured failures. `80438027` is treated as `no_recording` so one empty chunk does not stop the batch.
## Output Contract
Cloud downloads write a dedicated manifest:
```text
<output.dir>/hik_cloud_download_manifest.jsonl
```
Each row contains:
- `source: hik_cloud`
- `device_serial`
- `channel_no`
- `requested_begin`, `requested_end`
- `actual_begin`, `actual_end`
- `download_url_host` or no URL at all if avoiding host persistence is preferred
- `path` for downloaded video
- `status`: `address_ok`, `downloaded`, `no_recording`, `address_failed`, `download_failed`
- `retry_count`, `last_error`
Downloaded videos go under:
```text
<output.dir>/downloads/hik_cloud/<device_serial>/ch<channel_no>/
```
Filenames use device/channel/requested timestamps and never include URL query signatures or tokens.
## Pipeline Integration
`cli.py` should branch only at source acquisition:
```text
local mode:
discover local videos -> probe -> frames -> clips -> inference -> aggregate
hik_cloud mode:
build chunks -> request download URLs -> download videos -> probe -> frames -> clips -> inference -> aggregate
```
After downloads complete, the rest of the pipeline should consume downloaded file paths and preserve cloud metadata in `video_manifest.jsonl`.
FFmpeg sampling caps output frames from the requested/actual cloud chunk duration. This prevents malformed or irregular Hik MP4 timestamps from making the `fps=1` filter duplicate tens of thousands of frames for a 10-minute chunk.
Cloud `--dry-run` stops at download-address planning: it requests addresses and writes `hik_cloud_download_manifest.jsonl`, but does not download video files, run ffprobe, sample frames, infer, or aggregate.
## Error Handling
- Missing token: fail fast with a clear config error in `hik_cloud` mode.
- Invalid range: fail fast if `end <= begin`.
- API code 80438027: record `no_recording`, continue.
- Other API non-zero code: record `address_failed`, continue other chunks.
- Download HTTP/IO failure: record `download_failed`, continue other chunks.
- Existing downloaded file with manifest status `downloaded`: skip on resume.
## Testing
Use TDD with standard-library mocks:
- config parser loads `devices` as list of dicts.
- time parser accepts date-time strings and epoch integers.
- splitter produces max-3600-second chunks.
- API client builds correct URL, body, bearer header, and parses success/failure.
- downloader writes bytes and manifest without persisting token.
- CLI cloud mode uses downloaded files and keeps local mode unchanged.
Real Hik API smoke uses the sensitive `access_token.md` file provided by the user on the remote test environment. Do not copy values from that file into docs, tests, logs, or final responses.

309
tests/test_aggregator.py Normal file
View File

@@ -0,0 +1,309 @@
import json
import tempfile
import unittest
from datetime import datetime, timedelta
from pathlib import Path
from video_ai_analysis_poc.aggregator import aggregate_outputs
class AggregatorTests(unittest.TestCase):
def test_aggregates_video_results_folder_summary_and_merges_adjacent_events(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp)
video_a = {
"video_id": "video-a",
"path": "/videos/a.mp4",
"status": "probed",
"duration_seconds": 40.0,
"codec_name": "h264",
"width": 1920,
"height": 1080,
}
video_b = {
"video_id": "video-b",
"path": "/videos/b.mp4",
"status": "probe_failed",
"last_error": "bad file",
}
self._write_jsonl(output_dir / "video_manifest.jsonl", [video_a, video_b])
clips = [
self._clip("video-a", "video-a_c000001", 0.0, 10.0),
self._clip("video-a", "video-a_c000002", 12.0, 20.0),
self._clip("video-a", "video-a_c000003", 21.0, 30.0),
self._clip("video-b", "video-b_c000001", 0.0, 10.0),
]
self._write_jsonl(output_dir / "clip_manifest.jsonl", clips)
results = [
self._result(
"video-a",
"video-a_c000001",
"/videos/a.mp4",
0.0,
10.0,
"09:00:01",
[{"event_type": "queue_detected", "start_offset_seconds": 1.0, "end_offset_seconds": 10.0}],
),
self._result(
"video-a",
"video-a_c000002",
"/videos/a.mp4",
12.0,
20.0,
"09:00:13",
[{"event_type": "queue_detected", "start_offset_seconds": 12.0, "end_offset_seconds": 16.0}],
),
self._result(
"video-a",
"video-a_c000003",
"/videos/a.mp4",
21.0,
30.0,
"09:00:22",
[{"event_type": "staff_absent", "start_offset_seconds": 21.0, "end_offset_seconds": 25.0}],
),
{
"schema_version": "local-batch-v1",
"video_id": "video-b",
"video_path": "/videos/b.mp4",
"clip_id": "video-b_c000001",
"status": "inference_failed",
"monitoring_timeline": {
"video_start_time": None,
"clip_start_seconds": 0.0,
"clip_end_seconds": 10.0,
"frame_times": [],
"screen_time": "",
},
"events": [],
"raw_response": "",
"processing": {},
"error": "offline",
},
]
self._write_jsonl(output_dir / "clip_results.jsonl", results)
aggregate_outputs(
output_dir,
{
"input": {"dir": "/videos"},
"schema": {"version": "local-batch-v1", "merge_gap_seconds": 3},
"runtime": {"timezone": "Asia/Shanghai"},
},
)
video_result_path = output_dir / "videos" / "video-a" / "video_result.json"
self.assertTrue(video_result_path.exists())
video_result = json.loads(video_result_path.read_text(encoding="utf-8"))
self.assertEqual(video_result["schema_version"], "local-batch-v1")
self.assertEqual(video_result["video_id"], "video-a")
self.assertEqual(video_result["video_path"], "/videos/a.mp4")
self.assertEqual(video_result["probe"]["codec_name"], "h264")
self.assertIsNone(video_result["monitoring_timeline"]["video_start_time"])
self.assertEqual(video_result["monitoring_timeline"]["video_duration_seconds"], 40.0)
self.assertEqual(video_result["clip_count"], 3)
self.assertEqual(video_result["failed_clip_count"], 0)
self.assertEqual(video_result["event_counts"], {"queue_detected": 1, "staff_absent": 1})
self.assertEqual(len(video_result["events"]), 2)
merged = video_result["events"][0]
self.assertEqual(merged["event_type"], "queue_detected")
self.assertEqual(merged["start_offset_seconds"], 1.0)
self.assertEqual(merged["end_offset_seconds"], 16.0)
self.assertEqual(merged["screen_times"], ["09:00:01", "09:00:13"])
self.assertEqual(merged["evidence"]["clip_ids"], ["video-a_c000001", "video-a_c000002"])
self.assertEqual(
[
clip["clip_start_beijing_time"]
for clip in merged["evidence"]["clips"]
],
["2026-06-15 07:00:00", "2026-06-15 07:00:12"],
)
self.assertEqual(
[
clip["clip_end_beijing_time"]
for clip in merged["evidence"]["clips"]
],
["2026-06-15 07:00:10", "2026-06-15 07:00:20"],
)
self.assertEqual(video_result["outputs"]["clip_results_jsonl"], "clip_results.jsonl")
self.assertIn("started_at", video_result["processing"])
self.assertIn("finished_at", video_result["processing"])
failed_video_result = json.loads(
(output_dir / "videos" / "video-b" / "video_result.json").read_text(
encoding="utf-8"
)
)
self.assertEqual(failed_video_result["clip_count"], 1)
self.assertEqual(failed_video_result["failed_clip_count"], 1)
self.assertEqual(failed_video_result["event_counts"], {})
folder_summary = json.loads(
(output_dir / "folder_summary.json").read_text(encoding="utf-8")
)
self.assertEqual(folder_summary["schema_version"], "local-batch-v1")
self.assertEqual(folder_summary["input_dir"], "/videos")
self.assertEqual(folder_summary["video_count"], 2)
self.assertEqual(folder_summary["processed_video_count"], 1)
self.assertEqual(folder_summary["failed_video_count"], 1)
self.assertEqual(folder_summary["event_counts"], {"queue_detected": 1, "staff_absent": 1})
self.assertEqual(
[video["video_id"] for video in folder_summary["videos"]],
["video-a", "video-b"],
)
self.assertIn("processing", folder_summary)
def test_ffprobe_start_time_is_not_treated_as_monitoring_timeline_start(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp)
self._write_jsonl(
output_dir / "video_manifest.jsonl",
[
{
"video_id": "video-local",
"path": "/videos/local.mp4",
"status": "probed",
"duration_seconds": 12.0,
"start_time": 0.0,
}
],
)
self._write_jsonl(
output_dir / "clip_manifest.jsonl",
[self._clip("video-local", "video-local_c000001", 0.0, 10.0)],
)
self._write_jsonl(output_dir / "clip_results.jsonl", [])
aggregate_outputs(
output_dir,
{
"input": {"dir": "/videos"},
"schema": {"version": "local-batch-v1", "merge_gap_seconds": 3},
},
)
video_result = json.loads(
(output_dir / "videos" / "video-local" / "video_result.json").read_text(
encoding="utf-8"
)
)
self.assertEqual(video_result["probe"]["start_time"], 0.0)
self.assertIsNone(video_result["monitoring_timeline"]["video_start_time"])
def test_does_not_merge_different_event_types_videos_or_large_gaps(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp)
self._write_jsonl(
output_dir / "video_manifest.jsonl",
[
{"video_id": "video-a", "path": "/videos/a.mp4", "status": "probed"},
{"video_id": "video-b", "path": "/videos/b.mp4", "status": "probed"},
],
)
self._write_jsonl(
output_dir / "clip_manifest.jsonl",
[
self._clip("video-a", "a1", 0.0, 10.0),
self._clip("video-a", "a2", 40.0, 50.0),
self._clip("video-a", "a3", 51.0, 60.0),
self._clip("video-b", "b1", 0.0, 10.0),
],
)
self._write_jsonl(
output_dir / "clip_results.jsonl",
[
self._result("video-a", "a1", "/videos/a.mp4", 0.0, 10.0, "", [{"event_type": "queue_detected", "start_offset_seconds": 1.0, "end_offset_seconds": 5.0}]),
self._result("video-a", "a2", "/videos/a.mp4", 40.0, 50.0, "", [{"event_type": "queue_detected", "start_offset_seconds": 40.0, "end_offset_seconds": 45.0}]),
self._result("video-a", "a3", "/videos/a.mp4", 51.0, 60.0, "", [{"event_type": "staff_absent", "start_offset_seconds": 51.0, "end_offset_seconds": 55.0}]),
self._result("video-b", "b1", "/videos/b.mp4", 0.0, 10.0, "", [{"event_type": "queue_detected", "start_offset_seconds": 1.0, "end_offset_seconds": 5.0}]),
],
)
aggregate_outputs(
output_dir,
{
"input": {"dir": "/videos"},
"schema": {"version": "local-batch-v1", "merge_gap_seconds": 3},
},
)
video_a = json.loads(
(output_dir / "videos" / "video-a" / "video_result.json").read_text(
encoding="utf-8"
)
)
video_b = json.loads(
(output_dir / "videos" / "video-b" / "video_result.json").read_text(
encoding="utf-8"
)
)
self.assertEqual(len(video_a["events"]), 3)
self.assertEqual(video_a["event_counts"], {"queue_detected": 2, "staff_absent": 1})
self.assertEqual(len(video_b["events"]), 1)
self.assertEqual(video_b["event_counts"], {"queue_detected": 1})
def _clip(self, video_id, clip_id, start, end):
return {
"video_id": video_id,
"clip_id": clip_id,
"clip_start_seconds": start,
"clip_end_seconds": end,
"clip_start_timecode": "00:00:00",
"clip_end_timecode": "00:00:10",
"frame_times": [
{
"frame_path": f"frames/{video_id}/{clip_id}.jpg",
"offset_seconds": start,
"timecode": "00:00:00",
}
],
"status": "pending",
}
def _result(self, video_id, clip_id, video_path, start, end, screen_time, events):
base = datetime(2026, 6, 15, 7, 0, 0)
clip_start_beijing_time = (base + timedelta(seconds=start)).strftime(
"%Y-%m-%d %H:%M:%S"
)
clip_end_beijing_time = (base + timedelta(seconds=end)).strftime(
"%Y-%m-%d %H:%M:%S"
)
return {
"schema_version": "local-batch-v1",
"video_id": video_id,
"video_path": video_path,
"clip_id": clip_id,
"status": "ok",
"monitoring_timeline": {
"video_start_time": None,
"clip_start_seconds": start,
"clip_end_seconds": end,
"clip_start_timecode": "00:00:00",
"clip_end_timecode": "00:00:10",
"clip_start_beijing_time": clip_start_beijing_time,
"clip_end_beijing_time": clip_end_beijing_time,
"frame_times": [
{
"frame_path": f"frames/{video_id}/{clip_id}.jpg",
"offset_seconds": start,
"timecode": "00:00:00",
"beijing_time": clip_start_beijing_time,
}
],
"screen_time": screen_time,
},
"events": events,
"raw_response": "{}",
"processing": {},
"error": None,
}
def _write_jsonl(self, path, records):
path.write_text(
"".join(json.dumps(record, sort_keys=True) + "\n" for record in records),
encoding="utf-8",
)
if __name__ == "__main__":
unittest.main()

1275
tests/test_cli.py Normal file

File diff suppressed because it is too large Load Diff

167
tests/test_clips.py Normal file
View File

@@ -0,0 +1,167 @@
import json
import tempfile
import unittest
from pathlib import Path
from video_ai_analysis_poc.clips import build_clip_records, build_clip_records_from_manifest
class ClipTests(unittest.TestCase):
def test_build_clip_records_uniformly_samples_frames_per_clip(self):
frames = [
{
"video_id": "video-abc",
"frame_id": f"video-abc_f{index + 1:06d}",
"frame_path": f"frames/video-abc/{index + 1:06d}.jpg",
"offset_seconds": float(index),
"timecode": f"00:00:{index:02d}",
"pts_time": float(index),
"status": "sampled",
}
for index in range(10)
]
clips = build_clip_records(
frames,
{
"length_seconds": 10,
"stride_seconds": 10,
"frames_per_clip": 4,
"min_frames_per_clip": 2,
},
)
self.assertEqual(len(clips), 1)
self.assertEqual(clips[0]["clip_id"], "video-abc_c000001")
self.assertEqual(clips[0]["clip_start_seconds"], 0.0)
self.assertEqual(clips[0]["clip_end_seconds"], 10.0)
self.assertEqual(
[frame["offset_seconds"] for frame in clips[0]["frame_times"]],
[0.0, 3.0, 6.0, 9.0],
)
self.assertEqual(clips[0]["status"], "pending")
self.assertEqual(clips[0]["retry_count"], 0)
self.assertIsNone(clips[0]["last_error"])
def test_tail_clip_end_is_truncated_to_last_frame_interval(self):
frames = [
{
"video_id": "video-abc",
"frame_id": f"video-abc_f{index + 1:06d}",
"frame_path": f"frames/video-abc/{index + 1:06d}.jpg",
"offset_seconds": float(index),
"timecode": f"00:00:{index:02d}",
"pts_time": float(index),
"status": "sampled",
}
for index in range(15)
]
clips = build_clip_records(
frames,
{
"length_seconds": 10,
"stride_seconds": 10,
"frames_per_clip": 8,
"min_frames_per_clip": 4,
},
)
self.assertEqual(len(clips), 2)
self.assertEqual(clips[1]["clip_start_seconds"], 10.0)
self.assertEqual(clips[1]["clip_end_seconds"], 15.0)
self.assertEqual(clips[1]["clip_end_timecode"], "00:00:15")
def test_build_clip_records_adds_beijing_time_range_and_frame_times(self):
frames = [
{
"video_id": "video-abc",
"frame_id": f"video-abc_f{index + 1:06d}",
"frame_path": f"frames/video-abc/{index + 1:06d}.jpg",
"offset_seconds": float(index),
"timecode": f"00:00:{index:02d}",
"pts_time": float(index),
"beijing_time": f"2026-06-15 07:00:{index:02d}",
"status": "sampled",
}
for index in range(10)
]
clips = build_clip_records(
frames,
{
"length_seconds": 10,
"stride_seconds": 10,
"frames_per_clip": 4,
"min_frames_per_clip": 2,
},
)
self.assertEqual(clips[0]["clip_start_beijing_time"], "2026-06-15 07:00:00")
self.assertEqual(clips[0]["clip_end_beijing_time"], "2026-06-15 07:00:10")
self.assertEqual(
[frame["beijing_time"] for frame in clips[0]["frame_times"]],
[
"2026-06-15 07:00:00",
"2026-06-15 07:00:03",
"2026-06-15 07:00:06",
"2026-06-15 07:00:09",
],
)
def test_build_clip_records_from_manifest_skips_failed_frames_and_writes_jsonl(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
frame_manifest = root / "frame_manifest.jsonl"
clip_manifest = root / "clip_manifest.jsonl"
records = [
{
"video_id": "video-abc",
"frame_id": f"video-abc_f{index + 1:06d}",
"frame_path": f"frames/video-abc/{index + 1:06d}.jpg",
"offset_seconds": float(index),
"timecode": f"00:00:{index:02d}",
"pts_time": float(index),
"status": "sampled",
}
for index in range(4)
]
records.append(
{
"video_id": "video-abc",
"frame_id": None,
"frame_path": None,
"offset_seconds": None,
"timecode": None,
"pts_time": None,
"status": "sample_failed",
"last_error": "bad decode",
}
)
frame_manifest.write_text(
"\n".join(json.dumps(record, sort_keys=True) for record in records) + "\n",
encoding="utf-8",
)
clips = build_clip_records_from_manifest(
frame_manifest,
clip_manifest,
{
"length_seconds": 10,
"stride_seconds": 10,
"frames_per_clip": 8,
"min_frames_per_clip": 4,
},
)
self.assertEqual(len(clips), 1)
self.assertEqual(len(clips[0]["frame_times"]), 4)
persisted = [
json.loads(line)
for line in clip_manifest.read_text(encoding="utf-8").splitlines()
]
self.assertEqual(persisted, clips)
if __name__ == "__main__":
unittest.main()

240
tests/test_config.py Normal file
View File

@@ -0,0 +1,240 @@
import tempfile
import unittest
from pathlib import Path
from video_ai_analysis_poc.config import load_config
class ConfigTests(unittest.TestCase):
def test_loads_local_batch_yaml_and_applies_cli_overrides(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
input_dir = root / "videos"
output_dir = root / "out"
override_input = root / "override-videos"
override_output = root / "override-out"
input_dir.mkdir()
override_input.mkdir()
config_path = root / "local_batch.yaml"
config_path.write_text(
"\n".join(
[
"input:",
f" dir: {input_dir}",
" recursive: false",
' extensions: [".mp4", ".mov"]',
"output:",
f" dir: {output_dir}",
" overwrite: false",
"ffprobe:",
" timeout_seconds: 5",
]
),
encoding="utf-8",
)
config = load_config(
config_path,
input_dir=override_input,
output_dir=override_output,
)
self.assertEqual(config["input"]["dir"], str(override_input.resolve()))
self.assertEqual(config["output"]["dir"], str(override_output.resolve()))
self.assertFalse(config["input"]["recursive"])
self.assertEqual(config["input"]["extensions"], [".mp4", ".mov"])
self.assertEqual(config["ffprobe"]["timeout_seconds"], 5)
def test_rejects_output_dir_equal_to_input_dir(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
input_dir = root / "videos"
input_dir.mkdir()
config_path = root / "local_batch.yaml"
config_path.write_text(
"\n".join(
[
"input:",
f" dir: {input_dir}",
"output:",
f" dir: {input_dir}",
]
),
encoding="utf-8",
)
with self.assertRaisesRegex(ValueError, "output dir must not equal input dir"):
load_config(config_path)
def test_rejects_output_dir_inside_reference_project(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
input_dir = root / "videos"
input_dir.mkdir()
forbidden_output = (
Path("/Users/yoilun/AI-train/zhengxin-vlm-0413")
/ "outputs"
/ "local-batch"
)
config_path = root / "local_batch.yaml"
config_path.write_text(
"\n".join(
[
"input:",
f" dir: {input_dir}",
"output:",
f" dir: {forbidden_output}",
]
),
encoding="utf-8",
)
with self.assertRaisesRegex(
ValueError, "output dir must not be inside forbidden reference dir"
):
load_config(config_path)
def test_loads_nested_mapping_values(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
input_dir = root / "videos"
output_dir = root / "output"
input_dir.mkdir()
config_path = root / "local_batch.yaml"
config_path.write_text(
"\n".join(
[
"input:",
f" dir: {input_dir}",
"output:",
f" dir: {output_dir}",
"ffmpeg:",
" codec_decoders:",
" h264: h264_cuvid",
" hevc: hevc_cuvid",
]
),
encoding="utf-8",
)
config = load_config(config_path)
self.assertEqual(
config["ffmpeg"]["codec_decoders"],
{"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
)
def test_loads_prompt_block_scalar_values(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
input_dir = root / "videos"
output_dir = root / "output"
input_dir.mkdir()
config_path = root / "local_batch.yaml"
config_path.write_text(
"\n".join(
[
"input:",
f" dir: {input_dir}",
"output:",
f" dir: {output_dir}",
"prompt:",
" system: >-",
" First instruction.",
" Second instruction.",
"",
" Final instruction.",
" user: 'Return strict JSON.'",
]
),
encoding="utf-8",
)
config = load_config(config_path)
self.assertEqual(
config["prompt"]["system"],
"First instruction.\nSecond instruction.\n\nFinal instruction.",
)
self.assertEqual(config["prompt"]["user"], "Return strict JSON.")
def test_defaults_source_mode_to_local_and_hik_cloud_section(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
input_dir = root / "videos"
output_dir = root / "output"
input_dir.mkdir()
config_path = root / "local_batch.yaml"
config_path.write_text(
"\n".join(
[
"input:",
f" dir: {input_dir}",
"output:",
f" dir: {output_dir}",
]
),
encoding="utf-8",
)
config = load_config(config_path)
self.assertEqual(config["source"]["mode"], "local")
self.assertIn("devices", config["hik_cloud"])
self.assertIn("time_ranges", config["hik_cloud"])
def test_loads_hik_cloud_devices_and_time_ranges_as_list_of_mappings(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
input_dir = root / "videos"
output_dir = root / "output"
input_dir.mkdir()
config_path = root / "local_batch.yaml"
config_path.write_text(
"\n".join(
[
"input:",
f" dir: {input_dir}",
"output:",
f" dir: {output_dir}",
"source:",
" mode: hik_cloud",
"hik_cloud:",
" devices:",
" - device_serial: EXAMPLE_DEVICE_SERIAL",
" channel_no: 1",
" name: front",
" time_ranges:",
' - begin: "2026-02-03 09:00:00"',
' end: "2026-02-03 10:30:00"',
]
),
encoding="utf-8",
)
config = load_config(config_path)
self.assertEqual(config["source"]["mode"], "hik_cloud")
self.assertEqual(
config["hik_cloud"]["devices"],
[
{
"device_serial": "EXAMPLE_DEVICE_SERIAL",
"channel_no": 1,
"name": "front",
}
],
)
self.assertEqual(
config["hik_cloud"]["time_ranges"],
[
{
"begin": "2026-02-03 09:00:00",
"end": "2026-02-03 10:30:00",
}
],
)
if __name__ == "__main__":
unittest.main()

41
tests/test_discovery.py Normal file
View File

@@ -0,0 +1,41 @@
import tempfile
import unittest
from pathlib import Path
from video_ai_analysis_poc.discovery import discover_videos
class DiscoveryTests(unittest.TestCase):
def test_discovers_supported_extensions_without_recursion(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
nested = root / "nested"
nested.mkdir()
supported = root / "a.MP4"
unsupported = root / "notes.txt"
nested_video = nested / "b.mov"
supported.write_text("not a real video", encoding="utf-8")
unsupported.write_text("ignore me", encoding="utf-8")
nested_video.write_text("not a real video", encoding="utf-8")
videos = discover_videos(root, [".mp4", ".mov"], recursive=False)
self.assertEqual(videos, [supported])
def test_discovers_supported_extensions_recursively_sorted(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
nested = root / "nested"
nested.mkdir()
first = root / "a.mp4"
second = nested / "b.mov"
first.write_text("x", encoding="utf-8")
second.write_text("x", encoding="utf-8")
videos = discover_videos(root, [".mp4", ".mov"], recursive=True)
self.assertEqual(videos, [first, second])
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,357 @@
import json
import subprocess
import tempfile
import unittest
from pathlib import Path
from unittest.mock import patch
from video_ai_analysis_poc.ffmpeg_sampler import (
build_sample_command,
sample_video_frames,
)
class FfmpegSamplerTests(unittest.TestCase):
def test_build_sample_command_uses_nvdec_decoder_for_h264(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp) / "output"
command = build_sample_command(
Path("/tmp/input.mp4"),
output_dir,
"video-abc",
{
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"hwaccel": "cuda",
"codec_decoders": {"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
"frame_fps": 1,
"frame_width": 640,
"jpeg_quality": 4,
},
codec_name="h264",
)
self.assertIn("-hwaccel", command)
self.assertIn("cuda", command)
self.assertIn("-c:v", command)
self.assertIn("h264_cuvid", command)
self.assertEqual(command[-1], str(output_dir / "frames" / "video-abc" / "%06d.jpg"))
def test_build_sample_command_uses_nvdec_decoder_for_hevc(self):
with tempfile.TemporaryDirectory() as tmp:
command = build_sample_command(
Path("/tmp/input.mp4"),
Path(tmp) / "output",
"video-abc",
{
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"hwaccel": "cuda",
"codec_decoders": {"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
"frame_fps": 1,
"frame_width": 640,
"jpeg_quality": 4,
},
codec_name="hevc",
)
self.assertIn("-hwaccel", command)
self.assertIn("cuda", command)
self.assertIn("-c:v", command)
self.assertIn("hevc_cuvid", command)
def test_build_sample_command_refuses_cpu_fallback_by_default(self):
with tempfile.TemporaryDirectory() as tmp:
with self.assertRaisesRegex(ValueError, "NVDEC decoder is required"):
build_sample_command(
Path("/tmp/input.mp4"),
Path(tmp),
"video-abc",
{
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"codec_decoders": {"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
},
codec_name="vp9",
)
def test_sample_video_frames_writes_structured_failure_record(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
manifest_path = root / "frame_manifest.jsonl"
failure = subprocess.CalledProcessError(
returncode=1,
cmd=["ffmpeg"],
stderr="No decoder h264_cuvid",
)
with patch("subprocess.run", side_effect=failure):
records = sample_video_frames(
{
"video_id": "video-abc",
"path": str(root / "input.mp4"),
"codec_name": "h264",
},
root,
{
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"hwaccel": "cuda",
"codec_decoders": {"h264": "h264_cuvid"},
"frame_fps": 1,
"frame_width": 640,
"jpeg_quality": 4,
"timeout_seconds_per_video": 30,
},
manifest_path=manifest_path,
)
self.assertEqual(len(records), 1)
self.assertEqual(records[0]["video_id"], "video-abc")
self.assertEqual(records[0]["status"], "sample_failed")
self.assertIn("h264_cuvid", records[0]["last_error"])
persisted = [
json.loads(line)
for line in manifest_path.read_text(encoding="utf-8").splitlines()
]
self.assertEqual(persisted, records)
def test_sample_video_frames_persists_success_nvdec_evidence(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
manifest_path = root / "frame_manifest.jsonl"
video_id = "video-abc"
frame_dir = root / "frames" / video_id
def run_success(*args, **kwargs):
frame_dir.mkdir(parents=True, exist_ok=True)
(frame_dir / "000001.jpg").write_bytes(b"jpg")
return subprocess.CompletedProcess(
args=args[0],
returncode=0,
stdout="",
stderr="Using decoder h264_cuvid with hwaccel cuda",
)
with patch("subprocess.run", side_effect=run_success):
records = sample_video_frames(
{
"video_id": video_id,
"path": str(root / "input.mp4"),
"codec_name": "h264",
},
root,
{
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"hwaccel": "cuda",
"codec_decoders": {"h264": "h264_cuvid"},
"frame_fps": 1,
"frame_width": 640,
"jpeg_quality": 4,
"timeout_seconds_per_video": 30,
},
manifest_path=manifest_path,
)
self.assertEqual(records[0]["status"], "sampled")
self.assertEqual(records[0]["decoder"], "h264_cuvid")
self.assertEqual(records[0]["hwaccel"], "cuda")
self.assertIn("h264_cuvid", records[0]["ffmpeg_command"])
self.assertIn("Using decoder h264_cuvid", records[0]["stderr_summary"])
persisted = [
json.loads(line)
for line in manifest_path.read_text(encoding="utf-8").splitlines()
]
self.assertEqual(persisted, records)
def test_sample_video_frames_adds_beijing_time_from_hik_actual_begin(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
video_id = "video-abc"
frame_dir = root / "frames" / video_id
def run_success(command, *args, **kwargs):
frame_dir.mkdir(parents=True, exist_ok=True)
(frame_dir / "000001.jpg").write_bytes(b"jpg")
(frame_dir / "000002.jpg").write_bytes(b"jpg")
return subprocess.CompletedProcess(
args=command,
returncode=0,
stdout="",
stderr="",
)
with patch("subprocess.run", side_effect=run_success):
records = sample_video_frames(
{
"video_id": video_id,
"path": str(root / "input.mp4"),
"codec_name": "h264",
"actual_begin": 1781478000,
"actual_end": 1781478600,
},
root,
{
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"hwaccel": "cuda",
"codec_decoders": {"h264": "h264_cuvid"},
"frame_fps": 1,
"frame_width": 640,
"jpeg_quality": 4,
"timeout_seconds_per_video": 30,
"timezone": "Asia/Shanghai",
},
)
self.assertEqual(records[0]["beijing_time"], "2026-06-15 07:00:00")
self.assertEqual(records[1]["beijing_time"], "2026-06-15 07:00:01")
def test_sample_video_frames_caps_output_frames_to_requested_duration(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
video_id = "video-abc"
frame_dir = root / "frames" / video_id
captured_command = []
def run_success(command, *args, **kwargs):
captured_command.extend(command)
frame_dir.mkdir(parents=True, exist_ok=True)
(frame_dir / "000001.jpg").write_bytes(b"jpg")
return subprocess.CompletedProcess(
args=command,
returncode=0,
stdout="",
stderr="",
)
with patch("subprocess.run", side_effect=run_success):
sample_video_frames(
{
"video_id": video_id,
"path": str(root / "input.mp4"),
"codec_name": "hevc",
"requested_begin": 1000,
"requested_end": 1600,
},
root,
{
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"hwaccel": "cuda",
"codec_decoders": {"hevc": "hevc_cuvid"},
"frame_fps": 1,
"frame_width": 640,
"jpeg_quality": 4,
"timeout_seconds_per_video": 30,
},
)
self.assertIn("-frames:v", captured_command)
frames_flag_index = captured_command.index("-frames:v")
self.assertEqual(captured_command[frames_flag_index + 1], "601")
def test_sample_video_frames_limits_decode_window_to_requested_duration(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
video_id = "video-abc"
frame_dir = root / "frames" / video_id
captured_command = []
def run_success(command, *args, **kwargs):
captured_command.extend(command)
frame_dir.mkdir(parents=True, exist_ok=True)
(frame_dir / "000001.jpg").write_bytes(b"jpg")
return subprocess.CompletedProcess(
args=command,
returncode=0,
stdout="",
stderr="",
)
with patch("subprocess.run", side_effect=run_success):
sample_video_frames(
{
"video_id": video_id,
"path": str(root / "input.mp4"),
"codec_name": "hevc",
"requested_begin": 1000,
"requested_end": 1600,
"duration_seconds": 104259.921,
},
root,
{
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"hwaccel": "cuda",
"codec_decoders": {"hevc": "hevc_cuvid"},
"frame_fps": 1,
"frame_width": 640,
"jpeg_quality": 4,
"timeout_seconds_per_video": 30,
},
)
self.assertIn("-t", captured_command)
input_index = captured_command.index("-i")
t_flag_index = captured_command.index("-t")
vf_index = captured_command.index("-vf")
self.assertLess(input_index, t_flag_index)
self.assertLess(t_flag_index, vf_index)
self.assertEqual(captured_command[t_flag_index + 1], "600")
def test_sample_video_frames_uses_complete_frames_when_ffmpeg_exits_nonzero(self):
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
video_id = "video-abc"
frame_dir = root / "frames" / video_id
manifest_path = root / "frame_manifest.jsonl"
def run_with_nonzero_exit(command, *args, **kwargs):
frame_dir.mkdir(parents=True, exist_ok=True)
for index in range(1, 602):
(frame_dir / f"{index:06d}.jpg").write_bytes(b"jpg")
raise subprocess.CalledProcessError(
returncode=1,
cmd=command,
stderr="trailing decoder error after requested frames",
)
with patch("subprocess.run", side_effect=run_with_nonzero_exit):
records = sample_video_frames(
{
"video_id": video_id,
"path": str(root / "input.mp4"),
"codec_name": "hevc",
"requested_begin": 1000,
"requested_end": 1600,
},
root,
{
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"hwaccel": "cuda",
"codec_decoders": {"hevc": "hevc_cuvid"},
"frame_fps": 1,
"frame_width": 640,
"jpeg_quality": 4,
"timeout_seconds_per_video": 30,
},
manifest_path=manifest_path,
)
self.assertEqual(len(records), 601)
self.assertEqual({record["status"] for record in records}, {"sampled"})
self.assertIn("-t", records[0]["ffmpeg_command"])
self.assertIn("trailing decoder error", records[0]["stderr_summary"])
persisted = [
json.loads(line)
for line in manifest_path.read_text(encoding="utf-8").splitlines()
]
self.assertEqual(persisted, records)
if __name__ == "__main__":
unittest.main()

61
tests/test_frames.py Normal file
View File

@@ -0,0 +1,61 @@
import tempfile
import unittest
from pathlib import Path
from video_ai_analysis_poc.frames import build_frame_records, seconds_to_timecode
class FrameTests(unittest.TestCase):
def test_seconds_to_timecode_formats_relative_offsets(self):
self.assertEqual(seconds_to_timecode(0), "00:00:00")
self.assertEqual(seconds_to_timecode(65.2), "00:01:05")
self.assertEqual(seconds_to_timecode(3661), "01:01:01")
def test_build_frame_records_uses_stable_paths_and_offsets(self):
with tempfile.TemporaryDirectory() as tmp:
frame_dir = Path(tmp) / "frames" / "video-abc"
frame_dir.mkdir(parents=True)
first = frame_dir / "000001.jpg"
second = frame_dir / "000002.jpg"
first.write_bytes(b"jpg")
second.write_bytes(b"jpg")
records = build_frame_records(
"video-abc",
Path(tmp),
[first, second],
frame_fps=1,
)
self.assertEqual(records[0]["frame_id"], "video-abc_f000001")
self.assertEqual(records[0]["frame_path"], "frames/video-abc/000001.jpg")
self.assertEqual(records[0]["offset_seconds"], 0.0)
self.assertEqual(records[0]["timecode"], "00:00:00")
self.assertEqual(records[0]["pts_time"], 0.0)
self.assertEqual(records[0]["status"], "sampled")
self.assertEqual(records[1]["offset_seconds"], 1.0)
def test_build_frame_records_adds_beijing_time_from_timeline_epoch(self):
with tempfile.TemporaryDirectory() as tmp:
frame_dir = Path(tmp) / "frames" / "video-abc"
frame_dir.mkdir(parents=True)
first = frame_dir / "000001.jpg"
second = frame_dir / "000002.jpg"
first.write_bytes(b"jpg")
second.write_bytes(b"jpg")
records = build_frame_records(
"video-abc",
Path(tmp),
[first, second],
frame_fps=1,
timeline_start_epoch=1781478000,
timezone_name="Asia/Shanghai",
)
self.assertEqual(records[0]["beijing_time"], "2026-06-15 07:00:00")
self.assertEqual(records[1]["beijing_time"], "2026-06-15 07:00:01")
if __name__ == "__main__":
unittest.main()

554
tests/test_hik_cloud.py Normal file
View File

@@ -0,0 +1,554 @@
import os
import tempfile
import unittest
from datetime import datetime
from pathlib import Path
from unittest.mock import patch
from zoneinfo import ZoneInfo
from video_ai_analysis_poc import hik_cloud
from video_ai_analysis_poc.hik_cloud import (
build_download_chunks,
request_download_address,
resolve_access_token,
)
from video_ai_analysis_poc.manifest import read_jsonl, write_manifest
class HikCloudTests(unittest.TestCase):
def test_build_download_chunks_defaults_to_600_second_chunks(self):
config = {
"runtime": {"timezone": "Asia/Shanghai"},
"hik_cloud": {
"devices": [
{
"device_serial": "EXAMPLE_DEVICE_SERIAL",
"channel_no": 1,
"name": "front",
}
],
"time_ranges": [
{
"begin": "2026-02-03 09:00:00",
"end": "2026-02-03 10:30:00",
}
],
},
}
chunks = build_download_chunks(config)
requested_begin = int(
datetime(2026, 2, 3, 9, 0, 0, tzinfo=ZoneInfo("Asia/Shanghai")).timestamp()
)
requested_end = int(
datetime(2026, 2, 3, 10, 30, 0, tzinfo=ZoneInfo("Asia/Shanghai")).timestamp()
)
self.assertEqual(len(chunks), 9)
self.assertEqual(chunks[0]["time_begin"], requested_begin)
self.assertEqual(chunks[0]["time_end"], requested_begin + 600)
self.assertEqual(chunks[-1]["time_begin"], requested_begin + 4800)
self.assertEqual(chunks[-1]["time_end"], requested_end)
for chunk in chunks:
self.assertLessEqual(chunk["time_end"] - chunk["time_begin"], 600)
def test_build_download_chunks_allows_explicit_3600_second_chunks(self):
config = {
"runtime": {"timezone": "Asia/Shanghai"},
"hik_cloud": {
"chunk_seconds": 3600,
"devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
"time_ranges": [
{
"begin": "2026-02-03 09:00:00",
"end": "2026-02-03 10:30:00",
}
],
},
}
chunks = build_download_chunks(config)
requested_begin = int(
datetime(2026, 2, 3, 9, 0, 0, tzinfo=ZoneInfo("Asia/Shanghai")).timestamp()
)
requested_end = int(
datetime(2026, 2, 3, 10, 30, 0, tzinfo=ZoneInfo("Asia/Shanghai")).timestamp()
)
self.assertEqual(len(chunks), 2)
self.assertEqual(chunks[0]["time_begin"], requested_begin)
self.assertEqual(chunks[0]["time_end"], requested_begin + 3600)
self.assertEqual(chunks[1]["time_begin"], requested_begin + 3600)
self.assertEqual(chunks[1]["time_end"], requested_end)
for chunk in chunks:
self.assertLessEqual(chunk["time_end"] - chunk["time_begin"], 3600)
def test_build_download_chunks_accepts_epoch_time_ranges(self):
config = {
"hik_cloud": {
"devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
"time_ranges": [{"begin": 1770080400, "end": 1770084000.0}],
}
}
chunks = build_download_chunks(config)
self.assertEqual(len(chunks), 6)
self.assertEqual(chunks[0]["time_begin"], 1770080400)
self.assertEqual(chunks[0]["time_end"], 1770081000)
self.assertEqual(chunks[-1]["time_begin"], 1770083400)
self.assertEqual(chunks[-1]["time_end"], 1770084000)
def test_build_download_chunks_rejects_end_before_begin(self):
config = {
"hik_cloud": {
"devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
"time_ranges": [
{
"begin": "2026-02-03 10:30:00",
"end": "2026-02-03 09:00:00",
}
],
},
}
with self.assertRaisesRegex(ValueError, "end must be after begin"):
build_download_chunks(config)
def test_build_download_chunks_rejects_chunk_seconds_over_3600(self):
config = {
"hik_cloud": {
"chunk_seconds": 7200,
"devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
"time_ranges": [
{
"begin": "2026-02-03 09:00:00",
"end": "2026-02-03 11:30:00",
}
],
},
}
with self.assertRaisesRegex(
ValueError, "chunk_seconds must be less than or equal to 3600"
):
build_download_chunks(config)
def test_resolve_access_token_prefers_literal_token_over_environment(self):
config = {
"hik_cloud": {
"access_token": "DIRECT_TOKEN",
"access_token_env": "HIK_CLOUD_ACCESS_TOKEN",
}
}
with patch.dict(os.environ, {"HIK_CLOUD_ACCESS_TOKEN": "ENV_TOKEN"}):
token = resolve_access_token(config)
self.assertEqual(token, "DIRECT_TOKEN")
def test_resolve_access_token_reads_configured_environment_variable(self):
hik_config = {"access_token_env": "HIK_CLOUD_ACCESS_TOKEN"}
with patch.dict(os.environ, {"HIK_CLOUD_ACCESS_TOKEN": "ENV_TOKEN"}):
token = resolve_access_token(hik_config)
self.assertEqual(token, "ENV_TOKEN")
def test_resolve_access_token_raises_without_leaking_secret_values(self):
hik_config = {"access_token_env": "HIK_CLOUD_ACCESS_TOKEN"}
with patch.dict(os.environ, {}, clear=True):
with self.assertRaises(ValueError) as raised:
resolve_access_token(hik_config)
message = str(raised.exception)
self.assertIn("access_token", message)
self.assertNotIn("TOKEN", message)
def test_request_download_address_posts_expected_request_and_returns_success(self):
chunk = {
"device_serial": "EXAMPLE_DEVICE_SERIAL",
"channel_no": 1,
"requested_begin": 1764856787,
"requested_end": 1764856978,
"time_begin": 1764856787,
"time_end": 1764856978,
}
hik_config = {
"api_base_url": "https://api2.hik-cloud.com/",
"download_path": "/v1/carrier/cstorage/open/play/download",
"access_token": "TOKEN",
"timeout_seconds": 12,
}
calls = []
def fake_http_post(url, json_body, headers, timeout_seconds):
calls.append(
{
"url": url,
"json_body": json_body,
"headers": headers,
"timeout_seconds": timeout_seconds,
}
)
return {
"code": 0,
"success": True,
"data": {
"url": "https://download.example/video.mp4?sig=abc",
"actualBeginTime": "1764856787",
"actualEndTime": "1764856978",
},
}
result = request_download_address(chunk, hik_config, http_post=fake_http_post)
self.assertEqual(len(calls), 1)
self.assertEqual(
calls[0]["url"],
"https://api2.hik-cloud.com/v1/carrier/cstorage/open/play/download",
)
self.assertEqual(calls[0]["headers"]["Authorization"], "bearer TOKEN")
self.assertEqual(calls[0]["headers"]["Content-Type"], "application/json")
self.assertEqual(
calls[0]["json_body"],
{
"deviceSerial": "EXAMPLE_DEVICE_SERIAL",
"channelNo": 1,
"timeBegin": 1764856787,
"timeEnd": 1764856978,
},
)
self.assertEqual(calls[0]["timeout_seconds"], 12)
self.assertEqual(result["status"], "address_ok")
self.assertEqual(result["url"], "https://download.example/video.mp4?sig=abc")
self.assertEqual(result["actual_begin"], 1764856787)
self.assertEqual(result["actual_end"], 1764856978)
self.assertEqual(result["device_serial"], "EXAMPLE_DEVICE_SERIAL")
self.assertEqual(result["channel_no"], 1)
self.assertEqual(result["requested_begin"], 1764856787)
self.assertEqual(result["requested_end"], 1764856978)
def test_request_download_address_returns_no_recording_for_known_empty_code(self):
chunk = {
"device_serial": "EXAMPLE_DEVICE_SERIAL",
"channel_no": 1,
"requested_begin": 1764856787,
"requested_end": 1764856978,
"time_begin": 1764856787,
"time_end": 1764856978,
}
hik_config = {
"api_base_url": "https://api2.hik-cloud.com",
"download_path": "/v1/carrier/cstorage/open/play/download",
"access_token": "TOKEN",
}
def fake_http_post(url, json_body, headers, timeout_seconds):
return {"code": 80438027, "msg": "no recording"}
result = request_download_address(chunk, hik_config, http_post=fake_http_post)
self.assertEqual(result["status"], "no_recording")
self.assertEqual(result["code"], 80438027)
self.assertEqual(result["device_serial"], "EXAMPLE_DEVICE_SERIAL")
self.assertNotIn("url", result)
def test_request_download_address_returns_sanitized_failure_for_other_codes(self):
chunk = {
"device_serial": "EXAMPLE_DEVICE_SERIAL",
"channel_no": 1,
"requested_begin": 1764856787,
"requested_end": 1764856978,
"time_begin": 1764856787,
"time_end": 1764856978,
}
hik_config = {
"api_base_url": "https://api2.hik-cloud.com",
"download_path": "/v1/carrier/cstorage/open/play/download",
"access_token": "TOKEN",
}
def fake_http_post(url, json_body, headers, timeout_seconds):
return {"code": 80430002, "msg": "bad TOKEN Authorization request"}
result = request_download_address(chunk, hik_config, http_post=fake_http_post)
self.assertEqual(result["status"], "address_failed")
self.assertEqual(result["code"], 80430002)
self.assertIn("last_error", result)
self.assertNotIn("TOKEN", str(result))
self.assertNotIn("Authorization", str(result))
def test_download_hik_cloud_recordings_writes_file_records_and_manifest(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp)
config = _download_config()
address_calls = []
download_calls = []
def fake_address_client(chunk, hik_config):
address_calls.append((chunk, hik_config))
return {
**chunk,
"status": "address_ok",
"url": (
"https://download.example/video.mp4?"
"sign=SECRET&sig=SECRET&TOKEN=SECRET"
),
"actual_begin": chunk["time_begin"] + 1,
"actual_end": chunk["time_end"] - 1,
}
def fake_download_url(url, timeout_seconds=None):
download_calls.append((url, timeout_seconds))
return b"fake mp4 bytes"
records = hik_cloud.download_hik_cloud_recordings(
config,
output_dir,
address_client=fake_address_client,
download_url=fake_download_url,
)
self.assertEqual(len(address_calls), 1)
self.assertEqual(len(download_calls), 1)
self.assertEqual(download_calls[0][1], 600)
expected_path = (
output_dir
/ "downloads"
/ "hik_cloud"
/ "EXAMPLE_DEVICE_SERIAL"
/ "ch1"
/ "EXAMPLE_DEVICE_SERIAL_ch1_1764856787_1764856978.mp4"
).resolve(strict=False)
self.assertEqual(expected_path.read_bytes(), b"fake mp4 bytes")
self.assertEqual(len(records), 1)
self.assertEqual(records[0]["path"], str(expected_path))
self.assertEqual(records[0]["source"], "hik_cloud")
self.assertEqual(records[0]["source_path"], "hik_cloud://EXAMPLE_DEVICE_SERIAL/ch1/1764856787-1764856978")
self.assertEqual(records[0]["device_serial"], "EXAMPLE_DEVICE_SERIAL")
self.assertEqual(records[0]["channel_no"], 1)
self.assertEqual(records[0]["requested_begin"], 1764856787)
self.assertEqual(records[0]["requested_end"], 1764856978)
self.assertEqual(records[0]["actual_begin"], 1764856788)
self.assertEqual(records[0]["actual_end"], 1764856977)
self.assertEqual(records[0]["status"], "downloaded")
manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
self.assertEqual(len(manifest), 1)
self.assertEqual(manifest[0]["status"], "downloaded")
self.assertIsNone(manifest[0]["last_error"])
self.assertEqual(manifest[0]["download_url_host"], "download.example")
self.assertEqual(manifest[0]["path"], str(expected_path))
serialized_path = expected_path.name
serialized_manifest = str(manifest)
self.assertNotIn("sign=", serialized_path)
self.assertNotIn("sig=", serialized_path)
self.assertNotIn("TOKEN", serialized_path)
self.assertNotIn("sign=", serialized_manifest)
self.assertNotIn("sig=", serialized_manifest)
self.assertNotIn("TOKEN", serialized_manifest)
def test_download_hik_cloud_recordings_can_plan_without_downloading(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp)
config = _download_config()
download_calls = []
def fake_address_client(chunk, hik_config):
return {
**chunk,
"status": "address_ok",
"url": (
"https://download.example/video.mp4?"
"sign=SECRET&sig=SECRET&TOKEN=SECRET"
),
"actual_begin": chunk["time_begin"],
"actual_end": chunk["time_end"],
}
def fake_download_url(url, timeout_seconds=None):
download_calls.append(url)
return b"unexpected"
records = hik_cloud.download_hik_cloud_recordings(
config,
output_dir,
address_client=fake_address_client,
download_url=fake_download_url,
download=False,
)
self.assertEqual(records, [])
self.assertEqual(download_calls, [])
manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
self.assertEqual(len(manifest), 1)
self.assertEqual(manifest[0]["status"], "address_ok")
self.assertIsNone(manifest[0]["path"])
self.assertEqual(manifest[0]["download_url_host"], "download.example")
self.assertNotIn("sign=", str(manifest))
self.assertNotIn("sig=", str(manifest))
self.assertNotIn("TOKEN", str(manifest))
def test_download_hik_cloud_recordings_records_empty_and_address_failures(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp)
config = _download_config(
time_ranges=[
{"begin": 1764856787, "end": 1764856978},
{"begin": 1764857000, "end": 1764857100},
]
)
statuses = ["no_recording", "address_failed"]
download_calls = []
def fake_address_client(chunk, hik_config):
status = statuses.pop(0)
return {
**chunk,
"status": status,
"actual_begin": None,
"actual_end": None,
"last_error": None if status == "no_recording" else "api failed",
}
def fake_download_url(url, timeout_seconds=None):
download_calls.append(url)
return b"unexpected"
records = hik_cloud.download_hik_cloud_recordings(
config,
output_dir,
address_client=fake_address_client,
download_url=fake_download_url,
)
self.assertEqual(records, [])
self.assertEqual(download_calls, [])
manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
self.assertEqual([record["status"] for record in manifest], ["no_recording", "address_failed"])
def test_download_hik_cloud_recordings_records_download_failure_and_continues(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp)
config = _download_config(
time_ranges=[
{"begin": 1764856787, "end": 1764856978},
{"begin": 1764857000, "end": 1764857100},
]
)
download_calls = []
def fake_address_client(chunk, hik_config):
return {
**chunk,
"status": "address_ok",
"url": (
"https://download.example/video.mp4?"
"sign=SECRET&sig=SECRET&TOKEN=SECRET"
),
"actual_begin": chunk["time_begin"],
"actual_end": chunk["time_end"],
}
def fake_download_url(url, timeout_seconds=None):
download_calls.append(url)
if len(download_calls) == 1:
raise RuntimeError(
"download failed for query sign=SECRET&sig=SECRET&TOKEN=SECRET"
)
return b"second chunk"
records = hik_cloud.download_hik_cloud_recordings(
config,
output_dir,
address_client=fake_address_client,
download_url=fake_download_url,
)
self.assertEqual(len(download_calls), 2)
self.assertEqual(len(records), 1)
self.assertEqual(records[0]["status"], "downloaded")
manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
self.assertEqual([record["status"] for record in manifest], ["download_failed", "downloaded"])
self.assertIn("last_error", manifest[0])
self.assertNotIn("sign=", str(manifest))
self.assertNotIn("sig=", str(manifest))
self.assertNotIn("TOKEN", str(manifest))
self.assertNotIn("SECRET", str(manifest))
def test_download_hik_cloud_recordings_resume_skips_existing_downloaded_file(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp)
config = _download_config(resume=True)
downloaded_path = (
output_dir
/ "downloads"
/ "hik_cloud"
/ "EXAMPLE_DEVICE_SERIAL"
/ "ch1"
/ "EXAMPLE_DEVICE_SERIAL_ch1_1764856787_1764856978.mp4"
)
downloaded_path.parent.mkdir(parents=True, exist_ok=True)
downloaded_path.write_bytes(b"existing")
existing_record = {
"source": "hik_cloud",
"path": str(downloaded_path),
"device_serial": "EXAMPLE_DEVICE_SERIAL",
"channel_no": 1,
"requested_begin": 1764856787,
"requested_end": 1764856978,
"actual_begin": 1764856787,
"actual_end": 1764856978,
"status": "downloaded",
"retry_count": 0,
"last_error": None,
}
write_manifest(
output_dir / "hik_cloud_download_manifest.jsonl",
[existing_record],
)
def failing_address_client(chunk, hik_config):
raise AssertionError("resume should skip address lookup")
def failing_download_url(url, timeout_seconds=None):
raise AssertionError("resume should skip download")
records = hik_cloud.download_hik_cloud_recordings(
config,
output_dir,
address_client=failing_address_client,
download_url=failing_download_url,
)
expected_video_record = {
**existing_record,
"source_path": "hik_cloud://EXAMPLE_DEVICE_SERIAL/ch1/1764856787-1764856978",
}
self.assertEqual(records, [expected_video_record])
manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
self.assertEqual(manifest, [existing_record])
def _download_config(
*,
time_ranges=None,
resume: bool = False,
):
return {
"output": {"resume": resume},
"hik_cloud": {
"access_token": "TOKEN",
"download_timeout_seconds": 600,
"devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
"time_ranges": time_ranges
or [{"begin": 1764856787, "end": 1764856978}],
},
}
if __name__ == "__main__":
unittest.main()

30
tests/test_manifest.py Normal file
View File

@@ -0,0 +1,30 @@
import json
import tempfile
import unittest
from pathlib import Path
from video_ai_analysis_poc.manifest import read_jsonl, write_manifest
class ManifestTests(unittest.TestCase):
def test_write_manifest_writes_status_retry_and_error_fields(self):
with tempfile.TemporaryDirectory() as tmp:
path = Path(tmp) / "video_manifest.jsonl"
records = [
{"path": "/tmp/a.mp4", "status": "probed"},
{"path": "/tmp/b.mp4", "status": "probe_failed", "last_error": "bad data"},
]
write_manifest(path, records)
lines = path.read_text(encoding="utf-8").splitlines()
decoded = [json.loads(line) for line in lines]
self.assertEqual(decoded[0]["retry_count"], 0)
self.assertIsNone(decoded[0]["last_error"])
self.assertEqual(decoded[1]["status"], "probe_failed")
self.assertEqual(decoded[1]["last_error"], "bad data")
self.assertEqual(read_jsonl(path), decoded)
if __name__ == "__main__":
unittest.main()

51
tests/test_probe.py Normal file
View File

@@ -0,0 +1,51 @@
import subprocess
import unittest
from pathlib import Path
from unittest.mock import patch
from video_ai_analysis_poc.probe import probe_video
class ProbeTests(unittest.TestCase):
def test_probe_video_returns_structured_metadata(self):
payload = (
'{"streams":[{"codec_type":"video","codec_name":"h264",'
'"width":1920,"height":1080,"avg_frame_rate":"30000/1001"}],'
'"format":{"duration":"12.5","format_name":"mov,mp4,m4a,3gp,3g2,mj2",'
'"start_time":"0.000000"}}'
)
completed = subprocess.CompletedProcess(
args=["ffprobe"],
returncode=0,
stdout=payload,
stderr="",
)
with patch("subprocess.run", return_value=completed):
result = probe_video(Path("/tmp/video.mp4"), timeout_seconds=3)
self.assertEqual(result["status"], "probed")
self.assertEqual(result["codec_name"], "h264")
self.assertEqual(result["width"], 1920)
self.assertEqual(result["height"], 1080)
self.assertAlmostEqual(result["fps"], 29.97002997)
self.assertEqual(result["duration_seconds"], 12.5)
self.assertIsNone(result["last_error"])
def test_probe_video_returns_structured_failure(self):
failure = subprocess.CalledProcessError(
returncode=1,
cmd=["ffprobe"],
stderr="Invalid data found when processing input",
)
with patch("subprocess.run", side_effect=failure):
result = probe_video(Path("/tmp/bad.mp4"), timeout_seconds=3)
self.assertEqual(result["status"], "probe_failed")
self.assertEqual(result["retry_count"], 0)
self.assertIn("Invalid data", result["last_error"])
if __name__ == "__main__":
unittest.main()

135
tests/test_result_parser.py Normal file
View File

@@ -0,0 +1,135 @@
import unittest
from video_ai_analysis_poc.result_parser import build_clip_result, extract_json_payload
class ResultParserTests(unittest.TestCase):
def test_extract_json_payload_handles_markdown_and_prose(self):
payload = extract_json_payload(
"analysis follows\n```json\n{\"screen_time\":\"12:31:20\",\"events\":[]}\n```"
)
self.assertEqual(payload, {"screen_time": "12:31:20", "events": []})
def test_build_clip_result_preserves_timeline_screen_time_and_events(self):
clip_record = {
"video_id": "video-abc",
"clip_id": "video-abc_c000001",
"clip_start_seconds": 120.0,
"clip_end_seconds": 130.0,
"clip_start_timecode": "00:02:00",
"clip_end_timecode": "00:02:10",
"clip_start_beijing_time": "2026-06-15 07:02:00",
"clip_end_beijing_time": "2026-06-15 07:02:10",
"frame_times": [
{
"frame_path": "frames/video-abc/000120.jpg",
"offset_seconds": 120.0,
"timecode": "00:02:00",
"beijing_time": "2026-06-15 07:02:00",
}
],
}
raw_response = (
"Here is the result: "
"{\"画面时间\":\"2026-06-14 12:31:20\","
"\"events\":[{\"event_type\":\"queue_detected\",\"confidence\":0.86}]}"
)
result = build_clip_result(
raw_response,
clip_record,
{"path": "/videos/a.mp4"},
{
"schema": {"version": "local-batch-v1"},
"runtime": {"timezone": "Asia/Shanghai"},
},
processing={"latency_ms": 1800},
)
self.assertEqual(result["schema_version"], "local-batch-v1")
self.assertEqual(result["video_id"], "video-abc")
self.assertEqual(result["video_path"], "/videos/a.mp4")
self.assertEqual(result["clip_id"], "video-abc_c000001")
self.assertEqual(result["status"], "ok")
self.assertEqual(result["monitoring_timeline"]["timezone"], "Asia/Shanghai")
self.assertIsNone(result["monitoring_timeline"]["video_start_time"])
self.assertEqual(
result["monitoring_timeline"]["clip_start_beijing_time"],
"2026-06-15 07:02:00",
)
self.assertEqual(
result["monitoring_timeline"]["clip_end_beijing_time"],
"2026-06-15 07:02:10",
)
self.assertEqual(result["monitoring_timeline"]["frame_times"], clip_record["frame_times"])
self.assertEqual(
result["monitoring_timeline"]["screen_time"],
"2026-06-14 12:31:20",
)
self.assertEqual(result["events"][0]["event_type"], "queue_detected")
self.assertEqual(result["events"][0]["start_offset_seconds"], 120.0)
self.assertEqual(result["events"][0]["end_offset_seconds"], 130.0)
self.assertEqual(result["raw_response"], raw_response)
self.assertEqual(result["processing"]["latency_ms"], 1800)
self.assertIsNone(result["error"])
def test_build_clip_result_reads_zhengxin_time_key(self):
result = build_clip_result(
(
'{"Action":"Action_Idle","quality_status":"qualified",'
'"error_type":"","安全隐患":"","人物位置":"","总结":"",'
'"时间":"2026-06-14 12:31:20","employees":[],"guests":[]}'
),
{
"video_id": "video-abc",
"clip_id": "video-abc_c000001",
"clip_start_seconds": 0.0,
"clip_end_seconds": 10.0,
"clip_start_timecode": "00:00:00",
"clip_end_timecode": "00:00:10",
"frame_times": [],
},
{"path": "/videos/a.mp4"},
{
"schema": {"version": "local-batch-v1"},
"runtime": {"timezone": "Asia/Shanghai"},
},
processing={},
)
self.assertEqual(result["status"], "ok")
self.assertEqual(
result["monitoring_timeline"]["screen_time"],
"2026-06-14 12:31:20",
)
def test_build_clip_result_records_parse_failure_without_crashing(self):
result = build_clip_result(
"not json",
{
"video_id": "video-abc",
"clip_id": "video-abc_c000001",
"clip_start_seconds": 0.0,
"clip_end_seconds": 10.0,
"clip_start_timecode": "00:00:00",
"clip_end_timecode": "00:00:10",
"frame_times": [],
},
{"path": "/videos/a.mp4"},
{
"schema": {"version": "local-batch-v1"},
"runtime": {"timezone": "Asia/Shanghai"},
},
processing={},
)
self.assertEqual(result["status"], "parse_failed")
self.assertEqual(result["events"], [])
self.assertEqual(result["monitoring_timeline"]["screen_time"], "")
self.assertEqual(result["raw_response"], "not json")
self.assertIn("JSON", result["error"])
if __name__ == "__main__":
unittest.main()

85
tests/test_vlm_client.py Normal file
View File

@@ -0,0 +1,85 @@
import base64
import json
import tempfile
import unittest
from pathlib import Path
from video_ai_analysis_poc.vlm_client import infer_clip
class VlmClientTests(unittest.TestCase):
def test_infer_clip_uses_config_prompt_url_and_data_uri_images(self):
with tempfile.TemporaryDirectory() as tmp:
output_dir = Path(tmp)
frame_path = output_dir / "frames" / "video-abc" / "000001.jpg"
frame_path.parent.mkdir(parents=True)
frame_path.write_bytes(b"jpg-bytes")
calls = []
def http_post(url, payload, timeout_seconds):
calls.append((url, payload, timeout_seconds))
return {
"status": 200,
"body": {
"choices": [
{
"message": {
"content": json.dumps(
{"screen_time": "10:00:01", "events": []}
)
}
}
]
},
}
result = infer_clip(
{
"clip_id": "video-abc_c000001",
"frame_times": [
{
"frame_path": "frames/video-abc/000001.jpg",
"offset_seconds": 0.0,
"timecode": "00:00:00",
}
],
},
output_dir,
{
"api_base_url": "http://localhost:8679/",
"chat_completions_path": "/v1/chat/completions",
"model": "memai-zhengxin-v3-20260413",
"timeout_seconds": 17,
"max_tokens": 256,
"temperature": 0,
"image_transport": "data_uri",
},
{
"system": "system prompt from config",
"user": "user prompt from config",
},
http_post=http_post,
)
self.assertEqual(result["raw_response"], '{"screen_time": "10:00:01", "events": []}')
self.assertEqual(len(calls), 1)
url, payload, timeout_seconds = calls[0]
self.assertEqual(url, "http://localhost:8679/v1/chat/completions")
self.assertEqual(timeout_seconds, 17)
self.assertEqual(payload["model"], "memai-zhengxin-v3-20260413")
self.assertEqual(payload["messages"][0]["role"], "system")
self.assertEqual(payload["messages"][0]["content"], "system prompt from config")
user_content = payload["messages"][1]["content"]
self.assertEqual(user_content[0], {"type": "text", "text": "user prompt from config"})
self.assertEqual(user_content[1]["type"], "image_url")
expected_data = base64.b64encode(b"jpg-bytes").decode("ascii")
self.assertEqual(
user_content[1]["image_url"]["url"],
f"data:image/jpeg;base64,{expected_data}",
)
self.assertEqual(result["http_status"], 200)
self.assertIsInstance(result["latency_ms"], int)
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,9 @@
"""Local video batch analysis PoC."""
__all__ = [
"config",
"discovery",
"manifest",
"paths",
"probe",
]

View File

@@ -0,0 +1,403 @@
from __future__ import annotations
import json
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
from .manifest import read_jsonl
def aggregate_outputs(
output_dir: str | Path,
config: dict[str, Any],
) -> dict[str, Any]:
root = Path(output_dir).expanduser().resolve(strict=False)
started_at = _now_iso()
video_records = read_jsonl(root / "video_manifest.jsonl")
clip_records = read_jsonl(root / "clip_manifest.jsonl")
clip_results = read_jsonl(root / "clip_results.jsonl")
schema_version = str(config.get("schema", {}).get("version", "local-batch-v1"))
merge_gap_seconds = float(config.get("schema", {}).get("merge_gap_seconds", 30))
clips_by_video = _group_by_video(clip_records)
results_by_video = _group_by_video(clip_results)
videos_summary = []
folder_event_counts: dict[str, int] = {}
processed_video_count = 0
failed_video_count = 0
for video_record in video_records:
video_id = str(video_record.get("video_id") or "")
if not video_id:
continue
video_clips = clips_by_video.get(video_id, [])
video_results = results_by_video.get(video_id, [])
video_result = _build_video_result(
video_record,
video_clips,
video_results,
schema_version=schema_version,
merge_gap_seconds=merge_gap_seconds,
started_at=started_at,
)
result_path = root / "videos" / video_id / "video_result.json"
_write_json(result_path, video_result)
failed_clip_count = int(video_result["failed_clip_count"])
video_failed = video_record.get("status") != "probed" or failed_clip_count > 0
if video_failed:
failed_video_count += 1
else:
processed_video_count += 1
for event_type, count in video_result["event_counts"].items():
folder_event_counts[event_type] = folder_event_counts.get(event_type, 0) + int(count)
videos_summary.append(
{
"video_id": video_id,
"video_path": video_result["video_path"],
"status": "failed" if video_failed else "processed",
"clip_count": video_result["clip_count"],
"failed_clip_count": failed_clip_count,
"failed_clip_counts": video_result["failed_clip_counts"],
"event_counts": video_result["event_counts"],
"outputs": {"video_result_json": f"videos/{video_id}/video_result.json"},
"error": video_record.get("last_error"),
}
)
folder_summary = {
"schema_version": schema_version,
"input_dir": str(config.get("input", {}).get("dir")),
"video_count": len(video_records),
"processed_video_count": processed_video_count,
"failed_video_count": failed_video_count,
"event_counts": dict(sorted(folder_event_counts.items())),
"videos": videos_summary,
"processing": {
"started_at": started_at,
"finished_at": _now_iso(),
},
}
_write_json(root / "folder_summary.json", folder_summary)
return folder_summary
def _build_video_result(
video_record: dict[str, Any],
clip_records: list[dict[str, Any]],
clip_results: list[dict[str, Any]],
*,
schema_version: str,
merge_gap_seconds: float,
started_at: str,
) -> dict[str, Any]:
video_id = str(video_record.get("video_id"))
failed_clip_counts = _failed_clip_counts(clip_results)
merged_events = _merge_events(_event_records(clip_results), merge_gap_seconds)
event_counts = _event_counts(merged_events)
video_duration = _first_present(
video_record,
("duration_seconds", "video_duration_seconds", "duration"),
)
video_start_time = _video_start_time(video_record, clip_results)
return {
"schema_version": schema_version,
"video_id": video_id,
"video_path": _video_path(video_record, clip_results),
"probe": _probe(video_record),
"monitoring_timeline": {
"video_start_time": video_start_time,
"video_duration_seconds": video_duration,
},
"clip_count": len(clip_records),
"failed_clip_count": sum(failed_clip_counts.values()),
"failed_clip_counts": failed_clip_counts,
"event_counts": event_counts,
"events": merged_events,
"outputs": {"clip_results_jsonl": "clip_results.jsonl"},
"processing": {
"started_at": started_at,
"finished_at": _now_iso(),
},
}
def _event_records(clip_results: list[dict[str, Any]]) -> list[dict[str, Any]]:
records = []
for result in clip_results:
if result.get("status") != "ok":
continue
timeline = result.get("monitoring_timeline") or {}
if not isinstance(timeline, dict):
timeline = {}
for event in result.get("events") or []:
if not isinstance(event, dict):
continue
event_record = _normalize_event(event, result, timeline)
records.append(event_record)
return sorted(
records,
key=lambda event: (
str(event.get("video_id")),
str(event.get("event_type")),
float(event.get("start_offset_seconds") or 0),
float(event.get("end_offset_seconds") or 0),
),
)
def _normalize_event(
event: dict[str, Any],
result: dict[str, Any],
timeline: dict[str, Any],
) -> dict[str, Any]:
clip_id = str(result.get("clip_id"))
frame_times = [
dict(frame)
for frame in timeline.get("frame_times", [])
if isinstance(frame, dict)
]
frame_paths = [
str(frame.get("frame_path"))
for frame in frame_times
if frame.get("frame_path") is not None
]
start = event.get("start_offset_seconds", timeline.get("clip_start_seconds"))
end = event.get("end_offset_seconds", timeline.get("clip_end_seconds"))
screen_time = str(timeline.get("screen_time") or "")
normalized = {
"video_id": str(result.get("video_id")),
"event_type": str(event.get("event_type") or "unknown"),
"start_time": event.get("start_time"),
"end_time": event.get("end_time"),
"start_offset_seconds": _float_or_none(start),
"end_offset_seconds": _float_or_none(end),
"confidence": event.get("confidence"),
"severity": event.get("severity"),
"attributes": event.get("attributes") if isinstance(event.get("attributes"), dict) else {},
"screen_times": [screen_time] if screen_time else [],
"evidence": {
"clip_ids": [clip_id],
"frame_paths": frame_paths,
"frame_times": frame_times,
"clips": [
{
"clip_id": clip_id,
"clip_start_seconds": timeline.get("clip_start_seconds"),
"clip_end_seconds": timeline.get("clip_end_seconds"),
"clip_start_timecode": timeline.get("clip_start_timecode"),
"clip_end_timecode": timeline.get("clip_end_timecode"),
"clip_start_beijing_time": timeline.get("clip_start_beijing_time"),
"clip_end_beijing_time": timeline.get("clip_end_beijing_time"),
"screen_time": screen_time,
}
],
},
"source_event_count": 1,
}
original_evidence = event.get("evidence")
if isinstance(original_evidence, dict):
original_clip_id = original_evidence.get("clip_id")
if original_clip_id:
normalized["evidence"]["clip_ids"] = _unique(
[*normalized["evidence"]["clip_ids"], str(original_clip_id)]
)
original_frame_paths = original_evidence.get("frame_paths")
if isinstance(original_frame_paths, list):
normalized["evidence"]["frame_paths"] = _unique(
[*normalized["evidence"]["frame_paths"], *map(str, original_frame_paths)]
)
return normalized
def _merge_events(
events: list[dict[str, Any]],
merge_gap_seconds: float,
) -> list[dict[str, Any]]:
merged: list[dict[str, Any]] = []
for event in events:
if not merged or not _can_merge(merged[-1], event, merge_gap_seconds):
merged.append(_copy_event(event))
continue
_merge_into(merged[-1], event)
for event in merged:
event.pop("video_id", None)
return merged
def _can_merge(
previous: dict[str, Any],
current: dict[str, Any],
merge_gap_seconds: float,
) -> bool:
if previous.get("video_id") != current.get("video_id"):
return False
if previous.get("event_type") != current.get("event_type"):
return False
previous_end = _float_or_none(previous.get("end_offset_seconds"))
current_start = _float_or_none(current.get("start_offset_seconds"))
if previous_end is None or current_start is None:
return False
return current_start - previous_end <= merge_gap_seconds
def _merge_into(target: dict[str, Any], event: dict[str, Any]) -> None:
target["start_offset_seconds"] = _min_number(
target.get("start_offset_seconds"),
event.get("start_offset_seconds"),
)
target["end_offset_seconds"] = _max_number(
target.get("end_offset_seconds"),
event.get("end_offset_seconds"),
)
target["screen_times"] = _unique(
[*target.get("screen_times", []), *event.get("screen_times", [])]
)
target["source_event_count"] = int(target.get("source_event_count", 1)) + int(
event.get("source_event_count", 1)
)
target["evidence"]["clip_ids"] = _unique(
[*target["evidence"].get("clip_ids", []), *event["evidence"].get("clip_ids", [])]
)
target["evidence"]["frame_paths"] = _unique(
[
*target["evidence"].get("frame_paths", []),
*event["evidence"].get("frame_paths", []),
]
)
target["evidence"]["frame_times"].extend(event["evidence"].get("frame_times", []))
target["evidence"]["clips"].extend(event["evidence"].get("clips", []))
if target.get("confidence") is None:
target["confidence"] = event.get("confidence")
elif event.get("confidence") is not None:
target["confidence"] = max(float(target["confidence"]), float(event["confidence"]))
def _copy_event(event: dict[str, Any]) -> dict[str, Any]:
copied = dict(event)
copied["screen_times"] = list(event.get("screen_times", []))
copied["attributes"] = dict(event.get("attributes", {}))
copied["evidence"] = {
"clip_ids": list(event["evidence"].get("clip_ids", [])),
"frame_paths": list(event["evidence"].get("frame_paths", [])),
"frame_times": [dict(frame) for frame in event["evidence"].get("frame_times", [])],
"clips": [dict(clip) for clip in event["evidence"].get("clips", [])],
}
return copied
def _group_by_video(records: list[dict[str, Any]]) -> dict[str, list[dict[str, Any]]]:
grouped: dict[str, list[dict[str, Any]]] = {}
for record in records:
video_id = record.get("video_id")
if video_id:
grouped.setdefault(str(video_id), []).append(record)
return grouped
def _failed_clip_counts(clip_results: list[dict[str, Any]]) -> dict[str, int]:
counts = {"parse_failed": 0, "inference_failed": 0}
for result in clip_results:
status = result.get("status")
if status in counts:
counts[str(status)] += 1
return counts
def _event_counts(events: list[dict[str, Any]]) -> dict[str, int]:
counts: dict[str, int] = {}
for event in events:
event_type = str(event.get("event_type") or "unknown")
counts[event_type] = counts.get(event_type, 0) + 1
return dict(sorted(counts.items()))
def _probe(video_record: dict[str, Any]) -> dict[str, Any]:
excluded = {"video_id", "path", "source_path", "status", "retry_count", "last_error"}
probe = {
key: value
for key, value in video_record.items()
if key not in excluded
}
probe["status"] = video_record.get("status")
if video_record.get("last_error") is not None:
probe["last_error"] = video_record.get("last_error")
return probe
def _video_path(
video_record: dict[str, Any],
clip_results: list[dict[str, Any]],
) -> str | None:
path = video_record.get("path") or video_record.get("source_path")
if path is not None:
return str(path)
for result in clip_results:
if result.get("video_path") is not None:
return str(result["video_path"])
return None
def _video_start_time(
video_record: dict[str, Any],
clip_results: list[dict[str, Any]],
) -> Any:
if video_record.get("video_start_time") is not None:
return video_record.get("video_start_time")
for result in clip_results:
timeline = result.get("monitoring_timeline")
if isinstance(timeline, dict) and timeline.get("video_start_time") is not None:
return timeline.get("video_start_time")
return None
def _first_present(record: dict[str, Any], keys: tuple[str, ...]) -> Any:
for key in keys:
if record.get(key) is not None:
return record.get(key)
return None
def _float_or_none(value: Any) -> float | None:
if value is None:
return None
try:
return float(value)
except (TypeError, ValueError):
return None
def _min_number(left: Any, right: Any) -> float | None:
values = [value for value in (_float_or_none(left), _float_or_none(right)) if value is not None]
return min(values) if values else None
def _max_number(left: Any, right: Any) -> float | None:
values = [value for value in (_float_or_none(left), _float_or_none(right)) if value is not None]
return max(values) if values else None
def _unique(values: list[Any]) -> list[Any]:
seen = set()
unique_values = []
for value in values:
marker = json.dumps(value, sort_keys=True) if isinstance(value, dict) else value
if marker in seen:
continue
seen.add(marker)
unique_values.append(value)
return unique_values
def _write_json(path: Path, payload: dict[str, Any]) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(
json.dumps(payload, ensure_ascii=False, indent=2, sort_keys=True) + "\n",
encoding="utf-8",
)
def _now_iso() -> str:
return datetime.now(timezone.utc).isoformat()

View File

@@ -0,0 +1,424 @@
from __future__ import annotations
import argparse
import json
from pathlib import Path
from typing import Sequence
from .aggregator import aggregate_outputs
from .clips import build_clip_records
from .config import DEFAULT_CONFIG_PATH, load_config
from .discovery import discover_videos
from .ffmpeg_sampler import sample_video_frames
from .hik_cloud import download_hik_cloud_recordings
from .manifest import read_jsonl, write_manifest
from .paths import stable_video_id
from .probe import probe_video
from .result_parser import build_clip_result
from .timeline import DEFAULT_TIMEZONE, format_beijing_time, timeline_start_epoch
from .vlm_client import infer_clip
def main(argv: Sequence[str] | None = None) -> int:
parser = argparse.ArgumentParser(
description="Local video batch analysis PoC entrypoint."
)
parser.add_argument("--config", default=str(DEFAULT_CONFIG_PATH))
parser.add_argument("--input-dir")
parser.add_argument("--output-dir")
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--until", choices=["clips", "inference"])
parser.add_argument("--limit-clips", type=int)
args = parser.parse_args(argv)
config = load_config(
args.config,
input_dir=args.input_dir,
output_dir=args.output_dir,
)
if args.dry_run and args.until:
parser.error("--dry-run cannot be combined with --until")
if args.limit_clips is not None and args.limit_clips < 0:
parser.error("--limit-clips must be non-negative")
output_dir = Path(config["output"]["dir"])
output_dir.mkdir(parents=True, exist_ok=True)
video_manifest_path = output_dir / "video_manifest.jsonl"
resume_enabled = bool(config.get("output", {}).get("resume", False))
records = _load_resume_records(
video_manifest_path,
resume=resume_enabled,
)
record_indexes = {
_record_key(record): index
for index, record in enumerate(records)
if _record_key(record) is not None
}
try:
_acquire_source_records(
config,
output_dir,
records,
record_indexes,
download_source=not args.dry_run,
)
except ValueError as exc:
parser.error(str(exc))
write_manifest(video_manifest_path, records)
if args.dry_run:
return 0
clip_manifest_path = output_dir / "clip_manifest.jsonl"
existing_clip_records = read_jsonl(clip_manifest_path) if resume_enabled else []
existing_clip_video_ids = {
str(record.get("video_id"))
for record in existing_clip_records
if record.get("video_id")
}
frame_manifest_path = output_dir / "frame_manifest.jsonl"
frame_records = read_jsonl(frame_manifest_path) if resume_enabled else []
timezone_name = str(config.get("runtime", {}).get("timezone", DEFAULT_TIMEZONE))
backfilled_frame_video_ids = _backfill_frame_beijing_times(
frame_records,
records,
timezone_name=timezone_name,
)
existing_sampled_video_ids = {
str(record.get("video_id"))
for record in frame_records
if record.get("status") == "sampled" and record.get("video_id")
}
changed_frame_video_ids: set[str] = set(backfilled_frame_video_ids)
for record in records:
if record.get("status") != "probed":
continue
video_id = str(record.get("video_id"))
if args.until == "inference" and video_id in existing_clip_video_ids:
continue
if video_id in existing_sampled_video_ids:
continue
frame_records = _without_video_records(frame_records, video_id)
ffmpeg_config = dict(config["ffmpeg"])
ffmpeg_config["timezone"] = timezone_name
frame_records.extend(
sample_video_frames(
record,
output_dir,
ffmpeg_config,
manifest_path=None,
)
)
changed_frame_video_ids.add(video_id)
write_manifest(frame_manifest_path, frame_records)
sampled_video_ids = {
str(record.get("video_id"))
for record in frame_records
if record.get("status") == "sampled" and record.get("video_id")
}
clip_rebuild_video_ids = changed_frame_video_ids | (
sampled_video_ids - existing_clip_video_ids
)
clip_records = [
record
for record in existing_clip_records
if str(record.get("video_id")) not in clip_rebuild_video_ids
]
frames_to_build = [
record
for record in frame_records
if str(record.get("video_id")) in clip_rebuild_video_ids
]
clip_records.extend(build_clip_records(frames_to_build, config["clip"]))
write_manifest(output_dir / "clip_manifest.jsonl", clip_records)
if args.until == "clips":
return 0
_run_inference(
clip_records,
records,
output_dir,
config,
limit_clips=args.limit_clips,
resume=resume_enabled,
)
if args.until == "inference":
return 0
aggregate_outputs(output_dir, config)
return 0
def _load_resume_records(path: Path, *, resume: bool) -> list[dict[str, object]]:
if not resume:
return []
return read_jsonl(path)
def _record_key(record: dict[str, object]) -> str | None:
video_id = record.get("video_id")
if video_id:
return str(video_id)
path = record.get("path")
if path:
return stable_video_id(str(path))
return None
def _acquire_source_records(
config: dict[str, object],
output_dir: Path,
records: list[dict[str, object]],
record_indexes: dict[str, int],
*,
download_source: bool = True,
) -> None:
for source_record in _source_video_records(
config,
output_dir,
download_source=download_source,
):
path = source_record.get("path")
if not path:
continue
video_id = stable_video_id(str(path))
existing_index = record_indexes.get(video_id)
if (
existing_index is not None
and records[existing_index].get("status") == "probed"
):
continue
probe_record = probe_video(
str(path),
timeout_seconds=config["ffprobe"]["timeout_seconds"],
)
record = {**source_record, **probe_record, "video_id": video_id}
if existing_index is None:
record_indexes[video_id] = len(records)
records.append(record)
else:
records[existing_index] = record
def _source_video_records(
config: dict[str, object],
output_dir: Path,
*,
download_source: bool = True,
) -> list[dict[str, object]]:
source_config = config.get("source", {})
source_mode = "local"
if isinstance(source_config, dict):
source_mode = str(source_config.get("mode", "local"))
if source_mode == "local":
videos = discover_videos(
config["input"]["dir"],
config["input"]["extensions"],
recursive=config["input"]["recursive"],
)
return [{"path": path} for path in videos]
if source_mode == "hik_cloud":
return [
record
for record in download_hik_cloud_recordings(
config,
output_dir,
download=download_source,
)
if record.get("status") == "downloaded"
]
raise ValueError(f"unsupported source.mode: {source_mode}")
def _without_video_records(
records: list[dict[str, object]],
video_id: str,
) -> list[dict[str, object]]:
return [record for record in records if str(record.get("video_id")) != video_id]
def _backfill_frame_beijing_times(
frame_records: list[dict[str, object]],
video_records: list[dict[str, object]],
*,
timezone_name: str,
) -> set[str]:
video_by_id = {
str(record.get("video_id")): record
for record in video_records
if record.get("video_id")
}
changed_video_ids: set[str] = set()
for frame_record in frame_records:
if frame_record.get("status") != "sampled" or frame_record.get("beijing_time"):
continue
video_id = str(frame_record.get("video_id") or "")
start_epoch = timeline_start_epoch(video_by_id.get(video_id, {}))
beijing_time = format_beijing_time(
start_epoch,
offset_seconds=float(frame_record.get("offset_seconds") or 0),
timezone_name=timezone_name,
)
if beijing_time is None:
continue
frame_record["beijing_time"] = beijing_time
changed_video_ids.add(video_id)
return changed_video_ids
def _run_inference(
clip_records: list[dict[str, object]],
video_records: list[dict[str, object]],
output_dir: Path,
config: dict[str, object],
*,
limit_clips: int | None,
resume: bool,
) -> None:
results_path = output_dir / "clip_results.jsonl"
result_records = read_jsonl(results_path) if resume else []
clip_by_id = {
str(record.get("clip_id")): record
for record in clip_records
if record.get("clip_id")
}
result_records = [
_refresh_result_timeline(record, clip_by_id, config)
for record in result_records
]
ok_clip_ids = {
str(record.get("clip_id"))
for record in result_records
if record.get("status") == "ok" and record.get("clip_id")
}
video_by_id = {
str(record.get("video_id")): record
for record in video_records
if record.get("video_id")
}
processed = 0
for clip_record in clip_records:
clip_id = str(clip_record.get("clip_id"))
if clip_id in ok_clip_ids:
continue
if limit_clips is not None and processed >= limit_clips:
break
result_records = [
record for record in result_records if str(record.get("clip_id")) != clip_id
]
video_record = video_by_id.get(str(clip_record.get("video_id")), {})
result = _infer_and_parse_clip(clip_record, video_record, output_dir, config)
result_records.append(result)
_write_jsonl_exact(results_path, result_records)
processed += 1
_write_jsonl_exact(results_path, result_records)
def _refresh_result_timeline(
result_record: dict[str, object],
clip_by_id: dict[str, dict[str, object]],
config: dict[str, object],
) -> dict[str, object]:
clip_record = clip_by_id.get(str(result_record.get("clip_id")))
if not clip_record:
return result_record
if not _clip_has_beijing_timing(clip_record):
return result_record
timeline = dict(result_record.get("monitoring_timeline") or {})
timeline.update(
{
"timezone": config.get("runtime", {}).get("timezone", DEFAULT_TIMEZONE),
"clip_start_seconds": clip_record.get("clip_start_seconds"),
"clip_end_seconds": clip_record.get("clip_end_seconds"),
"clip_start_timecode": clip_record.get("clip_start_timecode"),
"clip_end_timecode": clip_record.get("clip_end_timecode"),
"clip_start_beijing_time": clip_record.get("clip_start_beijing_time"),
"clip_end_beijing_time": clip_record.get("clip_end_beijing_time"),
"frame_times": clip_record.get("frame_times", []),
}
)
refreshed = dict(result_record)
refreshed["monitoring_timeline"] = timeline
return refreshed
def _clip_has_beijing_timing(clip_record: dict[str, object]) -> bool:
if clip_record.get("clip_start_beijing_time") or clip_record.get("clip_end_beijing_time"):
return True
for frame in clip_record.get("frame_times", []) or []:
if isinstance(frame, dict) and frame.get("beijing_time"):
return True
return False
def _infer_and_parse_clip(
clip_record: dict[str, object],
video_record: dict[str, object],
output_dir: Path,
config: dict[str, object],
) -> dict[str, object]:
schema_config = config.get("schema", {})
parse_retry = 0
if isinstance(schema_config, dict):
parse_retry = int(schema_config.get("parse_retry", 0))
attempts = parse_retry + 1
result: dict[str, object] | None = None
for attempt in range(attempts):
try:
inference = infer_clip(
clip_record,
output_dir,
config["vlm"],
config["prompt"],
)
except Exception as exc:
return build_clip_result(
"",
clip_record,
video_record,
config,
processing={},
status="inference_failed",
error=str(exc),
)
result = build_clip_result(
str(inference.get("raw_response", "")),
clip_record,
video_record,
config,
processing={
"latency_ms": inference.get("latency_ms"),
"http_status": inference.get("http_status"),
"attempt": attempt + 1,
},
)
if result.get("status") != "parse_failed":
return result
if result is None:
raise RuntimeError("unreachable inference state")
return result
def _write_jsonl_exact(
path: Path,
records: list[dict[str, object]],
) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("w", encoding="utf-8") as handle:
for record in records:
handle.write(json.dumps(record, ensure_ascii=False, sort_keys=True) + "\n")
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,158 @@
from __future__ import annotations
from pathlib import Path
from typing import Any
from .frames import seconds_to_timecode
from .manifest import read_jsonl, write_manifest
from .timeline import derive_time_from_reference
def build_clip_records(
frame_records: list[dict[str, Any]],
clip_config: dict[str, Any],
) -> list[dict[str, Any]]:
sampled_frames = [
record for record in frame_records if record.get("status") == "sampled"
]
by_video: dict[str, list[dict[str, Any]]] = {}
for frame in sampled_frames:
by_video.setdefault(str(frame["video_id"]), []).append(frame)
clips = []
for video_id, frames in sorted(by_video.items()):
clips.extend(_build_video_clips(video_id, frames, clip_config))
return clips
def build_clip_records_from_manifest(
frame_manifest_path: str | Path,
clip_manifest_path: str | Path,
clip_config: dict[str, Any],
) -> list[dict[str, Any]]:
clips = build_clip_records(read_jsonl(frame_manifest_path), clip_config)
write_manifest(clip_manifest_path, clips)
return clips
def _build_video_clips(
video_id: str,
frames: list[dict[str, Any]],
clip_config: dict[str, Any],
) -> list[dict[str, Any]]:
sorted_frames = sorted(frames, key=lambda frame: float(frame["offset_seconds"]))
if not sorted_frames:
return []
length_seconds = float(clip_config.get("length_seconds", 10))
stride_seconds = float(clip_config.get("stride_seconds", length_seconds))
frames_per_clip = int(clip_config.get("frames_per_clip", 8))
min_frames_per_clip = int(clip_config.get("min_frames_per_clip", 4))
max_offset = max(float(frame["offset_seconds"]) for frame in sorted_frames)
timeline_end = _estimated_timeline_end(sorted_frames)
clips = []
clip_index = 1
start = 0.0
while start <= max_offset:
end = min(start + length_seconds, timeline_end)
in_window = [
frame
for frame in sorted_frames
if start <= float(frame["offset_seconds"]) < end
]
if len(in_window) >= min_frames_per_clip:
selected_frames = _uniform_sample(in_window, frames_per_clip)
start_beijing_time, end_beijing_time = _clip_beijing_time_range(
in_window,
start,
end,
)
clip = {
"video_id": video_id,
"clip_id": f"{video_id}_c{clip_index:06d}",
"clip_start_seconds": round(start, 6),
"clip_end_seconds": round(end, 6),
"clip_start_timecode": seconds_to_timecode(start),
"clip_end_timecode": seconds_to_timecode(end),
"frame_times": [_frame_time(frame) for frame in selected_frames],
"status": "pending",
"retry_count": 0,
"last_error": None,
}
if start_beijing_time is not None:
clip["clip_start_beijing_time"] = start_beijing_time
if end_beijing_time is not None:
clip["clip_end_beijing_time"] = end_beijing_time
clips.append(clip)
clip_index += 1
start += stride_seconds
return clips
def _estimated_timeline_end(frames: list[dict[str, Any]]) -> float:
offsets = [float(frame["offset_seconds"]) for frame in frames]
if len(offsets) < 2:
return offsets[-1]
intervals = [
current - previous
for previous, current in zip(offsets, offsets[1:])
if current > previous
]
if not intervals:
return offsets[-1]
return offsets[-1] + min(intervals)
def _uniform_sample(
frames: list[dict[str, Any]],
frames_per_clip: int,
) -> list[dict[str, Any]]:
if len(frames) <= frames_per_clip:
return frames
if frames_per_clip <= 1:
return [frames[0]]
last_index = len(frames) - 1
indexes = [
round(position * last_index / (frames_per_clip - 1))
for position in range(frames_per_clip)
]
return [frames[index] for index in indexes]
def _frame_time(frame: dict[str, Any]) -> dict[str, Any]:
record = {
"frame_id": frame.get("frame_id"),
"frame_path": frame.get("frame_path"),
"offset_seconds": frame.get("offset_seconds"),
"timecode": frame.get("timecode"),
"pts_time": frame.get("pts_time"),
}
if frame.get("beijing_time") is not None:
record["beijing_time"] = frame.get("beijing_time")
return record
def _clip_beijing_time_range(
frames: list[dict[str, Any]],
start: float,
end: float,
) -> tuple[str | None, str | None]:
for frame in frames:
reference_time = frame.get("beijing_time")
if not reference_time:
continue
reference_offset = frame.get("offset_seconds")
return (
derive_time_from_reference(
str(reference_time),
reference_offset_seconds=reference_offset,
target_offset_seconds=start,
),
derive_time_from_reference(
str(reference_time),
reference_offset_seconds=reference_offset,
target_offset_seconds=end,
),
)
return None, None

View File

@@ -0,0 +1,278 @@
from __future__ import annotations
import ast
from pathlib import Path
from typing import Any
from .paths import resolve_path, validate_output_dir
DEFAULT_CONFIG_PATH = Path(__file__).resolve().parent.parent / "config" / "local_batch.yaml"
def load_config(
config_path: str | Path = DEFAULT_CONFIG_PATH,
*,
input_dir: str | Path | None = None,
output_dir: str | Path | None = None,
) -> dict[str, Any]:
path = Path(config_path).expanduser().resolve(strict=False)
raw_config = _parse_simple_yaml(path)
config = _with_defaults(raw_config)
base_dir = path.parent.parent if path.parent.name == "config" else path.parent
if input_dir is not None:
config["input"]["dir"] = str(input_dir)
if output_dir is not None:
config["output"]["dir"] = str(output_dir)
config["input"]["dir"] = str(resolve_path(config["input"]["dir"], base_dir=base_dir))
config["output"]["dir"] = str(
resolve_path(config["output"]["dir"], base_dir=base_dir)
)
validate_output_dir(config["input"]["dir"], config["output"]["dir"])
extensions = config["input"].get("extensions", [])
config["input"]["extensions"] = _normalize_extensions(extensions)
config["input"]["recursive"] = bool(config["input"].get("recursive", True))
config.setdefault("ffprobe", {})
config["ffprobe"]["timeout_seconds"] = int(
config["ffprobe"].get("timeout_seconds", 30)
)
return config
def _with_defaults(config: dict[str, Any]) -> dict[str, Any]:
merged: dict[str, Any] = {
"input": {
"dir": "./videos",
"recursive": True,
"extensions": [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"],
},
"output": {
"dir": "./outputs/local-batch",
"overwrite": False,
"resume": True,
"keep_frames": True,
},
"source": {"mode": "local"},
"hik_cloud": {
"api_base_url": "https://api2.hik-cloud.com",
"download_path": "/v1/carrier/cstorage/open/play/download",
"access_token": None,
"access_token_env": "HIK_CLOUD_ACCESS_TOKEN",
"devices": [],
"time_ranges": [],
"chunk_seconds": 600,
"timeout_seconds": 60,
"download_timeout_seconds": 600,
},
"ffprobe": {"timeout_seconds": 30},
"ffmpeg": {
"prefer_nvdec": True,
"allow_cpu_fallback": False,
"hwaccel": "cuda",
"codec_decoders": {"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
"frame_fps": 1,
"frame_width": 640,
"jpeg_quality": 4,
"timeout_seconds_per_video": 3600,
},
"clip": {
"length_seconds": 10,
"stride_seconds": 10,
"frames_per_clip": 8,
"min_frames_per_clip": 4,
},
"vlm": {
"api_base_url": "http://localhost:8679",
"chat_completions_path": "/v1/chat/completions",
"model": "memai-zhengxin-v3-20260413",
"timeout_seconds": 120,
"max_tokens": 512,
"temperature": 0,
"batch_size": 1,
"image_transport": "data_uri",
"retries": 1,
},
"prompt": {
"system": "You are a store video analysis assistant. Return strict JSON only.",
"user": "Analyze this clip. Return events and screen_time. If no event, return events: [].",
},
"schema": {
"version": "local-batch-v1",
"event_types": [
"customer_enter",
"customer_leave",
"queue_detected",
"staff_absent",
"staff_present",
"area_crowded",
"abnormal_behavior",
"unknown",
],
"require_strict_json": True,
"parse_retry": 1,
"merge_gap_seconds": 30,
},
"runtime": {"timezone": "Asia/Shanghai", "log_level": "INFO"},
}
for section, values in config.items():
if isinstance(values, dict) and isinstance(merged.get(section), dict):
merged[section].update(values)
else:
merged[section] = values
return merged
def _normalize_extensions(extensions: list[str]) -> list[str]:
normalized = []
for extension in extensions:
value = str(extension).lower()
if not value.startswith("."):
value = f".{value}"
normalized.append(value)
return normalized
def _parse_simple_yaml(path: Path) -> dict[str, Any]:
if not path.exists():
raise FileNotFoundError(f"config file not found: {path}")
root: dict[str, Any] = {}
stack: list[tuple[int, dict[str, Any] | list[Any]]] = [(-1, root)]
lines = path.read_text(encoding="utf-8").splitlines()
index = 0
while index < len(lines):
raw_line = lines[index].rstrip()
stripped = raw_line.strip()
if not stripped or raw_line.lstrip().startswith("#"):
index += 1
continue
indent = len(raw_line) - len(raw_line.lstrip(" "))
while indent <= stack[-1][0]:
stack.pop()
parent = stack[-1][1]
if stripped.startswith("- "):
if not isinstance(parent, list):
raise ValueError(f"list item without list parent: {raw_line}")
item = stripped[2:].strip()
if ":" in item:
key, value = item.split(":", 1)
mapping: dict[str, Any] = {}
parent.append(mapping)
key = key.strip()
value = value.strip()
if not value:
next_stripped = _next_stripped(lines, index)
child: dict[str, Any] | list[Any]
child = [] if next_stripped and next_stripped.startswith("- ") else {}
mapping[key] = child
stack.append((indent, mapping))
stack.append((indent + 2, child))
else:
mapping[key] = _parse_scalar(value)
stack.append((indent, mapping))
else:
parent.append(_parse_scalar(item))
index += 1
continue
if not isinstance(parent, dict):
raise ValueError(f"mapping entry inside list is not supported: {raw_line}")
if ":" not in stripped:
raise ValueError(f"unsupported config line: {raw_line}")
key, value = stripped.split(":", 1)
key = key.strip()
value = value.strip()
if _is_block_scalar(value):
parent[key], index = _parse_block_scalar(lines, index, indent, value)
continue
if not value:
next_stripped = _next_stripped(lines, index)
child: dict[str, Any] | list[Any]
child = [] if next_stripped and next_stripped.startswith("- ") else {}
parent[key] = child
stack.append((indent, child))
else:
parent[key] = _parse_scalar(value)
index += 1
return root
def _next_stripped(lines: list[str], current_index: int) -> str | None:
for raw_line in lines[current_index + 1 :]:
stripped = raw_line.strip()
if stripped and not raw_line.lstrip().startswith("#"):
return stripped
return None
def _is_block_scalar(value: str) -> bool:
return value in {">", ">-", "|", "|-"}
def _parse_block_scalar(
lines: list[str],
start_index: int,
parent_indent: int,
marker: str,
) -> tuple[str, int]:
content_lines: list[str] = []
content_indent: int | None = None
index = start_index + 1
while index < len(lines):
raw_line = lines[index].rstrip()
stripped = raw_line.strip()
if not stripped:
content_lines.append("")
index += 1
continue
indent = len(raw_line) - len(raw_line.lstrip(" "))
if indent <= parent_indent:
break
if content_indent is None:
content_indent = indent
content_lines.append(raw_line[content_indent:])
index += 1
if marker.endswith("-"):
while content_lines and content_lines[-1] == "":
content_lines.pop()
return "\n".join(content_lines), index
def _parse_scalar(value: str) -> Any:
lower = value.lower()
if lower == "true":
return True
if lower == "false":
return False
if lower in {"null", "none"}:
return None
if value.startswith("[") and value.endswith("]"):
parsed = ast.literal_eval(value)
if not isinstance(parsed, list):
raise ValueError(f"expected list value: {value}")
return parsed
if (value.startswith('"') and value.endswith('"')) or (
value.startswith("'") and value.endswith("'")
):
return ast.literal_eval(value)
try:
return int(value)
except ValueError:
pass
try:
return float(value)
except ValueError:
return value

View File

@@ -0,0 +1,27 @@
from __future__ import annotations
from pathlib import Path
def discover_videos(
input_dir: str | Path,
extensions: list[str],
*,
recursive: bool,
) -> list[Path]:
root = Path(input_dir).expanduser()
if not root.exists():
raise FileNotFoundError(f"input dir not found: {root}")
if not root.is_dir():
raise NotADirectoryError(f"input path is not a directory: {root}")
allowed = {
extension.lower() if extension.startswith(".") else f".{extension.lower()}"
for extension in extensions
}
iterator = root.rglob("*") if recursive else root.iterdir()
return sorted(
path
for path in iterator
if path.is_file() and path.suffix.lower() in allowed
)

View File

@@ -0,0 +1,243 @@
from __future__ import annotations
import math
import subprocess
from pathlib import Path
from typing import Any
from .frames import build_frame_records
from .manifest import read_jsonl, write_manifest
from .timeline import DEFAULT_TIMEZONE, timeline_start_epoch
NVDEC_CODECS = {"h264", "hevc"}
def build_sample_command(
video_path: str | Path,
output_dir: str | Path,
video_id: str,
ffmpeg_config: dict[str, Any],
*,
codec_name: str | None,
max_frames: int | None = None,
max_duration_seconds: float | None = None,
) -> list[str]:
frame_dir = Path(output_dir).expanduser() / "frames" / video_id
frame_pattern = frame_dir / "%06d.jpg"
command = ["ffmpeg", "-hide_banner", "-y"]
codec = (codec_name or "").lower()
prefer_nvdec = bool(ffmpeg_config.get("prefer_nvdec", True))
allow_cpu_fallback = bool(ffmpeg_config.get("allow_cpu_fallback", False))
decoders = ffmpeg_config.get("codec_decoders", {})
decoder = decoders.get(codec) if isinstance(decoders, dict) else None
if prefer_nvdec and codec in NVDEC_CODECS and decoder:
command.extend(
[
"-hwaccel",
str(ffmpeg_config.get("hwaccel", "cuda")),
"-c:v",
str(decoder),
]
)
elif not allow_cpu_fallback:
raise ValueError(
f"NVDEC decoder is required for codec {codec_name!r}; CPU fallback is disabled"
)
frame_fps = ffmpeg_config.get("frame_fps", 1)
frame_width = ffmpeg_config.get("frame_width", 640)
jpeg_quality = ffmpeg_config.get("jpeg_quality", 4)
command.extend(
[
"-i",
str(Path(video_path).expanduser()),
]
)
if max_duration_seconds is not None and max_duration_seconds > 0:
command.extend(["-t", f"{max_duration_seconds:g}"])
command.extend(
[
"-vf",
f"fps={frame_fps},scale={frame_width}:-2",
"-q:v",
str(jpeg_quality),
]
)
if max_frames is not None and max_frames > 0:
command.extend(["-frames:v", str(max_frames)])
command.append(str(frame_pattern))
return command
def sample_video_frames(
video_record: dict[str, Any],
output_dir: str | Path,
ffmpeg_config: dict[str, Any],
*,
manifest_path: str | Path | None = None,
) -> list[dict[str, Any]]:
video_id = str(video_record["video_id"])
output_root = Path(output_dir).expanduser().resolve(strict=False)
frame_dir = output_root / "frames" / video_id
frame_dir.mkdir(parents=True, exist_ok=True)
try:
max_frames = _max_output_frames(video_record, ffmpeg_config)
timezone_name = str(ffmpeg_config.get("timezone", DEFAULT_TIMEZONE))
start_epoch = timeline_start_epoch(video_record)
command = build_sample_command(
video_record.get("path") or video_record.get("source_path"),
output_root,
video_id,
ffmpeg_config,
codec_name=video_record.get("codec_name"),
max_frames=max_frames,
max_duration_seconds=_record_duration_seconds(video_record),
)
completed = subprocess.run(
command,
capture_output=True,
text=True,
check=True,
timeout=int(ffmpeg_config.get("timeout_seconds_per_video", 3600)),
)
records = build_frame_records(
video_id,
output_root,
frame_dir.glob("*.jpg"),
frame_fps=float(ffmpeg_config.get("frame_fps", 1)),
timeline_start_epoch=start_epoch,
timezone_name=timezone_name,
)
_attach_success_evidence(
records,
command,
stderr=completed.stderr,
)
except subprocess.CalledProcessError as exc:
records = build_frame_records(
video_id,
output_root,
frame_dir.glob("*.jpg"),
frame_fps=float(ffmpeg_config.get("frame_fps", 1)),
timeline_start_epoch=start_epoch,
timezone_name=timezone_name,
)
if records and (max_frames is None or len(records) >= max_frames):
_attach_success_evidence(
records,
command,
stderr=exc.stderr,
)
else:
records = [_failure_record(video_id, exc)]
except (subprocess.TimeoutExpired, ValueError) as exc:
records = [_failure_record(video_id, exc)]
if manifest_path is not None:
_replace_video_records(Path(manifest_path), video_id, records)
return records
def _replace_video_records(
manifest_path: Path,
video_id: str,
new_records: list[dict[str, Any]],
) -> None:
existing = [
record
for record in read_jsonl(manifest_path)
if str(record.get("video_id")) != video_id
]
write_manifest(manifest_path, [*existing, *new_records])
def _failure_record(video_id: str, exc: BaseException) -> dict[str, Any]:
return {
"video_id": video_id,
"frame_id": None,
"frame_path": None,
"offset_seconds": None,
"timecode": None,
"pts_time": None,
"status": "sample_failed",
"retry_count": 0,
"last_error": _error_text(exc),
}
def _attach_success_evidence(
records: list[dict[str, Any]],
command: list[str],
*,
stderr: str | None,
) -> None:
evidence = {
"ffmpeg_command": command,
"decoder": _command_value_after(command, "-c:v"),
"hwaccel": _command_value_after(command, "-hwaccel"),
"stderr_summary": _stderr_summary(stderr),
}
for record in records:
record.update(evidence)
def _command_value_after(command: list[str], flag: str) -> str | None:
try:
index = command.index(flag)
except ValueError:
return None
if index + 1 >= len(command):
return None
return command[index + 1]
def _stderr_summary(stderr: str | None, *, limit: int = 2000) -> str:
if not stderr:
return ""
text = stderr.strip()
if len(text) <= limit:
return text
return text[:limit]
def _error_text(exc: BaseException) -> str:
if isinstance(exc, subprocess.CalledProcessError):
return str(exc.stderr or exc.stdout or exc)
if isinstance(exc, subprocess.TimeoutExpired):
return f"ffmpeg timed out after {exc.timeout}s"
return str(exc)
def _max_output_frames(
video_record: dict[str, Any],
ffmpeg_config: dict[str, Any],
) -> int | None:
frame_fps = _optional_float(ffmpeg_config.get("frame_fps", 1))
if frame_fps is None or frame_fps <= 0:
return None
duration_seconds = _record_duration_seconds(video_record)
if duration_seconds is None or duration_seconds <= 0:
return None
return max(1, math.ceil(duration_seconds * frame_fps) + 1)
def _record_duration_seconds(video_record: dict[str, Any]) -> float | None:
for begin_key, end_key in (
("actual_begin", "actual_end"),
("requested_begin", "requested_end"),
):
begin = _optional_float(video_record.get(begin_key))
end = _optional_float(video_record.get(end_key))
if begin is not None and end is not None and end > begin:
return end - begin
return _optional_float(video_record.get("duration_seconds"))
def _optional_float(value: Any) -> float | None:
if value is None or value == "":
return None
return float(value)

View File

@@ -0,0 +1,59 @@
from __future__ import annotations
from pathlib import Path
from typing import Any, Iterable
from .timeline import DEFAULT_TIMEZONE, format_beijing_time
def seconds_to_timecode(seconds: float | int | None) -> str | None:
if seconds is None:
return None
total_seconds = int(float(seconds))
hours = total_seconds // 3600
minutes = (total_seconds % 3600) // 60
remaining_seconds = total_seconds % 60
return f"{hours:02d}:{minutes:02d}:{remaining_seconds:02d}"
def build_frame_records(
video_id: str,
output_dir: str | Path,
frame_paths: Iterable[str | Path],
*,
frame_fps: float,
timeline_start_epoch: float | int | str | None = None,
timezone_name: str = DEFAULT_TIMEZONE,
) -> list[dict[str, Any]]:
base_dir = Path(output_dir).expanduser().resolve(strict=False)
records = []
for index, frame_path in enumerate(sorted(Path(path) for path in frame_paths), start=1):
offset_seconds = round((index - 1) / frame_fps, 6)
record = {
"video_id": video_id,
"frame_id": f"{video_id}_f{index:06d}",
"frame_path": _relative_frame_path(frame_path, base_dir),
"offset_seconds": offset_seconds,
"timecode": seconds_to_timecode(offset_seconds),
"pts_time": offset_seconds,
"status": "sampled",
"retry_count": 0,
"last_error": None,
}
beijing_time = format_beijing_time(
timeline_start_epoch,
offset_seconds=offset_seconds,
timezone_name=timezone_name,
)
if beijing_time is not None:
record["beijing_time"] = beijing_time
records.append(record)
return records
def _relative_frame_path(frame_path: Path, base_dir: Path) -> str:
resolved = frame_path.expanduser().resolve(strict=False)
try:
return resolved.relative_to(base_dir).as_posix()
except ValueError:
return resolved.as_posix()

View File

@@ -0,0 +1,450 @@
from __future__ import annotations
import json
import os
import re
from datetime import datetime
from pathlib import Path
from typing import Any
from urllib.parse import urlparse, urlunparse
import urllib.request
from zoneinfo import ZoneInfo
from .manifest import read_jsonl, write_manifest
from .paths import hik_cloud_download_path
DEFAULT_TIMEZONE = "Asia/Shanghai"
DEFAULT_CHUNK_SECONDS = 600
MAX_CHUNK_SECONDS = 3600
DEFAULT_API_BASE_URL = "https://api2.hik-cloud.com"
DEFAULT_DOWNLOAD_PATH = "/v1/carrier/cstorage/open/play/download"
DEFAULT_TIMEOUT_SECONDS = 60
DEFAULT_DOWNLOAD_TIMEOUT_SECONDS = 600
DOWNLOAD_MANIFEST_NAME = "hik_cloud_download_manifest.jsonl"
NO_RECORDING_CODE = 80438027
TIME_FORMAT = "%Y-%m-%d %H:%M:%S"
def parse_hik_time(value: str | int | float, timezone: str = DEFAULT_TIMEZONE) -> int:
if isinstance(value, bool):
raise ValueError(f"unsupported time value: {value!r}")
if isinstance(value, int | float):
return int(value)
if isinstance(value, str):
parsed = datetime.strptime(value, TIME_FORMAT)
return int(parsed.replace(tzinfo=ZoneInfo(timezone)).timestamp())
raise ValueError(f"unsupported time value: {value!r}")
def build_download_chunks(config: dict[str, Any]) -> list[dict[str, Any]]:
hik_config = config.get("hik_cloud", {})
runtime_config = config.get("runtime", {})
timezone = runtime_config.get("timezone", DEFAULT_TIMEZONE)
chunk_seconds = int(hik_config.get("chunk_seconds", DEFAULT_CHUNK_SECONDS))
if chunk_seconds <= 0:
raise ValueError("chunk_seconds must be greater than 0")
if chunk_seconds > MAX_CHUNK_SECONDS:
raise ValueError("chunk_seconds must be less than or equal to 3600")
chunks: list[dict[str, Any]] = []
devices = hik_config.get("devices", [])
time_ranges = hik_config.get("time_ranges", [])
for device in devices:
for time_range in time_ranges:
requested_begin = parse_hik_time(time_range["begin"], timezone)
requested_end = parse_hik_time(time_range["end"], timezone)
if requested_end <= requested_begin:
raise ValueError("time range end must be after begin")
time_begin = requested_begin
while time_begin < requested_end:
time_end = min(time_begin + chunk_seconds, requested_end)
chunks.append(
{
"device_serial": device["device_serial"],
"channel_no": device["channel_no"],
"requested_begin": requested_begin,
"requested_end": requested_end,
"time_begin": time_begin,
"time_end": time_end,
}
)
time_begin = time_end
return chunks
def resolve_access_token(config_or_hik_config: dict[str, Any]) -> str:
hik_config = _hik_config(config_or_hik_config)
access_token = hik_config.get("access_token")
if access_token:
return str(access_token)
access_token_env = hik_config.get("access_token_env")
if access_token_env:
env_token = os.environ.get(str(access_token_env))
if env_token:
return env_token
raise ValueError(
"missing hik_cloud access_token; configure access_token or access_token_env"
)
def request_download_address(
chunk: dict[str, Any],
hik_config: dict[str, Any],
*,
http_post: Any | None = None,
) -> dict[str, Any]:
token = resolve_access_token(hik_config)
api_base_url = str(hik_config.get("api_base_url") or DEFAULT_API_BASE_URL)
download_path = str(hik_config.get("download_path") or DEFAULT_DOWNLOAD_PATH)
url = api_base_url.rstrip("/") + download_path
headers = {
"Authorization": f"bearer {token}",
"Content-Type": "application/json",
}
json_body = {
"deviceSerial": chunk["device_serial"],
"channelNo": chunk["channel_no"],
"timeBegin": chunk["time_begin"],
"timeEnd": chunk["time_end"],
}
timeout_seconds = int(hik_config.get("timeout_seconds", DEFAULT_TIMEOUT_SECONDS))
post = http_post or _post_json
try:
response = post(url, json_body, headers, timeout_seconds)
except Exception as exc: # pragma: no cover - exact urllib failures vary.
return {
**_chunk_metadata(chunk),
"status": "address_failed",
"code": None,
"last_error": _sanitize_error(exc, token),
}
code = _optional_int(response.get("code"))
if code == 0:
data = response.get("data") or {}
return {
**_chunk_metadata(chunk),
"status": "address_ok",
"code": code,
"url": data.get("url"),
"actual_begin": _optional_int(data.get("actualBeginTime")),
"actual_end": _optional_int(data.get("actualEndTime")),
}
status = "no_recording" if code == NO_RECORDING_CODE else "address_failed"
result = {
**_chunk_metadata(chunk),
"status": status,
"code": code,
"last_error": _api_error_message(response, token),
}
return result
def download_hik_cloud_recordings(
config: dict[str, Any],
output_dir: str | Path,
*,
address_client: Any | None = None,
download_url: Any | None = None,
download: bool = True,
) -> list[dict[str, Any]]:
output_path = Path(output_dir).expanduser().resolve(strict=False)
manifest_path = output_path / DOWNLOAD_MANIFEST_NAME
hik_config = _hik_config(config)
chunks = build_download_chunks(config)
resume = bool(config.get("output", {}).get("resume", False))
manifest_records = read_jsonl(manifest_path) if resume else []
existing_downloads = {
_manifest_key(record): record
for record in manifest_records
if _is_resumable_download(record)
}
get_address = address_client or request_download_address
fetch = download_url or _download_url
download_timeout_seconds = int(
hik_config.get("download_timeout_seconds", DEFAULT_DOWNLOAD_TIMEOUT_SECONDS)
)
token = _redaction_token(hik_config)
video_records: list[dict[str, Any]] = []
for chunk in chunks:
key = _chunk_key(chunk)
existing_record = existing_downloads.get(key)
if download and existing_record is not None:
video_records.append(_video_record_from_manifest(existing_record))
continue
address_result = get_address(chunk, hik_config)
status = address_result.get("status")
if status != "address_ok":
_upsert_manifest_record(
manifest_records,
_manifest_record(
chunk,
address_result,
status=str(status or "address_failed"),
token=token,
),
)
continue
if not download:
_upsert_manifest_record(
manifest_records,
_manifest_record(
chunk,
address_result,
status="address_ok",
token=token,
),
)
continue
url = str(address_result.get("url") or "")
target_path = hik_cloud_download_path(
output_path,
str(chunk["device_serial"]),
chunk["channel_no"],
int(chunk["time_begin"]),
int(chunk["time_end"]),
)
try:
payload = fetch(url, timeout_seconds=download_timeout_seconds)
target_path.parent.mkdir(parents=True, exist_ok=True)
target_path.write_bytes(payload)
except Exception as exc: # pragma: no cover - concrete network failures vary.
_upsert_manifest_record(
manifest_records,
_manifest_record(
chunk,
address_result,
status="download_failed",
path=target_path,
last_error=_sanitize_error(exc, token),
token=token,
),
)
continue
record = _downloaded_video_record(chunk, address_result, target_path)
video_records.append(record)
_upsert_manifest_record(
manifest_records,
_manifest_record(
chunk,
address_result,
status="downloaded",
path=target_path,
token=token,
),
)
write_manifest(manifest_path, manifest_records)
return video_records
def _post_json(
url: str,
json_body: dict[str, Any],
headers: dict[str, str],
timeout_seconds: int,
) -> dict[str, Any]:
request = urllib.request.Request(
url,
data=json.dumps(json_body).encode("utf-8"),
headers=headers,
method="POST",
)
with urllib.request.urlopen(request, timeout=timeout_seconds) as response:
return json.loads(response.read().decode("utf-8"))
def _download_url(url: str, *, timeout_seconds: int | None = None) -> bytes:
with urllib.request.urlopen(url, timeout=timeout_seconds) as response:
return response.read()
def _hik_config(config_or_hik_config: dict[str, Any]) -> dict[str, Any]:
hik_config = config_or_hik_config.get("hik_cloud")
if isinstance(hik_config, dict):
return hik_config
return config_or_hik_config
def _chunk_metadata(chunk: dict[str, Any]) -> dict[str, Any]:
return {
"device_serial": chunk["device_serial"],
"channel_no": chunk["channel_no"],
"requested_begin": chunk.get("requested_begin"),
"requested_end": chunk.get("requested_end"),
"time_begin": chunk["time_begin"],
"time_end": chunk["time_end"],
}
def _optional_int(value: Any) -> int | None:
if value is None or value == "":
return None
return int(value)
def _api_error_message(response: dict[str, Any], token: str) -> str:
code = response.get("code")
message = response.get("msg") or response.get("message") or "hik api error"
return _sanitize_error(f"hik api code {code}: {message}", token)
def _sanitize_error(value: Any, token: str = "") -> str | None:
if value is None:
return None
message = str(value)
for raw_url in re.findall(r"https?://[^\s'\"<>]+", message):
parsed = urlparse(raw_url)
sanitized_url = urlunparse(
(parsed.scheme, parsed.netloc, parsed.path, "", "", "")
)
message = message.replace(raw_url, sanitized_url)
message = re.sub(
r"\b(?:sign|sig|token|access_token)=[^&\s'\"<>]+",
"[redacted-query]",
message,
flags=re.IGNORECASE,
)
if token:
message = message.replace(token, "[redacted]")
message = message.replace("Authorization", "[redacted-header]")
return message
def _downloaded_video_record(
chunk: dict[str, Any],
address_result: dict[str, Any],
path: Path,
) -> dict[str, Any]:
return {
"source": "hik_cloud",
"path": str(path),
"source_path": _source_path(chunk),
"device_serial": chunk["device_serial"],
"channel_no": chunk["channel_no"],
"requested_begin": chunk["time_begin"],
"requested_end": chunk["time_end"],
"actual_begin": address_result.get("actual_begin"),
"actual_end": address_result.get("actual_end"),
"status": "downloaded",
"retry_count": 0,
"last_error": None,
}
def _manifest_record(
chunk: dict[str, Any],
address_result: dict[str, Any],
*,
status: str,
token: str,
path: Path | None = None,
last_error: str | None = None,
) -> dict[str, Any]:
url = address_result.get("url")
record = {
"source": "hik_cloud",
"device_serial": chunk["device_serial"],
"channel_no": chunk["channel_no"],
"requested_begin": chunk["time_begin"],
"requested_end": chunk["time_end"],
"actual_begin": address_result.get("actual_begin"),
"actual_end": address_result.get("actual_end"),
"path": str(path) if path is not None else None,
"status": status,
"retry_count": 0,
"last_error": _sanitize_error(last_error or address_result.get("last_error"), token),
}
if url:
record["download_url_host"] = urlparse(str(url)).netloc
if "code" in address_result:
record["code"] = address_result.get("code")
if status == "downloaded":
record["source_path"] = _source_path(chunk)
return record
def _source_path(chunk: dict[str, Any]) -> str:
time_begin = chunk.get("time_begin", chunk.get("requested_begin"))
time_end = chunk.get("time_end", chunk.get("requested_end"))
return (
f"hik_cloud://{chunk['device_serial']}/ch{chunk['channel_no']}/"
f"{int(time_begin)}-{int(time_end)}"
)
def _is_resumable_download(record: dict[str, Any]) -> bool:
path = record.get("path")
return (
record.get("status") == "downloaded"
and isinstance(path, str)
and Path(path).exists()
)
def _video_record_from_manifest(record: dict[str, Any]) -> dict[str, Any]:
return {
"source": "hik_cloud",
"path": record["path"],
"source_path": record.get("source_path") or _source_path(record),
"device_serial": record["device_serial"],
"channel_no": record["channel_no"],
"requested_begin": record["requested_begin"],
"requested_end": record["requested_end"],
"actual_begin": record.get("actual_begin"),
"actual_end": record.get("actual_end"),
"status": "downloaded",
"retry_count": record.get("retry_count", 0),
"last_error": record.get("last_error"),
}
def _upsert_manifest_record(
records: list[dict[str, Any]],
new_record: dict[str, Any],
) -> None:
new_key = _manifest_key(new_record)
for index, record in enumerate(records):
if _manifest_key(record) == new_key:
records[index] = new_record
return
records.append(new_record)
def _chunk_key(chunk: dict[str, Any]) -> tuple[Any, Any, Any, Any]:
return (
chunk.get("device_serial"),
chunk.get("channel_no"),
chunk.get("time_begin"),
chunk.get("time_end"),
)
def _manifest_key(record: dict[str, Any]) -> tuple[Any, Any, Any, Any]:
return (
record.get("device_serial"),
record.get("channel_no"),
record.get("requested_begin"),
record.get("requested_end"),
)
def _redaction_token(hik_config: dict[str, Any]) -> str:
token = hik_config.get("access_token")
if token:
return str(token)
token_env = hik_config.get("access_token_env")
if token_env:
return os.environ.get(str(token_env), "")
return ""

View File

@@ -0,0 +1,35 @@
from __future__ import annotations
import json
from pathlib import Path
from typing import Any, Iterable
def write_manifest(path: str | Path, records: Iterable[dict[str, Any]]) -> None:
manifest_path = Path(path).expanduser().resolve(strict=False)
manifest_path.parent.mkdir(parents=True, exist_ok=True)
with manifest_path.open("w", encoding="utf-8") as handle:
for record in records:
normalized = _normalize_record(record)
handle.write(
json.dumps(normalized, ensure_ascii=False, sort_keys=True) + "\n"
)
def read_jsonl(path: str | Path) -> list[dict[str, Any]]:
jsonl_path = Path(path).expanduser().resolve(strict=False)
if not jsonl_path.exists():
return []
records = []
for line in jsonl_path.read_text(encoding="utf-8").splitlines():
if line.strip():
records.append(json.loads(line))
return records
def _normalize_record(record: dict[str, Any]) -> dict[str, Any]:
normalized = dict(record)
normalized.setdefault("status", "pending")
normalized.setdefault("retry_count", 0)
normalized.setdefault("last_error", None)
return normalized

View File

@@ -0,0 +1,71 @@
from __future__ import annotations
import hashlib
from pathlib import Path
FORBIDDEN_REFERENCE_ROOT = Path("/Users/yoilun/AI-train/zhengxin-vlm-0413")
def resolve_path(path: str | Path, *, base_dir: Path | None = None) -> Path:
candidate = Path(path).expanduser()
if not candidate.is_absolute() and base_dir is not None:
candidate = base_dir / candidate
return candidate.resolve(strict=False)
def _is_relative_to(path: Path, parent: Path) -> bool:
try:
path.relative_to(parent)
return True
except ValueError:
return False
def validate_output_dir(
input_dir: str | Path,
output_dir: str | Path,
*,
forbidden_root: Path = FORBIDDEN_REFERENCE_ROOT,
) -> Path:
resolved_input = resolve_path(input_dir)
resolved_output = resolve_path(output_dir)
resolved_forbidden = resolve_path(forbidden_root)
if resolved_output == resolved_input:
raise ValueError("output dir must not equal input dir")
if _is_relative_to(resolved_output, resolved_forbidden):
raise ValueError(
f"output dir must not be inside forbidden reference dir: {resolved_forbidden}"
)
return resolved_output
def stable_video_id(path: str | Path) -> str:
resolved = str(resolve_path(path))
digest = hashlib.sha1(resolved.encode("utf-8")).hexdigest()[:16]
return f"video-{digest}"
def hik_cloud_download_path(
output_dir: str | Path,
device_serial: str,
channel_no: int | str,
time_begin: int,
time_end: int,
) -> Path:
safe_device = _safe_path_component(device_serial)
safe_channel = _safe_path_component(str(channel_no))
filename = f"{safe_device}_ch{safe_channel}_{int(time_begin)}_{int(time_end)}.mp4"
return (
resolve_path(output_dir)
/ "downloads"
/ "hik_cloud"
/ safe_device
/ f"ch{safe_channel}"
/ filename
)
def _safe_path_component(value: str) -> str:
return "".join(char if char.isalnum() or char in "._-" else "_" for char in value)

View File

@@ -0,0 +1,99 @@
from __future__ import annotations
import json
import subprocess
from pathlib import Path
from typing import Any
def probe_video(path: str | Path, *, timeout_seconds: int = 30) -> dict[str, Any]:
video_path = Path(path).expanduser().resolve(strict=False)
base_record: dict[str, Any] = {
"path": str(video_path),
"status": "probe_failed",
"retry_count": 0,
"last_error": None,
}
command = [
"ffprobe",
"-v",
"error",
"-print_format",
"json",
"-show_format",
"-show_streams",
str(video_path),
]
try:
completed = subprocess.run(
command,
capture_output=True,
text=True,
check=True,
timeout=timeout_seconds,
)
payload = json.loads(completed.stdout or "{}")
video_stream = _first_video_stream(payload)
format_info = payload.get("format", {})
return {
**base_record,
"status": "probed",
"duration_seconds": _optional_float(format_info.get("duration")),
"codec_name": video_stream.get("codec_name"),
"width": _optional_int(video_stream.get("width")),
"height": _optional_int(video_stream.get("height")),
"fps": _parse_frame_rate(
video_stream.get("avg_frame_rate") or video_stream.get("r_frame_rate")
),
"format_name": format_info.get("format_name"),
"start_time": _optional_float(format_info.get("start_time")),
}
except subprocess.TimeoutExpired as exc:
base_record["last_error"] = f"ffprobe timed out after {timeout_seconds}s"
if exc.stderr:
base_record["last_error"] += f": {exc.stderr}"
return base_record
except subprocess.CalledProcessError as exc:
base_record["last_error"] = _error_text(exc.stderr or exc.stdout or str(exc))
return base_record
except (json.JSONDecodeError, ValueError) as exc:
base_record["last_error"] = f"ffprobe parse failed: {exc}"
return base_record
def _first_video_stream(payload: dict[str, Any]) -> dict[str, Any]:
for stream in payload.get("streams", []):
if stream.get("codec_type") == "video":
return stream
raise ValueError("ffprobe output did not contain a video stream")
def _parse_frame_rate(value: str | None) -> float | None:
if not value or value == "0/0":
return None
if "/" in value:
numerator, denominator = value.split("/", 1)
denominator_value = float(denominator)
if denominator_value == 0:
return None
return float(numerator) / denominator_value
return float(value)
def _optional_float(value: Any) -> float | None:
if value is None or value == "":
return None
return float(value)
def _optional_int(value: Any) -> int | None:
if value is None or value == "":
return None
return int(value)
def _error_text(value: Any) -> str:
if isinstance(value, bytes):
return value.decode("utf-8", errors="replace").strip()
return str(value).strip()

View File

@@ -0,0 +1,138 @@
from __future__ import annotations
import json
from typing import Any
def extract_json_payload(raw_response: str) -> dict[str, Any]:
text = raw_response.strip()
if not text:
raise ValueError("JSON payload is empty")
try:
payload = json.loads(text)
if isinstance(payload, dict):
return payload
except json.JSONDecodeError:
pass
decoder = json.JSONDecoder()
for index, char in enumerate(text):
if char != "{":
continue
try:
payload, _ = decoder.raw_decode(text[index:])
except json.JSONDecodeError:
continue
if isinstance(payload, dict):
return payload
raise ValueError("JSON object not found in model response")
def build_clip_result(
raw_response: str,
clip_record: dict[str, Any],
video_record: dict[str, Any] | None,
config: dict[str, Any],
*,
processing: dict[str, Any] | None = None,
status: str | None = None,
error: str | None = None,
) -> dict[str, Any]:
processing_record = dict(processing or {})
if status is not None:
payload: dict[str, Any] = {}
result_status = status
result_error = error
else:
try:
payload = extract_json_payload(raw_response)
result_status = "ok"
result_error = None
except ValueError as exc:
payload = {}
result_status = "parse_failed"
result_error = str(exc)
timeline = _timeline(clip_record, config, payload)
return {
"schema_version": config.get("schema", {}).get("version", "local-batch-v1"),
"video_id": str(clip_record.get("video_id")),
"video_path": _video_path(video_record),
"clip_id": str(clip_record.get("clip_id")),
"status": result_status,
"monitoring_timeline": timeline,
"events": _events(payload, clip_record) if result_status == "ok" else [],
"raw_response": raw_response,
"processing": processing_record,
"error": result_error,
}
def _timeline(
clip_record: dict[str, Any],
config: dict[str, Any],
payload: dict[str, Any],
) -> dict[str, Any]:
return {
"timezone": config.get("runtime", {}).get("timezone", "Asia/Shanghai"),
"video_start_time": clip_record.get("video_start_time"),
"clip_start_seconds": clip_record.get("clip_start_seconds"),
"clip_end_seconds": clip_record.get("clip_end_seconds"),
"clip_start_timecode": clip_record.get("clip_start_timecode"),
"clip_end_timecode": clip_record.get("clip_end_timecode"),
"clip_start_beijing_time": clip_record.get("clip_start_beijing_time"),
"clip_end_beijing_time": clip_record.get("clip_end_beijing_time"),
"frame_times": clip_record.get("frame_times", []),
"screen_time": str(
payload.get("screen_time") or payload.get("画面时间") or payload.get("时间") or ""
),
}
def _events(
payload: dict[str, Any],
clip_record: dict[str, Any],
) -> list[dict[str, Any]]:
raw_events = payload.get("events") or []
if not isinstance(raw_events, list):
return []
return [
_event(event, clip_record)
for event in raw_events
if isinstance(event, dict)
]
def _event(
event: dict[str, Any],
clip_record: dict[str, Any],
) -> dict[str, Any]:
normalized = dict(event)
normalized.setdefault("event_type", "unknown")
normalized.setdefault("start_time", None)
normalized.setdefault("end_time", None)
normalized.setdefault("start_offset_seconds", clip_record.get("clip_start_seconds"))
normalized.setdefault("end_offset_seconds", clip_record.get("clip_end_seconds"))
normalized.setdefault("confidence", None)
normalized.setdefault("severity", None)
normalized.setdefault("attributes", {})
normalized.setdefault(
"evidence",
{
"clip_id": clip_record.get("clip_id"),
"frame_paths": [
frame.get("frame_path")
for frame in clip_record.get("frame_times", [])
if frame.get("frame_path")
],
},
)
return normalized
def _video_path(video_record: dict[str, Any] | None) -> str | None:
if not video_record:
return None
value = video_record.get("path") or video_record.get("source_path")
return str(value) if value is not None else None

View File

@@ -0,0 +1,67 @@
from __future__ import annotations
from datetime import datetime, timedelta, timezone
from typing import Any
from zoneinfo import ZoneInfo, ZoneInfoNotFoundError
TIME_FORMAT = "%Y-%m-%d %H:%M:%S"
DEFAULT_TIMEZONE = "Asia/Shanghai"
def format_beijing_time(
epoch_seconds: float | int | str | None,
*,
offset_seconds: float | int = 0,
timezone_name: str = DEFAULT_TIMEZONE,
) -> str | None:
epoch = _optional_float(epoch_seconds)
if epoch is None:
return None
zone = _zone(timezone_name)
timestamp = epoch + float(offset_seconds)
return datetime.fromtimestamp(timestamp, tz=timezone.utc).astimezone(zone).strftime(
TIME_FORMAT
)
def derive_time_from_reference(
reference_time: str | None,
*,
reference_offset_seconds: float | int | None,
target_offset_seconds: float | int | None,
) -> str | None:
if not reference_time:
return None
reference_offset = _optional_float(reference_offset_seconds)
target_offset = _optional_float(target_offset_seconds)
if reference_offset is None or target_offset is None:
return None
try:
reference = datetime.strptime(reference_time, TIME_FORMAT)
except ValueError:
return None
return (reference + timedelta(seconds=target_offset - reference_offset)).strftime(
TIME_FORMAT
)
def timeline_start_epoch(record: dict[str, Any]) -> float | None:
for key in ("actual_begin", "requested_begin"):
value = _optional_float(record.get(key))
if value is not None:
return value
return None
def _zone(timezone_name: str) -> ZoneInfo:
try:
return ZoneInfo(timezone_name)
except ZoneInfoNotFoundError:
return ZoneInfo(DEFAULT_TIMEZONE)
def _optional_float(value: Any) -> float | None:
if value is None or value == "":
return None
return float(value)

View File

@@ -0,0 +1,134 @@
from __future__ import annotations
import base64
import json
import time
import urllib.request
from pathlib import Path
from typing import Any, Callable
HttpPost = Callable[[str, dict[str, Any], int], dict[str, Any]]
def infer_clip(
clip_record: dict[str, Any],
output_dir: str | Path,
vlm_config: dict[str, Any],
prompt_config: dict[str, Any],
*,
http_post: HttpPost | None = None,
) -> dict[str, Any]:
start = time.monotonic()
client = http_post or _post_json
url = build_chat_url(vlm_config)
payload = build_payload(clip_record, output_dir, vlm_config, prompt_config)
response = client(url, payload, int(vlm_config.get("timeout_seconds", 120)))
latency_ms = int((time.monotonic() - start) * 1000)
return {
"raw_response": _extract_message_content(response.get("body")),
"http_status": response.get("status"),
"latency_ms": latency_ms,
}
def build_chat_url(vlm_config: dict[str, Any]) -> str:
return (
str(vlm_config["api_base_url"]).rstrip("/")
+ str(vlm_config["chat_completions_path"])
)
def build_payload(
clip_record: dict[str, Any],
output_dir: str | Path,
vlm_config: dict[str, Any],
prompt_config: dict[str, Any],
) -> dict[str, Any]:
content: list[dict[str, Any]] = [
{"type": "text", "text": str(prompt_config.get("user", ""))}
]
for frame in clip_record.get("frame_times", []):
frame_path = frame.get("frame_path")
if not frame_path:
continue
content.append(
{
"type": "image_url",
"image_url": {
"url": _image_url(
frame_path,
output_dir,
str(vlm_config.get("image_transport", "data_uri")),
)
},
}
)
return {
"model": vlm_config.get("model"),
"messages": [
{"role": "system", "content": str(prompt_config.get("system", ""))},
{"role": "user", "content": content},
],
"temperature": vlm_config.get("temperature", 0),
"max_tokens": vlm_config.get("max_tokens", 512),
}
def _image_url(
frame_path: str | Path,
output_dir: str | Path,
image_transport: str,
) -> str:
if image_transport != "data_uri":
return str(frame_path)
path = Path(frame_path).expanduser()
if not path.is_absolute():
path = Path(output_dir).expanduser() / path
data = base64.b64encode(path.read_bytes()).decode("ascii")
return f"data:{_mime_type(path)};base64,{data}"
def _mime_type(path: Path) -> str:
suffix = path.suffix.lower()
if suffix in {".jpg", ".jpeg"}:
return "image/jpeg"
if suffix == ".png":
return "image/png"
if suffix == ".webp":
return "image/webp"
return "application/octet-stream"
def _post_json(
url: str,
payload: dict[str, Any],
timeout_seconds: int,
) -> dict[str, Any]:
body = json.dumps(payload).encode("utf-8")
request = urllib.request.Request(
url,
data=body,
headers={"Content-Type": "application/json"},
method="POST",
)
with urllib.request.urlopen(request, timeout=timeout_seconds) as response:
response_body = response.read().decode("utf-8")
return {
"status": response.status,
"body": json.loads(response_body) if response_body else {},
}
def _extract_message_content(body: Any) -> str:
if not isinstance(body, dict):
return ""
choices = body.get("choices")
if not choices:
return ""
message = choices[0].get("message", {}) if isinstance(choices[0], dict) else {}
content = message.get("content", "")
if isinstance(content, str):
return content
return json.dumps(content, ensure_ascii=False)

File diff suppressed because it is too large Load Diff

BIN
录像下载流程_1.pdf Normal file

Binary file not shown.