Initial video AI analysis project

2026-06-17 11:33:54 +08:00
commit ef0047af6d
35 changed files with 8613 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,37 @@
+# Secrets and local credentials
+access_token.md
+.env
+.env.*
+*.pem
+*.key
+
+# Runtime inputs and generated outputs
+outputs/
+videos/
+downloads/
+frames/
+codex_records/
+
+# Agent working notes, not project source
+findings.md
+memories.md
+progress.md
+task_plan.md
+
+# Python caches and test artifacts
+__pycache__/
+*.py[cod]
+.pytest_cache/
+.coverage
+htmlcov/
+
+# Local indexes, editor, and OS files
+.codegraph/
+.DS_Store
+.idea/
+.vscode/
+
+# Logs and temporary files
+*.log
+*.pid
+*.tmp
--- a/agent.md
+++ b/agent.md
@@ -0,0 +1,250 @@
+# Video AI Analysis PoC Agent Instructions
+
+本文件约束后续 AI 在 `/Users/yoilun/AI-train/video-ai-analysis-poc` 中的开发、审查、测试和文档维护行为。任何业务代码修改前必须先阅读并遵守本文件。
+
+## Repository Snapshot
+
+- 项目名称：`video-ai-analysis-poc`。
+- 项目目录：`/Users/yoilun/AI-train/video-ai-analysis-poc`。
+- 项目目标：实现本地视频文件夹离线批处理分析 PoC。
+- 外部模型/参考实现目录：`/Users/yoilun/AI-train/zhengxin-vlm-0413`。
+- 参考 VLM 模型：`/Users/yoilun/AI-train/zhengxin-vlm-0413/models/memai-zhengxin-v3-20260413`。
+- 测试环境：`ssh xiaozheng@192.168.5.100`，用户说明该环境已有模型。
+- 运行目标：Ubuntu 24，单机 NVIDIA RTX 3080 20GB，离线批处理优先吞吐而非低延迟。
+
+## Hard Directory Boundary
+
+`video-ai-analysis-poc` 是本次项目目录。后续代码、配置、计划、文档、测试和输出模板都必须放在这里。
+
+`zhengxin-vlm-0413` 不是本次项目目录，只能作为：
+
+- 已有模型目录。
+- 参考实现目录。
+- VLM API、prompt、输出解析、部署方式的参考。
+
+默认禁止在 `zhengxin-vlm-0413` 中创建本项目文件或修改业务代码。特别禁止改动、移动、删除或复制：
+
+- `zhengxin-vlm-0413/models/**`
+- `zhengxin-vlm-0413/service/config.yaml`
+- `zhengxin-vlm-0413/service/config.yaml-bk`
+- `zhengxin-vlm-0413/docker/.env`
+
+如果后续确实需要从参考项目复制逻辑，必须复制到本项目目录内，并注明来源和差异。
+
+## Repository Map
+
+当前项目文件：
+
+- `video_ai_analysis_system_plan.md`：前期系统实施方案。
+- `agent.md`：本文件，约束后续 AI 工作。
+- `task_plan.md`：本次 goal 的阶段计划。
+- `findings.md`：代码阅读、约束、关键发现。
+- `progress.md`：执行记录、测试结果、bug 循环。
+- `docs/project.md`：项目目标、架构、配置、运行方式和风险。
+- `memories.md`：主 agent 对用户要求和关键决策的长期记忆。
+
+参考目录关键文件：
+
+- `/Users/yoilun/AI-train/zhengxin-vlm-0413/service/rtsp_service.py`：实时 RTSP 服务入口。
+- `/Users/yoilun/AI-train/zhengxin-vlm-0413/service/config.yaml`：现有推理、摄像头、prompt、服务、YOLO 配置，包含敏感信息。
+- `/Users/yoilun/AI-train/zhengxin-vlm-0413/shared/vlm_client.py`：VLM 请求构建、OpenAI-compatible API 调用、Action 解析。
+- `/Users/yoilun/AI-train/zhengxin-vlm-0413/shared/frame_utils.py`：已有本地视频抽帧辅助函数，但不满足本次完整离线批处理需求。
+- `/Users/yoilun/AI-train/zhengxin-vlm-0413/docker/docker-compose.yml`：vLLM 与 RTSP 服务容器编排。
+
+## Current Workflow Batch
+
+```text
+[项目: /Users/yoilun/AI-train/video-ai-analysis-poc]
+[工作流批次: v1.0 本地视频批处理PoC]
+```
+
+派发任何子 agent 时，任务首段必须包含：
+
+```text
+[项目: /Users/yoilun/AI-train/video-ai-analysis-poc]
+[工作流批次: v1.0 本地视频批处理PoC]
+[阶段: 阶段 x <阶段名>]
+[角色: <角色名>]
+[子agent名称: <从指定名单中选择>]
+```
+
+子 agent 名称必须从以下列表选择：
+
+```text
+huzenan, jiangzhiyou, linjiayu, hujiarui, wangchiheng, niwenhao,
+caiziquan, yepeijun, lizheng, zhengchenda, chenruihao, yangyilun, donglele
+```
+
+## Required Workflow
+
+### 1. Agent Rules Before Code
+
+在 `agent.md` 未确定前，不允许修改业务代码。
+
+阶段 0 允许修改：
+
+- `agent.md`
+- `task_plan.md`
+- `findings.md`
+- `progress.md`
+- `docs/project.md`
+- `memories.md`
+
+### 2. File-Based Planning Is Mandatory
+
+非简单任务必须维护：
+
+- `task_plan.md`
+- `findings.md`
+- `progress.md`
+- `docs/project.md`
+- `memories.md`
+
+每个阶段开始前，主 agent 必须读取这些文件，确认当前目标和下一步。每个阶段完成后，必须更新阶段状态、验证记录、关键文件和剩余风险。
+
+### 3. Sub Agent Workflow
+
+每个实现阶段至少使用：
+
+- coding agent：只实现当前阶段，不处理未来阶段，不做无关重构。
+- testing/review agent：只测试、审查、复现问题和报告 bug，不直接修改代码。
+
+如果 testing/review agent 发现 bug：
+
+1. 主 agent 将 bug report 记录到 `progress.md`。
+2. 主 agent 将 bug report 转发给当前阶段 coding agent。
+3. coding agent 只修复报告中的问题。
+4. testing/review agent 复测。
+5. 同一问题最多 3 轮，仍失败则暂停并向用户报告。
+
+### 4. TDD And Verification
+
+新增功能或 bugfix 必须优先写测试或最小可复现验证，再写实现。无法自动化测试的 GPU/视频/环境行为，必须写清楚 smoke test 命令、输入样例和人工判定标准。
+
+完成任何阶段前，必须有新鲜验证证据：
+
+- 单元测试。
+- CLI smoke test。
+- FFmpeg 命令检查。
+- vLLM health check。
+- 输出 JSON schema/字段检查。
+
+不能只根据代码阅读声称完成。
+
+## Local Batch PoC Requirements
+
+### 1. Input
+
+本次 PoC 优先支持本地视频文件夹：
+
+- 通过 CLI 参数或 config 选择输入目录。
+- 递归或非递归行为必须可配置。
+- 支持常见视频格式，例如 `.mp4`、`.mov`、`.mkv`、`.avi`、`.flv`、`.ts`、`.m4v`。
+- 不支持或损坏的视频要记录失败原因，不能阻塞整个文件夹。
+
+### 2. Video Processing
+
+- 必须优先使用 FFmpeg + NVDEC GPU 解码。
+- 默认 1 FPS 抽帧。
+- 默认 clip 长度 10 秒，允许配置 10-20 秒。
+- 禁止逐帧 LLM 推理，必须 clip 级推理。
+- Clip 输入帧数要小，默认 8-10 帧，避免 RTX 3080 20GB OOM。
+- 输出目录保存 manifest、抽帧中间结果、clip 结果和汇总 JSON。
+
+### 3. Prompt Configuration
+
+- prompt 必须从 config 读取，不能硬编码在业务逻辑中。
+- 支持 `prompt.system` 和 `prompt.user`。
+- Prompt 必须要求模型输出严格 JSON。
+- Prompt 必须要求输出画面时间字段；如果画面时间不可读，要保留 clip 的视频相对时间。
+
+### 4. Timeline Output
+
+输出结果必须包含监控画面的时间轴。至少包含：
+
+- `video_id` 或视频文件路径。
+- `video_start_time`：如果文件或画面可识别则填写，否则为 `null`。
+- `clip_start_seconds`。
+- `clip_end_seconds`。
+- `clip_start_timecode`，格式如 `HH:MM:SS`。
+- `clip_end_timecode`，格式如 `HH:MM:SS`。
+- `frame_times`：clip 内参与推理帧的相对秒数或 timecode。
+- `screen_time` 或 `画面时间`：模型从监控画面 OCR 到的时间，无法读取则为空。
+- 事件级 `start_time` / `end_time` 或对应 clip 范围。
+
+不能只输出 `datetime.now()` 这种服务处理时间。
+
+### 5. Model Inference
+
+- 优先兼容 OpenAI-compatible `/v1/chat/completions`。
+- 默认模型名：`memai-zhengxin-v3-20260413`。
+- 默认配置使用 `api_base_url: http://localhost:8679` 和 `chat_completions_path: /v1/chat/completions`，由代码拼接为完整请求 URL。
+- RTX 3080 20GB 上 batch 保守起步，先 batch size 1，再逐步尝试 2-4。
+- vLLM dtype 和显存参数要在测试环境验证；如 BF16 不稳定，优先 FP16。
+
+## Security And Data Rules
+
+- 不要在新文档、测试夹具或输出示例中复制真实 RTSP 密码、token、Webhook 密钥、Cookie。
+- 参考项目 `service/config.yaml` 包含真实内网 RTSP URL 和密码，阅读可以，传播要脱敏。
+- 本地视频、抽帧图片、模型输出可能包含门店画面，默认视为敏感数据。
+- 不要把视频帧、日志、输出样例批量复制到仓库外部。
+- 不要删除用户已有视频或模型文件。
+
+## Implementation Rules
+
+- 所有新增代码放在本项目目录内。
+- 不修改参考项目实时 RTSP 主链路。
+- 可参考 `shared/vlm_client.py` 的接口设计，但新实现应位于本项目。
+- 不引入不必要的分布式系统。
+- 不引入大型依赖解决小问题。
+- 保持配置、运行命令、文档一致。
+- 所有输出文件命名要稳定，支持断点续跑。
+- JSON 输出必须可被机器解析；模型 raw response 可以保留，但不能作为唯一结构化结果。
+
+## Validation Matrix
+
+阶段 0 文档/agent 规则：
+
+- 检查文件存在：
+  - `agent.md`
+  - `task_plan.md`
+  - `findings.md`
+  - `progress.md`
+  - `docs/project.md`
+  - `memories.md`
+- 检查 `zhengxin-vlm-0413` 下没有本次误放的工作流文件。
+
+本地批处理代码变更后：
+
+- Python 语法检查：
+  - `python3 -m py_compile <changed python files>`
+- 单元测试：
+  - 如新增 tests，运行对应 `python3 -m unittest ...` 或 `pytest ...`
+- FFmpeg/NVDEC 检查：
+  - `ffmpeg -hwaccels`
+  - `ffmpeg -decoders | grep cuvid`
+- vLLM health check：
+  - `curl http://localhost:8679/v1/models`
+- 最小视频 smoke test：
+  - 使用一个短视频目录运行本地批处理入口。
+  - 检查输出包含 clip 级时间轴和汇总 JSON。
+
+测试环境验证：
+
+- 通过 `ssh xiaozheng@192.168.5.100` 执行前，先确认路径、依赖和 GPU 状态。
+- 远端命令要尽量只读或写入明确输出目录。
+- 不要覆盖远端已有模型和配置。
+
+## Definition Of Done
+
+本次 PoC 完成必须满足：
+
+1. 支持本地文件夹所有视频分析。
+2. 不依赖海康云眸云存储。
+3. 模型提示词可通过 config 调整。
+4. 输出包含视频、clip、事件的监控时间轴。
+5. 4B VLM 使用现有模型路径或测试环境已有模型。
+6. 断点续跑和失败记录有基本支持。
+7. 文档更新，包含运行命令、配置项和输出结构。
+8. 必要验证命令已运行并记录。
+9. 每个阶段的子 agent 审查结论记录在 `progress.md`。
--- a/config/local_batch.yaml
+++ b/config/local_batch.yaml
@@ -0,0 +1,173 @@
+input:
+  dir: ./videos
+  recursive: true
+  extensions: [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"]
+
+source:
+  mode: local
+
+output:
+  dir: ./outputs/local-batch
+  overwrite: false
+  resume: true
+  keep_frames: true
+
+hik_cloud:
+  api_base_url: https://api2.hik-cloud.com
+  download_path: /v1/carrier/cstorage/open/play/download
+  access_token: null
+  access_token_env: HIK_CLOUD_ACCESS_TOKEN
+  chunk_seconds: 600
+  timeout_seconds: 60
+  download_timeout_seconds: 600
+  devices:
+    - device_serial: EXAMPLE_DEVICE_SERIAL
+      channel_no: 1
+      name: example-device
+  time_ranges:
+    - begin: "2026-02-03 09:00:00"
+      end: "2026-02-03 10:00:00"
+
+ffprobe:
+  timeout_seconds: 30
+
+ffmpeg:
+  prefer_nvdec: true
+  allow_cpu_fallback: false
+  hwaccel: cuda
+  codec_decoders:
+    h264: h264_cuvid
+    hevc: hevc_cuvid
+  frame_fps: 1
+  frame_width: 640
+  jpeg_quality: 4
+  timeout_seconds_per_video: 3600
+
+clip:
+  length_seconds: 10
+  stride_seconds: 10
+  frames_per_clip: 8
+  min_frames_per_clip: 4
+
+vlm:
+  api_base_url: http://localhost:8679
+  chat_completions_path: /v1/chat/completions
+  model: memai-zhengxin-v3-20260413
+  timeout_seconds: 120
+  max_tokens: 512
+  temperature: 0
+  batch_size: 1
+  image_transport: data_uri
+  retries: 1
+
+prompt:
+  system: >-
+    You are an AI quality inspector and store monitoring assistant for a fried chicken cutlet (鸡排) production line and storefront.
+    Your task is to analyze a short video clip and output a structured JSON describing actions, quality statuses, errors, safety hazards, personnel (employees/guests), and the frame timestamp.
+
+
+    All 9 top-level keys below are REQUIRED in every response. Use the specified empty-value convention when a field does not apply — never omit a key.
+
+
+    ### 1. Action (REQUIRED)
+
+    Identify the primary action. Use the "Action_" prefix on every label except End_Frying. If no action is detected, output "Action_Idle".
+
+    Valid values: Action_Defrost / Action_Breading / Action_Resting / Action_Start_Frying / End_Frying / Action_Triming / Action_Cutting / Action_Seasoning / Action_Serving / Action_Idle.
+
+
+    ### 2. quality_status (REQUIRED — "" if not applicable)
+
+    Choose based on the action:
+
+    - Action_Breading → fully_covered | uneven
+
+    - Action_Resting → stacked | qualified
+
+    - Action_Start_Frying / End_Frying → standard_time | early_retrieval | overcooked | double_fried
+
+    - Action_Cutting → complete_cut | linked | dusted_before_cut
+
+    - Action_Seasoning → coverage_high | missed | single_side_dusted
+
+    - Other actions → qualified
+
+    If no ingredient is visible or the action has no applicable status, output "".
+
+
+    ### 3. error_type (REQUIRED — "" if no error)
+
+    Short description of any anomaly. Examples: "smoking", "dusted_before_cut", "single_side_dusted", "double_fried". If the operation is normal, output "".
+
+
+    ### 4. 安全隐患 (REQUIRED — "" if no hazard)
+
+    Chinese description of any safety hazard visible in the scene (e.g., "油锅附近有易燃物"). If none, output "".
+
+
+    ### 5. 人物位置 (REQUIRED — "" if no people)
+
+    Descriptive Chinese sentence of where people are and how they are moving. Example: "员工在油锅边". If no one is in the frame, output "".
+
+
+    ### 6. 总结 (REQUIRED — "无" if no people)
+
+    Descriptive Chinese sentence summarizing the scene with the exact person count. Example: "员工在油锅边炸鸡，顾客在收银台前等待". If no one is in the frame, output "无".
+
+
+    ### 7. 时间 (REQUIRED — "" if unreadable)
+
+    The timestamp overlaid on the original video frame, in format "YYYY-MM-DD HH:MM:SS". If the timestamp is not visible or cannot be read, output "".
+
+
+    ### 8. employees (REQUIRED — [] if none)
+
+    Array of employee objects. Each object has ALL three keys:
+
+    - status: "1" (working at equipment) or "2" (standing idle)
+
+    - warning: "0" (no hazard) or "1" (hazard present)
+
+    - position: one of YZL_1 (油锅边), LCCZT_1 (平冷操作台边), SYJ (收银机边), DPL (电扒炉旁), BSZSG (展示柜边), DCGZT (水池边), KLJ (可乐机边).
+
+    If no employees are in the frame, output [].
+
+
+    ### 9. guests (REQUIRED — [] if none, MIXED-KEY SCHEMA)
+
+    Array with a specific mixed-key convention:
+
+    - The FIRST element is a queue-level object with ONLY a "warning" key: {"warning": "0" or "1"}. "1" means the queue has ≥ 3 people; "0" means < 3.
+
+    - Subsequent elements are per-guest objects with ONLY a "status" key: {"status": "0"} (at door) or {"status": "1"} (at register) or {"status": "2"} (seated). One such object per visible guest.
+
+    If there are no guests at all, output []. If only the queue header is known, output [{"warning": "0 or 1"}].
+
+    Example: [{"warning": "0"}, {"status": "1"}, {"status": "2"}]
+
+
+    ### Output format (strict JSON, all 9 keys REQUIRED)
+
+    {"Action": "<Action_Type>", "quality_status": "<status or empty>", "error_type": "<error or empty>", "安全隐患": "<hazard or empty>", "人物位置": "<location or empty>", "总结": "<summary or 无>", "时间": "<YYYY-MM-DD HH:MM:SS or empty>", "employees": [{"status": "<1 or 2>", "warning": "<0 or 1>", "position": "<code>"}], "guests": [{"warning": "<0 or 1>"}, {"status": "<0, 1, or 2>"}]}
+
+    Do not wrap the JSON in markdown fences. Do not add any prose before or after the JSON.
+  user: 'Analyze the video clip and return the required JSON with all 9 keys. Read the timestamp from the frame overlay into "时间".'
+
+schema:
+  version: local-batch-v1
+  event_types:
+    - customer_enter
+    - customer_leave
+    - queue_detected
+    - staff_absent
+    - staff_present
+    - area_crowded
+    - abnormal_behavior
+    - unknown
+  require_strict_json: true
+  parse_retry: 1
+  merge_gap_seconds: 30
+
+runtime:
+  timezone: Asia/Shanghai
+  log_level: INFO
--- a/docs/project.md
+++ b/docs/project.md
@@ -0,0 +1,719 @@
+# Project Documentation
+
+## Goal
+
+本项目是在 `/Users/yoilun/AI-train/video-ai-analysis-poc` 中实现视频离线批处理分析 PoC。`v1.0` 已支持本地视频文件夹；`v1.1` 新增海康云存储录像下载作为视频来源，下载完成后复用现有抽帧、clip、VLM 推理和聚合流程。
+
+必须支持：
+
+- 选择一个本地视频文件夹。
+- 直接调用海康云存储录像下载 API 获取录像下载地址并下载视频。
+- AccessToken 通过 config 或环境变量配置，不写入测试夹具和文档样例。
+- 设备序列号和通道可配置，并支持多设备。
+- 分析时间段包含年月日，支持 `YYYY-MM-DD HH:MM:SS` 配置。
+- 海康 API 单次最多下载 1 小时，超过 1 小时的时间段必须拆成多个不超过 3600 秒的请求；默认示例使用 600 秒分片，真实 smoke 中比 3600 秒更稳定。
+- 自动发现文件夹内所有常见视频文件。
+- 对每个视频按 1 FPS 抽帧，按 10-20 秒 clip 组织输入。
+- 使用已有 4B VLM 模型能力，兼容 `memai-zhengxin-v3-20260413` 的 OpenAI-compatible vLLM 接口。
+- prompt 通过 config 调整。
+- 输出结构化 JSON/JSONL。
+- 输出中必须包含监控画面的时间轴，包括视频、clip、frame 和事件的时间定位。
+
+## v1.1 Hik Cloud Storage Source
+
+海康文档 `录像下载流程_1.pdf` 的“2、获取录像下载地址”定义：
+
+```text
+POST https://api2.hik-cloud.com/v1/carrier/cstorage/open/play/download
+Authorization: bearer <AccessToken>
+Content-Type: application/json
+```
+
+请求 body：
+
+```json
+{
+  "deviceSerial": "EXAMPLE_DEVICE_SERIAL",
+  "channelNo": 1,
+  "timeBegin": 1764856787,
+  "timeEnd": 1764856978
+}
+```
+
+成功返回 `data.url`、`actualBeginTime`、`actualEndTime`。错误码 `80430002` 包含起止时间大于 3600 秒的参数错误，错误码 `80438027` 表示起始时间内没有录像。
+
+配置示例：
+
+```yaml
+source:
+  mode: hik_cloud  # local | hik_cloud
+
+hik_cloud:
+  api_base_url: https://api2.hik-cloud.com
+  download_path: /v1/carrier/cstorage/open/play/download
+  access_token: null
+  access_token_env: HIK_CLOUD_ACCESS_TOKEN
+  chunk_seconds: 600
+  timeout_seconds: 60
+  download_timeout_seconds: 600
+  devices:
+    - device_serial: EXAMPLE_DEVICE_SERIAL
+      channel_no: 1
+      name: store-front
+  time_ranges:
+    - begin: "2026-02-03 09:00:00"
+      end: "2026-02-03 11:30:00"
+```
+
+云下载输出：
+
+- `hik_cloud_download_manifest.jsonl`：每个设备/通道/时间分片的请求、实际时间、状态和错误。`--dry-run` 云模式只请求下载地址并写入 `address_ok` / failure 状态，不下载 mp4，不 probe。
+- `downloads/hik_cloud/<device_serial>/ch<channel_no>/*.mp4`：下载后供现有分析链路消费的视频文件。
+- `video_manifest.jsonl`：保留现有契约，并附加云来源元数据。
+
+运行本地文件夹模式：
+
+```bash
+python3 -B -m video_ai_analysis_poc.cli \
+  --config config/local_batch.yaml \
+  --input-dir /path/to/local/videos \
+  --output-dir ./outputs/local-batch
+```
+
+运行海康云存储模式时，复制配置文件并设置 `source.mode: hik_cloud`，AccessToken 优先通过环境变量提供：
+
+```bash
+export HIK_CLOUD_ACCESS_TOKEN='<redacted>'
+python3 -B -m video_ai_analysis_poc.cli \
+  --config /path/to/hik-cloud.yaml \
+  --output-dir ./outputs/hik-cloud
+```
+
+`--dry-run` 会请求海康下载地址并写 `hik_cloud_download_manifest.jsonl`，但不会下载视频文件、probe、抽帧、推理或聚合。`--until clips` 会在下载、探测、抽帧和 clip manifest 后停止；`--until inference` 会继续运行模型推理并写入 `clip_results.jsonl`。
+
+真实远端 smoke 观察到同一 1 小时时间段直接按 3600 秒下载时，云端返回的 MP4 缺少 `moov` atom，`ffprobe` 无法解析；改用 600 秒分片后 6 个分片均可探测并进入抽帧。抽帧阶段会根据云下载记录的 `actual_begin/actual_end` 或 `requested_begin/requested_end` 给 FFmpeg 加输出帧数上限，避免海康 MP4 异常时间戳导致 `fps=1` 复制出过量帧。
+
+海康云存储安全规则：
+
+- 不提交真实 AccessToken。
+- 优先使用 `hik_cloud.access_token_env: HIK_CLOUD_ACCESS_TOKEN`。
+- 不记录 Authorization header。
+- 不持久化签名下载 URL query，例如 `sign`、`sig`、`token`、`access_token`。
+- `access_token.md` 是敏感验证文件，只能用于远端真实 smoke，不复制进文档、测试或输出样例。
+
+## Directory Boundaries
+
+```text
+/Users/yoilun/AI-train/video-ai-analysis-poc
+  本次 PoC 项目目录，后续代码、配置、计划、文档都放这里。
+
+/Users/yoilun/AI-train/zhengxin-vlm-0413
+  外部模型和参考实现目录，不是本次项目目录。
+```
+
+硬性边界：
+
+- 不在 `zhengxin-vlm-0413` 中创建本项目文件。
+- 不修改 `zhengxin-vlm-0413/models/**`。
+- 不修改 `zhengxin-vlm-0413/service/config.yaml`、`service/config.yaml-bk`、`docker/.env`。
+- 不把参考项目真实 RTSP、Webhook、token、Cookie、密码写入本项目示例配置、测试夹具、文档或输出样例。
+- 输出目录只能是用户显式传入目录，或本项目内 `outputs/`。
+- 不覆盖用户原始视频文件。
+
+## Inference Architecture Decision
+
+本 PoC 明确选择：
+
+```text
+OpenAI-compatible vLLM API
+```
+
+不在 PoC 第一版中直接加载 PyTorch + Transformers + PEFT。原因：
+
+- 用户说明测试环境已有模型。
+- 参考项目已经使用 vLLM OpenAI-compatible API。
+- 本地视频批处理的主要目标是打通工程链路，而不是重新实现模型服务。
+
+配置字段固定为：
+
+```yaml
+vlm:
+  api_base_url: http://localhost:8679
+  chat_completions_path: /v1/chat/completions
+```
+
+代码拼接规则：
+
+```text
+chat_url = api_base_url.rstrip("/") + chat_completions_path
+```
+
+不要在配置中同时传完整 endpoint 和 base URL，避免出现 `/v1/chat/completions/v1/chat/completions` 之类的双拼路径。
+
+## Target File Structure
+
+```text
+video-ai-analysis-poc/
+  agent.md
+  task_plan.md
+  findings.md
+  progress.md
+  memories.md
+  video_ai_analysis_system_plan.md
+  config/
+    local_batch.yaml
+  video_ai_analysis_poc/
+    __init__.py
+    cli.py
+    config.py
+    paths.py
+    discovery.py
+    probe.py
+    ffmpeg_sampler.py
+    frames.py
+    clips.py
+    vlm_client.py
+    result_parser.py
+    aggregator.py
+    manifest.py
+    logging_utils.py
+  schemas/
+    clip_result.schema.json
+    video_result.schema.json
+    folder_summary.schema.json
+  tests/
+    test_config.py
+    test_discovery.py
+    test_probe.py
+    test_clips.py
+    test_result_parser.py
+    test_aggregator.py
+  outputs/
+    .gitkeep
+```
+
+## Module Boundaries
+
+### `config.py`
+
+- 加载 `config/local_batch.yaml`。
+- 合并 CLI 参数覆盖项。
+- 校验必填字段、数值范围、路径安全。
+- 不访问视频、不调用 FFmpeg、不调用模型。
+
+### `paths.py`
+
+- 生成稳定 `video_id`、`clip_id`。
+- 生成输出目录结构。
+- 防止输出目录指向参考模型目录或覆盖输入视频目录。
+
+### `discovery.py`
+
+- 只负责按 `input.dir`、`recursive`、`extensions` 发现视频。
+- 输出 `video_manifest.jsonl`。
+- 不做 ffprobe，不做抽帧，不调用模型。
+
+### `probe.py`
+
+- 包装 `ffprobe`。
+- 输出 `duration_seconds`、`codec_name`、`width`、`height`、`fps`、`format_name`、`start_time`。
+- 损坏或不支持视频标记 `probe_failed`，记录 `last_error`，不阻塞其他视频。
+
+### `ffmpeg_sampler.py`
+
+- 使用 FFmpeg + NVDEC 做 1 FPS 抽帧。
+- 根据 codec 选择 `h264_cuvid` / `hevc_cuvid`。
+- 默认 `allow_cpu_fallback: false`。
+- 输出 JPEG 和 `frame_manifest.jsonl`。
+- 保存 FFmpeg stderr 摘要，作为实际使用 GPU 解码的证据。
+
+### `frames.py`
+
+- 计算 frame 的相对秒数和 timecode。
+- 维护 frame 文件路径、offset、timecode。
+- 优先使用可获得的 `pts_time`，否则使用抽帧序号按 FPS 推导相对时间。
+
+### `clips.py`
+
+- 读取 `frame_manifest.jsonl`。
+- 按 `clip.length_seconds` 和 `clip.stride_seconds` 构建 clip。
+- 从 1 FPS 帧中均匀采样 `frames_per_clip`。
+- 输出 `clip_manifest.jsonl`，必须包含参与推理的实际帧时间。
+
+### `vlm_client.py`
+
+- 调用 OpenAI-compatible `/v1/chat/completions`。
+- 多帧使用 `image_url`，默认 `data:image/jpeg;base64`。
+- prompt 来自 config，不硬编码。
+- 不解析业务事件，只返回 raw response、latency 和 HTTP 状态。
+- 阶段 4 实现使用 Python 标准库 `urllib`，并暴露可注入 HTTP 函数以便测试 mock；默认 URL 拼接为 `vlm.api_base_url.rstrip("/") + vlm.chat_completions_path`。
+
+### `result_parser.py`
+
+- 从 raw response 中提取严格 JSON。
+- 校验 `schema_version`、`events`、`screen_time`、事件枚举等字段。
+- 解析失败触发一次严格 prompt 重试。
+- 仍失败写 `parse_failed`，保留 `raw_response`。
+- 阶段 4 实现支持 raw JSON、markdown/prose 中嵌入 JSON，输出 clip 级 `monitoring_timeline`、`events`、`raw_response`、`processing` 和 `error` 字段。
+
+### `aggregator.py`
+
+- 消费 `video_manifest.jsonl`、`clip_manifest.jsonl` 和 `clip_results.jsonl`。
+- 聚合为 `videos/<video_id>/video_result.json` 和输出根目录下的 `folder_summary.json`。
+- 按 `merge_gap_seconds` 合并同视频、同类型、相邻时间范围接近的事件。
+- 保留事件相对时间轴、screen_time、clip evidence 和 frame evidence。
+- 统计 `parse_failed` / `inference_failed` clip 数量。
+
+### `manifest.py`
+
+- 负责 JSONL 读写和状态字段。
+- 支持断点续跑。
+- 每条记录包含 `status`、`retry_count`、`last_error`。
+
+## Config Schema
+
+`config/local_batch.yaml` 建议字段：
+
+```yaml
+input:
+  dir: /path/to/videos
+  recursive: true
+  extensions: [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"]
+
+source:
+  mode: local
+
+output:
+  dir: ./outputs/local-batch
+  overwrite: false
+  resume: true
+  keep_frames: true
+
+hik_cloud:
+  api_base_url: https://api2.hik-cloud.com
+  download_path: /v1/carrier/cstorage/open/play/download
+  access_token: null
+  access_token_env: HIK_CLOUD_ACCESS_TOKEN
+  chunk_seconds: 600
+  timeout_seconds: 60
+  download_timeout_seconds: 600
+  devices:
+    - device_serial: EXAMPLE_DEVICE_SERIAL
+      channel_no: 1
+      name: example-device
+  time_ranges:
+    - begin: "2026-02-03 09:00:00"
+      end: "2026-02-03 10:00:00"
+
+ffprobe:
+  timeout_seconds: 30
+
+ffmpeg:
+  prefer_nvdec: true
+  allow_cpu_fallback: false
+  hwaccel: cuda
+  codec_decoders:
+    h264: h264_cuvid
+    hevc: hevc_cuvid
+  frame_fps: 1
+  frame_width: 640
+  jpeg_quality: 4
+  timeout_seconds_per_video: 3600
+
+clip:
+  length_seconds: 10
+  stride_seconds: 10
+  frames_per_clip: 8
+  min_frames_per_clip: 4
+
+vlm:
+  api_base_url: http://localhost:8679
+  chat_completions_path: /v1/chat/completions
+  model: memai-zhengxin-v3-20260413
+  timeout_seconds: 120
+  max_tokens: 512
+  temperature: 0
+  batch_size: 1
+  image_transport: data_uri
+  retries: 1
+
+prompt:
+  system: "You are a store video analysis assistant. Return strict JSON only."
+  user: "Analyze this clip. Return events and screen_time. If no event, return events: []."
+
+schema:
+  version: local-batch-v1
+  event_types:
+    - customer_enter
+    - customer_leave
+    - queue_detected
+    - staff_absent
+    - staff_present
+    - area_crowded
+    - abnormal_behavior
+    - unknown
+  require_strict_json: true
+  parse_retry: 1
+  merge_gap_seconds: 30
+
+runtime:
+  timezone: Asia/Shanghai
+  log_level: INFO
+```
+
+## File Contracts
+
+### `video_manifest.jsonl`
+
+One line per discovered video:
+
+```json
+{
+  "video_id": "stable_hash_or_slug",
+  "source_path": "/path/to/video.mp4",
+  "status": "pending",
+  "probe": null,
+  "retry_count": 0,
+  "last_error": null
+}
+```
+
+### `frame_manifest.jsonl`
+
+One line per sampled frame:
+
+```json
+{
+  "video_id": "stable_hash_or_slug",
+  "frame_id": "stable_hash_or_slug_f000120",
+  "frame_path": "frames/stable_hash_or_slug/000120.jpg",
+  "offset_seconds": 120.0,
+  "timecode": "00:02:00",
+  "pts_time": 120.0,
+  "status": "sampled"
+}
+```
+
+### `clip_manifest.jsonl`
+
+One line per clip:
+
+```json
+{
+  "video_id": "stable_hash_or_slug",
+  "clip_id": "stable_hash_or_slug_c000012",
+  "clip_start_seconds": 120.0,
+  "clip_end_seconds": 130.0,
+  "clip_start_timecode": "00:02:00",
+  "clip_end_timecode": "00:02:10",
+  "frame_times": [
+    {
+      "frame_path": "frames/stable_hash_or_slug/000120.jpg",
+      "offset_seconds": 120.0,
+      "timecode": "00:02:00"
+    }
+  ],
+  "status": "pending",
+  "retry_count": 0,
+  "last_error": null
+}
+```
+
+### `clip_results.jsonl`
+
+One line per inferred clip:
+
+```json
+{
+  "schema_version": "local-batch-v1",
+  "video_id": "stable_hash_or_slug",
+  "video_path": "/path/to/video.mp4",
+  "clip_id": "stable_hash_or_slug_c000012",
+  "status": "ok",
+  "monitoring_timeline": {
+    "timezone": "Asia/Shanghai",
+    "video_start_time": null,
+    "clip_start_seconds": 120.0,
+    "clip_end_seconds": 130.0,
+    "clip_start_timecode": "00:02:00",
+    "clip_end_timecode": "00:02:10",
+    "frame_times": [
+      {
+        "frame_path": "frames/stable_hash_or_slug/000120.jpg",
+        "offset_seconds": 120.0,
+        "timecode": "00:02:00"
+      }
+    ],
+    "screen_time": "2026-06-14 12:31:20"
+  },
+  "events": [
+    {
+      "event_type": "queue_detected",
+      "start_time": null,
+      "end_time": null,
+      "start_offset_seconds": 120.0,
+      "end_offset_seconds": 130.0,
+      "confidence": 0.86,
+      "severity": "medium",
+      "attributes": {},
+      "evidence": {
+        "clip_id": "stable_hash_or_slug_c000012",
+        "frame_paths": ["frames/stable_hash_or_slug/000120.jpg"]
+      }
+    }
+  ],
+  "raw_response": null,
+  "processing": {
+    "started_at": "2026-06-15T10:00:00+08:00",
+    "finished_at": "2026-06-15T10:00:02+08:00",
+    "latency_ms": 1800
+  },
+  "error": null
+}
+```
+
+### `video_result.json`
+
+Written to:
+
+```text
+videos/<video_id>/video_result.json
+```
+
+Required top-level fields:
+
+```text
+schema_version
+video_id
+video_path
+probe
+monitoring_timeline.video_start_time
+monitoring_timeline.video_duration_seconds
+clip_count
+failed_clip_count
+event_counts
+events
+outputs.clip_results_jsonl
+processing
+```
+
+### `folder_summary.json`
+
+Required top-level fields:
+
+```text
+schema_version
+input_dir
+video_count
+processed_video_count
+failed_video_count
+event_counts
+videos
+processing
+```
+
+## Timeline Rules
+
+时间轴必须区分三类时间：
+
+- 视频相对时间：`offset_seconds`、`timecode`。
+- 画面 OCR 时间：`screen_time` 或模型输出里的 `画面时间`。
+- 处理时间：`processing.started_at`、`processing.finished_at`。
+
+本地视频没有可靠业务开始时间时：
+
+- `video_start_time` 必须为 `null`。
+- 不允许伪造绝对时间。
+- 事件必须保留 `start_offset_seconds` 和 `end_offset_seconds`。
+
+参与推理的实际帧时间必须写入 `frame_times`。不能只写 clip 起止时间。
+
+## Reference Code Usage
+
+可以参考：
+
+- `zhengxin-vlm-0413/shared/vlm_client.py` 的 OpenAI-compatible payload 结构。
+- `zhengxin-vlm-0413/shared/frame_utils.py` 的 base64 data URI 处理方式。
+- `zhengxin-vlm-0413/service/config.yaml` 的 prompt 配置风格。
+
+不能直接复用为核心实现：
+
+- `frame_utils.extract_frames_from_video`，因为它是整段均匀抽 8 帧，不满足 1 FPS、clip manifest、时间轴要求。
+- `vlm_client.extract_action`，因为它只解析 `Action`，不能覆盖本项目完整事件和时间轴 schema。
+- `rtsp_service.py` 主循环，因为它服务实时 RTSP，不适合离线文件夹批处理。
+
+## Validation Matrix
+
+### Phase 1 Architecture Validation
+
+阶段 1 complete 条件：
+
+- `docs/project.md` 固化模块边界、文件输出契约、config schema、时间轴 schema、安全边界和验证矩阵。
+- 推理接口选择已明确为 OpenAI-compatible vLLM。
+- API URL 字段语义已固定为 `api_base_url` + `chat_completions_path`。
+- 已声明参考 `frame_utils.py` / `vlm_client.py` 哪些可借鉴、哪些不能直接复用。
+- 已列出阶段 2-6 的 smoke test 输入、命令、期望输出字段和失败判定标准。
+- 子 agent 审查结论记录到 `progress.md`。
+
+### Phase 2 Validation
+
+目标：本地视频发现、ffprobe、manifest、CLI 骨架。
+
+命令：
+
+```bash
+python3 -m py_compile video_ai_analysis_poc/*.py
+python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/videos --output-dir ./outputs/local-batch --dry-run
+```
+
+期望：
+
+- 生成 `video_manifest.jsonl`。
+- 损坏/不支持视频被标记失败，不阻塞其他视频。
+- 不读取或写入参考模型目录。
+
+### Phase 3 Validation
+
+目标：FFmpeg/NVDEC 1 FPS 抽帧和 clip 构建。
+
+命令：
+
+```bash
+ffmpeg -hwaccels
+ffmpeg -decoders | grep cuvid
+python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/short-videos --output-dir ./outputs/local-batch --until clips
+```
+
+期望：
+
+- 对一个样例视频实际运行带 `-hwaccel cuda` 和 `h264_cuvid` 或 `hevc_cuvid` 的抽帧命令。
+- 保存 FFmpeg stderr 或日志中的解码器证据。
+- 生成 `frame_manifest.jsonl` 和 `clip_manifest.jsonl`。
+- `clip_manifest.jsonl` 包含 `frame_times`。
+
+### Phase 4 Validation
+
+目标：vLLM OpenAI-compatible API、prompt 配置、JSON 解析重试。
+
+命令：
+
+```bash
+curl http://localhost:8679/v1/models
+python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/short-videos --output-dir ./outputs/local-batch --until inference --limit-clips 3
+```
+
+期望：
+
+- prompt 从 config 读取。
+- 请求 URL 使用 `api_base_url + chat_completions_path`。
+- 生成 `clip_results.jsonl`。
+- 每条结果包含 `monitoring_timeline.frame_times` 和 `screen_time` 字段。
+
+### Phase 5 Validation
+
+目标：clip/video/folder 聚合和 schema 校验。
+
+命令：
+
+```bash
+python3 -m video_ai_analysis_poc.cli --config config/local_batch.yaml --input-dir /path/to/short-videos --output-dir ./outputs/local-batch
+python3 -m json.tool ./outputs/local-batch/folder_summary.json >/dev/null
+```
+
+期望：
+
+- 默认 CLI 运行不传 `--dry-run` 或 `--until` 时，会执行到 inference 并继续 aggregation。
+- `--until clips` 和 `--until inference` 仍停在各自阶段，不写聚合输出。
+- 生成 `videos/<video_id>/video_result.json`。
+- 生成 `folder_summary.json`。
+- 事件聚合保留相对时间轴。
+- JSON 可被标准工具解析。
+
+### Phase 6 Validation
+
+目标：测试环境 smoke test 与文档更新。
+
+远端环境：
+
+```text
+ssh xiaozheng@192.168.5.100
+/home/xiaozheng/video-ai-analysis-poc
+```
+
+模型服务：
+
+```bash
+ssh xiaozheng@192.168.5.100 'curl http://localhost:8679/v1/models'
+```
+
+当前服务状态：
+
+- 容器：`zhengxin-vllm`
+- 镜像：`vllm/vllm-openai:v0.14.1`
+- 端口：`8679`
+- 模型：`memai-zhengxin-v3-20260413`
+- 模型目录挂载：`/home/xiaozheng/zhengxin-vlm-0413/models:/models:ro`
+
+远端能力验证命令：
+
+```bash
+ssh xiaozheng@192.168.5.100 'nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv,noheader'
+ssh xiaozheng@192.168.5.100 'ffmpeg -hwaccels'
+ssh xiaozheng@192.168.5.100 'ffmpeg -decoders'
+```
+
+已验证：
+
+- GPU: `NVIDIA GeForce RTX 3080`, `20480 MiB`, driver `595.71.05`。
+- FFmpeg 6.1.1 支持 `cuda` hwaccel。
+- FFmpeg decoders 包含 `h264_cuvid` 和 `hevc_cuvid`。
+- `/v1/models` 返回模型 id `memai-zhengxin-v3-20260413`。
+- `/v1/chat/completions` 安全 quoted health check 返回 `OK`。
+
+远端 smoke 输入：
+
+```text
+/tmp/video-ai-analysis-poc-smoke.h1cZUR/input/sample_h264.mp4
+```
+
+远端 smoke 输出：
+
+```text
+/tmp/video-ai-analysis-poc-smoke.h1cZUR/output
+```
+
+远端批处理命令：
+
+```bash
+ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m unittest discover -s /home/xiaozheng/video-ai-analysis-poc/tests -v'
+ssh xiaozheng@192.168.5.100 'python3 -B -m compileall -q /home/xiaozheng/video-ai-analysis-poc/video_ai_analysis_poc'
+ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m video_ai_analysis_poc.cli --config /home/xiaozheng/video-ai-analysis-poc/config/local_batch.yaml --input-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/input --output-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/output --until clips'
+ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m video_ai_analysis_poc.cli --config /home/xiaozheng/video-ai-analysis-poc/config/local_batch.yaml --input-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/input --output-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/output --until inference --limit-clips 1'
+ssh xiaozheng@192.168.5.100 'PYTHONPATH=/home/xiaozheng/video-ai-analysis-poc python3 -B -m video_ai_analysis_poc.cli --config /home/xiaozheng/video-ai-analysis-poc/config/local_batch.yaml --input-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/input --output-dir /tmp/video-ai-analysis-poc-smoke.h1cZUR/output'
+```
+
+已验证输出：
+
+- `video_manifest.jsonl`: 1 条视频记录。
+- `frame_manifest.jsonl`: 12 条 sampled frame 记录。
+- `clip_manifest.jsonl`: 1 条 clip 记录。
+- frame manifest 中持久化 `hwaccel: cuda`、`decoder: h264_cuvid`、`ffmpeg_command` 和 FFmpeg stderr 摘要。
+- `clip_results.jsonl`: 1 条记录，`status: ok`，包含 `monitoring_timeline.frame_times`。
+- `videos/<video_id>/video_result.json`: JSON 可解析，`failed_clip_count: 0`。
+- `folder_summary.json`: JSON 可解析，`video_count: 1`、`processed_video_count: 1`。
+- 本地视频没有可靠业务开始时间时，`monitoring_timeline.video_start_time` 输出 `null`；ffprobe 的 `start_time: 0.0` 只保留在 `probe`。
+
+远端验证约束：
+
+- 只写入明确输出目录。
+- 不覆盖远端已有模型、配置和视频。
+- 不复制真实凭据到日志或文档。
+
+## Known Risks
+
+- HEVC decoder 可用性已验证，但实际 smoke 只覆盖 H.264 样例视频。
+- 24 小时真实门店视频吞吐量尚未压测。
+- 海康云眸云录像/RTSP 接入仍在当前本地文件夹 PoC 范围之外。
+- 本地视频可能没有画面内时间戳，必须同时保留相对时间。
+- 模型事件质量尚未用真实门店素材验收；合成测试图没有业务事件，输出空事件是合理结果。
+- 远端 vLLM 容器当前为手工启动，不是生产级 systemd/compose 托管。
--- a/docs/superpowers/plans/2026-06-16-hik-cloud-download-analysis.md
+++ b/docs/superpowers/plans/2026-06-16-hik-cloud-download-analysis.md
@@ -0,0 +1,190 @@
+# Hik Cloud Download Analysis Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Add Hik Cloud Storage recording download as a configurable multi-device source, then feed downloaded videos into the existing model analysis pipeline.
+
+**Architecture:** Keep the current local-folder pipeline intact. Add a cloud acquisition module that plans one-hour chunks, calls the Hik download-address API, downloads videos to local output storage, records a download manifest, and returns local file records for the existing probe/frame/clip/inference/aggregate stages.
+
+**Tech Stack:** Python standard library, existing `unittest` suite, existing JSONL manifest helpers, FFmpeg/vLLM pipeline already in `video_ai_analysis_poc`.
+
+---
+
+### Task 1: Config Schema And Time Chunking
+
+**Files:**
+- Modify: `video_ai_analysis_poc/config.py`
+- Create: `video_ai_analysis_poc/hik_cloud.py`
+- Modify: `tests/test_config.py`
+- Create: `tests/test_hik_cloud.py`
+
+- [ ] **Step 1: Write failing config tests**
+
+Add tests that load:
+
+```yaml
+source:
+  mode: hik_cloud
+hik_cloud:
+  access_token_env: HIK_CLOUD_ACCESS_TOKEN
+  devices:
+    - device_serial: EXAMPLE_DEVICE_SERIAL
+      channel_no: 1
+      name: front
+  time_ranges:
+    - begin: "2026-02-03 09:00:00"
+      end: "2026-02-03 10:30:00"
+```
+
+Expected: `source.mode == "hik_cloud"`, `devices` is a list of dicts, and `time_ranges` is a list of dicts.
+
+- [ ] **Step 2: Write failing chunk tests**
+
+Test that `build_download_chunks(...)` converts the range above into chunks with `timeEnd - timeBegin <= 3600`.
+
+- [ ] **Step 3: Run red tests**
+
+Run:
+
+```bash
+python3 -B -m unittest tests.test_config tests.test_hik_cloud -v
+```
+
+Expected: fail because list-of-mapping parsing and `hik_cloud.py` do not exist yet.
+
+- [ ] **Step 4: Implement minimal parser/defaults/chunking**
+
+Extend the simple YAML parser only enough for list items shaped as mappings. Add defaults for `source` and `hik_cloud`. Implement date-time parsing with `zoneinfo.ZoneInfo`.
+
+- [ ] **Step 5: Run green tests**
+
+Run the same unittest command. Expected: pass.
+
+### Task 2: Hik Download Address API Client
+
+**Files:**
+- Modify: `video_ai_analysis_poc/hik_cloud.py`
+- Modify: `tests/test_hik_cloud.py`
+
+- [ ] **Step 1: Write failing API client tests**
+
+Mock the HTTP function and verify:
+
+- URL is `api_base_url.rstrip("/") + download_path`.
+- Headers include `Authorization: bearer TOKEN`.
+- JSON body includes `deviceSerial`, `channelNo`, `timeBegin`, `timeEnd`.
+- Success returns URL and actual begin/end.
+- Code `80438027` returns a structured `no_recording` result.
+- Other non-zero codes return `address_failed`.
+
+- [ ] **Step 2: Run red tests**
+
+Run:
+
+```bash
+python3 -B -m unittest tests.test_hik_cloud -v
+```
+
+Expected: fail because the client is missing.
+
+- [ ] **Step 3: Implement client**
+
+Use `urllib.request` and injectable callables for tests. Do not log or persist the token.
+
+- [ ] **Step 4: Run green tests**
+
+Run the same command. Expected: pass.
+
+### Task 3: Download Files And Manifest
+
+**Files:**
+- Modify: `video_ai_analysis_poc/hik_cloud.py`
+- Modify: `video_ai_analysis_poc/paths.py`
+- Modify: `tests/test_hik_cloud.py`
+
+- [ ] **Step 1: Write failing downloader tests**
+
+Mock address results and download bytes. Verify downloaded files are written under `downloads/hik_cloud/<device>/ch<channel>/`, filenames contain requested timestamps, manifest rows are written, token/query signatures are not in filenames, and resume skips already downloaded files.
+
+- [ ] **Step 2: Run red tests**
+
+Run:
+
+```bash
+python3 -B -m unittest tests.test_hik_cloud -v
+```
+
+Expected: fail because downloader/manifest behavior is missing.
+
+- [ ] **Step 3: Implement downloader**
+
+Write `download_hik_cloud_recordings(config, output_dir, *, address_client=None, download_url=None)` returning downloaded video records with cloud metadata.
+
+- [ ] **Step 4: Run green tests**
+
+Run the same command. Expected: pass.
+
+### Task 4: CLI Cloud Source Integration
+
+**Files:**
+- Modify: `video_ai_analysis_poc/cli.py`
+- Modify: `tests/test_cli.py`
+
+- [ ] **Step 1: Write failing CLI tests**
+
+Add tests that:
+
+- `source.mode: local` still uses `discover_videos`.
+- `source.mode: hik_cloud` calls the cloud downloader and probes returned downloaded paths.
+- `--dry-run` in cloud mode requests download addresses and writes the download manifest, but does not download video files, probe, call FFmpeg, call VLM, or aggregate.
+- `--until clips` in cloud mode produces video/frame/clip manifests from mocked downloaded video records.
+
+- [ ] **Step 2: Run red tests**
+
+Run:
+
+```bash
+python3 -B -m unittest tests.test_cli -v
+```
+
+Expected: fail because CLI has no source mode branch.
+
+- [ ] **Step 3: Implement CLI branch**
+
+Keep local behavior unchanged. In cloud mode, call downloader before probe and carry cloud metadata into `video_manifest.jsonl`.
+
+- [ ] **Step 4: Run green tests**
+
+Run the same command. Expected: pass.
+
+### Task 5: Docs, Example Config, And Full Verification
+
+**Files:**
+- Modify: `config/local_batch.yaml`
+- Modify: `docs/project.md`
+- Modify: `findings.md`
+- Modify: `progress.md`
+- Modify: `memories.md`
+
+- [ ] **Step 1: Update docs/config**
+
+Add a commented or safe example for `source.mode: hik_cloud`, token env var, devices, and time ranges. Do not include a real token.
+
+- [ ] **Step 2: Run full tests**
+
+Run:
+
+```bash
+python3 -B -m unittest discover -s tests -v
+python3 -B -m py_compile video_ai_analysis_poc/*.py
+```
+
+Expected: all pass.
+
+- [ ] **Step 3: Run local mock smoke**
+
+Use test mocks or a temporary local HTTP fixture to verify cloud mode can produce downloaded files and continue to `--until clips` without a real Hik token.
+
+- [ ] **Step 4: Record results**
+
+Update `progress.md` with commands, results, files changed, and remaining risk. Real Hik API verification is skipped until a real AccessToken/device/time range is provided.
--- a/docs/superpowers/specs/2026-06-16-hik-cloud-download-analysis-design.md
+++ b/docs/superpowers/specs/2026-06-16-hik-cloud-download-analysis-design.md
@@ -0,0 +1,151 @@
+# Hik Cloud Download Analysis Design
+
+## Goal
+
+Add Hik Cloud Storage recording download as a first-class video source for the existing video analysis pipeline. The implementation must support configurable AccessToken, multiple devices, configurable date-time ranges, one-hour API slicing, video downloads, and reuse the existing local analysis pipeline.
+
+## Source Model
+
+The pipeline keeps the existing local mode and adds a cloud mode:
+
+```yaml
+source:
+  mode: local  # local | hik_cloud
+```
+
+`local` keeps the current folder discovery behavior. `hik_cloud` runs a download stage first, then analyzes the downloaded files exactly like local files.
+
+## Hik Cloud Configuration
+
+The config should allow a literal token for controlled testing and an environment variable for normal use:
+
+```yaml
+hik_cloud:
+  api_base_url: https://api2.hik-cloud.com
+  download_path: /v1/carrier/cstorage/open/play/download
+  access_token: null
+  access_token_env: HIK_CLOUD_ACCESS_TOKEN
+  chunk_seconds: 600
+  timeout_seconds: 60
+  download_timeout_seconds: 600
+  devices:
+    - device_serial: EXAMPLE_DEVICE_SERIAL
+      channel_no: 1
+      name: store-front
+  time_ranges:
+    - begin: "2026-02-03 09:00:00"
+      end: "2026-02-03 11:30:00"
+```
+
+The implementation must not print or persist the token. Manifest entries may record the API URL path, device serial, channel, requested times, actual times, and status, but not the Authorization header.
+
+## Time Handling
+
+The user-facing time range includes year, month, day, hour, minute, and second. The config supports both `YYYY-MM-DD HH:MM:SS` strings and integer epoch seconds. String parsing uses `runtime.timezone`, defaulting to `Asia/Shanghai`, and converts to Unix seconds for `timeBegin` and `timeEnd`.
+
+Ranges are split into chunks with `end - begin <= 3600` because the PDF documents error `80430002` when the requested interval exceeds 3600 seconds. The example default uses 600 seconds because real remote smoke found that shorter chunks produced valid, probeable MP4 files for the provided test range.
+
+## API Contract
+
+Use the PDF section “2、获取录像下载地址”:
+
+```text
+POST https://api2.hik-cloud.com/v1/carrier/cstorage/open/play/download
+Authorization: bearer <AccessToken>
+Content-Type: application/json
+```
+
+Request body:
+
+```json
+{
+  "deviceSerial": "EXAMPLE_DEVICE_SERIAL",
+  "channelNo": 1,
+  "timeBegin": 1764856787,
+  "timeEnd": 1764856978
+}
+```
+
+Successful response:
+
+```json
+{
+  "code": 0,
+  "data": {
+    "url": "https://...",
+    "actualBeginTime": "1764856787",
+    "actualEndTime": "1764856978"
+  },
+  "success": true
+}
+```
+
+Non-zero codes become structured failures. `80438027` is treated as `no_recording` so one empty chunk does not stop the batch.
+
+## Output Contract
+
+Cloud downloads write a dedicated manifest:
+
+```text
+<output.dir>/hik_cloud_download_manifest.jsonl
+```
+
+Each row contains:
+
+- `source: hik_cloud`
+- `device_serial`
+- `channel_no`
+- `requested_begin`, `requested_end`
+- `actual_begin`, `actual_end`
+- `download_url_host` or no URL at all if avoiding host persistence is preferred
+- `path` for downloaded video
+- `status`: `address_ok`, `downloaded`, `no_recording`, `address_failed`, `download_failed`
+- `retry_count`, `last_error`
+
+Downloaded videos go under:
+
+```text
+<output.dir>/downloads/hik_cloud/<device_serial>/ch<channel_no>/
+```
+
+Filenames use device/channel/requested timestamps and never include URL query signatures or tokens.
+
+## Pipeline Integration
+
+`cli.py` should branch only at source acquisition:
+
+```text
+local mode:
+  discover local videos -> probe -> frames -> clips -> inference -> aggregate
+
+hik_cloud mode:
+  build chunks -> request download URLs -> download videos -> probe -> frames -> clips -> inference -> aggregate
+```
+
+After downloads complete, the rest of the pipeline should consume downloaded file paths and preserve cloud metadata in `video_manifest.jsonl`.
+
+FFmpeg sampling caps output frames from the requested/actual cloud chunk duration. This prevents malformed or irregular Hik MP4 timestamps from making the `fps=1` filter duplicate tens of thousands of frames for a 10-minute chunk.
+
+Cloud `--dry-run` stops at download-address planning: it requests addresses and writes `hik_cloud_download_manifest.jsonl`, but does not download video files, run ffprobe, sample frames, infer, or aggregate.
+
+## Error Handling
+
+- Missing token: fail fast with a clear config error in `hik_cloud` mode.
+- Invalid range: fail fast if `end <= begin`.
+- API code 80438027: record `no_recording`, continue.
+- Other API non-zero code: record `address_failed`, continue other chunks.
+- Download HTTP/IO failure: record `download_failed`, continue other chunks.
+- Existing downloaded file with manifest status `downloaded`: skip on resume.
+
+## Testing
+
+Use TDD with standard-library mocks:
+
+- config parser loads `devices` as list of dicts.
+- time parser accepts date-time strings and epoch integers.
+- splitter produces max-3600-second chunks.
+- API client builds correct URL, body, bearer header, and parses success/failure.
+- downloader writes bytes and manifest without persisting token.
+- CLI cloud mode uses downloaded files and keeps local mode unchanged.
+
+Real Hik API smoke uses the sensitive `access_token.md` file provided by the user on the remote test environment. Do not copy values from that file into docs, tests, logs, or final responses.
--- a/tests/test_aggregator.py
+++ b/tests/test_aggregator.py
@@ -0,0 +1,309 @@
+import json
+import tempfile
+import unittest
+from datetime import datetime, timedelta
+from pathlib import Path
+
+from video_ai_analysis_poc.aggregator import aggregate_outputs
+
+
+class AggregatorTests(unittest.TestCase):
+    def test_aggregates_video_results_folder_summary_and_merges_adjacent_events(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp)
+            video_a = {
+                "video_id": "video-a",
+                "path": "/videos/a.mp4",
+                "status": "probed",
+                "duration_seconds": 40.0,
+                "codec_name": "h264",
+                "width": 1920,
+                "height": 1080,
+            }
+            video_b = {
+                "video_id": "video-b",
+                "path": "/videos/b.mp4",
+                "status": "probe_failed",
+                "last_error": "bad file",
+            }
+            self._write_jsonl(output_dir / "video_manifest.jsonl", [video_a, video_b])
+            clips = [
+                self._clip("video-a", "video-a_c000001", 0.0, 10.0),
+                self._clip("video-a", "video-a_c000002", 12.0, 20.0),
+                self._clip("video-a", "video-a_c000003", 21.0, 30.0),
+                self._clip("video-b", "video-b_c000001", 0.0, 10.0),
+            ]
+            self._write_jsonl(output_dir / "clip_manifest.jsonl", clips)
+            results = [
+                self._result(
+                    "video-a",
+                    "video-a_c000001",
+                    "/videos/a.mp4",
+                    0.0,
+                    10.0,
+                    "09:00:01",
+                    [{"event_type": "queue_detected", "start_offset_seconds": 1.0, "end_offset_seconds": 10.0}],
+                ),
+                self._result(
+                    "video-a",
+                    "video-a_c000002",
+                    "/videos/a.mp4",
+                    12.0,
+                    20.0,
+                    "09:00:13",
+                    [{"event_type": "queue_detected", "start_offset_seconds": 12.0, "end_offset_seconds": 16.0}],
+                ),
+                self._result(
+                    "video-a",
+                    "video-a_c000003",
+                    "/videos/a.mp4",
+                    21.0,
+                    30.0,
+                    "09:00:22",
+                    [{"event_type": "staff_absent", "start_offset_seconds": 21.0, "end_offset_seconds": 25.0}],
+                ),
+                {
+                    "schema_version": "local-batch-v1",
+                    "video_id": "video-b",
+                    "video_path": "/videos/b.mp4",
+                    "clip_id": "video-b_c000001",
+                    "status": "inference_failed",
+                    "monitoring_timeline": {
+                        "video_start_time": None,
+                        "clip_start_seconds": 0.0,
+                        "clip_end_seconds": 10.0,
+                        "frame_times": [],
+                        "screen_time": "",
+                    },
+                    "events": [],
+                    "raw_response": "",
+                    "processing": {},
+                    "error": "offline",
+                },
+            ]
+            self._write_jsonl(output_dir / "clip_results.jsonl", results)
+
+            aggregate_outputs(
+                output_dir,
+                {
+                    "input": {"dir": "/videos"},
+                    "schema": {"version": "local-batch-v1", "merge_gap_seconds": 3},
+                    "runtime": {"timezone": "Asia/Shanghai"},
+                },
+            )
+
+            video_result_path = output_dir / "videos" / "video-a" / "video_result.json"
+            self.assertTrue(video_result_path.exists())
+            video_result = json.loads(video_result_path.read_text(encoding="utf-8"))
+            self.assertEqual(video_result["schema_version"], "local-batch-v1")
+            self.assertEqual(video_result["video_id"], "video-a")
+            self.assertEqual(video_result["video_path"], "/videos/a.mp4")
+            self.assertEqual(video_result["probe"]["codec_name"], "h264")
+            self.assertIsNone(video_result["monitoring_timeline"]["video_start_time"])
+            self.assertEqual(video_result["monitoring_timeline"]["video_duration_seconds"], 40.0)
+            self.assertEqual(video_result["clip_count"], 3)
+            self.assertEqual(video_result["failed_clip_count"], 0)
+            self.assertEqual(video_result["event_counts"], {"queue_detected": 1, "staff_absent": 1})
+            self.assertEqual(len(video_result["events"]), 2)
+            merged = video_result["events"][0]
+            self.assertEqual(merged["event_type"], "queue_detected")
+            self.assertEqual(merged["start_offset_seconds"], 1.0)
+            self.assertEqual(merged["end_offset_seconds"], 16.0)
+            self.assertEqual(merged["screen_times"], ["09:00:01", "09:00:13"])
+            self.assertEqual(merged["evidence"]["clip_ids"], ["video-a_c000001", "video-a_c000002"])
+            self.assertEqual(
+                [
+                    clip["clip_start_beijing_time"]
+                    for clip in merged["evidence"]["clips"]
+                ],
+                ["2026-06-15 07:00:00", "2026-06-15 07:00:12"],
+            )
+            self.assertEqual(
+                [
+                    clip["clip_end_beijing_time"]
+                    for clip in merged["evidence"]["clips"]
+                ],
+                ["2026-06-15 07:00:10", "2026-06-15 07:00:20"],
+            )
+            self.assertEqual(video_result["outputs"]["clip_results_jsonl"], "clip_results.jsonl")
+            self.assertIn("started_at", video_result["processing"])
+            self.assertIn("finished_at", video_result["processing"])
+
+            failed_video_result = json.loads(
+                (output_dir / "videos" / "video-b" / "video_result.json").read_text(
+                    encoding="utf-8"
+                )
+            )
+            self.assertEqual(failed_video_result["clip_count"], 1)
+            self.assertEqual(failed_video_result["failed_clip_count"], 1)
+            self.assertEqual(failed_video_result["event_counts"], {})
+
+            folder_summary = json.loads(
+                (output_dir / "folder_summary.json").read_text(encoding="utf-8")
+            )
+            self.assertEqual(folder_summary["schema_version"], "local-batch-v1")
+            self.assertEqual(folder_summary["input_dir"], "/videos")
+            self.assertEqual(folder_summary["video_count"], 2)
+            self.assertEqual(folder_summary["processed_video_count"], 1)
+            self.assertEqual(folder_summary["failed_video_count"], 1)
+            self.assertEqual(folder_summary["event_counts"], {"queue_detected": 1, "staff_absent": 1})
+            self.assertEqual(
+                [video["video_id"] for video in folder_summary["videos"]],
+                ["video-a", "video-b"],
+            )
+            self.assertIn("processing", folder_summary)
+
+    def test_ffprobe_start_time_is_not_treated_as_monitoring_timeline_start(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp)
+            self._write_jsonl(
+                output_dir / "video_manifest.jsonl",
+                [
+                    {
+                        "video_id": "video-local",
+                        "path": "/videos/local.mp4",
+                        "status": "probed",
+                        "duration_seconds": 12.0,
+                        "start_time": 0.0,
+                    }
+                ],
+            )
+            self._write_jsonl(
+                output_dir / "clip_manifest.jsonl",
+                [self._clip("video-local", "video-local_c000001", 0.0, 10.0)],
+            )
+            self._write_jsonl(output_dir / "clip_results.jsonl", [])
+
+            aggregate_outputs(
+                output_dir,
+                {
+                    "input": {"dir": "/videos"},
+                    "schema": {"version": "local-batch-v1", "merge_gap_seconds": 3},
+                },
+            )
+
+            video_result = json.loads(
+                (output_dir / "videos" / "video-local" / "video_result.json").read_text(
+                    encoding="utf-8"
+                )
+            )
+            self.assertEqual(video_result["probe"]["start_time"], 0.0)
+            self.assertIsNone(video_result["monitoring_timeline"]["video_start_time"])
+
+    def test_does_not_merge_different_event_types_videos_or_large_gaps(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp)
+            self._write_jsonl(
+                output_dir / "video_manifest.jsonl",
+                [
+                    {"video_id": "video-a", "path": "/videos/a.mp4", "status": "probed"},
+                    {"video_id": "video-b", "path": "/videos/b.mp4", "status": "probed"},
+                ],
+            )
+            self._write_jsonl(
+                output_dir / "clip_manifest.jsonl",
+                [
+                    self._clip("video-a", "a1", 0.0, 10.0),
+                    self._clip("video-a", "a2", 40.0, 50.0),
+                    self._clip("video-a", "a3", 51.0, 60.0),
+                    self._clip("video-b", "b1", 0.0, 10.0),
+                ],
+            )
+            self._write_jsonl(
+                output_dir / "clip_results.jsonl",
+                [
+                    self._result("video-a", "a1", "/videos/a.mp4", 0.0, 10.0, "", [{"event_type": "queue_detected", "start_offset_seconds": 1.0, "end_offset_seconds": 5.0}]),
+                    self._result("video-a", "a2", "/videos/a.mp4", 40.0, 50.0, "", [{"event_type": "queue_detected", "start_offset_seconds": 40.0, "end_offset_seconds": 45.0}]),
+                    self._result("video-a", "a3", "/videos/a.mp4", 51.0, 60.0, "", [{"event_type": "staff_absent", "start_offset_seconds": 51.0, "end_offset_seconds": 55.0}]),
+                    self._result("video-b", "b1", "/videos/b.mp4", 0.0, 10.0, "", [{"event_type": "queue_detected", "start_offset_seconds": 1.0, "end_offset_seconds": 5.0}]),
+                ],
+            )
+
+            aggregate_outputs(
+                output_dir,
+                {
+                    "input": {"dir": "/videos"},
+                    "schema": {"version": "local-batch-v1", "merge_gap_seconds": 3},
+                },
+            )
+
+            video_a = json.loads(
+                (output_dir / "videos" / "video-a" / "video_result.json").read_text(
+                    encoding="utf-8"
+                )
+            )
+            video_b = json.loads(
+                (output_dir / "videos" / "video-b" / "video_result.json").read_text(
+                    encoding="utf-8"
+                )
+            )
+            self.assertEqual(len(video_a["events"]), 3)
+            self.assertEqual(video_a["event_counts"], {"queue_detected": 2, "staff_absent": 1})
+            self.assertEqual(len(video_b["events"]), 1)
+            self.assertEqual(video_b["event_counts"], {"queue_detected": 1})
+
+    def _clip(self, video_id, clip_id, start, end):
+        return {
+            "video_id": video_id,
+            "clip_id": clip_id,
+            "clip_start_seconds": start,
+            "clip_end_seconds": end,
+            "clip_start_timecode": "00:00:00",
+            "clip_end_timecode": "00:00:10",
+            "frame_times": [
+                {
+                    "frame_path": f"frames/{video_id}/{clip_id}.jpg",
+                    "offset_seconds": start,
+                    "timecode": "00:00:00",
+                }
+            ],
+            "status": "pending",
+        }
+
+    def _result(self, video_id, clip_id, video_path, start, end, screen_time, events):
+        base = datetime(2026, 6, 15, 7, 0, 0)
+        clip_start_beijing_time = (base + timedelta(seconds=start)).strftime(
+            "%Y-%m-%d %H:%M:%S"
+        )
+        clip_end_beijing_time = (base + timedelta(seconds=end)).strftime(
+            "%Y-%m-%d %H:%M:%S"
+        )
+        return {
+            "schema_version": "local-batch-v1",
+            "video_id": video_id,
+            "video_path": video_path,
+            "clip_id": clip_id,
+            "status": "ok",
+            "monitoring_timeline": {
+                "video_start_time": None,
+                "clip_start_seconds": start,
+                "clip_end_seconds": end,
+                "clip_start_timecode": "00:00:00",
+                "clip_end_timecode": "00:00:10",
+                "clip_start_beijing_time": clip_start_beijing_time,
+                "clip_end_beijing_time": clip_end_beijing_time,
+                "frame_times": [
+                    {
+                        "frame_path": f"frames/{video_id}/{clip_id}.jpg",
+                        "offset_seconds": start,
+                        "timecode": "00:00:00",
+                        "beijing_time": clip_start_beijing_time,
+                    }
+                ],
+                "screen_time": screen_time,
+            },
+            "events": events,
+            "raw_response": "{}",
+            "processing": {},
+            "error": None,
+        }
+
+    def _write_jsonl(self, path, records):
+        path.write_text(
+            "".join(json.dumps(record, sort_keys=True) + "\n" for record in records),
+            encoding="utf-8",
+        )
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
--- a/tests/test_clips.py
+++ b/tests/test_clips.py
@@ -0,0 +1,167 @@
+import json
+import tempfile
+import unittest
+from pathlib import Path
+
+from video_ai_analysis_poc.clips import build_clip_records, build_clip_records_from_manifest
+
+
+class ClipTests(unittest.TestCase):
+    def test_build_clip_records_uniformly_samples_frames_per_clip(self):
+        frames = [
+            {
+                "video_id": "video-abc",
+                "frame_id": f"video-abc_f{index + 1:06d}",
+                "frame_path": f"frames/video-abc/{index + 1:06d}.jpg",
+                "offset_seconds": float(index),
+                "timecode": f"00:00:{index:02d}",
+                "pts_time": float(index),
+                "status": "sampled",
+            }
+            for index in range(10)
+        ]
+
+        clips = build_clip_records(
+            frames,
+            {
+                "length_seconds": 10,
+                "stride_seconds": 10,
+                "frames_per_clip": 4,
+                "min_frames_per_clip": 2,
+            },
+        )
+
+        self.assertEqual(len(clips), 1)
+        self.assertEqual(clips[0]["clip_id"], "video-abc_c000001")
+        self.assertEqual(clips[0]["clip_start_seconds"], 0.0)
+        self.assertEqual(clips[0]["clip_end_seconds"], 10.0)
+        self.assertEqual(
+            [frame["offset_seconds"] for frame in clips[0]["frame_times"]],
+            [0.0, 3.0, 6.0, 9.0],
+        )
+        self.assertEqual(clips[0]["status"], "pending")
+        self.assertEqual(clips[0]["retry_count"], 0)
+        self.assertIsNone(clips[0]["last_error"])
+
+    def test_tail_clip_end_is_truncated_to_last_frame_interval(self):
+        frames = [
+            {
+                "video_id": "video-abc",
+                "frame_id": f"video-abc_f{index + 1:06d}",
+                "frame_path": f"frames/video-abc/{index + 1:06d}.jpg",
+                "offset_seconds": float(index),
+                "timecode": f"00:00:{index:02d}",
+                "pts_time": float(index),
+                "status": "sampled",
+            }
+            for index in range(15)
+        ]
+
+        clips = build_clip_records(
+            frames,
+            {
+                "length_seconds": 10,
+                "stride_seconds": 10,
+                "frames_per_clip": 8,
+                "min_frames_per_clip": 4,
+            },
+        )
+
+        self.assertEqual(len(clips), 2)
+        self.assertEqual(clips[1]["clip_start_seconds"], 10.0)
+        self.assertEqual(clips[1]["clip_end_seconds"], 15.0)
+        self.assertEqual(clips[1]["clip_end_timecode"], "00:00:15")
+
+    def test_build_clip_records_adds_beijing_time_range_and_frame_times(self):
+        frames = [
+            {
+                "video_id": "video-abc",
+                "frame_id": f"video-abc_f{index + 1:06d}",
+                "frame_path": f"frames/video-abc/{index + 1:06d}.jpg",
+                "offset_seconds": float(index),
+                "timecode": f"00:00:{index:02d}",
+                "pts_time": float(index),
+                "beijing_time": f"2026-06-15 07:00:{index:02d}",
+                "status": "sampled",
+            }
+            for index in range(10)
+        ]
+
+        clips = build_clip_records(
+            frames,
+            {
+                "length_seconds": 10,
+                "stride_seconds": 10,
+                "frames_per_clip": 4,
+                "min_frames_per_clip": 2,
+            },
+        )
+
+        self.assertEqual(clips[0]["clip_start_beijing_time"], "2026-06-15 07:00:00")
+        self.assertEqual(clips[0]["clip_end_beijing_time"], "2026-06-15 07:00:10")
+        self.assertEqual(
+            [frame["beijing_time"] for frame in clips[0]["frame_times"]],
+            [
+                "2026-06-15 07:00:00",
+                "2026-06-15 07:00:03",
+                "2026-06-15 07:00:06",
+                "2026-06-15 07:00:09",
+            ],
+        )
+
+    def test_build_clip_records_from_manifest_skips_failed_frames_and_writes_jsonl(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            frame_manifest = root / "frame_manifest.jsonl"
+            clip_manifest = root / "clip_manifest.jsonl"
+            records = [
+                {
+                    "video_id": "video-abc",
+                    "frame_id": f"video-abc_f{index + 1:06d}",
+                    "frame_path": f"frames/video-abc/{index + 1:06d}.jpg",
+                    "offset_seconds": float(index),
+                    "timecode": f"00:00:{index:02d}",
+                    "pts_time": float(index),
+                    "status": "sampled",
+                }
+                for index in range(4)
+            ]
+            records.append(
+                {
+                    "video_id": "video-abc",
+                    "frame_id": None,
+                    "frame_path": None,
+                    "offset_seconds": None,
+                    "timecode": None,
+                    "pts_time": None,
+                    "status": "sample_failed",
+                    "last_error": "bad decode",
+                }
+            )
+            frame_manifest.write_text(
+                "\n".join(json.dumps(record, sort_keys=True) for record in records) + "\n",
+                encoding="utf-8",
+            )
+
+            clips = build_clip_records_from_manifest(
+                frame_manifest,
+                clip_manifest,
+                {
+                    "length_seconds": 10,
+                    "stride_seconds": 10,
+                    "frames_per_clip": 8,
+                    "min_frames_per_clip": 4,
+                },
+            )
+
+            self.assertEqual(len(clips), 1)
+            self.assertEqual(len(clips[0]["frame_times"]), 4)
+            persisted = [
+                json.loads(line)
+                for line in clip_manifest.read_text(encoding="utf-8").splitlines()
+            ]
+            self.assertEqual(persisted, clips)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -0,0 +1,240 @@
+import tempfile
+import unittest
+from pathlib import Path
+
+from video_ai_analysis_poc.config import load_config
+
+
+class ConfigTests(unittest.TestCase):
+    def test_loads_local_batch_yaml_and_applies_cli_overrides(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            input_dir = root / "videos"
+            output_dir = root / "out"
+            override_input = root / "override-videos"
+            override_output = root / "override-out"
+            input_dir.mkdir()
+            override_input.mkdir()
+            config_path = root / "local_batch.yaml"
+            config_path.write_text(
+                "\n".join(
+                    [
+                        "input:",
+                        f"  dir: {input_dir}",
+                        "  recursive: false",
+                        '  extensions: [".mp4", ".mov"]',
+                        "output:",
+                        f"  dir: {output_dir}",
+                        "  overwrite: false",
+                        "ffprobe:",
+                        "  timeout_seconds: 5",
+                    ]
+                ),
+                encoding="utf-8",
+            )
+
+            config = load_config(
+                config_path,
+                input_dir=override_input,
+                output_dir=override_output,
+            )
+
+            self.assertEqual(config["input"]["dir"], str(override_input.resolve()))
+            self.assertEqual(config["output"]["dir"], str(override_output.resolve()))
+            self.assertFalse(config["input"]["recursive"])
+            self.assertEqual(config["input"]["extensions"], [".mp4", ".mov"])
+            self.assertEqual(config["ffprobe"]["timeout_seconds"], 5)
+
+    def test_rejects_output_dir_equal_to_input_dir(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            input_dir = root / "videos"
+            input_dir.mkdir()
+            config_path = root / "local_batch.yaml"
+            config_path.write_text(
+                "\n".join(
+                    [
+                        "input:",
+                        f"  dir: {input_dir}",
+                        "output:",
+                        f"  dir: {input_dir}",
+                    ]
+                ),
+                encoding="utf-8",
+            )
+
+            with self.assertRaisesRegex(ValueError, "output dir must not equal input dir"):
+                load_config(config_path)
+
+    def test_rejects_output_dir_inside_reference_project(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            input_dir = root / "videos"
+            input_dir.mkdir()
+            forbidden_output = (
+                Path("/Users/yoilun/AI-train/zhengxin-vlm-0413")
+                / "outputs"
+                / "local-batch"
+            )
+            config_path = root / "local_batch.yaml"
+            config_path.write_text(
+                "\n".join(
+                    [
+                        "input:",
+                        f"  dir: {input_dir}",
+                        "output:",
+                        f"  dir: {forbidden_output}",
+                    ]
+                ),
+                encoding="utf-8",
+            )
+
+            with self.assertRaisesRegex(
+                ValueError, "output dir must not be inside forbidden reference dir"
+            ):
+                load_config(config_path)
+
+    def test_loads_nested_mapping_values(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            input_dir = root / "videos"
+            output_dir = root / "output"
+            input_dir.mkdir()
+            config_path = root / "local_batch.yaml"
+            config_path.write_text(
+                "\n".join(
+                    [
+                        "input:",
+                        f"  dir: {input_dir}",
+                        "output:",
+                        f"  dir: {output_dir}",
+                        "ffmpeg:",
+                        "  codec_decoders:",
+                        "    h264: h264_cuvid",
+                        "    hevc: hevc_cuvid",
+                    ]
+                ),
+                encoding="utf-8",
+            )
+
+            config = load_config(config_path)
+
+            self.assertEqual(
+                config["ffmpeg"]["codec_decoders"],
+                {"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
+            )
+
+    def test_loads_prompt_block_scalar_values(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            input_dir = root / "videos"
+            output_dir = root / "output"
+            input_dir.mkdir()
+            config_path = root / "local_batch.yaml"
+            config_path.write_text(
+                "\n".join(
+                    [
+                        "input:",
+                        f"  dir: {input_dir}",
+                        "output:",
+                        f"  dir: {output_dir}",
+                        "prompt:",
+                        "  system: >-",
+                        "    First instruction.",
+                        "    Second instruction.",
+                        "",
+                        "    Final instruction.",
+                        "  user: 'Return strict JSON.'",
+                    ]
+                ),
+                encoding="utf-8",
+            )
+
+            config = load_config(config_path)
+
+            self.assertEqual(
+                config["prompt"]["system"],
+                "First instruction.\nSecond instruction.\n\nFinal instruction.",
+            )
+            self.assertEqual(config["prompt"]["user"], "Return strict JSON.")
+
+    def test_defaults_source_mode_to_local_and_hik_cloud_section(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            input_dir = root / "videos"
+            output_dir = root / "output"
+            input_dir.mkdir()
+            config_path = root / "local_batch.yaml"
+            config_path.write_text(
+                "\n".join(
+                    [
+                        "input:",
+                        f"  dir: {input_dir}",
+                        "output:",
+                        f"  dir: {output_dir}",
+                    ]
+                ),
+                encoding="utf-8",
+            )
+
+            config = load_config(config_path)
+
+            self.assertEqual(config["source"]["mode"], "local")
+            self.assertIn("devices", config["hik_cloud"])
+            self.assertIn("time_ranges", config["hik_cloud"])
+
+    def test_loads_hik_cloud_devices_and_time_ranges_as_list_of_mappings(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            input_dir = root / "videos"
+            output_dir = root / "output"
+            input_dir.mkdir()
+            config_path = root / "local_batch.yaml"
+            config_path.write_text(
+                "\n".join(
+                    [
+                        "input:",
+                        f"  dir: {input_dir}",
+                        "output:",
+                        f"  dir: {output_dir}",
+                        "source:",
+                        "  mode: hik_cloud",
+                        "hik_cloud:",
+                        "  devices:",
+                        "    - device_serial: EXAMPLE_DEVICE_SERIAL",
+                        "      channel_no: 1",
+                        "      name: front",
+                        "  time_ranges:",
+                        '    - begin: "2026-02-03 09:00:00"',
+                        '      end: "2026-02-03 10:30:00"',
+                    ]
+                ),
+                encoding="utf-8",
+            )
+
+            config = load_config(config_path)
+
+            self.assertEqual(config["source"]["mode"], "hik_cloud")
+            self.assertEqual(
+                config["hik_cloud"]["devices"],
+                [
+                    {
+                            "device_serial": "EXAMPLE_DEVICE_SERIAL",
+                        "channel_no": 1,
+                        "name": "front",
+                    }
+                ],
+            )
+            self.assertEqual(
+                config["hik_cloud"]["time_ranges"],
+                [
+                    {
+                        "begin": "2026-02-03 09:00:00",
+                        "end": "2026-02-03 10:30:00",
+                    }
+                ],
+            )
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_discovery.py
+++ b/tests/test_discovery.py
@@ -0,0 +1,41 @@
+import tempfile
+import unittest
+from pathlib import Path
+
+from video_ai_analysis_poc.discovery import discover_videos
+
+
+class DiscoveryTests(unittest.TestCase):
+    def test_discovers_supported_extensions_without_recursion(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            nested = root / "nested"
+            nested.mkdir()
+            supported = root / "a.MP4"
+            unsupported = root / "notes.txt"
+            nested_video = nested / "b.mov"
+            supported.write_text("not a real video", encoding="utf-8")
+            unsupported.write_text("ignore me", encoding="utf-8")
+            nested_video.write_text("not a real video", encoding="utf-8")
+
+            videos = discover_videos(root, [".mp4", ".mov"], recursive=False)
+
+            self.assertEqual(videos, [supported])
+
+    def test_discovers_supported_extensions_recursively_sorted(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            nested = root / "nested"
+            nested.mkdir()
+            first = root / "a.mp4"
+            second = nested / "b.mov"
+            first.write_text("x", encoding="utf-8")
+            second.write_text("x", encoding="utf-8")
+
+            videos = discover_videos(root, [".mp4", ".mov"], recursive=True)
+
+            self.assertEqual(videos, [first, second])
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_ffmpeg_sampler.py
+++ b/tests/test_ffmpeg_sampler.py
@@ -0,0 +1,357 @@
+import json
+import subprocess
+import tempfile
+import unittest
+from pathlib import Path
+from unittest.mock import patch
+
+from video_ai_analysis_poc.ffmpeg_sampler import (
+    build_sample_command,
+    sample_video_frames,
+)
+
+
+class FfmpegSamplerTests(unittest.TestCase):
+    def test_build_sample_command_uses_nvdec_decoder_for_h264(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp) / "output"
+
+            command = build_sample_command(
+                Path("/tmp/input.mp4"),
+                output_dir,
+                "video-abc",
+                {
+                    "prefer_nvdec": True,
+                    "allow_cpu_fallback": False,
+                    "hwaccel": "cuda",
+                    "codec_decoders": {"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
+                    "frame_fps": 1,
+                    "frame_width": 640,
+                    "jpeg_quality": 4,
+                },
+                codec_name="h264",
+            )
+
+        self.assertIn("-hwaccel", command)
+        self.assertIn("cuda", command)
+        self.assertIn("-c:v", command)
+        self.assertIn("h264_cuvid", command)
+        self.assertEqual(command[-1], str(output_dir / "frames" / "video-abc" / "%06d.jpg"))
+
+    def test_build_sample_command_uses_nvdec_decoder_for_hevc(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            command = build_sample_command(
+                Path("/tmp/input.mp4"),
+                Path(tmp) / "output",
+                "video-abc",
+                {
+                    "prefer_nvdec": True,
+                    "allow_cpu_fallback": False,
+                    "hwaccel": "cuda",
+                    "codec_decoders": {"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
+                    "frame_fps": 1,
+                    "frame_width": 640,
+                    "jpeg_quality": 4,
+                },
+                codec_name="hevc",
+            )
+
+        self.assertIn("-hwaccel", command)
+        self.assertIn("cuda", command)
+        self.assertIn("-c:v", command)
+        self.assertIn("hevc_cuvid", command)
+
+    def test_build_sample_command_refuses_cpu_fallback_by_default(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            with self.assertRaisesRegex(ValueError, "NVDEC decoder is required"):
+                build_sample_command(
+                    Path("/tmp/input.mp4"),
+                    Path(tmp),
+                    "video-abc",
+                    {
+                        "prefer_nvdec": True,
+                        "allow_cpu_fallback": False,
+                        "codec_decoders": {"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
+                    },
+                    codec_name="vp9",
+                )
+
+    def test_sample_video_frames_writes_structured_failure_record(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            manifest_path = root / "frame_manifest.jsonl"
+            failure = subprocess.CalledProcessError(
+                returncode=1,
+                cmd=["ffmpeg"],
+                stderr="No decoder h264_cuvid",
+            )
+
+            with patch("subprocess.run", side_effect=failure):
+                records = sample_video_frames(
+                    {
+                        "video_id": "video-abc",
+                        "path": str(root / "input.mp4"),
+                        "codec_name": "h264",
+                    },
+                    root,
+                    {
+                        "prefer_nvdec": True,
+                        "allow_cpu_fallback": False,
+                        "hwaccel": "cuda",
+                        "codec_decoders": {"h264": "h264_cuvid"},
+                        "frame_fps": 1,
+                        "frame_width": 640,
+                        "jpeg_quality": 4,
+                        "timeout_seconds_per_video": 30,
+                    },
+                    manifest_path=manifest_path,
+                )
+
+            self.assertEqual(len(records), 1)
+            self.assertEqual(records[0]["video_id"], "video-abc")
+            self.assertEqual(records[0]["status"], "sample_failed")
+            self.assertIn("h264_cuvid", records[0]["last_error"])
+            persisted = [
+                json.loads(line)
+                for line in manifest_path.read_text(encoding="utf-8").splitlines()
+            ]
+            self.assertEqual(persisted, records)
+
+    def test_sample_video_frames_persists_success_nvdec_evidence(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            manifest_path = root / "frame_manifest.jsonl"
+            video_id = "video-abc"
+            frame_dir = root / "frames" / video_id
+
+            def run_success(*args, **kwargs):
+                frame_dir.mkdir(parents=True, exist_ok=True)
+                (frame_dir / "000001.jpg").write_bytes(b"jpg")
+                return subprocess.CompletedProcess(
+                    args=args[0],
+                    returncode=0,
+                    stdout="",
+                    stderr="Using decoder h264_cuvid with hwaccel cuda",
+                )
+
+            with patch("subprocess.run", side_effect=run_success):
+                records = sample_video_frames(
+                    {
+                        "video_id": video_id,
+                        "path": str(root / "input.mp4"),
+                        "codec_name": "h264",
+                    },
+                    root,
+                    {
+                        "prefer_nvdec": True,
+                        "allow_cpu_fallback": False,
+                        "hwaccel": "cuda",
+                        "codec_decoders": {"h264": "h264_cuvid"},
+                        "frame_fps": 1,
+                        "frame_width": 640,
+                        "jpeg_quality": 4,
+                        "timeout_seconds_per_video": 30,
+                    },
+                    manifest_path=manifest_path,
+                )
+
+            self.assertEqual(records[0]["status"], "sampled")
+            self.assertEqual(records[0]["decoder"], "h264_cuvid")
+            self.assertEqual(records[0]["hwaccel"], "cuda")
+            self.assertIn("h264_cuvid", records[0]["ffmpeg_command"])
+            self.assertIn("Using decoder h264_cuvid", records[0]["stderr_summary"])
+            persisted = [
+                json.loads(line)
+                for line in manifest_path.read_text(encoding="utf-8").splitlines()
+            ]
+            self.assertEqual(persisted, records)
+
+    def test_sample_video_frames_adds_beijing_time_from_hik_actual_begin(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            video_id = "video-abc"
+            frame_dir = root / "frames" / video_id
+
+            def run_success(command, *args, **kwargs):
+                frame_dir.mkdir(parents=True, exist_ok=True)
+                (frame_dir / "000001.jpg").write_bytes(b"jpg")
+                (frame_dir / "000002.jpg").write_bytes(b"jpg")
+                return subprocess.CompletedProcess(
+                    args=command,
+                    returncode=0,
+                    stdout="",
+                    stderr="",
+                )
+
+            with patch("subprocess.run", side_effect=run_success):
+                records = sample_video_frames(
+                    {
+                        "video_id": video_id,
+                        "path": str(root / "input.mp4"),
+                        "codec_name": "h264",
+                        "actual_begin": 1781478000,
+                        "actual_end": 1781478600,
+                    },
+                    root,
+                    {
+                        "prefer_nvdec": True,
+                        "allow_cpu_fallback": False,
+                        "hwaccel": "cuda",
+                        "codec_decoders": {"h264": "h264_cuvid"},
+                        "frame_fps": 1,
+                        "frame_width": 640,
+                        "jpeg_quality": 4,
+                        "timeout_seconds_per_video": 30,
+                        "timezone": "Asia/Shanghai",
+                    },
+                )
+
+            self.assertEqual(records[0]["beijing_time"], "2026-06-15 07:00:00")
+            self.assertEqual(records[1]["beijing_time"], "2026-06-15 07:00:01")
+
+    def test_sample_video_frames_caps_output_frames_to_requested_duration(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            video_id = "video-abc"
+            frame_dir = root / "frames" / video_id
+            captured_command = []
+
+            def run_success(command, *args, **kwargs):
+                captured_command.extend(command)
+                frame_dir.mkdir(parents=True, exist_ok=True)
+                (frame_dir / "000001.jpg").write_bytes(b"jpg")
+                return subprocess.CompletedProcess(
+                    args=command,
+                    returncode=0,
+                    stdout="",
+                    stderr="",
+                )
+
+            with patch("subprocess.run", side_effect=run_success):
+                sample_video_frames(
+                    {
+                        "video_id": video_id,
+                        "path": str(root / "input.mp4"),
+                        "codec_name": "hevc",
+                        "requested_begin": 1000,
+                        "requested_end": 1600,
+                    },
+                    root,
+                    {
+                        "prefer_nvdec": True,
+                        "allow_cpu_fallback": False,
+                        "hwaccel": "cuda",
+                        "codec_decoders": {"hevc": "hevc_cuvid"},
+                        "frame_fps": 1,
+                        "frame_width": 640,
+                        "jpeg_quality": 4,
+                        "timeout_seconds_per_video": 30,
+                    },
+                )
+
+            self.assertIn("-frames:v", captured_command)
+            frames_flag_index = captured_command.index("-frames:v")
+            self.assertEqual(captured_command[frames_flag_index + 1], "601")
+
+    def test_sample_video_frames_limits_decode_window_to_requested_duration(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            video_id = "video-abc"
+            frame_dir = root / "frames" / video_id
+            captured_command = []
+
+            def run_success(command, *args, **kwargs):
+                captured_command.extend(command)
+                frame_dir.mkdir(parents=True, exist_ok=True)
+                (frame_dir / "000001.jpg").write_bytes(b"jpg")
+                return subprocess.CompletedProcess(
+                    args=command,
+                    returncode=0,
+                    stdout="",
+                    stderr="",
+                )
+
+            with patch("subprocess.run", side_effect=run_success):
+                sample_video_frames(
+                    {
+                        "video_id": video_id,
+                        "path": str(root / "input.mp4"),
+                        "codec_name": "hevc",
+                        "requested_begin": 1000,
+                        "requested_end": 1600,
+                        "duration_seconds": 104259.921,
+                    },
+                    root,
+                    {
+                        "prefer_nvdec": True,
+                        "allow_cpu_fallback": False,
+                        "hwaccel": "cuda",
+                        "codec_decoders": {"hevc": "hevc_cuvid"},
+                        "frame_fps": 1,
+                        "frame_width": 640,
+                        "jpeg_quality": 4,
+                        "timeout_seconds_per_video": 30,
+                    },
+                )
+
+            self.assertIn("-t", captured_command)
+            input_index = captured_command.index("-i")
+            t_flag_index = captured_command.index("-t")
+            vf_index = captured_command.index("-vf")
+            self.assertLess(input_index, t_flag_index)
+            self.assertLess(t_flag_index, vf_index)
+            self.assertEqual(captured_command[t_flag_index + 1], "600")
+
+    def test_sample_video_frames_uses_complete_frames_when_ffmpeg_exits_nonzero(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            video_id = "video-abc"
+            frame_dir = root / "frames" / video_id
+            manifest_path = root / "frame_manifest.jsonl"
+
+            def run_with_nonzero_exit(command, *args, **kwargs):
+                frame_dir.mkdir(parents=True, exist_ok=True)
+                for index in range(1, 602):
+                    (frame_dir / f"{index:06d}.jpg").write_bytes(b"jpg")
+                raise subprocess.CalledProcessError(
+                    returncode=1,
+                    cmd=command,
+                    stderr="trailing decoder error after requested frames",
+                )
+
+            with patch("subprocess.run", side_effect=run_with_nonzero_exit):
+                records = sample_video_frames(
+                    {
+                        "video_id": video_id,
+                        "path": str(root / "input.mp4"),
+                        "codec_name": "hevc",
+                        "requested_begin": 1000,
+                        "requested_end": 1600,
+                    },
+                    root,
+                    {
+                        "prefer_nvdec": True,
+                        "allow_cpu_fallback": False,
+                        "hwaccel": "cuda",
+                        "codec_decoders": {"hevc": "hevc_cuvid"},
+                        "frame_fps": 1,
+                        "frame_width": 640,
+                        "jpeg_quality": 4,
+                        "timeout_seconds_per_video": 30,
+                    },
+                    manifest_path=manifest_path,
+                )
+
+            self.assertEqual(len(records), 601)
+            self.assertEqual({record["status"] for record in records}, {"sampled"})
+            self.assertIn("-t", records[0]["ffmpeg_command"])
+            self.assertIn("trailing decoder error", records[0]["stderr_summary"])
+            persisted = [
+                json.loads(line)
+                for line in manifest_path.read_text(encoding="utf-8").splitlines()
+            ]
+            self.assertEqual(persisted, records)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_frames.py
+++ b/tests/test_frames.py
@@ -0,0 +1,61 @@
+import tempfile
+import unittest
+from pathlib import Path
+
+from video_ai_analysis_poc.frames import build_frame_records, seconds_to_timecode
+
+
+class FrameTests(unittest.TestCase):
+    def test_seconds_to_timecode_formats_relative_offsets(self):
+        self.assertEqual(seconds_to_timecode(0), "00:00:00")
+        self.assertEqual(seconds_to_timecode(65.2), "00:01:05")
+        self.assertEqual(seconds_to_timecode(3661), "01:01:01")
+
+    def test_build_frame_records_uses_stable_paths_and_offsets(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            frame_dir = Path(tmp) / "frames" / "video-abc"
+            frame_dir.mkdir(parents=True)
+            first = frame_dir / "000001.jpg"
+            second = frame_dir / "000002.jpg"
+            first.write_bytes(b"jpg")
+            second.write_bytes(b"jpg")
+
+            records = build_frame_records(
+                "video-abc",
+                Path(tmp),
+                [first, second],
+                frame_fps=1,
+            )
+
+        self.assertEqual(records[0]["frame_id"], "video-abc_f000001")
+        self.assertEqual(records[0]["frame_path"], "frames/video-abc/000001.jpg")
+        self.assertEqual(records[0]["offset_seconds"], 0.0)
+        self.assertEqual(records[0]["timecode"], "00:00:00")
+        self.assertEqual(records[0]["pts_time"], 0.0)
+        self.assertEqual(records[0]["status"], "sampled")
+        self.assertEqual(records[1]["offset_seconds"], 1.0)
+
+    def test_build_frame_records_adds_beijing_time_from_timeline_epoch(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            frame_dir = Path(tmp) / "frames" / "video-abc"
+            frame_dir.mkdir(parents=True)
+            first = frame_dir / "000001.jpg"
+            second = frame_dir / "000002.jpg"
+            first.write_bytes(b"jpg")
+            second.write_bytes(b"jpg")
+
+            records = build_frame_records(
+                "video-abc",
+                Path(tmp),
+                [first, second],
+                frame_fps=1,
+                timeline_start_epoch=1781478000,
+                timezone_name="Asia/Shanghai",
+            )
+
+        self.assertEqual(records[0]["beijing_time"], "2026-06-15 07:00:00")
+        self.assertEqual(records[1]["beijing_time"], "2026-06-15 07:00:01")
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_hik_cloud.py
+++ b/tests/test_hik_cloud.py
@@ -0,0 +1,554 @@
+import os
+import tempfile
+import unittest
+from datetime import datetime
+from pathlib import Path
+from unittest.mock import patch
+from zoneinfo import ZoneInfo
+
+from video_ai_analysis_poc import hik_cloud
+from video_ai_analysis_poc.hik_cloud import (
+    build_download_chunks,
+    request_download_address,
+    resolve_access_token,
+)
+from video_ai_analysis_poc.manifest import read_jsonl, write_manifest
+
+
+class HikCloudTests(unittest.TestCase):
+    def test_build_download_chunks_defaults_to_600_second_chunks(self):
+        config = {
+            "runtime": {"timezone": "Asia/Shanghai"},
+            "hik_cloud": {
+                "devices": [
+                    {
+                        "device_serial": "EXAMPLE_DEVICE_SERIAL",
+                        "channel_no": 1,
+                        "name": "front",
+                    }
+                ],
+                "time_ranges": [
+                    {
+                        "begin": "2026-02-03 09:00:00",
+                        "end": "2026-02-03 10:30:00",
+                    }
+                ],
+            },
+        }
+
+        chunks = build_download_chunks(config)
+
+        requested_begin = int(
+            datetime(2026, 2, 3, 9, 0, 0, tzinfo=ZoneInfo("Asia/Shanghai")).timestamp()
+        )
+        requested_end = int(
+            datetime(2026, 2, 3, 10, 30, 0, tzinfo=ZoneInfo("Asia/Shanghai")).timestamp()
+        )
+        self.assertEqual(len(chunks), 9)
+        self.assertEqual(chunks[0]["time_begin"], requested_begin)
+        self.assertEqual(chunks[0]["time_end"], requested_begin + 600)
+        self.assertEqual(chunks[-1]["time_begin"], requested_begin + 4800)
+        self.assertEqual(chunks[-1]["time_end"], requested_end)
+        for chunk in chunks:
+            self.assertLessEqual(chunk["time_end"] - chunk["time_begin"], 600)
+
+    def test_build_download_chunks_allows_explicit_3600_second_chunks(self):
+        config = {
+            "runtime": {"timezone": "Asia/Shanghai"},
+            "hik_cloud": {
+                "chunk_seconds": 3600,
+                "devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
+                "time_ranges": [
+                    {
+                        "begin": "2026-02-03 09:00:00",
+                        "end": "2026-02-03 10:30:00",
+                    }
+                ],
+            },
+        }
+
+        chunks = build_download_chunks(config)
+
+        requested_begin = int(
+            datetime(2026, 2, 3, 9, 0, 0, tzinfo=ZoneInfo("Asia/Shanghai")).timestamp()
+        )
+        requested_end = int(
+            datetime(2026, 2, 3, 10, 30, 0, tzinfo=ZoneInfo("Asia/Shanghai")).timestamp()
+        )
+        self.assertEqual(len(chunks), 2)
+        self.assertEqual(chunks[0]["time_begin"], requested_begin)
+        self.assertEqual(chunks[0]["time_end"], requested_begin + 3600)
+        self.assertEqual(chunks[1]["time_begin"], requested_begin + 3600)
+        self.assertEqual(chunks[1]["time_end"], requested_end)
+        for chunk in chunks:
+            self.assertLessEqual(chunk["time_end"] - chunk["time_begin"], 3600)
+
+    def test_build_download_chunks_accepts_epoch_time_ranges(self):
+        config = {
+            "hik_cloud": {
+                "devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
+                "time_ranges": [{"begin": 1770080400, "end": 1770084000.0}],
+            }
+        }
+
+        chunks = build_download_chunks(config)
+
+        self.assertEqual(len(chunks), 6)
+        self.assertEqual(chunks[0]["time_begin"], 1770080400)
+        self.assertEqual(chunks[0]["time_end"], 1770081000)
+        self.assertEqual(chunks[-1]["time_begin"], 1770083400)
+        self.assertEqual(chunks[-1]["time_end"], 1770084000)
+
+    def test_build_download_chunks_rejects_end_before_begin(self):
+        config = {
+            "hik_cloud": {
+                "devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
+                "time_ranges": [
+                    {
+                        "begin": "2026-02-03 10:30:00",
+                        "end": "2026-02-03 09:00:00",
+                    }
+                ],
+            },
+        }
+
+        with self.assertRaisesRegex(ValueError, "end must be after begin"):
+            build_download_chunks(config)
+
+    def test_build_download_chunks_rejects_chunk_seconds_over_3600(self):
+        config = {
+            "hik_cloud": {
+                "chunk_seconds": 7200,
+                "devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
+                "time_ranges": [
+                    {
+                        "begin": "2026-02-03 09:00:00",
+                        "end": "2026-02-03 11:30:00",
+                    }
+                ],
+            },
+        }
+
+        with self.assertRaisesRegex(
+            ValueError, "chunk_seconds must be less than or equal to 3600"
+        ):
+            build_download_chunks(config)
+
+    def test_resolve_access_token_prefers_literal_token_over_environment(self):
+        config = {
+            "hik_cloud": {
+                "access_token": "DIRECT_TOKEN",
+                "access_token_env": "HIK_CLOUD_ACCESS_TOKEN",
+            }
+        }
+
+        with patch.dict(os.environ, {"HIK_CLOUD_ACCESS_TOKEN": "ENV_TOKEN"}):
+            token = resolve_access_token(config)
+
+        self.assertEqual(token, "DIRECT_TOKEN")
+
+    def test_resolve_access_token_reads_configured_environment_variable(self):
+        hik_config = {"access_token_env": "HIK_CLOUD_ACCESS_TOKEN"}
+
+        with patch.dict(os.environ, {"HIK_CLOUD_ACCESS_TOKEN": "ENV_TOKEN"}):
+            token = resolve_access_token(hik_config)
+
+        self.assertEqual(token, "ENV_TOKEN")
+
+    def test_resolve_access_token_raises_without_leaking_secret_values(self):
+        hik_config = {"access_token_env": "HIK_CLOUD_ACCESS_TOKEN"}
+
+        with patch.dict(os.environ, {}, clear=True):
+            with self.assertRaises(ValueError) as raised:
+                resolve_access_token(hik_config)
+
+        message = str(raised.exception)
+        self.assertIn("access_token", message)
+        self.assertNotIn("TOKEN", message)
+
+    def test_request_download_address_posts_expected_request_and_returns_success(self):
+        chunk = {
+            "device_serial": "EXAMPLE_DEVICE_SERIAL",
+            "channel_no": 1,
+            "requested_begin": 1764856787,
+            "requested_end": 1764856978,
+            "time_begin": 1764856787,
+            "time_end": 1764856978,
+        }
+        hik_config = {
+            "api_base_url": "https://api2.hik-cloud.com/",
+            "download_path": "/v1/carrier/cstorage/open/play/download",
+            "access_token": "TOKEN",
+            "timeout_seconds": 12,
+        }
+        calls = []
+
+        def fake_http_post(url, json_body, headers, timeout_seconds):
+            calls.append(
+                {
+                    "url": url,
+                    "json_body": json_body,
+                    "headers": headers,
+                    "timeout_seconds": timeout_seconds,
+                }
+            )
+            return {
+                "code": 0,
+                "success": True,
+                "data": {
+                    "url": "https://download.example/video.mp4?sig=abc",
+                    "actualBeginTime": "1764856787",
+                    "actualEndTime": "1764856978",
+                },
+            }
+
+        result = request_download_address(chunk, hik_config, http_post=fake_http_post)
+
+        self.assertEqual(len(calls), 1)
+        self.assertEqual(
+            calls[0]["url"],
+            "https://api2.hik-cloud.com/v1/carrier/cstorage/open/play/download",
+        )
+        self.assertEqual(calls[0]["headers"]["Authorization"], "bearer TOKEN")
+        self.assertEqual(calls[0]["headers"]["Content-Type"], "application/json")
+        self.assertEqual(
+            calls[0]["json_body"],
+            {
+                "deviceSerial": "EXAMPLE_DEVICE_SERIAL",
+                "channelNo": 1,
+                "timeBegin": 1764856787,
+                "timeEnd": 1764856978,
+            },
+        )
+        self.assertEqual(calls[0]["timeout_seconds"], 12)
+        self.assertEqual(result["status"], "address_ok")
+        self.assertEqual(result["url"], "https://download.example/video.mp4?sig=abc")
+        self.assertEqual(result["actual_begin"], 1764856787)
+        self.assertEqual(result["actual_end"], 1764856978)
+        self.assertEqual(result["device_serial"], "EXAMPLE_DEVICE_SERIAL")
+        self.assertEqual(result["channel_no"], 1)
+        self.assertEqual(result["requested_begin"], 1764856787)
+        self.assertEqual(result["requested_end"], 1764856978)
+
+    def test_request_download_address_returns_no_recording_for_known_empty_code(self):
+        chunk = {
+            "device_serial": "EXAMPLE_DEVICE_SERIAL",
+            "channel_no": 1,
+            "requested_begin": 1764856787,
+            "requested_end": 1764856978,
+            "time_begin": 1764856787,
+            "time_end": 1764856978,
+        }
+        hik_config = {
+            "api_base_url": "https://api2.hik-cloud.com",
+            "download_path": "/v1/carrier/cstorage/open/play/download",
+            "access_token": "TOKEN",
+        }
+
+        def fake_http_post(url, json_body, headers, timeout_seconds):
+            return {"code": 80438027, "msg": "no recording"}
+
+        result = request_download_address(chunk, hik_config, http_post=fake_http_post)
+
+        self.assertEqual(result["status"], "no_recording")
+        self.assertEqual(result["code"], 80438027)
+        self.assertEqual(result["device_serial"], "EXAMPLE_DEVICE_SERIAL")
+        self.assertNotIn("url", result)
+
+    def test_request_download_address_returns_sanitized_failure_for_other_codes(self):
+        chunk = {
+            "device_serial": "EXAMPLE_DEVICE_SERIAL",
+            "channel_no": 1,
+            "requested_begin": 1764856787,
+            "requested_end": 1764856978,
+            "time_begin": 1764856787,
+            "time_end": 1764856978,
+        }
+        hik_config = {
+            "api_base_url": "https://api2.hik-cloud.com",
+            "download_path": "/v1/carrier/cstorage/open/play/download",
+            "access_token": "TOKEN",
+        }
+
+        def fake_http_post(url, json_body, headers, timeout_seconds):
+            return {"code": 80430002, "msg": "bad TOKEN Authorization request"}
+
+        result = request_download_address(chunk, hik_config, http_post=fake_http_post)
+
+        self.assertEqual(result["status"], "address_failed")
+        self.assertEqual(result["code"], 80430002)
+        self.assertIn("last_error", result)
+        self.assertNotIn("TOKEN", str(result))
+        self.assertNotIn("Authorization", str(result))
+
+    def test_download_hik_cloud_recordings_writes_file_records_and_manifest(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp)
+            config = _download_config()
+            address_calls = []
+            download_calls = []
+
+            def fake_address_client(chunk, hik_config):
+                address_calls.append((chunk, hik_config))
+                return {
+                    **chunk,
+                    "status": "address_ok",
+                    "url": (
+                        "https://download.example/video.mp4?"
+                        "sign=SECRET&sig=SECRET&TOKEN=SECRET"
+                    ),
+                    "actual_begin": chunk["time_begin"] + 1,
+                    "actual_end": chunk["time_end"] - 1,
+                }
+
+            def fake_download_url(url, timeout_seconds=None):
+                download_calls.append((url, timeout_seconds))
+                return b"fake mp4 bytes"
+
+            records = hik_cloud.download_hik_cloud_recordings(
+                config,
+                output_dir,
+                address_client=fake_address_client,
+                download_url=fake_download_url,
+            )
+
+            self.assertEqual(len(address_calls), 1)
+            self.assertEqual(len(download_calls), 1)
+            self.assertEqual(download_calls[0][1], 600)
+            expected_path = (
+                output_dir
+                / "downloads"
+                / "hik_cloud"
+                / "EXAMPLE_DEVICE_SERIAL"
+                / "ch1"
+                / "EXAMPLE_DEVICE_SERIAL_ch1_1764856787_1764856978.mp4"
+            ).resolve(strict=False)
+            self.assertEqual(expected_path.read_bytes(), b"fake mp4 bytes")
+            self.assertEqual(len(records), 1)
+            self.assertEqual(records[0]["path"], str(expected_path))
+            self.assertEqual(records[0]["source"], "hik_cloud")
+            self.assertEqual(records[0]["source_path"], "hik_cloud://EXAMPLE_DEVICE_SERIAL/ch1/1764856787-1764856978")
+            self.assertEqual(records[0]["device_serial"], "EXAMPLE_DEVICE_SERIAL")
+            self.assertEqual(records[0]["channel_no"], 1)
+            self.assertEqual(records[0]["requested_begin"], 1764856787)
+            self.assertEqual(records[0]["requested_end"], 1764856978)
+            self.assertEqual(records[0]["actual_begin"], 1764856788)
+            self.assertEqual(records[0]["actual_end"], 1764856977)
+            self.assertEqual(records[0]["status"], "downloaded")
+
+            manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
+            self.assertEqual(len(manifest), 1)
+            self.assertEqual(manifest[0]["status"], "downloaded")
+            self.assertIsNone(manifest[0]["last_error"])
+            self.assertEqual(manifest[0]["download_url_host"], "download.example")
+            self.assertEqual(manifest[0]["path"], str(expected_path))
+            serialized_path = expected_path.name
+            serialized_manifest = str(manifest)
+            self.assertNotIn("sign=", serialized_path)
+            self.assertNotIn("sig=", serialized_path)
+            self.assertNotIn("TOKEN", serialized_path)
+            self.assertNotIn("sign=", serialized_manifest)
+            self.assertNotIn("sig=", serialized_manifest)
+            self.assertNotIn("TOKEN", serialized_manifest)
+
+    def test_download_hik_cloud_recordings_can_plan_without_downloading(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp)
+            config = _download_config()
+            download_calls = []
+
+            def fake_address_client(chunk, hik_config):
+                return {
+                    **chunk,
+                    "status": "address_ok",
+                    "url": (
+                        "https://download.example/video.mp4?"
+                        "sign=SECRET&sig=SECRET&TOKEN=SECRET"
+                    ),
+                    "actual_begin": chunk["time_begin"],
+                    "actual_end": chunk["time_end"],
+                }
+
+            def fake_download_url(url, timeout_seconds=None):
+                download_calls.append(url)
+                return b"unexpected"
+
+            records = hik_cloud.download_hik_cloud_recordings(
+                config,
+                output_dir,
+                address_client=fake_address_client,
+                download_url=fake_download_url,
+                download=False,
+            )
+
+            self.assertEqual(records, [])
+            self.assertEqual(download_calls, [])
+            manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
+            self.assertEqual(len(manifest), 1)
+            self.assertEqual(manifest[0]["status"], "address_ok")
+            self.assertIsNone(manifest[0]["path"])
+            self.assertEqual(manifest[0]["download_url_host"], "download.example")
+            self.assertNotIn("sign=", str(manifest))
+            self.assertNotIn("sig=", str(manifest))
+            self.assertNotIn("TOKEN", str(manifest))
+
+    def test_download_hik_cloud_recordings_records_empty_and_address_failures(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp)
+            config = _download_config(
+                time_ranges=[
+                    {"begin": 1764856787, "end": 1764856978},
+                    {"begin": 1764857000, "end": 1764857100},
+                ]
+            )
+            statuses = ["no_recording", "address_failed"]
+            download_calls = []
+
+            def fake_address_client(chunk, hik_config):
+                status = statuses.pop(0)
+                return {
+                    **chunk,
+                    "status": status,
+                    "actual_begin": None,
+                    "actual_end": None,
+                    "last_error": None if status == "no_recording" else "api failed",
+                }
+
+            def fake_download_url(url, timeout_seconds=None):
+                download_calls.append(url)
+                return b"unexpected"
+
+            records = hik_cloud.download_hik_cloud_recordings(
+                config,
+                output_dir,
+                address_client=fake_address_client,
+                download_url=fake_download_url,
+            )
+
+            self.assertEqual(records, [])
+            self.assertEqual(download_calls, [])
+            manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
+            self.assertEqual([record["status"] for record in manifest], ["no_recording", "address_failed"])
+
+    def test_download_hik_cloud_recordings_records_download_failure_and_continues(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp)
+            config = _download_config(
+                time_ranges=[
+                    {"begin": 1764856787, "end": 1764856978},
+                    {"begin": 1764857000, "end": 1764857100},
+                ]
+            )
+            download_calls = []
+
+            def fake_address_client(chunk, hik_config):
+                return {
+                    **chunk,
+                    "status": "address_ok",
+                    "url": (
+                        "https://download.example/video.mp4?"
+                        "sign=SECRET&sig=SECRET&TOKEN=SECRET"
+                    ),
+                    "actual_begin": chunk["time_begin"],
+                    "actual_end": chunk["time_end"],
+                }
+
+            def fake_download_url(url, timeout_seconds=None):
+                download_calls.append(url)
+                if len(download_calls) == 1:
+                    raise RuntimeError(
+                        "download failed for query sign=SECRET&sig=SECRET&TOKEN=SECRET"
+                    )
+                return b"second chunk"
+
+            records = hik_cloud.download_hik_cloud_recordings(
+                config,
+                output_dir,
+                address_client=fake_address_client,
+                download_url=fake_download_url,
+            )
+
+            self.assertEqual(len(download_calls), 2)
+            self.assertEqual(len(records), 1)
+            self.assertEqual(records[0]["status"], "downloaded")
+            manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
+            self.assertEqual([record["status"] for record in manifest], ["download_failed", "downloaded"])
+            self.assertIn("last_error", manifest[0])
+            self.assertNotIn("sign=", str(manifest))
+            self.assertNotIn("sig=", str(manifest))
+            self.assertNotIn("TOKEN", str(manifest))
+            self.assertNotIn("SECRET", str(manifest))
+
+    def test_download_hik_cloud_recordings_resume_skips_existing_downloaded_file(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp)
+            config = _download_config(resume=True)
+            downloaded_path = (
+                output_dir
+                / "downloads"
+                / "hik_cloud"
+                / "EXAMPLE_DEVICE_SERIAL"
+                / "ch1"
+                / "EXAMPLE_DEVICE_SERIAL_ch1_1764856787_1764856978.mp4"
+            )
+            downloaded_path.parent.mkdir(parents=True, exist_ok=True)
+            downloaded_path.write_bytes(b"existing")
+            existing_record = {
+                "source": "hik_cloud",
+                "path": str(downloaded_path),
+                "device_serial": "EXAMPLE_DEVICE_SERIAL",
+                "channel_no": 1,
+                "requested_begin": 1764856787,
+                "requested_end": 1764856978,
+                "actual_begin": 1764856787,
+                "actual_end": 1764856978,
+                "status": "downloaded",
+                "retry_count": 0,
+                "last_error": None,
+            }
+            write_manifest(
+                output_dir / "hik_cloud_download_manifest.jsonl",
+                [existing_record],
+            )
+
+            def failing_address_client(chunk, hik_config):
+                raise AssertionError("resume should skip address lookup")
+
+            def failing_download_url(url, timeout_seconds=None):
+                raise AssertionError("resume should skip download")
+
+            records = hik_cloud.download_hik_cloud_recordings(
+                config,
+                output_dir,
+                address_client=failing_address_client,
+                download_url=failing_download_url,
+            )
+
+            expected_video_record = {
+                **existing_record,
+                "source_path": "hik_cloud://EXAMPLE_DEVICE_SERIAL/ch1/1764856787-1764856978",
+            }
+            self.assertEqual(records, [expected_video_record])
+            manifest = read_jsonl(output_dir / "hik_cloud_download_manifest.jsonl")
+            self.assertEqual(manifest, [existing_record])
+
+
+def _download_config(
+    *,
+    time_ranges=None,
+    resume: bool = False,
+):
+    return {
+        "output": {"resume": resume},
+        "hik_cloud": {
+            "access_token": "TOKEN",
+            "download_timeout_seconds": 600,
+            "devices": [{"device_serial": "EXAMPLE_DEVICE_SERIAL", "channel_no": 1}],
+            "time_ranges": time_ranges
+            or [{"begin": 1764856787, "end": 1764856978}],
+        },
+    }
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_manifest.py
+++ b/tests/test_manifest.py
@@ -0,0 +1,30 @@
+import json
+import tempfile
+import unittest
+from pathlib import Path
+
+from video_ai_analysis_poc.manifest import read_jsonl, write_manifest
+
+
+class ManifestTests(unittest.TestCase):
+    def test_write_manifest_writes_status_retry_and_error_fields(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            path = Path(tmp) / "video_manifest.jsonl"
+            records = [
+                {"path": "/tmp/a.mp4", "status": "probed"},
+                {"path": "/tmp/b.mp4", "status": "probe_failed", "last_error": "bad data"},
+            ]
+
+            write_manifest(path, records)
+
+            lines = path.read_text(encoding="utf-8").splitlines()
+            decoded = [json.loads(line) for line in lines]
+            self.assertEqual(decoded[0]["retry_count"], 0)
+            self.assertIsNone(decoded[0]["last_error"])
+            self.assertEqual(decoded[1]["status"], "probe_failed")
+            self.assertEqual(decoded[1]["last_error"], "bad data")
+            self.assertEqual(read_jsonl(path), decoded)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_probe.py
+++ b/tests/test_probe.py
@@ -0,0 +1,51 @@
+import subprocess
+import unittest
+from pathlib import Path
+from unittest.mock import patch
+
+from video_ai_analysis_poc.probe import probe_video
+
+
+class ProbeTests(unittest.TestCase):
+    def test_probe_video_returns_structured_metadata(self):
+        payload = (
+            '{"streams":[{"codec_type":"video","codec_name":"h264",'
+            '"width":1920,"height":1080,"avg_frame_rate":"30000/1001"}],'
+            '"format":{"duration":"12.5","format_name":"mov,mp4,m4a,3gp,3g2,mj2",'
+            '"start_time":"0.000000"}}'
+        )
+        completed = subprocess.CompletedProcess(
+            args=["ffprobe"],
+            returncode=0,
+            stdout=payload,
+            stderr="",
+        )
+
+        with patch("subprocess.run", return_value=completed):
+            result = probe_video(Path("/tmp/video.mp4"), timeout_seconds=3)
+
+        self.assertEqual(result["status"], "probed")
+        self.assertEqual(result["codec_name"], "h264")
+        self.assertEqual(result["width"], 1920)
+        self.assertEqual(result["height"], 1080)
+        self.assertAlmostEqual(result["fps"], 29.97002997)
+        self.assertEqual(result["duration_seconds"], 12.5)
+        self.assertIsNone(result["last_error"])
+
+    def test_probe_video_returns_structured_failure(self):
+        failure = subprocess.CalledProcessError(
+            returncode=1,
+            cmd=["ffprobe"],
+            stderr="Invalid data found when processing input",
+        )
+
+        with patch("subprocess.run", side_effect=failure):
+            result = probe_video(Path("/tmp/bad.mp4"), timeout_seconds=3)
+
+        self.assertEqual(result["status"], "probe_failed")
+        self.assertEqual(result["retry_count"], 0)
+        self.assertIn("Invalid data", result["last_error"])
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_result_parser.py
+++ b/tests/test_result_parser.py
@@ -0,0 +1,135 @@
+import unittest
+
+from video_ai_analysis_poc.result_parser import build_clip_result, extract_json_payload
+
+
+class ResultParserTests(unittest.TestCase):
+    def test_extract_json_payload_handles_markdown_and_prose(self):
+        payload = extract_json_payload(
+            "analysis follows\n```json\n{\"screen_time\":\"12:31:20\",\"events\":[]}\n```"
+        )
+
+        self.assertEqual(payload, {"screen_time": "12:31:20", "events": []})
+
+    def test_build_clip_result_preserves_timeline_screen_time_and_events(self):
+        clip_record = {
+            "video_id": "video-abc",
+            "clip_id": "video-abc_c000001",
+            "clip_start_seconds": 120.0,
+            "clip_end_seconds": 130.0,
+            "clip_start_timecode": "00:02:00",
+            "clip_end_timecode": "00:02:10",
+            "clip_start_beijing_time": "2026-06-15 07:02:00",
+            "clip_end_beijing_time": "2026-06-15 07:02:10",
+            "frame_times": [
+                {
+                    "frame_path": "frames/video-abc/000120.jpg",
+                    "offset_seconds": 120.0,
+                    "timecode": "00:02:00",
+                    "beijing_time": "2026-06-15 07:02:00",
+                }
+            ],
+        }
+        raw_response = (
+            "Here is the result: "
+            "{\"画面时间\":\"2026-06-14 12:31:20\","
+            "\"events\":[{\"event_type\":\"queue_detected\",\"confidence\":0.86}]}"
+        )
+
+        result = build_clip_result(
+            raw_response,
+            clip_record,
+            {"path": "/videos/a.mp4"},
+            {
+                "schema": {"version": "local-batch-v1"},
+                "runtime": {"timezone": "Asia/Shanghai"},
+            },
+            processing={"latency_ms": 1800},
+        )
+
+        self.assertEqual(result["schema_version"], "local-batch-v1")
+        self.assertEqual(result["video_id"], "video-abc")
+        self.assertEqual(result["video_path"], "/videos/a.mp4")
+        self.assertEqual(result["clip_id"], "video-abc_c000001")
+        self.assertEqual(result["status"], "ok")
+        self.assertEqual(result["monitoring_timeline"]["timezone"], "Asia/Shanghai")
+        self.assertIsNone(result["monitoring_timeline"]["video_start_time"])
+        self.assertEqual(
+            result["monitoring_timeline"]["clip_start_beijing_time"],
+            "2026-06-15 07:02:00",
+        )
+        self.assertEqual(
+            result["monitoring_timeline"]["clip_end_beijing_time"],
+            "2026-06-15 07:02:10",
+        )
+        self.assertEqual(result["monitoring_timeline"]["frame_times"], clip_record["frame_times"])
+        self.assertEqual(
+            result["monitoring_timeline"]["screen_time"],
+            "2026-06-14 12:31:20",
+        )
+        self.assertEqual(result["events"][0]["event_type"], "queue_detected")
+        self.assertEqual(result["events"][0]["start_offset_seconds"], 120.0)
+        self.assertEqual(result["events"][0]["end_offset_seconds"], 130.0)
+        self.assertEqual(result["raw_response"], raw_response)
+        self.assertEqual(result["processing"]["latency_ms"], 1800)
+        self.assertIsNone(result["error"])
+
+    def test_build_clip_result_reads_zhengxin_time_key(self):
+        result = build_clip_result(
+            (
+                '{"Action":"Action_Idle","quality_status":"qualified",'
+                '"error_type":"","安全隐患":"","人物位置":"","总结":"无",'
+                '"时间":"2026-06-14 12:31:20","employees":[],"guests":[]}'
+            ),
+            {
+                "video_id": "video-abc",
+                "clip_id": "video-abc_c000001",
+                "clip_start_seconds": 0.0,
+                "clip_end_seconds": 10.0,
+                "clip_start_timecode": "00:00:00",
+                "clip_end_timecode": "00:00:10",
+                "frame_times": [],
+            },
+            {"path": "/videos/a.mp4"},
+            {
+                "schema": {"version": "local-batch-v1"},
+                "runtime": {"timezone": "Asia/Shanghai"},
+            },
+            processing={},
+        )
+
+        self.assertEqual(result["status"], "ok")
+        self.assertEqual(
+            result["monitoring_timeline"]["screen_time"],
+            "2026-06-14 12:31:20",
+        )
+
+    def test_build_clip_result_records_parse_failure_without_crashing(self):
+        result = build_clip_result(
+            "not json",
+            {
+                "video_id": "video-abc",
+                "clip_id": "video-abc_c000001",
+                "clip_start_seconds": 0.0,
+                "clip_end_seconds": 10.0,
+                "clip_start_timecode": "00:00:00",
+                "clip_end_timecode": "00:00:10",
+                "frame_times": [],
+            },
+            {"path": "/videos/a.mp4"},
+            {
+                "schema": {"version": "local-batch-v1"},
+                "runtime": {"timezone": "Asia/Shanghai"},
+            },
+            processing={},
+        )
+
+        self.assertEqual(result["status"], "parse_failed")
+        self.assertEqual(result["events"], [])
+        self.assertEqual(result["monitoring_timeline"]["screen_time"], "")
+        self.assertEqual(result["raw_response"], "not json")
+        self.assertIn("JSON", result["error"])
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_vlm_client.py
+++ b/tests/test_vlm_client.py
@@ -0,0 +1,85 @@
+import base64
+import json
+import tempfile
+import unittest
+from pathlib import Path
+
+from video_ai_analysis_poc.vlm_client import infer_clip
+
+
+class VlmClientTests(unittest.TestCase):
+    def test_infer_clip_uses_config_prompt_url_and_data_uri_images(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            output_dir = Path(tmp)
+            frame_path = output_dir / "frames" / "video-abc" / "000001.jpg"
+            frame_path.parent.mkdir(parents=True)
+            frame_path.write_bytes(b"jpg-bytes")
+            calls = []
+
+            def http_post(url, payload, timeout_seconds):
+                calls.append((url, payload, timeout_seconds))
+                return {
+                    "status": 200,
+                    "body": {
+                        "choices": [
+                            {
+                                "message": {
+                                    "content": json.dumps(
+                                        {"screen_time": "10:00:01", "events": []}
+                                    )
+                                }
+                            }
+                        ]
+                    },
+                }
+
+            result = infer_clip(
+                {
+                    "clip_id": "video-abc_c000001",
+                    "frame_times": [
+                        {
+                            "frame_path": "frames/video-abc/000001.jpg",
+                            "offset_seconds": 0.0,
+                            "timecode": "00:00:00",
+                        }
+                    ],
+                },
+                output_dir,
+                {
+                    "api_base_url": "http://localhost:8679/",
+                    "chat_completions_path": "/v1/chat/completions",
+                    "model": "memai-zhengxin-v3-20260413",
+                    "timeout_seconds": 17,
+                    "max_tokens": 256,
+                    "temperature": 0,
+                    "image_transport": "data_uri",
+                },
+                {
+                    "system": "system prompt from config",
+                    "user": "user prompt from config",
+                },
+                http_post=http_post,
+            )
+
+        self.assertEqual(result["raw_response"], '{"screen_time": "10:00:01", "events": []}')
+        self.assertEqual(len(calls), 1)
+        url, payload, timeout_seconds = calls[0]
+        self.assertEqual(url, "http://localhost:8679/v1/chat/completions")
+        self.assertEqual(timeout_seconds, 17)
+        self.assertEqual(payload["model"], "memai-zhengxin-v3-20260413")
+        self.assertEqual(payload["messages"][0]["role"], "system")
+        self.assertEqual(payload["messages"][0]["content"], "system prompt from config")
+        user_content = payload["messages"][1]["content"]
+        self.assertEqual(user_content[0], {"type": "text", "text": "user prompt from config"})
+        self.assertEqual(user_content[1]["type"], "image_url")
+        expected_data = base64.b64encode(b"jpg-bytes").decode("ascii")
+        self.assertEqual(
+            user_content[1]["image_url"]["url"],
+            f"data:image/jpeg;base64,{expected_data}",
+        )
+        self.assertEqual(result["http_status"], 200)
+        self.assertIsInstance(result["latency_ms"], int)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/video_ai_analysis_poc/init.py
+++ b/video_ai_analysis_poc/init.py
@@ -0,0 +1,9 @@
+"""Local video batch analysis PoC."""
+
+__all__ = [
+    "config",
+    "discovery",
+    "manifest",
+    "paths",
+    "probe",
+]
--- a/video_ai_analysis_poc/aggregator.py
+++ b/video_ai_analysis_poc/aggregator.py
@@ -0,0 +1,403 @@
+from __future__ import annotations
+
+import json
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+
+from .manifest import read_jsonl
+
+
+def aggregate_outputs(
+    output_dir: str | Path,
+    config: dict[str, Any],
+) -> dict[str, Any]:
+    root = Path(output_dir).expanduser().resolve(strict=False)
+    started_at = _now_iso()
+    video_records = read_jsonl(root / "video_manifest.jsonl")
+    clip_records = read_jsonl(root / "clip_manifest.jsonl")
+    clip_results = read_jsonl(root / "clip_results.jsonl")
+
+    schema_version = str(config.get("schema", {}).get("version", "local-batch-v1"))
+    merge_gap_seconds = float(config.get("schema", {}).get("merge_gap_seconds", 30))
+    clips_by_video = _group_by_video(clip_records)
+    results_by_video = _group_by_video(clip_results)
+
+    videos_summary = []
+    folder_event_counts: dict[str, int] = {}
+    processed_video_count = 0
+    failed_video_count = 0
+
+    for video_record in video_records:
+        video_id = str(video_record.get("video_id") or "")
+        if not video_id:
+            continue
+        video_clips = clips_by_video.get(video_id, [])
+        video_results = results_by_video.get(video_id, [])
+        video_result = _build_video_result(
+            video_record,
+            video_clips,
+            video_results,
+            schema_version=schema_version,
+            merge_gap_seconds=merge_gap_seconds,
+            started_at=started_at,
+        )
+        result_path = root / "videos" / video_id / "video_result.json"
+        _write_json(result_path, video_result)
+
+        failed_clip_count = int(video_result["failed_clip_count"])
+        video_failed = video_record.get("status") != "probed" or failed_clip_count > 0
+        if video_failed:
+            failed_video_count += 1
+        else:
+            processed_video_count += 1
+        for event_type, count in video_result["event_counts"].items():
+            folder_event_counts[event_type] = folder_event_counts.get(event_type, 0) + int(count)
+        videos_summary.append(
+            {
+                "video_id": video_id,
+                "video_path": video_result["video_path"],
+                "status": "failed" if video_failed else "processed",
+                "clip_count": video_result["clip_count"],
+                "failed_clip_count": failed_clip_count,
+                "failed_clip_counts": video_result["failed_clip_counts"],
+                "event_counts": video_result["event_counts"],
+                "outputs": {"video_result_json": f"videos/{video_id}/video_result.json"},
+                "error": video_record.get("last_error"),
+            }
+        )
+
+    folder_summary = {
+        "schema_version": schema_version,
+        "input_dir": str(config.get("input", {}).get("dir")),
+        "video_count": len(video_records),
+        "processed_video_count": processed_video_count,
+        "failed_video_count": failed_video_count,
+        "event_counts": dict(sorted(folder_event_counts.items())),
+        "videos": videos_summary,
+        "processing": {
+            "started_at": started_at,
+            "finished_at": _now_iso(),
+        },
+    }
+    _write_json(root / "folder_summary.json", folder_summary)
+    return folder_summary
+
+
+def _build_video_result(
+    video_record: dict[str, Any],
+    clip_records: list[dict[str, Any]],
+    clip_results: list[dict[str, Any]],
+    *,
+    schema_version: str,
+    merge_gap_seconds: float,
+    started_at: str,
+) -> dict[str, Any]:
+    video_id = str(video_record.get("video_id"))
+    failed_clip_counts = _failed_clip_counts(clip_results)
+    merged_events = _merge_events(_event_records(clip_results), merge_gap_seconds)
+    event_counts = _event_counts(merged_events)
+    video_duration = _first_present(
+        video_record,
+        ("duration_seconds", "video_duration_seconds", "duration"),
+    )
+    video_start_time = _video_start_time(video_record, clip_results)
+    return {
+        "schema_version": schema_version,
+        "video_id": video_id,
+        "video_path": _video_path(video_record, clip_results),
+        "probe": _probe(video_record),
+        "monitoring_timeline": {
+            "video_start_time": video_start_time,
+            "video_duration_seconds": video_duration,
+        },
+        "clip_count": len(clip_records),
+        "failed_clip_count": sum(failed_clip_counts.values()),
+        "failed_clip_counts": failed_clip_counts,
+        "event_counts": event_counts,
+        "events": merged_events,
+        "outputs": {"clip_results_jsonl": "clip_results.jsonl"},
+        "processing": {
+            "started_at": started_at,
+            "finished_at": _now_iso(),
+        },
+    }
+
+
+def _event_records(clip_results: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    records = []
+    for result in clip_results:
+        if result.get("status") != "ok":
+            continue
+        timeline = result.get("monitoring_timeline") or {}
+        if not isinstance(timeline, dict):
+            timeline = {}
+        for event in result.get("events") or []:
+            if not isinstance(event, dict):
+                continue
+            event_record = _normalize_event(event, result, timeline)
+            records.append(event_record)
+    return sorted(
+        records,
+        key=lambda event: (
+            str(event.get("video_id")),
+            str(event.get("event_type")),
+            float(event.get("start_offset_seconds") or 0),
+            float(event.get("end_offset_seconds") or 0),
+        ),
+    )
+
+
+def _normalize_event(
+    event: dict[str, Any],
+    result: dict[str, Any],
+    timeline: dict[str, Any],
+) -> dict[str, Any]:
+    clip_id = str(result.get("clip_id"))
+    frame_times = [
+        dict(frame)
+        for frame in timeline.get("frame_times", [])
+        if isinstance(frame, dict)
+    ]
+    frame_paths = [
+        str(frame.get("frame_path"))
+        for frame in frame_times
+        if frame.get("frame_path") is not None
+    ]
+    start = event.get("start_offset_seconds", timeline.get("clip_start_seconds"))
+    end = event.get("end_offset_seconds", timeline.get("clip_end_seconds"))
+    screen_time = str(timeline.get("screen_time") or "")
+    normalized = {
+        "video_id": str(result.get("video_id")),
+        "event_type": str(event.get("event_type") or "unknown"),
+        "start_time": event.get("start_time"),
+        "end_time": event.get("end_time"),
+        "start_offset_seconds": _float_or_none(start),
+        "end_offset_seconds": _float_or_none(end),
+        "confidence": event.get("confidence"),
+        "severity": event.get("severity"),
+        "attributes": event.get("attributes") if isinstance(event.get("attributes"), dict) else {},
+        "screen_times": [screen_time] if screen_time else [],
+        "evidence": {
+            "clip_ids": [clip_id],
+            "frame_paths": frame_paths,
+            "frame_times": frame_times,
+            "clips": [
+                {
+                    "clip_id": clip_id,
+                    "clip_start_seconds": timeline.get("clip_start_seconds"),
+                    "clip_end_seconds": timeline.get("clip_end_seconds"),
+                    "clip_start_timecode": timeline.get("clip_start_timecode"),
+                    "clip_end_timecode": timeline.get("clip_end_timecode"),
+                    "clip_start_beijing_time": timeline.get("clip_start_beijing_time"),
+                    "clip_end_beijing_time": timeline.get("clip_end_beijing_time"),
+                    "screen_time": screen_time,
+                }
+            ],
+        },
+        "source_event_count": 1,
+    }
+    original_evidence = event.get("evidence")
+    if isinstance(original_evidence, dict):
+        original_clip_id = original_evidence.get("clip_id")
+        if original_clip_id:
+            normalized["evidence"]["clip_ids"] = _unique(
+                [*normalized["evidence"]["clip_ids"], str(original_clip_id)]
+            )
+        original_frame_paths = original_evidence.get("frame_paths")
+        if isinstance(original_frame_paths, list):
+            normalized["evidence"]["frame_paths"] = _unique(
+                [*normalized["evidence"]["frame_paths"], *map(str, original_frame_paths)]
+            )
+    return normalized
+
+
+def _merge_events(
+    events: list[dict[str, Any]],
+    merge_gap_seconds: float,
+) -> list[dict[str, Any]]:
+    merged: list[dict[str, Any]] = []
+    for event in events:
+        if not merged or not _can_merge(merged[-1], event, merge_gap_seconds):
+            merged.append(_copy_event(event))
+            continue
+        _merge_into(merged[-1], event)
+    for event in merged:
+        event.pop("video_id", None)
+    return merged
+
+
+def _can_merge(
+    previous: dict[str, Any],
+    current: dict[str, Any],
+    merge_gap_seconds: float,
+) -> bool:
+    if previous.get("video_id") != current.get("video_id"):
+        return False
+    if previous.get("event_type") != current.get("event_type"):
+        return False
+    previous_end = _float_or_none(previous.get("end_offset_seconds"))
+    current_start = _float_or_none(current.get("start_offset_seconds"))
+    if previous_end is None or current_start is None:
+        return False
+    return current_start - previous_end <= merge_gap_seconds
+
+
+def _merge_into(target: dict[str, Any], event: dict[str, Any]) -> None:
+    target["start_offset_seconds"] = _min_number(
+        target.get("start_offset_seconds"),
+        event.get("start_offset_seconds"),
+    )
+    target["end_offset_seconds"] = _max_number(
+        target.get("end_offset_seconds"),
+        event.get("end_offset_seconds"),
+    )
+    target["screen_times"] = _unique(
+        [*target.get("screen_times", []), *event.get("screen_times", [])]
+    )
+    target["source_event_count"] = int(target.get("source_event_count", 1)) + int(
+        event.get("source_event_count", 1)
+    )
+    target["evidence"]["clip_ids"] = _unique(
+        [*target["evidence"].get("clip_ids", []), *event["evidence"].get("clip_ids", [])]
+    )
+    target["evidence"]["frame_paths"] = _unique(
+        [
+            *target["evidence"].get("frame_paths", []),
+            *event["evidence"].get("frame_paths", []),
+        ]
+    )
+    target["evidence"]["frame_times"].extend(event["evidence"].get("frame_times", []))
+    target["evidence"]["clips"].extend(event["evidence"].get("clips", []))
+    if target.get("confidence") is None:
+        target["confidence"] = event.get("confidence")
+    elif event.get("confidence") is not None:
+        target["confidence"] = max(float(target["confidence"]), float(event["confidence"]))
+
+
+def _copy_event(event: dict[str, Any]) -> dict[str, Any]:
+    copied = dict(event)
+    copied["screen_times"] = list(event.get("screen_times", []))
+    copied["attributes"] = dict(event.get("attributes", {}))
+    copied["evidence"] = {
+        "clip_ids": list(event["evidence"].get("clip_ids", [])),
+        "frame_paths": list(event["evidence"].get("frame_paths", [])),
+        "frame_times": [dict(frame) for frame in event["evidence"].get("frame_times", [])],
+        "clips": [dict(clip) for clip in event["evidence"].get("clips", [])],
+    }
+    return copied
+
+
+def _group_by_video(records: list[dict[str, Any]]) -> dict[str, list[dict[str, Any]]]:
+    grouped: dict[str, list[dict[str, Any]]] = {}
+    for record in records:
+        video_id = record.get("video_id")
+        if video_id:
+            grouped.setdefault(str(video_id), []).append(record)
+    return grouped
+
+
+def _failed_clip_counts(clip_results: list[dict[str, Any]]) -> dict[str, int]:
+    counts = {"parse_failed": 0, "inference_failed": 0}
+    for result in clip_results:
+        status = result.get("status")
+        if status in counts:
+            counts[str(status)] += 1
+    return counts
+
+
+def _event_counts(events: list[dict[str, Any]]) -> dict[str, int]:
+    counts: dict[str, int] = {}
+    for event in events:
+        event_type = str(event.get("event_type") or "unknown")
+        counts[event_type] = counts.get(event_type, 0) + 1
+    return dict(sorted(counts.items()))
+
+
+def _probe(video_record: dict[str, Any]) -> dict[str, Any]:
+    excluded = {"video_id", "path", "source_path", "status", "retry_count", "last_error"}
+    probe = {
+        key: value
+        for key, value in video_record.items()
+        if key not in excluded
+    }
+    probe["status"] = video_record.get("status")
+    if video_record.get("last_error") is not None:
+        probe["last_error"] = video_record.get("last_error")
+    return probe
+
+
+def _video_path(
+    video_record: dict[str, Any],
+    clip_results: list[dict[str, Any]],
+) -> str | None:
+    path = video_record.get("path") or video_record.get("source_path")
+    if path is not None:
+        return str(path)
+    for result in clip_results:
+        if result.get("video_path") is not None:
+            return str(result["video_path"])
+    return None
+
+
+def _video_start_time(
+    video_record: dict[str, Any],
+    clip_results: list[dict[str, Any]],
+) -> Any:
+    if video_record.get("video_start_time") is not None:
+        return video_record.get("video_start_time")
+    for result in clip_results:
+        timeline = result.get("monitoring_timeline")
+        if isinstance(timeline, dict) and timeline.get("video_start_time") is not None:
+            return timeline.get("video_start_time")
+    return None
+
+
+def _first_present(record: dict[str, Any], keys: tuple[str, ...]) -> Any:
+    for key in keys:
+        if record.get(key) is not None:
+            return record.get(key)
+    return None
+
+
+def _float_or_none(value: Any) -> float | None:
+    if value is None:
+        return None
+    try:
+        return float(value)
+    except (TypeError, ValueError):
+        return None
+
+
+def _min_number(left: Any, right: Any) -> float | None:
+    values = [value for value in (_float_or_none(left), _float_or_none(right)) if value is not None]
+    return min(values) if values else None
+
+
+def _max_number(left: Any, right: Any) -> float | None:
+    values = [value for value in (_float_or_none(left), _float_or_none(right)) if value is not None]
+    return max(values) if values else None
+
+
+def _unique(values: list[Any]) -> list[Any]:
+    seen = set()
+    unique_values = []
+    for value in values:
+        marker = json.dumps(value, sort_keys=True) if isinstance(value, dict) else value
+        if marker in seen:
+            continue
+        seen.add(marker)
+        unique_values.append(value)
+    return unique_values
+
+
+def _write_json(path: Path, payload: dict[str, Any]) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(
+        json.dumps(payload, ensure_ascii=False, indent=2, sort_keys=True) + "\n",
+        encoding="utf-8",
+    )
+
+
+def _now_iso() -> str:
+    return datetime.now(timezone.utc).isoformat()
--- a/video_ai_analysis_poc/cli.py
+++ b/video_ai_analysis_poc/cli.py
@@ -0,0 +1,424 @@
+from __future__ import annotations
+
+import argparse
+import json
+from pathlib import Path
+from typing import Sequence
+
+from .aggregator import aggregate_outputs
+from .clips import build_clip_records
+from .config import DEFAULT_CONFIG_PATH, load_config
+from .discovery import discover_videos
+from .ffmpeg_sampler import sample_video_frames
+from .hik_cloud import download_hik_cloud_recordings
+from .manifest import read_jsonl, write_manifest
+from .paths import stable_video_id
+from .probe import probe_video
+from .result_parser import build_clip_result
+from .timeline import DEFAULT_TIMEZONE, format_beijing_time, timeline_start_epoch
+from .vlm_client import infer_clip
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(
+        description="Local video batch analysis PoC entrypoint."
+    )
+    parser.add_argument("--config", default=str(DEFAULT_CONFIG_PATH))
+    parser.add_argument("--input-dir")
+    parser.add_argument("--output-dir")
+    parser.add_argument("--dry-run", action="store_true")
+    parser.add_argument("--until", choices=["clips", "inference"])
+    parser.add_argument("--limit-clips", type=int)
+    args = parser.parse_args(argv)
+
+    config = load_config(
+        args.config,
+        input_dir=args.input_dir,
+        output_dir=args.output_dir,
+    )
+    if args.dry_run and args.until:
+        parser.error("--dry-run cannot be combined with --until")
+    if args.limit_clips is not None and args.limit_clips < 0:
+        parser.error("--limit-clips must be non-negative")
+
+    output_dir = Path(config["output"]["dir"])
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    video_manifest_path = output_dir / "video_manifest.jsonl"
+    resume_enabled = bool(config.get("output", {}).get("resume", False))
+    records = _load_resume_records(
+        video_manifest_path,
+        resume=resume_enabled,
+    )
+    record_indexes = {
+        _record_key(record): index
+        for index, record in enumerate(records)
+        if _record_key(record) is not None
+    }
+
+    try:
+        _acquire_source_records(
+            config,
+            output_dir,
+            records,
+            record_indexes,
+            download_source=not args.dry_run,
+        )
+    except ValueError as exc:
+        parser.error(str(exc))
+
+    write_manifest(video_manifest_path, records)
+    if args.dry_run:
+        return 0
+
+    clip_manifest_path = output_dir / "clip_manifest.jsonl"
+    existing_clip_records = read_jsonl(clip_manifest_path) if resume_enabled else []
+    existing_clip_video_ids = {
+        str(record.get("video_id"))
+        for record in existing_clip_records
+        if record.get("video_id")
+    }
+
+    frame_manifest_path = output_dir / "frame_manifest.jsonl"
+    frame_records = read_jsonl(frame_manifest_path) if resume_enabled else []
+    timezone_name = str(config.get("runtime", {}).get("timezone", DEFAULT_TIMEZONE))
+    backfilled_frame_video_ids = _backfill_frame_beijing_times(
+        frame_records,
+        records,
+        timezone_name=timezone_name,
+    )
+    existing_sampled_video_ids = {
+        str(record.get("video_id"))
+        for record in frame_records
+        if record.get("status") == "sampled" and record.get("video_id")
+    }
+    changed_frame_video_ids: set[str] = set(backfilled_frame_video_ids)
+    for record in records:
+        if record.get("status") != "probed":
+            continue
+        video_id = str(record.get("video_id"))
+        if args.until == "inference" and video_id in existing_clip_video_ids:
+            continue
+        if video_id in existing_sampled_video_ids:
+            continue
+        frame_records = _without_video_records(frame_records, video_id)
+        ffmpeg_config = dict(config["ffmpeg"])
+        ffmpeg_config["timezone"] = timezone_name
+        frame_records.extend(
+            sample_video_frames(
+                record,
+                output_dir,
+                ffmpeg_config,
+                manifest_path=None,
+            )
+        )
+        changed_frame_video_ids.add(video_id)
+    write_manifest(frame_manifest_path, frame_records)
+
+    sampled_video_ids = {
+        str(record.get("video_id"))
+        for record in frame_records
+        if record.get("status") == "sampled" and record.get("video_id")
+    }
+    clip_rebuild_video_ids = changed_frame_video_ids | (
+        sampled_video_ids - existing_clip_video_ids
+    )
+    clip_records = [
+        record
+        for record in existing_clip_records
+        if str(record.get("video_id")) not in clip_rebuild_video_ids
+    ]
+    frames_to_build = [
+        record
+        for record in frame_records
+        if str(record.get("video_id")) in clip_rebuild_video_ids
+    ]
+    clip_records.extend(build_clip_records(frames_to_build, config["clip"]))
+    write_manifest(output_dir / "clip_manifest.jsonl", clip_records)
+    if args.until == "clips":
+        return 0
+
+    _run_inference(
+        clip_records,
+        records,
+        output_dir,
+        config,
+        limit_clips=args.limit_clips,
+        resume=resume_enabled,
+    )
+    if args.until == "inference":
+        return 0
+    aggregate_outputs(output_dir, config)
+    return 0
+
+
+def _load_resume_records(path: Path, *, resume: bool) -> list[dict[str, object]]:
+    if not resume:
+        return []
+    return read_jsonl(path)
+
+
+def _record_key(record: dict[str, object]) -> str | None:
+    video_id = record.get("video_id")
+    if video_id:
+        return str(video_id)
+    path = record.get("path")
+    if path:
+        return stable_video_id(str(path))
+    return None
+
+
+def _acquire_source_records(
+    config: dict[str, object],
+    output_dir: Path,
+    records: list[dict[str, object]],
+    record_indexes: dict[str, int],
+    *,
+    download_source: bool = True,
+) -> None:
+    for source_record in _source_video_records(
+        config,
+        output_dir,
+        download_source=download_source,
+    ):
+        path = source_record.get("path")
+        if not path:
+            continue
+        video_id = stable_video_id(str(path))
+        existing_index = record_indexes.get(video_id)
+        if (
+            existing_index is not None
+            and records[existing_index].get("status") == "probed"
+        ):
+            continue
+
+        probe_record = probe_video(
+            str(path),
+            timeout_seconds=config["ffprobe"]["timeout_seconds"],
+        )
+        record = {**source_record, **probe_record, "video_id": video_id}
+        if existing_index is None:
+            record_indexes[video_id] = len(records)
+            records.append(record)
+        else:
+            records[existing_index] = record
+
+
+def _source_video_records(
+    config: dict[str, object],
+    output_dir: Path,
+    *,
+    download_source: bool = True,
+) -> list[dict[str, object]]:
+    source_config = config.get("source", {})
+    source_mode = "local"
+    if isinstance(source_config, dict):
+        source_mode = str(source_config.get("mode", "local"))
+
+    if source_mode == "local":
+        videos = discover_videos(
+            config["input"]["dir"],
+            config["input"]["extensions"],
+            recursive=config["input"]["recursive"],
+        )
+        return [{"path": path} for path in videos]
+
+    if source_mode == "hik_cloud":
+        return [
+            record
+            for record in download_hik_cloud_recordings(
+                config,
+                output_dir,
+                download=download_source,
+            )
+            if record.get("status") == "downloaded"
+        ]
+
+    raise ValueError(f"unsupported source.mode: {source_mode}")
+
+
+def _without_video_records(
+    records: list[dict[str, object]],
+    video_id: str,
+) -> list[dict[str, object]]:
+    return [record for record in records if str(record.get("video_id")) != video_id]
+
+
+def _backfill_frame_beijing_times(
+    frame_records: list[dict[str, object]],
+    video_records: list[dict[str, object]],
+    *,
+    timezone_name: str,
+) -> set[str]:
+    video_by_id = {
+        str(record.get("video_id")): record
+        for record in video_records
+        if record.get("video_id")
+    }
+    changed_video_ids: set[str] = set()
+    for frame_record in frame_records:
+        if frame_record.get("status") != "sampled" or frame_record.get("beijing_time"):
+            continue
+        video_id = str(frame_record.get("video_id") or "")
+        start_epoch = timeline_start_epoch(video_by_id.get(video_id, {}))
+        beijing_time = format_beijing_time(
+            start_epoch,
+            offset_seconds=float(frame_record.get("offset_seconds") or 0),
+            timezone_name=timezone_name,
+        )
+        if beijing_time is None:
+            continue
+        frame_record["beijing_time"] = beijing_time
+        changed_video_ids.add(video_id)
+    return changed_video_ids
+
+
+def _run_inference(
+    clip_records: list[dict[str, object]],
+    video_records: list[dict[str, object]],
+    output_dir: Path,
+    config: dict[str, object],
+    *,
+    limit_clips: int | None,
+    resume: bool,
+) -> None:
+    results_path = output_dir / "clip_results.jsonl"
+    result_records = read_jsonl(results_path) if resume else []
+    clip_by_id = {
+        str(record.get("clip_id")): record
+        for record in clip_records
+        if record.get("clip_id")
+    }
+    result_records = [
+        _refresh_result_timeline(record, clip_by_id, config)
+        for record in result_records
+    ]
+    ok_clip_ids = {
+        str(record.get("clip_id"))
+        for record in result_records
+        if record.get("status") == "ok" and record.get("clip_id")
+    }
+    video_by_id = {
+        str(record.get("video_id")): record
+        for record in video_records
+        if record.get("video_id")
+    }
+    processed = 0
+    for clip_record in clip_records:
+        clip_id = str(clip_record.get("clip_id"))
+        if clip_id in ok_clip_ids:
+            continue
+        if limit_clips is not None and processed >= limit_clips:
+            break
+
+        result_records = [
+            record for record in result_records if str(record.get("clip_id")) != clip_id
+        ]
+        video_record = video_by_id.get(str(clip_record.get("video_id")), {})
+        result = _infer_and_parse_clip(clip_record, video_record, output_dir, config)
+        result_records.append(result)
+        _write_jsonl_exact(results_path, result_records)
+        processed += 1
+
+    _write_jsonl_exact(results_path, result_records)
+
+
+def _refresh_result_timeline(
+    result_record: dict[str, object],
+    clip_by_id: dict[str, dict[str, object]],
+    config: dict[str, object],
+) -> dict[str, object]:
+    clip_record = clip_by_id.get(str(result_record.get("clip_id")))
+    if not clip_record:
+        return result_record
+    if not _clip_has_beijing_timing(clip_record):
+        return result_record
+    timeline = dict(result_record.get("monitoring_timeline") or {})
+    timeline.update(
+        {
+            "timezone": config.get("runtime", {}).get("timezone", DEFAULT_TIMEZONE),
+            "clip_start_seconds": clip_record.get("clip_start_seconds"),
+            "clip_end_seconds": clip_record.get("clip_end_seconds"),
+            "clip_start_timecode": clip_record.get("clip_start_timecode"),
+            "clip_end_timecode": clip_record.get("clip_end_timecode"),
+            "clip_start_beijing_time": clip_record.get("clip_start_beijing_time"),
+            "clip_end_beijing_time": clip_record.get("clip_end_beijing_time"),
+            "frame_times": clip_record.get("frame_times", []),
+        }
+    )
+    refreshed = dict(result_record)
+    refreshed["monitoring_timeline"] = timeline
+    return refreshed
+
+
+def _clip_has_beijing_timing(clip_record: dict[str, object]) -> bool:
+    if clip_record.get("clip_start_beijing_time") or clip_record.get("clip_end_beijing_time"):
+        return True
+    for frame in clip_record.get("frame_times", []) or []:
+        if isinstance(frame, dict) and frame.get("beijing_time"):
+            return True
+    return False
+
+
+def _infer_and_parse_clip(
+    clip_record: dict[str, object],
+    video_record: dict[str, object],
+    output_dir: Path,
+    config: dict[str, object],
+) -> dict[str, object]:
+    schema_config = config.get("schema", {})
+    parse_retry = 0
+    if isinstance(schema_config, dict):
+        parse_retry = int(schema_config.get("parse_retry", 0))
+
+    attempts = parse_retry + 1
+    result: dict[str, object] | None = None
+    for attempt in range(attempts):
+        try:
+            inference = infer_clip(
+                clip_record,
+                output_dir,
+                config["vlm"],
+                config["prompt"],
+            )
+        except Exception as exc:
+            return build_clip_result(
+                "",
+                clip_record,
+                video_record,
+                config,
+                processing={},
+                status="inference_failed",
+                error=str(exc),
+            )
+
+        result = build_clip_result(
+            str(inference.get("raw_response", "")),
+            clip_record,
+            video_record,
+            config,
+            processing={
+                "latency_ms": inference.get("latency_ms"),
+                "http_status": inference.get("http_status"),
+                "attempt": attempt + 1,
+            },
+        )
+        if result.get("status") != "parse_failed":
+            return result
+    if result is None:
+        raise RuntimeError("unreachable inference state")
+    return result
+
+
+def _write_jsonl_exact(
+    path: Path,
+    records: list[dict[str, object]],
+) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    with path.open("w", encoding="utf-8") as handle:
+        for record in records:
+            handle.write(json.dumps(record, ensure_ascii=False, sort_keys=True) + "\n")
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/video_ai_analysis_poc/clips.py
+++ b/video_ai_analysis_poc/clips.py
@@ -0,0 +1,158 @@
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+
+from .frames import seconds_to_timecode
+from .manifest import read_jsonl, write_manifest
+from .timeline import derive_time_from_reference
+
+
+def build_clip_records(
+    frame_records: list[dict[str, Any]],
+    clip_config: dict[str, Any],
+) -> list[dict[str, Any]]:
+    sampled_frames = [
+        record for record in frame_records if record.get("status") == "sampled"
+    ]
+    by_video: dict[str, list[dict[str, Any]]] = {}
+    for frame in sampled_frames:
+        by_video.setdefault(str(frame["video_id"]), []).append(frame)
+
+    clips = []
+    for video_id, frames in sorted(by_video.items()):
+        clips.extend(_build_video_clips(video_id, frames, clip_config))
+    return clips
+
+
+def build_clip_records_from_manifest(
+    frame_manifest_path: str | Path,
+    clip_manifest_path: str | Path,
+    clip_config: dict[str, Any],
+) -> list[dict[str, Any]]:
+    clips = build_clip_records(read_jsonl(frame_manifest_path), clip_config)
+    write_manifest(clip_manifest_path, clips)
+    return clips
+
+
+def _build_video_clips(
+    video_id: str,
+    frames: list[dict[str, Any]],
+    clip_config: dict[str, Any],
+) -> list[dict[str, Any]]:
+    sorted_frames = sorted(frames, key=lambda frame: float(frame["offset_seconds"]))
+    if not sorted_frames:
+        return []
+
+    length_seconds = float(clip_config.get("length_seconds", 10))
+    stride_seconds = float(clip_config.get("stride_seconds", length_seconds))
+    frames_per_clip = int(clip_config.get("frames_per_clip", 8))
+    min_frames_per_clip = int(clip_config.get("min_frames_per_clip", 4))
+    max_offset = max(float(frame["offset_seconds"]) for frame in sorted_frames)
+    timeline_end = _estimated_timeline_end(sorted_frames)
+
+    clips = []
+    clip_index = 1
+    start = 0.0
+    while start <= max_offset:
+        end = min(start + length_seconds, timeline_end)
+        in_window = [
+            frame
+            for frame in sorted_frames
+            if start <= float(frame["offset_seconds"]) < end
+        ]
+        if len(in_window) >= min_frames_per_clip:
+            selected_frames = _uniform_sample(in_window, frames_per_clip)
+            start_beijing_time, end_beijing_time = _clip_beijing_time_range(
+                in_window,
+                start,
+                end,
+            )
+            clip = {
+                "video_id": video_id,
+                "clip_id": f"{video_id}_c{clip_index:06d}",
+                "clip_start_seconds": round(start, 6),
+                "clip_end_seconds": round(end, 6),
+                "clip_start_timecode": seconds_to_timecode(start),
+                "clip_end_timecode": seconds_to_timecode(end),
+                "frame_times": [_frame_time(frame) for frame in selected_frames],
+                "status": "pending",
+                "retry_count": 0,
+                "last_error": None,
+            }
+            if start_beijing_time is not None:
+                clip["clip_start_beijing_time"] = start_beijing_time
+            if end_beijing_time is not None:
+                clip["clip_end_beijing_time"] = end_beijing_time
+            clips.append(clip)
+            clip_index += 1
+        start += stride_seconds
+    return clips
+
+
+def _estimated_timeline_end(frames: list[dict[str, Any]]) -> float:
+    offsets = [float(frame["offset_seconds"]) for frame in frames]
+    if len(offsets) < 2:
+        return offsets[-1]
+    intervals = [
+        current - previous
+        for previous, current in zip(offsets, offsets[1:])
+        if current > previous
+    ]
+    if not intervals:
+        return offsets[-1]
+    return offsets[-1] + min(intervals)
+
+
+def _uniform_sample(
+    frames: list[dict[str, Any]],
+    frames_per_clip: int,
+) -> list[dict[str, Any]]:
+    if len(frames) <= frames_per_clip:
+        return frames
+    if frames_per_clip <= 1:
+        return [frames[0]]
+    last_index = len(frames) - 1
+    indexes = [
+        round(position * last_index / (frames_per_clip - 1))
+        for position in range(frames_per_clip)
+    ]
+    return [frames[index] for index in indexes]
+
+
+def _frame_time(frame: dict[str, Any]) -> dict[str, Any]:
+    record = {
+        "frame_id": frame.get("frame_id"),
+        "frame_path": frame.get("frame_path"),
+        "offset_seconds": frame.get("offset_seconds"),
+        "timecode": frame.get("timecode"),
+        "pts_time": frame.get("pts_time"),
+    }
+    if frame.get("beijing_time") is not None:
+        record["beijing_time"] = frame.get("beijing_time")
+    return record
+
+
+def _clip_beijing_time_range(
+    frames: list[dict[str, Any]],
+    start: float,
+    end: float,
+) -> tuple[str | None, str | None]:
+    for frame in frames:
+        reference_time = frame.get("beijing_time")
+        if not reference_time:
+            continue
+        reference_offset = frame.get("offset_seconds")
+        return (
+            derive_time_from_reference(
+                str(reference_time),
+                reference_offset_seconds=reference_offset,
+                target_offset_seconds=start,
+            ),
+            derive_time_from_reference(
+                str(reference_time),
+                reference_offset_seconds=reference_offset,
+                target_offset_seconds=end,
+            ),
+        )
+    return None, None
--- a/video_ai_analysis_poc/config.py
+++ b/video_ai_analysis_poc/config.py
@@ -0,0 +1,278 @@
+from __future__ import annotations
+
+import ast
+from pathlib import Path
+from typing import Any
+
+from .paths import resolve_path, validate_output_dir
+
+
+DEFAULT_CONFIG_PATH = Path(__file__).resolve().parent.parent / "config" / "local_batch.yaml"
+
+
+def load_config(
+    config_path: str | Path = DEFAULT_CONFIG_PATH,
+    *,
+    input_dir: str | Path | None = None,
+    output_dir: str | Path | None = None,
+) -> dict[str, Any]:
+    path = Path(config_path).expanduser().resolve(strict=False)
+    raw_config = _parse_simple_yaml(path)
+    config = _with_defaults(raw_config)
+
+    base_dir = path.parent.parent if path.parent.name == "config" else path.parent
+
+    if input_dir is not None:
+        config["input"]["dir"] = str(input_dir)
+    if output_dir is not None:
+        config["output"]["dir"] = str(output_dir)
+
+    config["input"]["dir"] = str(resolve_path(config["input"]["dir"], base_dir=base_dir))
+    config["output"]["dir"] = str(
+        resolve_path(config["output"]["dir"], base_dir=base_dir)
+    )
+    validate_output_dir(config["input"]["dir"], config["output"]["dir"])
+
+    extensions = config["input"].get("extensions", [])
+    config["input"]["extensions"] = _normalize_extensions(extensions)
+    config["input"]["recursive"] = bool(config["input"].get("recursive", True))
+    config.setdefault("ffprobe", {})
+    config["ffprobe"]["timeout_seconds"] = int(
+        config["ffprobe"].get("timeout_seconds", 30)
+    )
+    return config
+
+
+def _with_defaults(config: dict[str, Any]) -> dict[str, Any]:
+    merged: dict[str, Any] = {
+        "input": {
+            "dir": "./videos",
+            "recursive": True,
+            "extensions": [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"],
+        },
+        "output": {
+            "dir": "./outputs/local-batch",
+            "overwrite": False,
+            "resume": True,
+            "keep_frames": True,
+        },
+        "source": {"mode": "local"},
+        "hik_cloud": {
+            "api_base_url": "https://api2.hik-cloud.com",
+            "download_path": "/v1/carrier/cstorage/open/play/download",
+            "access_token": None,
+            "access_token_env": "HIK_CLOUD_ACCESS_TOKEN",
+            "devices": [],
+            "time_ranges": [],
+            "chunk_seconds": 600,
+            "timeout_seconds": 60,
+            "download_timeout_seconds": 600,
+        },
+        "ffprobe": {"timeout_seconds": 30},
+        "ffmpeg": {
+            "prefer_nvdec": True,
+            "allow_cpu_fallback": False,
+            "hwaccel": "cuda",
+            "codec_decoders": {"h264": "h264_cuvid", "hevc": "hevc_cuvid"},
+            "frame_fps": 1,
+            "frame_width": 640,
+            "jpeg_quality": 4,
+            "timeout_seconds_per_video": 3600,
+        },
+        "clip": {
+            "length_seconds": 10,
+            "stride_seconds": 10,
+            "frames_per_clip": 8,
+            "min_frames_per_clip": 4,
+        },
+        "vlm": {
+            "api_base_url": "http://localhost:8679",
+            "chat_completions_path": "/v1/chat/completions",
+            "model": "memai-zhengxin-v3-20260413",
+            "timeout_seconds": 120,
+            "max_tokens": 512,
+            "temperature": 0,
+            "batch_size": 1,
+            "image_transport": "data_uri",
+            "retries": 1,
+        },
+        "prompt": {
+            "system": "You are a store video analysis assistant. Return strict JSON only.",
+            "user": "Analyze this clip. Return events and screen_time. If no event, return events: [].",
+        },
+        "schema": {
+            "version": "local-batch-v1",
+            "event_types": [
+                "customer_enter",
+                "customer_leave",
+                "queue_detected",
+                "staff_absent",
+                "staff_present",
+                "area_crowded",
+                "abnormal_behavior",
+                "unknown",
+            ],
+            "require_strict_json": True,
+            "parse_retry": 1,
+            "merge_gap_seconds": 30,
+        },
+        "runtime": {"timezone": "Asia/Shanghai", "log_level": "INFO"},
+    }
+    for section, values in config.items():
+        if isinstance(values, dict) and isinstance(merged.get(section), dict):
+            merged[section].update(values)
+        else:
+            merged[section] = values
+    return merged
+
+
+def _normalize_extensions(extensions: list[str]) -> list[str]:
+    normalized = []
+    for extension in extensions:
+        value = str(extension).lower()
+        if not value.startswith("."):
+            value = f".{value}"
+        normalized.append(value)
+    return normalized
+
+
+def _parse_simple_yaml(path: Path) -> dict[str, Any]:
+    if not path.exists():
+        raise FileNotFoundError(f"config file not found: {path}")
+
+    root: dict[str, Any] = {}
+    stack: list[tuple[int, dict[str, Any] | list[Any]]] = [(-1, root)]
+    lines = path.read_text(encoding="utf-8").splitlines()
+
+    index = 0
+    while index < len(lines):
+        raw_line = lines[index].rstrip()
+        stripped = raw_line.strip()
+        if not stripped or raw_line.lstrip().startswith("#"):
+            index += 1
+            continue
+
+        indent = len(raw_line) - len(raw_line.lstrip(" "))
+        while indent <= stack[-1][0]:
+            stack.pop()
+        parent = stack[-1][1]
+
+        if stripped.startswith("- "):
+            if not isinstance(parent, list):
+                raise ValueError(f"list item without list parent: {raw_line}")
+            item = stripped[2:].strip()
+            if ":" in item:
+                key, value = item.split(":", 1)
+                mapping: dict[str, Any] = {}
+                parent.append(mapping)
+                key = key.strip()
+                value = value.strip()
+                if not value:
+                    next_stripped = _next_stripped(lines, index)
+                    child: dict[str, Any] | list[Any]
+                    child = [] if next_stripped and next_stripped.startswith("- ") else {}
+                    mapping[key] = child
+                    stack.append((indent, mapping))
+                    stack.append((indent + 2, child))
+                else:
+                    mapping[key] = _parse_scalar(value)
+                    stack.append((indent, mapping))
+            else:
+                parent.append(_parse_scalar(item))
+            index += 1
+            continue
+
+        if not isinstance(parent, dict):
+            raise ValueError(f"mapping entry inside list is not supported: {raw_line}")
+
+        if ":" not in stripped:
+            raise ValueError(f"unsupported config line: {raw_line}")
+
+        key, value = stripped.split(":", 1)
+        key = key.strip()
+        value = value.strip()
+        if _is_block_scalar(value):
+            parent[key], index = _parse_block_scalar(lines, index, indent, value)
+            continue
+        if not value:
+            next_stripped = _next_stripped(lines, index)
+            child: dict[str, Any] | list[Any]
+            child = [] if next_stripped and next_stripped.startswith("- ") else {}
+            parent[key] = child
+            stack.append((indent, child))
+        else:
+            parent[key] = _parse_scalar(value)
+        index += 1
+
+    return root
+
+
+def _next_stripped(lines: list[str], current_index: int) -> str | None:
+    for raw_line in lines[current_index + 1 :]:
+        stripped = raw_line.strip()
+        if stripped and not raw_line.lstrip().startswith("#"):
+            return stripped
+    return None
+
+
+def _is_block_scalar(value: str) -> bool:
+    return value in {">", ">-", "|", "|-"}
+
+
+def _parse_block_scalar(
+    lines: list[str],
+    start_index: int,
+    parent_indent: int,
+    marker: str,
+) -> tuple[str, int]:
+    content_lines: list[str] = []
+    content_indent: int | None = None
+    index = start_index + 1
+
+    while index < len(lines):
+        raw_line = lines[index].rstrip()
+        stripped = raw_line.strip()
+        if not stripped:
+            content_lines.append("")
+            index += 1
+            continue
+
+        indent = len(raw_line) - len(raw_line.lstrip(" "))
+        if indent <= parent_indent:
+            break
+        if content_indent is None:
+            content_indent = indent
+        content_lines.append(raw_line[content_indent:])
+        index += 1
+
+    if marker.endswith("-"):
+        while content_lines and content_lines[-1] == "":
+            content_lines.pop()
+    return "\n".join(content_lines), index
+
+
+def _parse_scalar(value: str) -> Any:
+    lower = value.lower()
+    if lower == "true":
+        return True
+    if lower == "false":
+        return False
+    if lower in {"null", "none"}:
+        return None
+    if value.startswith("[") and value.endswith("]"):
+        parsed = ast.literal_eval(value)
+        if not isinstance(parsed, list):
+            raise ValueError(f"expected list value: {value}")
+        return parsed
+    if (value.startswith('"') and value.endswith('"')) or (
+        value.startswith("'") and value.endswith("'")
+    ):
+        return ast.literal_eval(value)
+    try:
+        return int(value)
+    except ValueError:
+        pass
+    try:
+        return float(value)
+    except ValueError:
+        return value
--- a/video_ai_analysis_poc/discovery.py
+++ b/video_ai_analysis_poc/discovery.py
@@ -0,0 +1,27 @@
+from __future__ import annotations
+
+from pathlib import Path
+
+
+def discover_videos(
+    input_dir: str | Path,
+    extensions: list[str],
+    *,
+    recursive: bool,
+) -> list[Path]:
+    root = Path(input_dir).expanduser()
+    if not root.exists():
+        raise FileNotFoundError(f"input dir not found: {root}")
+    if not root.is_dir():
+        raise NotADirectoryError(f"input path is not a directory: {root}")
+
+    allowed = {
+        extension.lower() if extension.startswith(".") else f".{extension.lower()}"
+        for extension in extensions
+    }
+    iterator = root.rglob("*") if recursive else root.iterdir()
+    return sorted(
+        path
+        for path in iterator
+        if path.is_file() and path.suffix.lower() in allowed
+    )
--- a/video_ai_analysis_poc/ffmpeg_sampler.py
+++ b/video_ai_analysis_poc/ffmpeg_sampler.py
@@ -0,0 +1,243 @@
+from __future__ import annotations
+
+import math
+import subprocess
+from pathlib import Path
+from typing import Any
+
+from .frames import build_frame_records
+from .manifest import read_jsonl, write_manifest
+from .timeline import DEFAULT_TIMEZONE, timeline_start_epoch
+
+
+NVDEC_CODECS = {"h264", "hevc"}
+
+
+def build_sample_command(
+    video_path: str | Path,
+    output_dir: str | Path,
+    video_id: str,
+    ffmpeg_config: dict[str, Any],
+    *,
+    codec_name: str | None,
+    max_frames: int | None = None,
+    max_duration_seconds: float | None = None,
+) -> list[str]:
+    frame_dir = Path(output_dir).expanduser() / "frames" / video_id
+    frame_pattern = frame_dir / "%06d.jpg"
+    command = ["ffmpeg", "-hide_banner", "-y"]
+
+    codec = (codec_name or "").lower()
+    prefer_nvdec = bool(ffmpeg_config.get("prefer_nvdec", True))
+    allow_cpu_fallback = bool(ffmpeg_config.get("allow_cpu_fallback", False))
+    decoders = ffmpeg_config.get("codec_decoders", {})
+    decoder = decoders.get(codec) if isinstance(decoders, dict) else None
+
+    if prefer_nvdec and codec in NVDEC_CODECS and decoder:
+        command.extend(
+            [
+                "-hwaccel",
+                str(ffmpeg_config.get("hwaccel", "cuda")),
+                "-c:v",
+                str(decoder),
+            ]
+        )
+    elif not allow_cpu_fallback:
+        raise ValueError(
+            f"NVDEC decoder is required for codec {codec_name!r}; CPU fallback is disabled"
+        )
+
+    frame_fps = ffmpeg_config.get("frame_fps", 1)
+    frame_width = ffmpeg_config.get("frame_width", 640)
+    jpeg_quality = ffmpeg_config.get("jpeg_quality", 4)
+    command.extend(
+        [
+            "-i",
+            str(Path(video_path).expanduser()),
+        ]
+    )
+    if max_duration_seconds is not None and max_duration_seconds > 0:
+        command.extend(["-t", f"{max_duration_seconds:g}"])
+    command.extend(
+        [
+            "-vf",
+            f"fps={frame_fps},scale={frame_width}:-2",
+            "-q:v",
+            str(jpeg_quality),
+        ]
+    )
+    if max_frames is not None and max_frames > 0:
+        command.extend(["-frames:v", str(max_frames)])
+    command.append(str(frame_pattern))
+    return command
+
+
+def sample_video_frames(
+    video_record: dict[str, Any],
+    output_dir: str | Path,
+    ffmpeg_config: dict[str, Any],
+    *,
+    manifest_path: str | Path | None = None,
+) -> list[dict[str, Any]]:
+    video_id = str(video_record["video_id"])
+    output_root = Path(output_dir).expanduser().resolve(strict=False)
+    frame_dir = output_root / "frames" / video_id
+    frame_dir.mkdir(parents=True, exist_ok=True)
+
+    try:
+        max_frames = _max_output_frames(video_record, ffmpeg_config)
+        timezone_name = str(ffmpeg_config.get("timezone", DEFAULT_TIMEZONE))
+        start_epoch = timeline_start_epoch(video_record)
+        command = build_sample_command(
+            video_record.get("path") or video_record.get("source_path"),
+            output_root,
+            video_id,
+            ffmpeg_config,
+            codec_name=video_record.get("codec_name"),
+            max_frames=max_frames,
+            max_duration_seconds=_record_duration_seconds(video_record),
+        )
+        completed = subprocess.run(
+            command,
+            capture_output=True,
+            text=True,
+            check=True,
+            timeout=int(ffmpeg_config.get("timeout_seconds_per_video", 3600)),
+        )
+        records = build_frame_records(
+            video_id,
+            output_root,
+            frame_dir.glob("*.jpg"),
+            frame_fps=float(ffmpeg_config.get("frame_fps", 1)),
+            timeline_start_epoch=start_epoch,
+            timezone_name=timezone_name,
+        )
+        _attach_success_evidence(
+            records,
+            command,
+            stderr=completed.stderr,
+        )
+    except subprocess.CalledProcessError as exc:
+        records = build_frame_records(
+            video_id,
+            output_root,
+            frame_dir.glob("*.jpg"),
+            frame_fps=float(ffmpeg_config.get("frame_fps", 1)),
+            timeline_start_epoch=start_epoch,
+            timezone_name=timezone_name,
+        )
+        if records and (max_frames is None or len(records) >= max_frames):
+            _attach_success_evidence(
+                records,
+                command,
+                stderr=exc.stderr,
+            )
+        else:
+            records = [_failure_record(video_id, exc)]
+    except (subprocess.TimeoutExpired, ValueError) as exc:
+        records = [_failure_record(video_id, exc)]
+
+    if manifest_path is not None:
+        _replace_video_records(Path(manifest_path), video_id, records)
+    return records
+
+
+def _replace_video_records(
+    manifest_path: Path,
+    video_id: str,
+    new_records: list[dict[str, Any]],
+) -> None:
+    existing = [
+        record
+        for record in read_jsonl(manifest_path)
+        if str(record.get("video_id")) != video_id
+    ]
+    write_manifest(manifest_path, [*existing, *new_records])
+
+
+def _failure_record(video_id: str, exc: BaseException) -> dict[str, Any]:
+    return {
+        "video_id": video_id,
+        "frame_id": None,
+        "frame_path": None,
+        "offset_seconds": None,
+        "timecode": None,
+        "pts_time": None,
+        "status": "sample_failed",
+        "retry_count": 0,
+        "last_error": _error_text(exc),
+    }
+
+
+def _attach_success_evidence(
+    records: list[dict[str, Any]],
+    command: list[str],
+    *,
+    stderr: str | None,
+) -> None:
+    evidence = {
+        "ffmpeg_command": command,
+        "decoder": _command_value_after(command, "-c:v"),
+        "hwaccel": _command_value_after(command, "-hwaccel"),
+        "stderr_summary": _stderr_summary(stderr),
+    }
+    for record in records:
+        record.update(evidence)
+
+
+def _command_value_after(command: list[str], flag: str) -> str | None:
+    try:
+        index = command.index(flag)
+    except ValueError:
+        return None
+    if index + 1 >= len(command):
+        return None
+    return command[index + 1]
+
+
+def _stderr_summary(stderr: str | None, *, limit: int = 2000) -> str:
+    if not stderr:
+        return ""
+    text = stderr.strip()
+    if len(text) <= limit:
+        return text
+    return text[:limit]
+
+
+def _error_text(exc: BaseException) -> str:
+    if isinstance(exc, subprocess.CalledProcessError):
+        return str(exc.stderr or exc.stdout or exc)
+    if isinstance(exc, subprocess.TimeoutExpired):
+        return f"ffmpeg timed out after {exc.timeout}s"
+    return str(exc)
+
+
+def _max_output_frames(
+    video_record: dict[str, Any],
+    ffmpeg_config: dict[str, Any],
+) -> int | None:
+    frame_fps = _optional_float(ffmpeg_config.get("frame_fps", 1))
+    if frame_fps is None or frame_fps <= 0:
+        return None
+    duration_seconds = _record_duration_seconds(video_record)
+    if duration_seconds is None or duration_seconds <= 0:
+        return None
+    return max(1, math.ceil(duration_seconds * frame_fps) + 1)
+
+
+def _record_duration_seconds(video_record: dict[str, Any]) -> float | None:
+    for begin_key, end_key in (
+        ("actual_begin", "actual_end"),
+        ("requested_begin", "requested_end"),
+    ):
+        begin = _optional_float(video_record.get(begin_key))
+        end = _optional_float(video_record.get(end_key))
+        if begin is not None and end is not None and end > begin:
+            return end - begin
+    return _optional_float(video_record.get("duration_seconds"))
+
+
+def _optional_float(value: Any) -> float | None:
+    if value is None or value == "":
+        return None
+    return float(value)
--- a/video_ai_analysis_poc/frames.py
+++ b/video_ai_analysis_poc/frames.py
@@ -0,0 +1,59 @@
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any, Iterable
+
+from .timeline import DEFAULT_TIMEZONE, format_beijing_time
+
+
+def seconds_to_timecode(seconds: float | int | None) -> str | None:
+    if seconds is None:
+        return None
+    total_seconds = int(float(seconds))
+    hours = total_seconds // 3600
+    minutes = (total_seconds % 3600) // 60
+    remaining_seconds = total_seconds % 60
+    return f"{hours:02d}:{minutes:02d}:{remaining_seconds:02d}"
+
+
+def build_frame_records(
+    video_id: str,
+    output_dir: str | Path,
+    frame_paths: Iterable[str | Path],
+    *,
+    frame_fps: float,
+    timeline_start_epoch: float | int | str | None = None,
+    timezone_name: str = DEFAULT_TIMEZONE,
+) -> list[dict[str, Any]]:
+    base_dir = Path(output_dir).expanduser().resolve(strict=False)
+    records = []
+    for index, frame_path in enumerate(sorted(Path(path) for path in frame_paths), start=1):
+        offset_seconds = round((index - 1) / frame_fps, 6)
+        record = {
+            "video_id": video_id,
+            "frame_id": f"{video_id}_f{index:06d}",
+            "frame_path": _relative_frame_path(frame_path, base_dir),
+            "offset_seconds": offset_seconds,
+            "timecode": seconds_to_timecode(offset_seconds),
+            "pts_time": offset_seconds,
+            "status": "sampled",
+            "retry_count": 0,
+            "last_error": None,
+        }
+        beijing_time = format_beijing_time(
+            timeline_start_epoch,
+            offset_seconds=offset_seconds,
+            timezone_name=timezone_name,
+        )
+        if beijing_time is not None:
+            record["beijing_time"] = beijing_time
+        records.append(record)
+    return records
+
+
+def _relative_frame_path(frame_path: Path, base_dir: Path) -> str:
+    resolved = frame_path.expanduser().resolve(strict=False)
+    try:
+        return resolved.relative_to(base_dir).as_posix()
+    except ValueError:
+        return resolved.as_posix()
--- a/video_ai_analysis_poc/hik_cloud.py
+++ b/video_ai_analysis_poc/hik_cloud.py
@@ -0,0 +1,450 @@
+from __future__ import annotations
+
+import json
+import os
+import re
+from datetime import datetime
+from pathlib import Path
+from typing import Any
+from urllib.parse import urlparse, urlunparse
+import urllib.request
+from zoneinfo import ZoneInfo
+
+from .manifest import read_jsonl, write_manifest
+from .paths import hik_cloud_download_path
+
+
+DEFAULT_TIMEZONE = "Asia/Shanghai"
+DEFAULT_CHUNK_SECONDS = 600
+MAX_CHUNK_SECONDS = 3600
+DEFAULT_API_BASE_URL = "https://api2.hik-cloud.com"
+DEFAULT_DOWNLOAD_PATH = "/v1/carrier/cstorage/open/play/download"
+DEFAULT_TIMEOUT_SECONDS = 60
+DEFAULT_DOWNLOAD_TIMEOUT_SECONDS = 600
+DOWNLOAD_MANIFEST_NAME = "hik_cloud_download_manifest.jsonl"
+NO_RECORDING_CODE = 80438027
+TIME_FORMAT = "%Y-%m-%d %H:%M:%S"
+
+
+def parse_hik_time(value: str | int | float, timezone: str = DEFAULT_TIMEZONE) -> int:
+    if isinstance(value, bool):
+        raise ValueError(f"unsupported time value: {value!r}")
+    if isinstance(value, int | float):
+        return int(value)
+    if isinstance(value, str):
+        parsed = datetime.strptime(value, TIME_FORMAT)
+        return int(parsed.replace(tzinfo=ZoneInfo(timezone)).timestamp())
+    raise ValueError(f"unsupported time value: {value!r}")
+
+
+def build_download_chunks(config: dict[str, Any]) -> list[dict[str, Any]]:
+    hik_config = config.get("hik_cloud", {})
+    runtime_config = config.get("runtime", {})
+    timezone = runtime_config.get("timezone", DEFAULT_TIMEZONE)
+    chunk_seconds = int(hik_config.get("chunk_seconds", DEFAULT_CHUNK_SECONDS))
+    if chunk_seconds <= 0:
+        raise ValueError("chunk_seconds must be greater than 0")
+    if chunk_seconds > MAX_CHUNK_SECONDS:
+        raise ValueError("chunk_seconds must be less than or equal to 3600")
+
+    chunks: list[dict[str, Any]] = []
+    devices = hik_config.get("devices", [])
+    time_ranges = hik_config.get("time_ranges", [])
+    for device in devices:
+        for time_range in time_ranges:
+            requested_begin = parse_hik_time(time_range["begin"], timezone)
+            requested_end = parse_hik_time(time_range["end"], timezone)
+            if requested_end <= requested_begin:
+                raise ValueError("time range end must be after begin")
+
+            time_begin = requested_begin
+            while time_begin < requested_end:
+                time_end = min(time_begin + chunk_seconds, requested_end)
+                chunks.append(
+                    {
+                        "device_serial": device["device_serial"],
+                        "channel_no": device["channel_no"],
+                        "requested_begin": requested_begin,
+                        "requested_end": requested_end,
+                        "time_begin": time_begin,
+                        "time_end": time_end,
+                    }
+                )
+                time_begin = time_end
+    return chunks
+
+
+def resolve_access_token(config_or_hik_config: dict[str, Any]) -> str:
+    hik_config = _hik_config(config_or_hik_config)
+    access_token = hik_config.get("access_token")
+    if access_token:
+        return str(access_token)
+
+    access_token_env = hik_config.get("access_token_env")
+    if access_token_env:
+        env_token = os.environ.get(str(access_token_env))
+        if env_token:
+            return env_token
+
+    raise ValueError(
+        "missing hik_cloud access_token; configure access_token or access_token_env"
+    )
+
+
+def request_download_address(
+    chunk: dict[str, Any],
+    hik_config: dict[str, Any],
+    *,
+    http_post: Any | None = None,
+) -> dict[str, Any]:
+    token = resolve_access_token(hik_config)
+    api_base_url = str(hik_config.get("api_base_url") or DEFAULT_API_BASE_URL)
+    download_path = str(hik_config.get("download_path") or DEFAULT_DOWNLOAD_PATH)
+    url = api_base_url.rstrip("/") + download_path
+    headers = {
+        "Authorization": f"bearer {token}",
+        "Content-Type": "application/json",
+    }
+    json_body = {
+        "deviceSerial": chunk["device_serial"],
+        "channelNo": chunk["channel_no"],
+        "timeBegin": chunk["time_begin"],
+        "timeEnd": chunk["time_end"],
+    }
+    timeout_seconds = int(hik_config.get("timeout_seconds", DEFAULT_TIMEOUT_SECONDS))
+    post = http_post or _post_json
+
+    try:
+        response = post(url, json_body, headers, timeout_seconds)
+    except Exception as exc:  # pragma: no cover - exact urllib failures vary.
+        return {
+            **_chunk_metadata(chunk),
+            "status": "address_failed",
+            "code": None,
+            "last_error": _sanitize_error(exc, token),
+        }
+
+    code = _optional_int(response.get("code"))
+    if code == 0:
+        data = response.get("data") or {}
+        return {
+            **_chunk_metadata(chunk),
+            "status": "address_ok",
+            "code": code,
+            "url": data.get("url"),
+            "actual_begin": _optional_int(data.get("actualBeginTime")),
+            "actual_end": _optional_int(data.get("actualEndTime")),
+        }
+
+    status = "no_recording" if code == NO_RECORDING_CODE else "address_failed"
+    result = {
+        **_chunk_metadata(chunk),
+        "status": status,
+        "code": code,
+        "last_error": _api_error_message(response, token),
+    }
+    return result
+
+
+def download_hik_cloud_recordings(
+    config: dict[str, Any],
+    output_dir: str | Path,
+    *,
+    address_client: Any | None = None,
+    download_url: Any | None = None,
+    download: bool = True,
+) -> list[dict[str, Any]]:
+    output_path = Path(output_dir).expanduser().resolve(strict=False)
+    manifest_path = output_path / DOWNLOAD_MANIFEST_NAME
+    hik_config = _hik_config(config)
+    chunks = build_download_chunks(config)
+    resume = bool(config.get("output", {}).get("resume", False))
+    manifest_records = read_jsonl(manifest_path) if resume else []
+    existing_downloads = {
+        _manifest_key(record): record
+        for record in manifest_records
+        if _is_resumable_download(record)
+    }
+    get_address = address_client or request_download_address
+    fetch = download_url or _download_url
+    download_timeout_seconds = int(
+        hik_config.get("download_timeout_seconds", DEFAULT_DOWNLOAD_TIMEOUT_SECONDS)
+    )
+    token = _redaction_token(hik_config)
+
+    video_records: list[dict[str, Any]] = []
+    for chunk in chunks:
+        key = _chunk_key(chunk)
+        existing_record = existing_downloads.get(key)
+        if download and existing_record is not None:
+            video_records.append(_video_record_from_manifest(existing_record))
+            continue
+
+        address_result = get_address(chunk, hik_config)
+        status = address_result.get("status")
+        if status != "address_ok":
+            _upsert_manifest_record(
+                manifest_records,
+                _manifest_record(
+                    chunk,
+                    address_result,
+                    status=str(status or "address_failed"),
+                    token=token,
+                ),
+            )
+            continue
+
+        if not download:
+            _upsert_manifest_record(
+                manifest_records,
+                _manifest_record(
+                    chunk,
+                    address_result,
+                    status="address_ok",
+                    token=token,
+                ),
+            )
+            continue
+
+        url = str(address_result.get("url") or "")
+        target_path = hik_cloud_download_path(
+            output_path,
+            str(chunk["device_serial"]),
+            chunk["channel_no"],
+            int(chunk["time_begin"]),
+            int(chunk["time_end"]),
+        )
+        try:
+            payload = fetch(url, timeout_seconds=download_timeout_seconds)
+            target_path.parent.mkdir(parents=True, exist_ok=True)
+            target_path.write_bytes(payload)
+        except Exception as exc:  # pragma: no cover - concrete network failures vary.
+            _upsert_manifest_record(
+                manifest_records,
+                _manifest_record(
+                    chunk,
+                    address_result,
+                    status="download_failed",
+                    path=target_path,
+                    last_error=_sanitize_error(exc, token),
+                    token=token,
+                ),
+            )
+            continue
+
+        record = _downloaded_video_record(chunk, address_result, target_path)
+        video_records.append(record)
+        _upsert_manifest_record(
+            manifest_records,
+            _manifest_record(
+                chunk,
+                address_result,
+                status="downloaded",
+                path=target_path,
+                token=token,
+            ),
+        )
+
+    write_manifest(manifest_path, manifest_records)
+    return video_records
+
+
+def _post_json(
+    url: str,
+    json_body: dict[str, Any],
+    headers: dict[str, str],
+    timeout_seconds: int,
+) -> dict[str, Any]:
+    request = urllib.request.Request(
+        url,
+        data=json.dumps(json_body).encode("utf-8"),
+        headers=headers,
+        method="POST",
+    )
+    with urllib.request.urlopen(request, timeout=timeout_seconds) as response:
+        return json.loads(response.read().decode("utf-8"))
+
+
+def _download_url(url: str, *, timeout_seconds: int | None = None) -> bytes:
+    with urllib.request.urlopen(url, timeout=timeout_seconds) as response:
+        return response.read()
+
+
+def _hik_config(config_or_hik_config: dict[str, Any]) -> dict[str, Any]:
+    hik_config = config_or_hik_config.get("hik_cloud")
+    if isinstance(hik_config, dict):
+        return hik_config
+    return config_or_hik_config
+
+
+def _chunk_metadata(chunk: dict[str, Any]) -> dict[str, Any]:
+    return {
+        "device_serial": chunk["device_serial"],
+        "channel_no": chunk["channel_no"],
+        "requested_begin": chunk.get("requested_begin"),
+        "requested_end": chunk.get("requested_end"),
+        "time_begin": chunk["time_begin"],
+        "time_end": chunk["time_end"],
+    }
+
+
+def _optional_int(value: Any) -> int | None:
+    if value is None or value == "":
+        return None
+    return int(value)
+
+
+def _api_error_message(response: dict[str, Any], token: str) -> str:
+    code = response.get("code")
+    message = response.get("msg") or response.get("message") or "hik api error"
+    return _sanitize_error(f"hik api code {code}: {message}", token)
+
+
+def _sanitize_error(value: Any, token: str = "") -> str | None:
+    if value is None:
+        return None
+    message = str(value)
+    for raw_url in re.findall(r"https?://[^\s'\"<>]+", message):
+        parsed = urlparse(raw_url)
+        sanitized_url = urlunparse(
+            (parsed.scheme, parsed.netloc, parsed.path, "", "", "")
+        )
+        message = message.replace(raw_url, sanitized_url)
+    message = re.sub(
+        r"\b(?:sign|sig|token|access_token)=[^&\s'\"<>]+",
+        "[redacted-query]",
+        message,
+        flags=re.IGNORECASE,
+    )
+    if token:
+        message = message.replace(token, "[redacted]")
+    message = message.replace("Authorization", "[redacted-header]")
+    return message
+
+
+def _downloaded_video_record(
+    chunk: dict[str, Any],
+    address_result: dict[str, Any],
+    path: Path,
+) -> dict[str, Any]:
+    return {
+        "source": "hik_cloud",
+        "path": str(path),
+        "source_path": _source_path(chunk),
+        "device_serial": chunk["device_serial"],
+        "channel_no": chunk["channel_no"],
+        "requested_begin": chunk["time_begin"],
+        "requested_end": chunk["time_end"],
+        "actual_begin": address_result.get("actual_begin"),
+        "actual_end": address_result.get("actual_end"),
+        "status": "downloaded",
+        "retry_count": 0,
+        "last_error": None,
+    }
+
+
+def _manifest_record(
+    chunk: dict[str, Any],
+    address_result: dict[str, Any],
+    *,
+    status: str,
+    token: str,
+    path: Path | None = None,
+    last_error: str | None = None,
+) -> dict[str, Any]:
+    url = address_result.get("url")
+    record = {
+        "source": "hik_cloud",
+        "device_serial": chunk["device_serial"],
+        "channel_no": chunk["channel_no"],
+        "requested_begin": chunk["time_begin"],
+        "requested_end": chunk["time_end"],
+        "actual_begin": address_result.get("actual_begin"),
+        "actual_end": address_result.get("actual_end"),
+        "path": str(path) if path is not None else None,
+        "status": status,
+        "retry_count": 0,
+        "last_error": _sanitize_error(last_error or address_result.get("last_error"), token),
+    }
+    if url:
+        record["download_url_host"] = urlparse(str(url)).netloc
+    if "code" in address_result:
+        record["code"] = address_result.get("code")
+    if status == "downloaded":
+        record["source_path"] = _source_path(chunk)
+    return record
+
+
+def _source_path(chunk: dict[str, Any]) -> str:
+    time_begin = chunk.get("time_begin", chunk.get("requested_begin"))
+    time_end = chunk.get("time_end", chunk.get("requested_end"))
+    return (
+        f"hik_cloud://{chunk['device_serial']}/ch{chunk['channel_no']}/"
+        f"{int(time_begin)}-{int(time_end)}"
+    )
+
+
+def _is_resumable_download(record: dict[str, Any]) -> bool:
+    path = record.get("path")
+    return (
+        record.get("status") == "downloaded"
+        and isinstance(path, str)
+        and Path(path).exists()
+    )
+
+
+def _video_record_from_manifest(record: dict[str, Any]) -> dict[str, Any]:
+    return {
+        "source": "hik_cloud",
+        "path": record["path"],
+        "source_path": record.get("source_path") or _source_path(record),
+        "device_serial": record["device_serial"],
+        "channel_no": record["channel_no"],
+        "requested_begin": record["requested_begin"],
+        "requested_end": record["requested_end"],
+        "actual_begin": record.get("actual_begin"),
+        "actual_end": record.get("actual_end"),
+        "status": "downloaded",
+        "retry_count": record.get("retry_count", 0),
+        "last_error": record.get("last_error"),
+    }
+
+
+def _upsert_manifest_record(
+    records: list[dict[str, Any]],
+    new_record: dict[str, Any],
+) -> None:
+    new_key = _manifest_key(new_record)
+    for index, record in enumerate(records):
+        if _manifest_key(record) == new_key:
+            records[index] = new_record
+            return
+    records.append(new_record)
+
+
+def _chunk_key(chunk: dict[str, Any]) -> tuple[Any, Any, Any, Any]:
+    return (
+        chunk.get("device_serial"),
+        chunk.get("channel_no"),
+        chunk.get("time_begin"),
+        chunk.get("time_end"),
+    )
+
+
+def _manifest_key(record: dict[str, Any]) -> tuple[Any, Any, Any, Any]:
+    return (
+        record.get("device_serial"),
+        record.get("channel_no"),
+        record.get("requested_begin"),
+        record.get("requested_end"),
+    )
+
+
+def _redaction_token(hik_config: dict[str, Any]) -> str:
+    token = hik_config.get("access_token")
+    if token:
+        return str(token)
+    token_env = hik_config.get("access_token_env")
+    if token_env:
+        return os.environ.get(str(token_env), "")
+    return ""
--- a/video_ai_analysis_poc/manifest.py
+++ b/video_ai_analysis_poc/manifest.py
@@ -0,0 +1,35 @@
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any, Iterable
+
+
+def write_manifest(path: str | Path, records: Iterable[dict[str, Any]]) -> None:
+    manifest_path = Path(path).expanduser().resolve(strict=False)
+    manifest_path.parent.mkdir(parents=True, exist_ok=True)
+    with manifest_path.open("w", encoding="utf-8") as handle:
+        for record in records:
+            normalized = _normalize_record(record)
+            handle.write(
+                json.dumps(normalized, ensure_ascii=False, sort_keys=True) + "\n"
+            )
+
+
+def read_jsonl(path: str | Path) -> list[dict[str, Any]]:
+    jsonl_path = Path(path).expanduser().resolve(strict=False)
+    if not jsonl_path.exists():
+        return []
+    records = []
+    for line in jsonl_path.read_text(encoding="utf-8").splitlines():
+        if line.strip():
+            records.append(json.loads(line))
+    return records
+
+
+def _normalize_record(record: dict[str, Any]) -> dict[str, Any]:
+    normalized = dict(record)
+    normalized.setdefault("status", "pending")
+    normalized.setdefault("retry_count", 0)
+    normalized.setdefault("last_error", None)
+    return normalized
--- a/video_ai_analysis_poc/paths.py
+++ b/video_ai_analysis_poc/paths.py
@@ -0,0 +1,71 @@
+from __future__ import annotations
+
+import hashlib
+from pathlib import Path
+
+
+FORBIDDEN_REFERENCE_ROOT = Path("/Users/yoilun/AI-train/zhengxin-vlm-0413")
+
+
+def resolve_path(path: str | Path, *, base_dir: Path | None = None) -> Path:
+    candidate = Path(path).expanduser()
+    if not candidate.is_absolute() and base_dir is not None:
+        candidate = base_dir / candidate
+    return candidate.resolve(strict=False)
+
+
+def _is_relative_to(path: Path, parent: Path) -> bool:
+    try:
+        path.relative_to(parent)
+        return True
+    except ValueError:
+        return False
+
+
+def validate_output_dir(
+    input_dir: str | Path,
+    output_dir: str | Path,
+    *,
+    forbidden_root: Path = FORBIDDEN_REFERENCE_ROOT,
+) -> Path:
+    resolved_input = resolve_path(input_dir)
+    resolved_output = resolve_path(output_dir)
+    resolved_forbidden = resolve_path(forbidden_root)
+
+    if resolved_output == resolved_input:
+        raise ValueError("output dir must not equal input dir")
+    if _is_relative_to(resolved_output, resolved_forbidden):
+        raise ValueError(
+            f"output dir must not be inside forbidden reference dir: {resolved_forbidden}"
+        )
+    return resolved_output
+
+
+def stable_video_id(path: str | Path) -> str:
+    resolved = str(resolve_path(path))
+    digest = hashlib.sha1(resolved.encode("utf-8")).hexdigest()[:16]
+    return f"video-{digest}"
+
+
+def hik_cloud_download_path(
+    output_dir: str | Path,
+    device_serial: str,
+    channel_no: int | str,
+    time_begin: int,
+    time_end: int,
+) -> Path:
+    safe_device = _safe_path_component(device_serial)
+    safe_channel = _safe_path_component(str(channel_no))
+    filename = f"{safe_device}_ch{safe_channel}_{int(time_begin)}_{int(time_end)}.mp4"
+    return (
+        resolve_path(output_dir)
+        / "downloads"
+        / "hik_cloud"
+        / safe_device
+        / f"ch{safe_channel}"
+        / filename
+    )
+
+
+def _safe_path_component(value: str) -> str:
+    return "".join(char if char.isalnum() or char in "._-" else "_" for char in value)
--- a/video_ai_analysis_poc/probe.py
+++ b/video_ai_analysis_poc/probe.py
@@ -0,0 +1,99 @@
+from __future__ import annotations
+
+import json
+import subprocess
+from pathlib import Path
+from typing import Any
+
+
+def probe_video(path: str | Path, *, timeout_seconds: int = 30) -> dict[str, Any]:
+    video_path = Path(path).expanduser().resolve(strict=False)
+    base_record: dict[str, Any] = {
+        "path": str(video_path),
+        "status": "probe_failed",
+        "retry_count": 0,
+        "last_error": None,
+    }
+    command = [
+        "ffprobe",
+        "-v",
+        "error",
+        "-print_format",
+        "json",
+        "-show_format",
+        "-show_streams",
+        str(video_path),
+    ]
+
+    try:
+        completed = subprocess.run(
+            command,
+            capture_output=True,
+            text=True,
+            check=True,
+            timeout=timeout_seconds,
+        )
+        payload = json.loads(completed.stdout or "{}")
+        video_stream = _first_video_stream(payload)
+        format_info = payload.get("format", {})
+        return {
+            **base_record,
+            "status": "probed",
+            "duration_seconds": _optional_float(format_info.get("duration")),
+            "codec_name": video_stream.get("codec_name"),
+            "width": _optional_int(video_stream.get("width")),
+            "height": _optional_int(video_stream.get("height")),
+            "fps": _parse_frame_rate(
+                video_stream.get("avg_frame_rate") or video_stream.get("r_frame_rate")
+            ),
+            "format_name": format_info.get("format_name"),
+            "start_time": _optional_float(format_info.get("start_time")),
+        }
+    except subprocess.TimeoutExpired as exc:
+        base_record["last_error"] = f"ffprobe timed out after {timeout_seconds}s"
+        if exc.stderr:
+            base_record["last_error"] += f": {exc.stderr}"
+        return base_record
+    except subprocess.CalledProcessError as exc:
+        base_record["last_error"] = _error_text(exc.stderr or exc.stdout or str(exc))
+        return base_record
+    except (json.JSONDecodeError, ValueError) as exc:
+        base_record["last_error"] = f"ffprobe parse failed: {exc}"
+        return base_record
+
+
+def _first_video_stream(payload: dict[str, Any]) -> dict[str, Any]:
+    for stream in payload.get("streams", []):
+        if stream.get("codec_type") == "video":
+            return stream
+    raise ValueError("ffprobe output did not contain a video stream")
+
+
+def _parse_frame_rate(value: str | None) -> float | None:
+    if not value or value == "0/0":
+        return None
+    if "/" in value:
+        numerator, denominator = value.split("/", 1)
+        denominator_value = float(denominator)
+        if denominator_value == 0:
+            return None
+        return float(numerator) / denominator_value
+    return float(value)
+
+
+def _optional_float(value: Any) -> float | None:
+    if value is None or value == "":
+        return None
+    return float(value)
+
+
+def _optional_int(value: Any) -> int | None:
+    if value is None or value == "":
+        return None
+    return int(value)
+
+
+def _error_text(value: Any) -> str:
+    if isinstance(value, bytes):
+        return value.decode("utf-8", errors="replace").strip()
+    return str(value).strip()
--- a/video_ai_analysis_poc/result_parser.py
+++ b/video_ai_analysis_poc/result_parser.py
@@ -0,0 +1,138 @@
+from __future__ import annotations
+
+import json
+from typing import Any
+
+
+def extract_json_payload(raw_response: str) -> dict[str, Any]:
+    text = raw_response.strip()
+    if not text:
+        raise ValueError("JSON payload is empty")
+
+    try:
+        payload = json.loads(text)
+        if isinstance(payload, dict):
+            return payload
+    except json.JSONDecodeError:
+        pass
+
+    decoder = json.JSONDecoder()
+    for index, char in enumerate(text):
+        if char != "{":
+            continue
+        try:
+            payload, _ = decoder.raw_decode(text[index:])
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            return payload
+    raise ValueError("JSON object not found in model response")
+
+
+def build_clip_result(
+    raw_response: str,
+    clip_record: dict[str, Any],
+    video_record: dict[str, Any] | None,
+    config: dict[str, Any],
+    *,
+    processing: dict[str, Any] | None = None,
+    status: str | None = None,
+    error: str | None = None,
+) -> dict[str, Any]:
+    processing_record = dict(processing or {})
+    if status is not None:
+        payload: dict[str, Any] = {}
+        result_status = status
+        result_error = error
+    else:
+        try:
+            payload = extract_json_payload(raw_response)
+            result_status = "ok"
+            result_error = None
+        except ValueError as exc:
+            payload = {}
+            result_status = "parse_failed"
+            result_error = str(exc)
+
+    timeline = _timeline(clip_record, config, payload)
+    return {
+        "schema_version": config.get("schema", {}).get("version", "local-batch-v1"),
+        "video_id": str(clip_record.get("video_id")),
+        "video_path": _video_path(video_record),
+        "clip_id": str(clip_record.get("clip_id")),
+        "status": result_status,
+        "monitoring_timeline": timeline,
+        "events": _events(payload, clip_record) if result_status == "ok" else [],
+        "raw_response": raw_response,
+        "processing": processing_record,
+        "error": result_error,
+    }
+
+
+def _timeline(
+    clip_record: dict[str, Any],
+    config: dict[str, Any],
+    payload: dict[str, Any],
+) -> dict[str, Any]:
+    return {
+        "timezone": config.get("runtime", {}).get("timezone", "Asia/Shanghai"),
+        "video_start_time": clip_record.get("video_start_time"),
+        "clip_start_seconds": clip_record.get("clip_start_seconds"),
+        "clip_end_seconds": clip_record.get("clip_end_seconds"),
+        "clip_start_timecode": clip_record.get("clip_start_timecode"),
+        "clip_end_timecode": clip_record.get("clip_end_timecode"),
+        "clip_start_beijing_time": clip_record.get("clip_start_beijing_time"),
+        "clip_end_beijing_time": clip_record.get("clip_end_beijing_time"),
+        "frame_times": clip_record.get("frame_times", []),
+        "screen_time": str(
+            payload.get("screen_time") or payload.get("画面时间") or payload.get("时间") or ""
+        ),
+    }
+
+
+def _events(
+    payload: dict[str, Any],
+    clip_record: dict[str, Any],
+) -> list[dict[str, Any]]:
+    raw_events = payload.get("events") or []
+    if not isinstance(raw_events, list):
+        return []
+    return [
+        _event(event, clip_record)
+        for event in raw_events
+        if isinstance(event, dict)
+    ]
+
+
+def _event(
+    event: dict[str, Any],
+    clip_record: dict[str, Any],
+) -> dict[str, Any]:
+    normalized = dict(event)
+    normalized.setdefault("event_type", "unknown")
+    normalized.setdefault("start_time", None)
+    normalized.setdefault("end_time", None)
+    normalized.setdefault("start_offset_seconds", clip_record.get("clip_start_seconds"))
+    normalized.setdefault("end_offset_seconds", clip_record.get("clip_end_seconds"))
+    normalized.setdefault("confidence", None)
+    normalized.setdefault("severity", None)
+    normalized.setdefault("attributes", {})
+    normalized.setdefault(
+        "evidence",
+        {
+            "clip_id": clip_record.get("clip_id"),
+            "frame_paths": [
+                frame.get("frame_path")
+                for frame in clip_record.get("frame_times", [])
+                if frame.get("frame_path")
+            ],
+        },
+    )
+    return normalized
+
+
+def _video_path(video_record: dict[str, Any] | None) -> str | None:
+    if not video_record:
+        return None
+    value = video_record.get("path") or video_record.get("source_path")
+    return str(value) if value is not None else None
--- a/video_ai_analysis_poc/timeline.py
+++ b/video_ai_analysis_poc/timeline.py
@@ -0,0 +1,67 @@
+from __future__ import annotations
+
+from datetime import datetime, timedelta, timezone
+from typing import Any
+from zoneinfo import ZoneInfo, ZoneInfoNotFoundError
+
+
+TIME_FORMAT = "%Y-%m-%d %H:%M:%S"
+DEFAULT_TIMEZONE = "Asia/Shanghai"
+
+
+def format_beijing_time(
+    epoch_seconds: float | int | str | None,
+    *,
+    offset_seconds: float | int = 0,
+    timezone_name: str = DEFAULT_TIMEZONE,
+) -> str | None:
+    epoch = _optional_float(epoch_seconds)
+    if epoch is None:
+        return None
+    zone = _zone(timezone_name)
+    timestamp = epoch + float(offset_seconds)
+    return datetime.fromtimestamp(timestamp, tz=timezone.utc).astimezone(zone).strftime(
+        TIME_FORMAT
+    )
+
+
+def derive_time_from_reference(
+    reference_time: str | None,
+    *,
+    reference_offset_seconds: float | int | None,
+    target_offset_seconds: float | int | None,
+) -> str | None:
+    if not reference_time:
+        return None
+    reference_offset = _optional_float(reference_offset_seconds)
+    target_offset = _optional_float(target_offset_seconds)
+    if reference_offset is None or target_offset is None:
+        return None
+    try:
+        reference = datetime.strptime(reference_time, TIME_FORMAT)
+    except ValueError:
+        return None
+    return (reference + timedelta(seconds=target_offset - reference_offset)).strftime(
+        TIME_FORMAT
+    )
+
+
+def timeline_start_epoch(record: dict[str, Any]) -> float | None:
+    for key in ("actual_begin", "requested_begin"):
+        value = _optional_float(record.get(key))
+        if value is not None:
+            return value
+    return None
+
+
+def _zone(timezone_name: str) -> ZoneInfo:
+    try:
+        return ZoneInfo(timezone_name)
+    except ZoneInfoNotFoundError:
+        return ZoneInfo(DEFAULT_TIMEZONE)
+
+
+def _optional_float(value: Any) -> float | None:
+    if value is None or value == "":
+        return None
+    return float(value)
--- a/video_ai_analysis_poc/vlm_client.py
+++ b/video_ai_analysis_poc/vlm_client.py
@@ -0,0 +1,134 @@
+from __future__ import annotations
+
+import base64
+import json
+import time
+import urllib.request
+from pathlib import Path
+from typing import Any, Callable
+
+
+HttpPost = Callable[[str, dict[str, Any], int], dict[str, Any]]
+
+
+def infer_clip(
+    clip_record: dict[str, Any],
+    output_dir: str | Path,
+    vlm_config: dict[str, Any],
+    prompt_config: dict[str, Any],
+    *,
+    http_post: HttpPost | None = None,
+) -> dict[str, Any]:
+    start = time.monotonic()
+    client = http_post or _post_json
+    url = build_chat_url(vlm_config)
+    payload = build_payload(clip_record, output_dir, vlm_config, prompt_config)
+    response = client(url, payload, int(vlm_config.get("timeout_seconds", 120)))
+    latency_ms = int((time.monotonic() - start) * 1000)
+    return {
+        "raw_response": _extract_message_content(response.get("body")),
+        "http_status": response.get("status"),
+        "latency_ms": latency_ms,
+    }
+
+
+def build_chat_url(vlm_config: dict[str, Any]) -> str:
+    return (
+        str(vlm_config["api_base_url"]).rstrip("/")
+        + str(vlm_config["chat_completions_path"])
+    )
+
+
+def build_payload(
+    clip_record: dict[str, Any],
+    output_dir: str | Path,
+    vlm_config: dict[str, Any],
+    prompt_config: dict[str, Any],
+) -> dict[str, Any]:
+    content: list[dict[str, Any]] = [
+        {"type": "text", "text": str(prompt_config.get("user", ""))}
+    ]
+    for frame in clip_record.get("frame_times", []):
+        frame_path = frame.get("frame_path")
+        if not frame_path:
+            continue
+        content.append(
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": _image_url(
+                        frame_path,
+                        output_dir,
+                        str(vlm_config.get("image_transport", "data_uri")),
+                    )
+                },
+            }
+        )
+
+    return {
+        "model": vlm_config.get("model"),
+        "messages": [
+            {"role": "system", "content": str(prompt_config.get("system", ""))},
+            {"role": "user", "content": content},
+        ],
+        "temperature": vlm_config.get("temperature", 0),
+        "max_tokens": vlm_config.get("max_tokens", 512),
+    }
+
+
+def _image_url(
+    frame_path: str | Path,
+    output_dir: str | Path,
+    image_transport: str,
+) -> str:
+    if image_transport != "data_uri":
+        return str(frame_path)
+    path = Path(frame_path).expanduser()
+    if not path.is_absolute():
+        path = Path(output_dir).expanduser() / path
+    data = base64.b64encode(path.read_bytes()).decode("ascii")
+    return f"data:{_mime_type(path)};base64,{data}"
+
+
+def _mime_type(path: Path) -> str:
+    suffix = path.suffix.lower()
+    if suffix in {".jpg", ".jpeg"}:
+        return "image/jpeg"
+    if suffix == ".png":
+        return "image/png"
+    if suffix == ".webp":
+        return "image/webp"
+    return "application/octet-stream"
+
+
+def _post_json(
+    url: str,
+    payload: dict[str, Any],
+    timeout_seconds: int,
+) -> dict[str, Any]:
+    body = json.dumps(payload).encode("utf-8")
+    request = urllib.request.Request(
+        url,
+        data=body,
+        headers={"Content-Type": "application/json"},
+        method="POST",
+    )
+    with urllib.request.urlopen(request, timeout=timeout_seconds) as response:
+        response_body = response.read().decode("utf-8")
+        return {
+            "status": response.status,
+            "body": json.loads(response_body) if response_body else {},
+        }
+
+
+def _extract_message_content(body: Any) -> str:
+    if not isinstance(body, dict):
+        return ""
+    choices = body.get("choices")
+    if not choices:
+        return ""
+    message = choices[0].get("message", {}) if isinstance(choices[0], dict) else {}
+    content = message.get("content", "")
+    if isinstance(content, str):
+        return content
+    return json.dumps(content, ensure_ascii=False)
--- a/video_ai_analysis_system_plan.md
+++ b/video_ai_analysis_system_plan.md
--- a/录像下载流程_1.pdf
+++ b/录像下载流程_1.pdf