Initial video AI analysis project

This commit is contained in:
yangyl
2026-06-17 11:33:54 +08:00
commit ef0047af6d
35 changed files with 8613 additions and 0 deletions

View File

@@ -0,0 +1,151 @@
# Hik Cloud Download Analysis Design
## Goal
Add Hik Cloud Storage recording download as a first-class video source for the existing video analysis pipeline. The implementation must support configurable AccessToken, multiple devices, configurable date-time ranges, one-hour API slicing, video downloads, and reuse the existing local analysis pipeline.
## Source Model
The pipeline keeps the existing local mode and adds a cloud mode:
```yaml
source:
mode: local # local | hik_cloud
```
`local` keeps the current folder discovery behavior. `hik_cloud` runs a download stage first, then analyzes the downloaded files exactly like local files.
## Hik Cloud Configuration
The config should allow a literal token for controlled testing and an environment variable for normal use:
```yaml
hik_cloud:
api_base_url: https://api2.hik-cloud.com
download_path: /v1/carrier/cstorage/open/play/download
access_token: null
access_token_env: HIK_CLOUD_ACCESS_TOKEN
chunk_seconds: 600
timeout_seconds: 60
download_timeout_seconds: 600
devices:
- device_serial: EXAMPLE_DEVICE_SERIAL
channel_no: 1
name: store-front
time_ranges:
- begin: "2026-02-03 09:00:00"
end: "2026-02-03 11:30:00"
```
The implementation must not print or persist the token. Manifest entries may record the API URL path, device serial, channel, requested times, actual times, and status, but not the Authorization header.
## Time Handling
The user-facing time range includes year, month, day, hour, minute, and second. The config supports both `YYYY-MM-DD HH:MM:SS` strings and integer epoch seconds. String parsing uses `runtime.timezone`, defaulting to `Asia/Shanghai`, and converts to Unix seconds for `timeBegin` and `timeEnd`.
Ranges are split into chunks with `end - begin <= 3600` because the PDF documents error `80430002` when the requested interval exceeds 3600 seconds. The example default uses 600 seconds because real remote smoke found that shorter chunks produced valid, probeable MP4 files for the provided test range.
## API Contract
Use the PDF section “2、获取录像下载地址”:
```text
POST https://api2.hik-cloud.com/v1/carrier/cstorage/open/play/download
Authorization: bearer <AccessToken>
Content-Type: application/json
```
Request body:
```json
{
"deviceSerial": "EXAMPLE_DEVICE_SERIAL",
"channelNo": 1,
"timeBegin": 1764856787,
"timeEnd": 1764856978
}
```
Successful response:
```json
{
"code": 0,
"data": {
"url": "https://...",
"actualBeginTime": "1764856787",
"actualEndTime": "1764856978"
},
"success": true
}
```
Non-zero codes become structured failures. `80438027` is treated as `no_recording` so one empty chunk does not stop the batch.
## Output Contract
Cloud downloads write a dedicated manifest:
```text
<output.dir>/hik_cloud_download_manifest.jsonl
```
Each row contains:
- `source: hik_cloud`
- `device_serial`
- `channel_no`
- `requested_begin`, `requested_end`
- `actual_begin`, `actual_end`
- `download_url_host` or no URL at all if avoiding host persistence is preferred
- `path` for downloaded video
- `status`: `address_ok`, `downloaded`, `no_recording`, `address_failed`, `download_failed`
- `retry_count`, `last_error`
Downloaded videos go under:
```text
<output.dir>/downloads/hik_cloud/<device_serial>/ch<channel_no>/
```
Filenames use device/channel/requested timestamps and never include URL query signatures or tokens.
## Pipeline Integration
`cli.py` should branch only at source acquisition:
```text
local mode:
discover local videos -> probe -> frames -> clips -> inference -> aggregate
hik_cloud mode:
build chunks -> request download URLs -> download videos -> probe -> frames -> clips -> inference -> aggregate
```
After downloads complete, the rest of the pipeline should consume downloaded file paths and preserve cloud metadata in `video_manifest.jsonl`.
FFmpeg sampling caps output frames from the requested/actual cloud chunk duration. This prevents malformed or irregular Hik MP4 timestamps from making the `fps=1` filter duplicate tens of thousands of frames for a 10-minute chunk.
Cloud `--dry-run` stops at download-address planning: it requests addresses and writes `hik_cloud_download_manifest.jsonl`, but does not download video files, run ffprobe, sample frames, infer, or aggregate.
## Error Handling
- Missing token: fail fast with a clear config error in `hik_cloud` mode.
- Invalid range: fail fast if `end <= begin`.
- API code 80438027: record `no_recording`, continue.
- Other API non-zero code: record `address_failed`, continue other chunks.
- Download HTTP/IO failure: record `download_failed`, continue other chunks.
- Existing downloaded file with manifest status `downloaded`: skip on resume.
## Testing
Use TDD with standard-library mocks:
- config parser loads `devices` as list of dicts.
- time parser accepts date-time strings and epoch integers.
- splitter produces max-3600-second chunks.
- API client builds correct URL, body, bearer header, and parses success/failure.
- downloader writes bytes and manifest without persisting token.
- CLI cloud mode uses downloaded files and keeps local mode unchanged.
Real Hik API smoke uses the sensitive `access_token.md` file provided by the user on the remote test environment. Do not copy values from that file into docs, tests, logs, or final responses.