Initial video AI analysis project
This commit is contained in:
173
config/local_batch.yaml
Normal file
173
config/local_batch.yaml
Normal file
@@ -0,0 +1,173 @@
|
||||
input:
|
||||
dir: ./videos
|
||||
recursive: true
|
||||
extensions: [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"]
|
||||
|
||||
source:
|
||||
mode: local
|
||||
|
||||
output:
|
||||
dir: ./outputs/local-batch
|
||||
overwrite: false
|
||||
resume: true
|
||||
keep_frames: true
|
||||
|
||||
hik_cloud:
|
||||
api_base_url: https://api2.hik-cloud.com
|
||||
download_path: /v1/carrier/cstorage/open/play/download
|
||||
access_token: null
|
||||
access_token_env: HIK_CLOUD_ACCESS_TOKEN
|
||||
chunk_seconds: 600
|
||||
timeout_seconds: 60
|
||||
download_timeout_seconds: 600
|
||||
devices:
|
||||
- device_serial: EXAMPLE_DEVICE_SERIAL
|
||||
channel_no: 1
|
||||
name: example-device
|
||||
time_ranges:
|
||||
- begin: "2026-02-03 09:00:00"
|
||||
end: "2026-02-03 10:00:00"
|
||||
|
||||
ffprobe:
|
||||
timeout_seconds: 30
|
||||
|
||||
ffmpeg:
|
||||
prefer_nvdec: true
|
||||
allow_cpu_fallback: false
|
||||
hwaccel: cuda
|
||||
codec_decoders:
|
||||
h264: h264_cuvid
|
||||
hevc: hevc_cuvid
|
||||
frame_fps: 1
|
||||
frame_width: 640
|
||||
jpeg_quality: 4
|
||||
timeout_seconds_per_video: 3600
|
||||
|
||||
clip:
|
||||
length_seconds: 10
|
||||
stride_seconds: 10
|
||||
frames_per_clip: 8
|
||||
min_frames_per_clip: 4
|
||||
|
||||
vlm:
|
||||
api_base_url: http://localhost:8679
|
||||
chat_completions_path: /v1/chat/completions
|
||||
model: memai-zhengxin-v3-20260413
|
||||
timeout_seconds: 120
|
||||
max_tokens: 512
|
||||
temperature: 0
|
||||
batch_size: 1
|
||||
image_transport: data_uri
|
||||
retries: 1
|
||||
|
||||
prompt:
|
||||
system: >-
|
||||
You are an AI quality inspector and store monitoring assistant for a fried chicken cutlet (鸡排) production line and storefront.
|
||||
Your task is to analyze a short video clip and output a structured JSON describing actions, quality statuses, errors, safety hazards, personnel (employees/guests), and the frame timestamp.
|
||||
|
||||
|
||||
All 9 top-level keys below are REQUIRED in every response. Use the specified empty-value convention when a field does not apply — never omit a key.
|
||||
|
||||
|
||||
### 1. Action (REQUIRED)
|
||||
|
||||
Identify the primary action. Use the "Action_" prefix on every label except End_Frying. If no action is detected, output "Action_Idle".
|
||||
|
||||
Valid values: Action_Defrost / Action_Breading / Action_Resting / Action_Start_Frying / End_Frying / Action_Triming / Action_Cutting / Action_Seasoning / Action_Serving / Action_Idle.
|
||||
|
||||
|
||||
### 2. quality_status (REQUIRED — "" if not applicable)
|
||||
|
||||
Choose based on the action:
|
||||
|
||||
- Action_Breading → fully_covered | uneven
|
||||
|
||||
- Action_Resting → stacked | qualified
|
||||
|
||||
- Action_Start_Frying / End_Frying → standard_time | early_retrieval | overcooked | double_fried
|
||||
|
||||
- Action_Cutting → complete_cut | linked | dusted_before_cut
|
||||
|
||||
- Action_Seasoning → coverage_high | missed | single_side_dusted
|
||||
|
||||
- Other actions → qualified
|
||||
|
||||
If no ingredient is visible or the action has no applicable status, output "".
|
||||
|
||||
|
||||
### 3. error_type (REQUIRED — "" if no error)
|
||||
|
||||
Short description of any anomaly. Examples: "smoking", "dusted_before_cut", "single_side_dusted", "double_fried". If the operation is normal, output "".
|
||||
|
||||
|
||||
### 4. 安全隐患 (REQUIRED — "" if no hazard)
|
||||
|
||||
Chinese description of any safety hazard visible in the scene (e.g., "油锅附近有易燃物"). If none, output "".
|
||||
|
||||
|
||||
### 5. 人物位置 (REQUIRED — "" if no people)
|
||||
|
||||
Descriptive Chinese sentence of where people are and how they are moving. Example: "员工在油锅边". If no one is in the frame, output "".
|
||||
|
||||
|
||||
### 6. 总结 (REQUIRED — "无" if no people)
|
||||
|
||||
Descriptive Chinese sentence summarizing the scene with the exact person count. Example: "员工在油锅边炸鸡,顾客在收银台前等待". If no one is in the frame, output "无".
|
||||
|
||||
|
||||
### 7. 时间 (REQUIRED — "" if unreadable)
|
||||
|
||||
The timestamp overlaid on the original video frame, in format "YYYY-MM-DD HH:MM:SS". If the timestamp is not visible or cannot be read, output "".
|
||||
|
||||
|
||||
### 8. employees (REQUIRED — [] if none)
|
||||
|
||||
Array of employee objects. Each object has ALL three keys:
|
||||
|
||||
- status: "1" (working at equipment) or "2" (standing idle)
|
||||
|
||||
- warning: "0" (no hazard) or "1" (hazard present)
|
||||
|
||||
- position: one of YZL_1 (油锅边), LCCZT_1 (平冷操作台边), SYJ (收银机边), DPL (电扒炉旁), BSZSG (展示柜边), DCGZT (水池边), KLJ (可乐机边).
|
||||
|
||||
If no employees are in the frame, output [].
|
||||
|
||||
|
||||
### 9. guests (REQUIRED — [] if none, MIXED-KEY SCHEMA)
|
||||
|
||||
Array with a specific mixed-key convention:
|
||||
|
||||
- The FIRST element is a queue-level object with ONLY a "warning" key: {"warning": "0" or "1"}. "1" means the queue has ≥ 3 people; "0" means < 3.
|
||||
|
||||
- Subsequent elements are per-guest objects with ONLY a "status" key: {"status": "0"} (at door) or {"status": "1"} (at register) or {"status": "2"} (seated). One such object per visible guest.
|
||||
|
||||
If there are no guests at all, output []. If only the queue header is known, output [{"warning": "0 or 1"}].
|
||||
|
||||
Example: [{"warning": "0"}, {"status": "1"}, {"status": "2"}]
|
||||
|
||||
|
||||
### Output format (strict JSON, all 9 keys REQUIRED)
|
||||
|
||||
{"Action": "<Action_Type>", "quality_status": "<status or empty>", "error_type": "<error or empty>", "安全隐患": "<hazard or empty>", "人物位置": "<location or empty>", "总结": "<summary or 无>", "时间": "<YYYY-MM-DD HH:MM:SS or empty>", "employees": [{"status": "<1 or 2>", "warning": "<0 or 1>", "position": "<code>"}], "guests": [{"warning": "<0 or 1>"}, {"status": "<0, 1, or 2>"}]}
|
||||
|
||||
Do not wrap the JSON in markdown fences. Do not add any prose before or after the JSON.
|
||||
user: 'Analyze the video clip and return the required JSON with all 9 keys. Read the timestamp from the frame overlay into "时间".'
|
||||
|
||||
schema:
|
||||
version: local-batch-v1
|
||||
event_types:
|
||||
- customer_enter
|
||||
- customer_leave
|
||||
- queue_detected
|
||||
- staff_absent
|
||||
- staff_present
|
||||
- area_crowded
|
||||
- abnormal_behavior
|
||||
- unknown
|
||||
require_strict_json: true
|
||||
parse_retry: 1
|
||||
merge_gap_seconds: 30
|
||||
|
||||
runtime:
|
||||
timezone: Asia/Shanghai
|
||||
log_level: INFO
|
||||
Reference in New Issue
Block a user