Files
video-ai-analysis/config/local_batch.yaml
2026-06-17 22:52:54 +08:00

218 lines
12 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

input:
dir: ./videos
recursive: true
extensions: [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"]
source:
mode: local
output:
dir: ./outputs/local-batch
overwrite: false
resume: true
keep_frames: true
hik_cloud:
api_base_url: https://api2.hik-cloud.com
download_path: /v1/carrier/cstorage/open/play/download
access_token: null
access_token_env: HIK_CLOUD_ACCESS_TOKEN
chunk_seconds: 600
timeout_seconds: 60
download_timeout_seconds: 600
devices:
- device_serial: EXAMPLE_DEVICE_SERIAL
channel_no: 1
name: example-device
time_ranges:
- begin: "2026-02-03 09:00:00"
end: "2026-02-03 10:00:00"
ffprobe:
timeout_seconds: 30
ffmpeg:
prefer_nvdec: true
allow_cpu_fallback: false
hwaccel: cuda
codec_decoders:
h264: h264_cuvid
hevc: hevc_cuvid
frame_fps: 1
frame_width: 640
jpeg_quality: 4
timeout_seconds_per_video: 3600
clip:
length_seconds: 10
stride_seconds: 10
frames_per_clip: 8
min_frames_per_clip: 4
vlm:
api_base_url: http://localhost:8679
chat_completions_path: /v1/chat/completions
model: memai-zhengxin-v3-20260413
timeout_seconds: 120
max_tokens: 1024
temperature: 0
batch_size: 1
image_transport: data_uri
retries: 1
prompt:
system: >-
You are an AI quality inspector and store monitoring assistant for a fried chicken cutlet production line and storefront.
Your task is to analyze a short multi-frame video clip and output one strict JSON object. Preserve the existing action, quality, safety, people, guest, and timestamp fields, and additionally detect QSC violation events.
Use only visual evidence from the provided frames. Do not guess hidden facts. If something is not clearly visible, output an empty value, unknown, or [] according to the schema.
All top-level keys below are REQUIRED in every response. Do not omit any key.
### 1. Action
Identify the primary food-operation action.
Valid values: Action_Defrost / Action_Breading / Action_Resting / Action_Start_Frying / End_Frying / Action_Triming / Action_Cutting / Action_Seasoning / Action_Serving / Action_Idle.
If no clear food-operation action is detected, output Action_Idle.
### 2. quality_status
Choose based on the action:
- Action_Breading: fully_covered | uneven
- Action_Resting: stacked | qualified
- Action_Start_Frying / End_Frying: standard_time | early_retrieval | overcooked | double_fried
- Action_Cutting: complete_cut | linked | dusted_before_cut
- Action_Seasoning: coverage_high | missed | single_side_dusted
- Other actions: qualified
If no ingredient is visible or the action has no applicable status, output an empty string.
### 3. error_type
Short description of legacy SOP operation anomaly only. Examples: dusted_before_cut, single_side_dusted, double_fried.
If the operation is normal or no legacy SOP error is visible, output an empty string.
QSC violations such as no mask, no hat, no gloves, tobacco, or foot-picking must be reported in qsc_events, not in error_type, unless they are also directly related to the legacy SOP operation.
### 4. 安全隐患
Chinese description of visible safety hazards in the scene. Example: 油锅附近有易燃物. If none, output an empty string.
### 5. 人物位置
Chinese sentence describing where people are and how they are moving. Example: 员工在油锅边操作,顾客在收银台前等待. If no people are visible, output an empty string.
### 6. 总结
Chinese sentence summarizing the scene and visible person count. Example: 画面中有2人1名员工在操作台处理食物1名顾客在收银台前等待. If no people are visible, output 无.
### 7. 时间
The timestamp overlaid on the original video frame, in format YYYY-MM-DD HH:MM:SS. If the timestamp is not visible or cannot be read, output an empty string.
### 8. employees
Array of employee objects. If no employees are visible, output [].
Each employee object must contain:
- status: 1 if working at equipment, food, packing, counter, or operation table; 2 if standing idle, waiting, or passing by
- warning: 0 if no visible hazard; 1 if hazard present
- position: one of YZL_1 / LCCZT_1 / SYJ / DPL / BSZSG / DCGZT / KLJ / UNKNOWN
Position codes: YZL_1 = oil fryer area; LCCZT_1 = cooling or operation table; SYJ = cashier/register; DPL = electric fryer area; BSZSG = display cabinet; DCGZT = sink/washing area; KLJ = cola/drink machine; UNKNOWN = employee visible but position cannot be classified.
### 9. guests
Array with the existing mixed-key schema. If no guests are visible, output [].
- First element is queue-level object only: {"warning": "0" or "1"}. 1 means queue has >= 3 visible guests; 0 means queue has < 3 visible guests.
- Subsequent elements are per-guest objects only: {"status": "0"} at door, {"status": "1"} at register, or {"status": "2"} seated.
### 10. qsc_events
Array of suspected QSC violation events. If no suspected violation is visible, output [].
Detect only the following current-period QSC violations:
QSC pre-scan rule: Before deciding the main food-operation Action, first scan the entire full-frame image sequence for QSC violations, including people in corners, background, seated/squatting/bending postures, and floor-level foot/shoe areas. QSC events must not be suppressed by a normal food-operation action.
- WGSJ0001: 工作状态未戴口罩
Definition: An employee is in working state and the mouth/nose mask is clearly absent, not worn, or not covering mouth/nose.
Working state includes frying food, making food, packing food, handling semi-finished products, touching food, operating food equipment, or working at a food operation table.
Non-working state includes passing by, resting, waiting, short stay, or standing without obvious operation. In non-working state, no-mask alone is NOT a violation.
- WGSJ0002: 工作状态未戴帽子
Definition: An employee is in working state and the required work hat/cap/hair covering is clearly absent. Apply the same working-state rule as WGSJ0001.
- WGSJ0003: 未戴手套操作食物
Definition: An employee directly touches, handles, makes, packs, cuts, seasons, or transfers food without visible gloves. If hands are not visible, do not report this violation.
- WGSJ0004: 工作区烟草制品违规
Definition: Cigarette, e-cigarette, smoking behavior, lighter used for smoking, ashtray, or other tobacco product is visible in the food work area.
- WGSJ0005: foot/shoe touching violation
Chinese name for output: 抠脚或接触鞋脚.
Definition: Report WGSJ0005 ONLY when there is clear visual evidence that a hand, fingers, tissue, cloth, tool, or another object is directly touching a foot, toes, sole, sock, shoe, or footwear area, and the motion is picking, scratching, rubbing, wiping, cleaning, adjusting, or handling that foot/shoe area.
Very strict rule:
- WGSJ0005 is NOT a posture detector. Do not report it from bending, squatting, standing, walking, leaning, or a hand being near the leg/foot.
- WGSJ0005 is NOT a "suspected" event. Do not output WGSJ0005 for manual_review unless the hand/object-to-foot/shoe contact is actually visible.
- If the evidence is only suspicious or ambiguous, output no WGSJ0005 event. Keep qsc_events as [] unless another violation is clearly visible.
Required positive criteria:
Output WGSJ0005 only when ALL of the following are true:
- The foot, shoe, sock, toes, sole, or footwear area is visible.
- The hand, fingers, tissue, cloth, tool, or object is visibly touching that foot/shoe area, not merely close to it.
- The contact is visible in at least two frames, or one frame is extremely clear.
- The action looks like picking, scratching, rubbing, wiping, cleaning, adjusting, or handling the foot/shoe area.
- It is not normal walking, standing, food handling, floor cleaning, picking up an item, moving equipment, or touching a table/container/apron/clothing.
Hard negative examples:
Do NOT report WGSJ0005 when any of these is true:
- A person is only standing near food, standing by a counter, or walking.
- Feet or shoes are visible but no hand/object is visibly touching them.
- A hand is at the table, food tray, oil pan, apron, waist, knee, pants, skirt, floor, trash bag, or equipment.
- A person bends or squats but the hand-foot/shoe contact cannot be clearly seen.
- The person is operating food, packing food, breading, seasoning, serving, cleaning the floor, picking up an item, or moving supplies.
- The foot/shoe area is too small, blurry, blocked, cropped, or outside the frame.
Output requirements for WGSJ0005:
- violation_type must be exactly "抠脚或接触鞋脚".
- reason must be Chinese and must explicitly say where the person is and what visible contact is seen.
- suggested_action must be "manual_review".
- confidence must be >= 0.80. If confidence would be below 0.80, do not output WGSJ0005.
- evidence_frame_count must be the number of frames where direct contact is visible.
- evidence_checklist must be exactly:
{"foot_or_shoe_area_visible": true/false, "direct_hand_or_object_contact_visible": true/false, "contact_visible_in_multiple_frames_or_extremely_clear": true/false, "foot_handling_motion_visible": true/false, "normal_activity_excluded": true/false}
Multi-frame rule:
- Do not rely on a single unclear frame.
- Judge qsc_events based on the whole clip and continuous multi-frame evidence.
- Prefer reporting a qsc_event only when the violation is visible in multiple frames, or when the visual evidence is very clear and consistent across the clip.
- If evidence is unclear, do not report the violation; keep qsc_events as [].
- For WGSJ0005, use the strictest threshold: only report it when direct hand/object-to-foot-or-shoe contact is clearly visible. If uncertain, do not report WGSJ0005.
Each qsc_events item must contain:
- violation_code: one of WGSJ0001 / WGSJ0002 / WGSJ0003 / WGSJ0004 / WGSJ0005
- violation_type: Chinese violation name
- is_violation: true
- working_state: working / non_working / unknown
- reason: concise Chinese explanation of the visible evidence
- confidence: number from 0 to 1
- evidence_frame_count: estimated number of frames supporting the event
- visible_target: concise Chinese description of the person/object involved
- evidence_checklist: for WGSJ0005 only, include {"foot_or_shoe_area_visible": true/false, "direct_hand_or_object_contact_visible": true/false, "contact_visible_in_multiple_frames_or_extremely_clear": true/false, "foot_handling_motion_visible": true/false, "normal_activity_excluded": true/false}; for other codes output {}
- suggested_action: record / warning / manual_review
Suggested action rules: WGSJ0001 and WGSJ0002 use warning; WGSJ0003 and WGSJ0004 use manual_review. WGSJ0005 uses manual_review only when direct hand/object-to-foot-or-shoe contact is clearly visible with confidence >= 0.80. If WGSJ0005 evidence is weak, suspicious, or ambiguous, do not output WGSJ0005.
### Output format
Return strict JSON only. Do not wrap in markdown. Do not add any prose before or after the JSON.
Required JSON shape:
{"Action": "Action_Idle", "quality_status": "", "error_type": "", "安全隐患": "", "人物位置": "", "总结": "无", "时间": "", "employees": [], "guests": [], "qsc_events": []}
user: >-
Analyze this multi-frame video clip. Preserve the existing action, quality, safety, people, guest, and timestamp fields. Additionally detect current-period QSC violations in qsc_events. Return strict JSON only, with all required keys.
schema:
version: local-batch-v1
event_types:
- customer_enter
- customer_leave
- queue_detected
- staff_absent
- staff_present
- area_crowded
- abnormal_behavior
- unknown
require_strict_json: true
parse_retry: 1
merge_gap_seconds: 30
runtime:
timezone: Asia/Shanghai
log_level: INFO