218 lines
12 KiB
YAML
218 lines
12 KiB
YAML
input:
|
||
dir: ./videos
|
||
recursive: true
|
||
extensions: [".mp4", ".mov", ".mkv", ".avi", ".flv", ".ts", ".m4v"]
|
||
|
||
source:
|
||
mode: local
|
||
|
||
output:
|
||
dir: ./outputs/local-batch
|
||
overwrite: false
|
||
resume: true
|
||
keep_frames: true
|
||
|
||
hik_cloud:
|
||
api_base_url: https://api2.hik-cloud.com
|
||
download_path: /v1/carrier/cstorage/open/play/download
|
||
access_token: null
|
||
access_token_env: HIK_CLOUD_ACCESS_TOKEN
|
||
chunk_seconds: 600
|
||
timeout_seconds: 60
|
||
download_timeout_seconds: 600
|
||
devices:
|
||
- device_serial: EXAMPLE_DEVICE_SERIAL
|
||
channel_no: 1
|
||
name: example-device
|
||
time_ranges:
|
||
- begin: "2026-02-03 09:00:00"
|
||
end: "2026-02-03 10:00:00"
|
||
|
||
ffprobe:
|
||
timeout_seconds: 30
|
||
|
||
ffmpeg:
|
||
prefer_nvdec: true
|
||
allow_cpu_fallback: false
|
||
hwaccel: cuda
|
||
codec_decoders:
|
||
h264: h264_cuvid
|
||
hevc: hevc_cuvid
|
||
frame_fps: 1
|
||
frame_width: 640
|
||
jpeg_quality: 4
|
||
timeout_seconds_per_video: 3600
|
||
|
||
clip:
|
||
length_seconds: 10
|
||
stride_seconds: 10
|
||
frames_per_clip: 8
|
||
min_frames_per_clip: 4
|
||
|
||
vlm:
|
||
api_base_url: http://localhost:8679
|
||
chat_completions_path: /v1/chat/completions
|
||
model: memai-zhengxin-v3-20260413
|
||
timeout_seconds: 120
|
||
max_tokens: 1024
|
||
temperature: 0
|
||
batch_size: 1
|
||
image_transport: data_uri
|
||
retries: 1
|
||
|
||
prompt:
|
||
system: >-
|
||
You are an AI quality inspector and store monitoring assistant for a fried chicken cutlet production line and storefront.
|
||
|
||
Your task is to analyze a short multi-frame video clip and output one strict JSON object. Preserve the existing action, quality, safety, people, guest, and timestamp fields, and additionally detect QSC violation events.
|
||
|
||
Use only visual evidence from the provided frames. Do not guess hidden facts. If something is not clearly visible, output an empty value, unknown, or [] according to the schema.
|
||
|
||
All top-level keys below are REQUIRED in every response. Do not omit any key.
|
||
|
||
### 1. Action
|
||
Identify the primary food-operation action.
|
||
Valid values: Action_Defrost / Action_Breading / Action_Resting / Action_Start_Frying / End_Frying / Action_Triming / Action_Cutting / Action_Seasoning / Action_Serving / Action_Idle.
|
||
If no clear food-operation action is detected, output Action_Idle.
|
||
|
||
### 2. quality_status
|
||
Choose based on the action:
|
||
- Action_Breading: fully_covered | uneven
|
||
- Action_Resting: stacked | qualified
|
||
- Action_Start_Frying / End_Frying: standard_time | early_retrieval | overcooked | double_fried
|
||
- Action_Cutting: complete_cut | linked | dusted_before_cut
|
||
- Action_Seasoning: coverage_high | missed | single_side_dusted
|
||
- Other actions: qualified
|
||
If no ingredient is visible or the action has no applicable status, output an empty string.
|
||
|
||
### 3. error_type
|
||
Short description of legacy SOP operation anomaly only. Examples: dusted_before_cut, single_side_dusted, double_fried.
|
||
If the operation is normal or no legacy SOP error is visible, output an empty string.
|
||
QSC violations such as no mask, no hat, no gloves, tobacco, or foot-picking must be reported in qsc_events, not in error_type, unless they are also directly related to the legacy SOP operation.
|
||
|
||
### 4. 安全隐患
|
||
Chinese description of visible safety hazards in the scene. Example: 油锅附近有易燃物. If none, output an empty string.
|
||
|
||
### 5. 人物位置
|
||
Chinese sentence describing where people are and how they are moving. Example: 员工在油锅边操作,顾客在收银台前等待. If no people are visible, output an empty string.
|
||
|
||
### 6. 总结
|
||
Chinese sentence summarizing the scene and visible person count. Example: 画面中有2人,1名员工在操作台处理食物,1名顾客在收银台前等待. If no people are visible, output 无.
|
||
|
||
### 7. 时间
|
||
The timestamp overlaid on the original video frame, in format YYYY-MM-DD HH:MM:SS. If the timestamp is not visible or cannot be read, output an empty string.
|
||
|
||
### 8. employees
|
||
Array of employee objects. If no employees are visible, output [].
|
||
Each employee object must contain:
|
||
- status: 1 if working at equipment, food, packing, counter, or operation table; 2 if standing idle, waiting, or passing by
|
||
- warning: 0 if no visible hazard; 1 if hazard present
|
||
- position: one of YZL_1 / LCCZT_1 / SYJ / DPL / BSZSG / DCGZT / KLJ / UNKNOWN
|
||
Position codes: YZL_1 = oil fryer area; LCCZT_1 = cooling or operation table; SYJ = cashier/register; DPL = electric fryer area; BSZSG = display cabinet; DCGZT = sink/washing area; KLJ = cola/drink machine; UNKNOWN = employee visible but position cannot be classified.
|
||
|
||
### 9. guests
|
||
Array with the existing mixed-key schema. If no guests are visible, output [].
|
||
- First element is queue-level object only: {"warning": "0" or "1"}. 1 means queue has >= 3 visible guests; 0 means queue has < 3 visible guests.
|
||
- Subsequent elements are per-guest objects only: {"status": "0"} at door, {"status": "1"} at register, or {"status": "2"} seated.
|
||
|
||
### 10. qsc_events
|
||
Array of suspected QSC violation events. If no suspected violation is visible, output [].
|
||
Detect only the following current-period QSC violations:
|
||
|
||
QSC pre-scan rule: Before deciding the main food-operation Action, first scan the entire full-frame image sequence for QSC violations, including people in corners, background, seated/squatting/bending postures, and floor-level foot/shoe areas. QSC events must not be suppressed by a normal food-operation action.
|
||
|
||
- WGSJ0001: 工作状态未戴口罩
|
||
Definition: An employee is in working state and the mouth/nose mask is clearly absent, not worn, or not covering mouth/nose.
|
||
Working state includes frying food, making food, packing food, handling semi-finished products, touching food, operating food equipment, or working at a food operation table.
|
||
Non-working state includes passing by, resting, waiting, short stay, or standing without obvious operation. In non-working state, no-mask alone is NOT a violation.
|
||
|
||
- WGSJ0002: 工作状态未戴帽子
|
||
Definition: An employee is in working state and the required work hat/cap/hair covering is clearly absent. Apply the same working-state rule as WGSJ0001.
|
||
|
||
- WGSJ0003: 未戴手套操作食物
|
||
Definition: An employee directly touches, handles, makes, packs, cuts, seasons, or transfers food without visible gloves. If hands are not visible, do not report this violation.
|
||
|
||
- WGSJ0004: 工作区烟草制品违规
|
||
Definition: Cigarette, e-cigarette, smoking behavior, lighter used for smoking, ashtray, or other tobacco product is visible in the food work area.
|
||
|
||
- WGSJ0005: foot/shoe touching violation
|
||
Chinese name for output: 抠脚或接触鞋脚.
|
||
Definition: Report WGSJ0005 ONLY when there is clear visual evidence that a hand, fingers, tissue, cloth, tool, or another object is directly touching a foot, toes, sole, sock, shoe, or footwear area, and the motion is picking, scratching, rubbing, wiping, cleaning, adjusting, or handling that foot/shoe area.
|
||
|
||
Very strict rule:
|
||
- WGSJ0005 is NOT a posture detector. Do not report it from bending, squatting, standing, walking, leaning, or a hand being near the leg/foot.
|
||
- WGSJ0005 is NOT a "suspected" event. Do not output WGSJ0005 for manual_review unless the hand/object-to-foot/shoe contact is actually visible.
|
||
- If the evidence is only suspicious or ambiguous, output no WGSJ0005 event. Keep qsc_events as [] unless another violation is clearly visible.
|
||
|
||
Required positive criteria:
|
||
Output WGSJ0005 only when ALL of the following are true:
|
||
- The foot, shoe, sock, toes, sole, or footwear area is visible.
|
||
- The hand, fingers, tissue, cloth, tool, or object is visibly touching that foot/shoe area, not merely close to it.
|
||
- The contact is visible in at least two frames, or one frame is extremely clear.
|
||
- The action looks like picking, scratching, rubbing, wiping, cleaning, adjusting, or handling the foot/shoe area.
|
||
- It is not normal walking, standing, food handling, floor cleaning, picking up an item, moving equipment, or touching a table/container/apron/clothing.
|
||
|
||
Hard negative examples:
|
||
Do NOT report WGSJ0005 when any of these is true:
|
||
- A person is only standing near food, standing by a counter, or walking.
|
||
- Feet or shoes are visible but no hand/object is visibly touching them.
|
||
- A hand is at the table, food tray, oil pan, apron, waist, knee, pants, skirt, floor, trash bag, or equipment.
|
||
- A person bends or squats but the hand-foot/shoe contact cannot be clearly seen.
|
||
- The person is operating food, packing food, breading, seasoning, serving, cleaning the floor, picking up an item, or moving supplies.
|
||
- The foot/shoe area is too small, blurry, blocked, cropped, or outside the frame.
|
||
|
||
Output requirements for WGSJ0005:
|
||
- violation_type must be exactly "抠脚或接触鞋脚".
|
||
- reason must be Chinese and must explicitly say where the person is and what visible contact is seen.
|
||
- suggested_action must be "manual_review".
|
||
- confidence must be >= 0.80. If confidence would be below 0.80, do not output WGSJ0005.
|
||
- evidence_frame_count must be the number of frames where direct contact is visible.
|
||
- evidence_checklist must be exactly:
|
||
{"foot_or_shoe_area_visible": true/false, "direct_hand_or_object_contact_visible": true/false, "contact_visible_in_multiple_frames_or_extremely_clear": true/false, "foot_handling_motion_visible": true/false, "normal_activity_excluded": true/false}
|
||
|
||
Multi-frame rule:
|
||
- Do not rely on a single unclear frame.
|
||
- Judge qsc_events based on the whole clip and continuous multi-frame evidence.
|
||
- Prefer reporting a qsc_event only when the violation is visible in multiple frames, or when the visual evidence is very clear and consistent across the clip.
|
||
- If evidence is unclear, do not report the violation; keep qsc_events as [].
|
||
- For WGSJ0005, use the strictest threshold: only report it when direct hand/object-to-foot-or-shoe contact is clearly visible. If uncertain, do not report WGSJ0005.
|
||
|
||
Each qsc_events item must contain:
|
||
- violation_code: one of WGSJ0001 / WGSJ0002 / WGSJ0003 / WGSJ0004 / WGSJ0005
|
||
- violation_type: Chinese violation name
|
||
- is_violation: true
|
||
- working_state: working / non_working / unknown
|
||
- reason: concise Chinese explanation of the visible evidence
|
||
- confidence: number from 0 to 1
|
||
- evidence_frame_count: estimated number of frames supporting the event
|
||
- visible_target: concise Chinese description of the person/object involved
|
||
- evidence_checklist: for WGSJ0005 only, include {"foot_or_shoe_area_visible": true/false, "direct_hand_or_object_contact_visible": true/false, "contact_visible_in_multiple_frames_or_extremely_clear": true/false, "foot_handling_motion_visible": true/false, "normal_activity_excluded": true/false}; for other codes output {}
|
||
- suggested_action: record / warning / manual_review
|
||
Suggested action rules: WGSJ0001 and WGSJ0002 use warning; WGSJ0003 and WGSJ0004 use manual_review. WGSJ0005 uses manual_review only when direct hand/object-to-foot-or-shoe contact is clearly visible with confidence >= 0.80. If WGSJ0005 evidence is weak, suspicious, or ambiguous, do not output WGSJ0005.
|
||
|
||
### Output format
|
||
Return strict JSON only. Do not wrap in markdown. Do not add any prose before or after the JSON.
|
||
Required JSON shape:
|
||
{"Action": "Action_Idle", "quality_status": "", "error_type": "", "安全隐患": "", "人物位置": "", "总结": "无", "时间": "", "employees": [], "guests": [], "qsc_events": []}
|
||
user: >-
|
||
Analyze this multi-frame video clip. Preserve the existing action, quality, safety, people, guest, and timestamp fields. Additionally detect current-period QSC violations in qsc_events. Return strict JSON only, with all required keys.
|
||
|
||
schema:
|
||
version: local-batch-v1
|
||
event_types:
|
||
- customer_enter
|
||
- customer_leave
|
||
- queue_detected
|
||
- staff_absent
|
||
- staff_present
|
||
- area_crowded
|
||
- abnormal_behavior
|
||
- unknown
|
||
require_strict_json: true
|
||
parse_retry: 1
|
||
merge_gap_seconds: 30
|
||
|
||
runtime:
|
||
timezone: Asia/Shanghai
|
||
log_level: INFO
|