Add QSC prompt and phase timings
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -1,5 +1,6 @@
|
|||||||
# Secrets and local credentials
|
# Secrets and local credentials
|
||||||
access_token.md
|
access_token.md
|
||||||
|
config.yaml
|
||||||
.env
|
.env
|
||||||
.env.*
|
.env.*
|
||||||
*.pem
|
*.pem
|
||||||
|
|||||||
@@ -54,7 +54,7 @@ vlm:
|
|||||||
chat_completions_path: /v1/chat/completions
|
chat_completions_path: /v1/chat/completions
|
||||||
model: memai-zhengxin-v3-20260413
|
model: memai-zhengxin-v3-20260413
|
||||||
timeout_seconds: 120
|
timeout_seconds: 120
|
||||||
max_tokens: 512
|
max_tokens: 1024
|
||||||
temperature: 0
|
temperature: 0
|
||||||
batch_size: 1
|
batch_size: 1
|
||||||
image_transport: data_uri
|
image_transport: data_uri
|
||||||
@@ -62,96 +62,140 @@ vlm:
|
|||||||
|
|
||||||
prompt:
|
prompt:
|
||||||
system: >-
|
system: >-
|
||||||
You are an AI quality inspector and store monitoring assistant for a fried chicken cutlet (鸡排) production line and storefront.
|
You are an AI quality inspector and store monitoring assistant for a fried chicken cutlet production line and storefront.
|
||||||
Your task is to analyze a short video clip and output a structured JSON describing actions, quality statuses, errors, safety hazards, personnel (employees/guests), and the frame timestamp.
|
|
||||||
|
|
||||||
|
Your task is to analyze a short multi-frame video clip and output one strict JSON object. Preserve the existing action, quality, safety, people, guest, and timestamp fields, and additionally detect QSC violation events.
|
||||||
|
|
||||||
All 9 top-level keys below are REQUIRED in every response. Use the specified empty-value convention when a field does not apply — never omit a key.
|
Use only visual evidence from the provided frames. Do not guess hidden facts. If something is not clearly visible, output an empty value, unknown, or [] according to the schema.
|
||||||
|
|
||||||
|
All top-level keys below are REQUIRED in every response. Do not omit any key.
|
||||||
|
|
||||||
### 1. Action (REQUIRED)
|
### 1. Action
|
||||||
|
Identify the primary food-operation action.
|
||||||
Identify the primary action. Use the "Action_" prefix on every label except End_Frying. If no action is detected, output "Action_Idle".
|
|
||||||
|
|
||||||
Valid values: Action_Defrost / Action_Breading / Action_Resting / Action_Start_Frying / End_Frying / Action_Triming / Action_Cutting / Action_Seasoning / Action_Serving / Action_Idle.
|
Valid values: Action_Defrost / Action_Breading / Action_Resting / Action_Start_Frying / End_Frying / Action_Triming / Action_Cutting / Action_Seasoning / Action_Serving / Action_Idle.
|
||||||
|
If no clear food-operation action is detected, output Action_Idle.
|
||||||
|
|
||||||
|
### 2. quality_status
|
||||||
### 2. quality_status (REQUIRED — "" if not applicable)
|
|
||||||
|
|
||||||
Choose based on the action:
|
Choose based on the action:
|
||||||
|
- Action_Breading: fully_covered | uneven
|
||||||
|
- Action_Resting: stacked | qualified
|
||||||
|
- Action_Start_Frying / End_Frying: standard_time | early_retrieval | overcooked | double_fried
|
||||||
|
- Action_Cutting: complete_cut | linked | dusted_before_cut
|
||||||
|
- Action_Seasoning: coverage_high | missed | single_side_dusted
|
||||||
|
- Other actions: qualified
|
||||||
|
If no ingredient is visible or the action has no applicable status, output an empty string.
|
||||||
|
|
||||||
- Action_Breading → fully_covered | uneven
|
### 3. error_type
|
||||||
|
Short description of legacy SOP operation anomaly only. Examples: dusted_before_cut, single_side_dusted, double_fried.
|
||||||
|
If the operation is normal or no legacy SOP error is visible, output an empty string.
|
||||||
|
QSC violations such as no mask, no hat, no gloves, tobacco, or foot-picking must be reported in qsc_events, not in error_type, unless they are also directly related to the legacy SOP operation.
|
||||||
|
|
||||||
- Action_Resting → stacked | qualified
|
### 4. 安全隐患
|
||||||
|
Chinese description of visible safety hazards in the scene. Example: 油锅附近有易燃物. If none, output an empty string.
|
||||||
|
|
||||||
- Action_Start_Frying / End_Frying → standard_time | early_retrieval | overcooked | double_fried
|
### 5. 人物位置
|
||||||
|
Chinese sentence describing where people are and how they are moving. Example: 员工在油锅边操作,顾客在收银台前等待. If no people are visible, output an empty string.
|
||||||
|
|
||||||
- Action_Cutting → complete_cut | linked | dusted_before_cut
|
### 6. 总结
|
||||||
|
Chinese sentence summarizing the scene and visible person count. Example: 画面中有2人,1名员工在操作台处理食物,1名顾客在收银台前等待. If no people are visible, output 无.
|
||||||
|
|
||||||
- Action_Seasoning → coverage_high | missed | single_side_dusted
|
### 7. 时间
|
||||||
|
The timestamp overlaid on the original video frame, in format YYYY-MM-DD HH:MM:SS. If the timestamp is not visible or cannot be read, output an empty string.
|
||||||
|
|
||||||
- Other actions → qualified
|
### 8. employees
|
||||||
|
Array of employee objects. If no employees are visible, output [].
|
||||||
|
Each employee object must contain:
|
||||||
|
- status: 1 if working at equipment, food, packing, counter, or operation table; 2 if standing idle, waiting, or passing by
|
||||||
|
- warning: 0 if no visible hazard; 1 if hazard present
|
||||||
|
- position: one of YZL_1 / LCCZT_1 / SYJ / DPL / BSZSG / DCGZT / KLJ / UNKNOWN
|
||||||
|
Position codes: YZL_1 = oil fryer area; LCCZT_1 = cooling or operation table; SYJ = cashier/register; DPL = electric fryer area; BSZSG = display cabinet; DCGZT = sink/washing area; KLJ = cola/drink machine; UNKNOWN = employee visible but position cannot be classified.
|
||||||
|
|
||||||
If no ingredient is visible or the action has no applicable status, output "".
|
### 9. guests
|
||||||
|
Array with the existing mixed-key schema. If no guests are visible, output [].
|
||||||
|
- First element is queue-level object only: {"warning": "0" or "1"}. 1 means queue has >= 3 visible guests; 0 means queue has < 3 visible guests.
|
||||||
|
- Subsequent elements are per-guest objects only: {"status": "0"} at door, {"status": "1"} at register, or {"status": "2"} seated.
|
||||||
|
|
||||||
|
### 10. qsc_events
|
||||||
|
Array of suspected QSC violation events. If no suspected violation is visible, output [].
|
||||||
|
Detect only the following current-period QSC violations:
|
||||||
|
|
||||||
### 3. error_type (REQUIRED — "" if no error)
|
QSC pre-scan rule: Before deciding the main food-operation Action, first scan the entire full-frame image sequence for QSC violations, including people in corners, background, seated/squatting/bending postures, and floor-level foot/shoe areas. QSC events must not be suppressed by a normal food-operation action.
|
||||||
|
|
||||||
Short description of any anomaly. Examples: "smoking", "dusted_before_cut", "single_side_dusted", "double_fried". If the operation is normal, output "".
|
- WGSJ0001: 工作状态未戴口罩
|
||||||
|
Definition: An employee is in working state and the mouth/nose mask is clearly absent, not worn, or not covering mouth/nose.
|
||||||
|
Working state includes frying food, making food, packing food, handling semi-finished products, touching food, operating food equipment, or working at a food operation table.
|
||||||
|
Non-working state includes passing by, resting, waiting, short stay, or standing without obvious operation. In non-working state, no-mask alone is NOT a violation.
|
||||||
|
|
||||||
|
- WGSJ0002: 工作状态未戴帽子
|
||||||
|
Definition: An employee is in working state and the required work hat/cap/hair covering is clearly absent. Apply the same working-state rule as WGSJ0001.
|
||||||
|
|
||||||
### 4. 安全隐患 (REQUIRED — "" if no hazard)
|
- WGSJ0003: 未戴手套操作食物
|
||||||
|
Definition: An employee directly touches, handles, makes, packs, cuts, seasons, or transfers food without visible gloves. If hands are not visible, do not report this violation.
|
||||||
|
|
||||||
Chinese description of any safety hazard visible in the scene (e.g., "油锅附近有易燃物"). If none, output "".
|
- WGSJ0004: 工作区烟草制品违规
|
||||||
|
Definition: Cigarette, e-cigarette, smoking behavior, lighter used for smoking, ashtray, or other tobacco product is visible in the food work area.
|
||||||
|
|
||||||
|
- WGSJ0005: foot/shoe touching violation
|
||||||
|
Chinese name for output: 抠脚或接触鞋脚.
|
||||||
|
Definition: Report WGSJ0005 ONLY when there is clear visual evidence that a hand, fingers, tissue, cloth, tool, or another object is directly touching a foot, toes, sole, sock, shoe, or footwear area, and the motion is picking, scratching, rubbing, wiping, cleaning, adjusting, or handling that foot/shoe area.
|
||||||
|
|
||||||
### 5. 人物位置 (REQUIRED — "" if no people)
|
Very strict rule:
|
||||||
|
- WGSJ0005 is NOT a posture detector. Do not report it from bending, squatting, standing, walking, leaning, or a hand being near the leg/foot.
|
||||||
|
- WGSJ0005 is NOT a "suspected" event. Do not output WGSJ0005 for manual_review unless the hand/object-to-foot/shoe contact is actually visible.
|
||||||
|
- If the evidence is only suspicious or ambiguous, output no WGSJ0005 event. Keep qsc_events as [] unless another violation is clearly visible.
|
||||||
|
|
||||||
Descriptive Chinese sentence of where people are and how they are moving. Example: "员工在油锅边". If no one is in the frame, output "".
|
Required positive criteria:
|
||||||
|
Output WGSJ0005 only when ALL of the following are true:
|
||||||
|
- The foot, shoe, sock, toes, sole, or footwear area is visible.
|
||||||
|
- The hand, fingers, tissue, cloth, tool, or object is visibly touching that foot/shoe area, not merely close to it.
|
||||||
|
- The contact is visible in at least two frames, or one frame is extremely clear.
|
||||||
|
- The action looks like picking, scratching, rubbing, wiping, cleaning, adjusting, or handling the foot/shoe area.
|
||||||
|
- It is not normal walking, standing, food handling, floor cleaning, picking up an item, moving equipment, or touching a table/container/apron/clothing.
|
||||||
|
|
||||||
|
Hard negative examples:
|
||||||
|
Do NOT report WGSJ0005 when any of these is true:
|
||||||
|
- A person is only standing near food, standing by a counter, or walking.
|
||||||
|
- Feet or shoes are visible but no hand/object is visibly touching them.
|
||||||
|
- A hand is at the table, food tray, oil pan, apron, waist, knee, pants, skirt, floor, trash bag, or equipment.
|
||||||
|
- A person bends or squats but the hand-foot/shoe contact cannot be clearly seen.
|
||||||
|
- The person is operating food, packing food, breading, seasoning, serving, cleaning the floor, picking up an item, or moving supplies.
|
||||||
|
- The foot/shoe area is too small, blurry, blocked, cropped, or outside the frame.
|
||||||
|
|
||||||
### 6. 总结 (REQUIRED — "无" if no people)
|
Output requirements for WGSJ0005:
|
||||||
|
- violation_type must be exactly "抠脚或接触鞋脚".
|
||||||
|
- reason must be Chinese and must explicitly say where the person is and what visible contact is seen.
|
||||||
|
- suggested_action must be "manual_review".
|
||||||
|
- confidence must be >= 0.80. If confidence would be below 0.80, do not output WGSJ0005.
|
||||||
|
- evidence_frame_count must be the number of frames where direct contact is visible.
|
||||||
|
- evidence_checklist must be exactly:
|
||||||
|
{"foot_or_shoe_area_visible": true/false, "direct_hand_or_object_contact_visible": true/false, "contact_visible_in_multiple_frames_or_extremely_clear": true/false, "foot_handling_motion_visible": true/false, "normal_activity_excluded": true/false}
|
||||||
|
|
||||||
Descriptive Chinese sentence summarizing the scene with the exact person count. Example: "员工在油锅边炸鸡,顾客在收银台前等待". If no one is in the frame, output "无".
|
Multi-frame rule:
|
||||||
|
- Do not rely on a single unclear frame.
|
||||||
|
- Judge qsc_events based on the whole clip and continuous multi-frame evidence.
|
||||||
|
- Prefer reporting a qsc_event only when the violation is visible in multiple frames, or when the visual evidence is very clear and consistent across the clip.
|
||||||
|
- If evidence is unclear, do not report the violation; keep qsc_events as [].
|
||||||
|
- For WGSJ0005, use the strictest threshold: only report it when direct hand/object-to-foot-or-shoe contact is clearly visible. If uncertain, do not report WGSJ0005.
|
||||||
|
|
||||||
|
Each qsc_events item must contain:
|
||||||
|
- violation_code: one of WGSJ0001 / WGSJ0002 / WGSJ0003 / WGSJ0004 / WGSJ0005
|
||||||
|
- violation_type: Chinese violation name
|
||||||
|
- is_violation: true
|
||||||
|
- working_state: working / non_working / unknown
|
||||||
|
- reason: concise Chinese explanation of the visible evidence
|
||||||
|
- confidence: number from 0 to 1
|
||||||
|
- evidence_frame_count: estimated number of frames supporting the event
|
||||||
|
- visible_target: concise Chinese description of the person/object involved
|
||||||
|
- evidence_checklist: for WGSJ0005 only, include {"foot_or_shoe_area_visible": true/false, "direct_hand_or_object_contact_visible": true/false, "contact_visible_in_multiple_frames_or_extremely_clear": true/false, "foot_handling_motion_visible": true/false, "normal_activity_excluded": true/false}; for other codes output {}
|
||||||
|
- suggested_action: record / warning / manual_review
|
||||||
|
Suggested action rules: WGSJ0001 and WGSJ0002 use warning; WGSJ0003 and WGSJ0004 use manual_review. WGSJ0005 uses manual_review only when direct hand/object-to-foot-or-shoe contact is clearly visible with confidence >= 0.80. If WGSJ0005 evidence is weak, suspicious, or ambiguous, do not output WGSJ0005.
|
||||||
|
|
||||||
### 7. 时间 (REQUIRED — "" if unreadable)
|
### Output format
|
||||||
|
Return strict JSON only. Do not wrap in markdown. Do not add any prose before or after the JSON.
|
||||||
The timestamp overlaid on the original video frame, in format "YYYY-MM-DD HH:MM:SS". If the timestamp is not visible or cannot be read, output "".
|
Required JSON shape:
|
||||||
|
{"Action": "Action_Idle", "quality_status": "", "error_type": "", "安全隐患": "", "人物位置": "", "总结": "无", "时间": "", "employees": [], "guests": [], "qsc_events": []}
|
||||||
|
user: >-
|
||||||
### 8. employees (REQUIRED — [] if none)
|
Analyze this multi-frame video clip. Preserve the existing action, quality, safety, people, guest, and timestamp fields. Additionally detect current-period QSC violations in qsc_events. Return strict JSON only, with all required keys.
|
||||||
|
|
||||||
Array of employee objects. Each object has ALL three keys:
|
|
||||||
|
|
||||||
- status: "1" (working at equipment) or "2" (standing idle)
|
|
||||||
|
|
||||||
- warning: "0" (no hazard) or "1" (hazard present)
|
|
||||||
|
|
||||||
- position: one of YZL_1 (油锅边), LCCZT_1 (平冷操作台边), SYJ (收银机边), DPL (电扒炉旁), BSZSG (展示柜边), DCGZT (水池边), KLJ (可乐机边).
|
|
||||||
|
|
||||||
If no employees are in the frame, output [].
|
|
||||||
|
|
||||||
|
|
||||||
### 9. guests (REQUIRED — [] if none, MIXED-KEY SCHEMA)
|
|
||||||
|
|
||||||
Array with a specific mixed-key convention:
|
|
||||||
|
|
||||||
- The FIRST element is a queue-level object with ONLY a "warning" key: {"warning": "0" or "1"}. "1" means the queue has ≥ 3 people; "0" means < 3.
|
|
||||||
|
|
||||||
- Subsequent elements are per-guest objects with ONLY a "status" key: {"status": "0"} (at door) or {"status": "1"} (at register) or {"status": "2"} (seated). One such object per visible guest.
|
|
||||||
|
|
||||||
If there are no guests at all, output []. If only the queue header is known, output [{"warning": "0 or 1"}].
|
|
||||||
|
|
||||||
Example: [{"warning": "0"}, {"status": "1"}, {"status": "2"}]
|
|
||||||
|
|
||||||
|
|
||||||
### Output format (strict JSON, all 9 keys REQUIRED)
|
|
||||||
|
|
||||||
{"Action": "<Action_Type>", "quality_status": "<status or empty>", "error_type": "<error or empty>", "安全隐患": "<hazard or empty>", "人物位置": "<location or empty>", "总结": "<summary or 无>", "时间": "<YYYY-MM-DD HH:MM:SS or empty>", "employees": [{"status": "<1 or 2>", "warning": "<0 or 1>", "position": "<code>"}], "guests": [{"warning": "<0 or 1>"}, {"status": "<0, 1, or 2>"}]}
|
|
||||||
|
|
||||||
Do not wrap the JSON in markdown fences. Do not add any prose before or after the JSON.
|
|
||||||
user: 'Analyze the video clip and return the required JSON with all 9 keys. Read the timestamp from the frame overlay into "时间".'
|
|
||||||
|
|
||||||
schema:
|
schema:
|
||||||
version: local-batch-v1
|
version: local-batch-v1
|
||||||
|
|||||||
@@ -1269,6 +1269,20 @@ class CliTests(unittest.TestCase):
|
|||||||
self.assertEqual(folder_summary["processed_video_count"], 1)
|
self.assertEqual(folder_summary["processed_video_count"], 1)
|
||||||
self.assertEqual(folder_summary["failed_video_count"], 0)
|
self.assertEqual(folder_summary["failed_video_count"], 0)
|
||||||
self.assertEqual(folder_summary["event_counts"], {"queue_detected": 1})
|
self.assertEqual(folder_summary["event_counts"], {"queue_detected": 1})
|
||||||
|
phase_timings = json.loads(
|
||||||
|
(output_dir / "phase_timings.json").read_text(encoding="utf-8")
|
||||||
|
)
|
||||||
|
self.assertEqual(phase_timings["schema_version"], "phase-timings-v1")
|
||||||
|
for phase in (
|
||||||
|
"source_acquisition_seconds",
|
||||||
|
"video_probe_seconds",
|
||||||
|
"frame_sampling_seconds",
|
||||||
|
"clip_generation_seconds",
|
||||||
|
"inference_seconds",
|
||||||
|
"aggregation_seconds",
|
||||||
|
):
|
||||||
|
self.assertIn(phase, phase_timings["phases"])
|
||||||
|
self.assertGreaterEqual(phase_timings["phases"][phase], 0)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -104,6 +104,41 @@ class ResultParserTests(unittest.TestCase):
|
|||||||
"2026-06-14 12:31:20",
|
"2026-06-14 12:31:20",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
def test_build_clip_result_preserves_qsc_events(self):
|
||||||
|
result = build_clip_result(
|
||||||
|
(
|
||||||
|
'{"Action":"Action_Idle","quality_status":"","error_type":"",'
|
||||||
|
'"安全隐患":"","人物位置":"员工在操作台边","总结":"画面中有1人",'
|
||||||
|
'"时间":"2026-06-16 05:00:03","employees":[],"guests":[],'
|
||||||
|
'"qsc_events":[{"violation_code":"WGSJ0001",'
|
||||||
|
'"violation_type":"工作状态未戴口罩","is_violation":true,'
|
||||||
|
'"working_state":"working","reason":"员工在操作台处理食物时未见口罩",'
|
||||||
|
'"confidence":0.92,"evidence_frame_count":3,'
|
||||||
|
'"visible_target":"操作台边员工","evidence_checklist":{},'
|
||||||
|
'"suggested_action":"warning"}]}'
|
||||||
|
),
|
||||||
|
{
|
||||||
|
"video_id": "video-abc",
|
||||||
|
"clip_id": "video-abc_c000001",
|
||||||
|
"clip_start_seconds": 0.0,
|
||||||
|
"clip_end_seconds": 10.0,
|
||||||
|
"clip_start_timecode": "00:00:00",
|
||||||
|
"clip_end_timecode": "00:00:10",
|
||||||
|
"frame_times": [],
|
||||||
|
},
|
||||||
|
{"path": "/videos/a.mp4"},
|
||||||
|
{
|
||||||
|
"schema": {"version": "local-batch-v1"},
|
||||||
|
"runtime": {"timezone": "Asia/Shanghai"},
|
||||||
|
},
|
||||||
|
processing={},
|
||||||
|
)
|
||||||
|
|
||||||
|
self.assertEqual(result["status"], "ok")
|
||||||
|
self.assertEqual(len(result["qsc_events"]), 1)
|
||||||
|
self.assertEqual(result["qsc_events"][0]["violation_code"], "WGSJ0001")
|
||||||
|
self.assertEqual(result["qsc_events"][0]["suggested_action"], "warning")
|
||||||
|
|
||||||
def test_build_clip_result_records_parse_failure_without_crashing(self):
|
def test_build_clip_result_records_parse_failure_without_crashing(self):
|
||||||
result = build_clip_result(
|
result = build_clip_result(
|
||||||
"not json",
|
"not json",
|
||||||
@@ -126,6 +161,7 @@ class ResultParserTests(unittest.TestCase):
|
|||||||
|
|
||||||
self.assertEqual(result["status"], "parse_failed")
|
self.assertEqual(result["status"], "parse_failed")
|
||||||
self.assertEqual(result["events"], [])
|
self.assertEqual(result["events"], [])
|
||||||
|
self.assertEqual(result["qsc_events"], [])
|
||||||
self.assertEqual(result["monitoring_timeline"]["screen_time"], "")
|
self.assertEqual(result["monitoring_timeline"]["screen_time"], "")
|
||||||
self.assertEqual(result["raw_response"], "not json")
|
self.assertEqual(result["raw_response"], "not json")
|
||||||
self.assertIn("JSON", result["error"])
|
self.assertIn("JSON", result["error"])
|
||||||
|
|||||||
@@ -1,9 +1,12 @@
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
|
from contextlib import contextmanager
|
||||||
import json
|
import json
|
||||||
|
import time
|
||||||
|
from datetime import datetime, timezone
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Sequence
|
from typing import Callable, Iterator, Sequence, TypeVar
|
||||||
|
|
||||||
from .aggregator import aggregate_outputs
|
from .aggregator import aggregate_outputs
|
||||||
from .clips import build_clip_records
|
from .clips import build_clip_records
|
||||||
@@ -18,6 +21,64 @@ from .result_parser import build_clip_result
|
|||||||
from .timeline import DEFAULT_TIMEZONE, format_beijing_time, timeline_start_epoch
|
from .timeline import DEFAULT_TIMEZONE, format_beijing_time, timeline_start_epoch
|
||||||
from .vlm_client import infer_clip
|
from .vlm_client import infer_clip
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
def _new_phase_timings() -> dict[str, object]:
|
||||||
|
return {
|
||||||
|
"schema_version": "phase-timings-v1",
|
||||||
|
"started_at": _utc_now_iso(),
|
||||||
|
"updated_at": _utc_now_iso(),
|
||||||
|
"phases": {},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _write_phase_timings(
|
||||||
|
output_dir: Path,
|
||||||
|
phase_timings: dict[str, object],
|
||||||
|
) -> None:
|
||||||
|
phase_timings["updated_at"] = _utc_now_iso()
|
||||||
|
(output_dir / "phase_timings.json").write_text(
|
||||||
|
json.dumps(phase_timings, ensure_ascii=False, sort_keys=True, indent=2) + "\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _measure_phase(
|
||||||
|
phase_timings: dict[str, object] | None,
|
||||||
|
phase_name: str,
|
||||||
|
func: Callable[[], T],
|
||||||
|
) -> T:
|
||||||
|
with _timed_phase(phase_timings, phase_name):
|
||||||
|
return func()
|
||||||
|
|
||||||
|
|
||||||
|
@contextmanager
|
||||||
|
def _timed_phase(
|
||||||
|
phase_timings: dict[str, object] | None,
|
||||||
|
phase_name: str,
|
||||||
|
) -> Iterator[None]:
|
||||||
|
started = time.perf_counter()
|
||||||
|
try:
|
||||||
|
yield
|
||||||
|
finally:
|
||||||
|
if phase_timings is not None:
|
||||||
|
phases = phase_timings.get("phases")
|
||||||
|
if not isinstance(phases, dict):
|
||||||
|
phases = {}
|
||||||
|
phase_timings["phases"] = phases
|
||||||
|
previous = phases.get(phase_name, 0)
|
||||||
|
if not isinstance(previous, (int, float)):
|
||||||
|
previous = 0
|
||||||
|
phases[phase_name] = round(
|
||||||
|
float(previous) + time.perf_counter() - started,
|
||||||
|
6,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _utc_now_iso() -> str:
|
||||||
|
return datetime.now(timezone.utc).isoformat()
|
||||||
|
|
||||||
|
|
||||||
def main(argv: Sequence[str] | None = None) -> int:
|
def main(argv: Sequence[str] | None = None) -> int:
|
||||||
parser = argparse.ArgumentParser(
|
parser = argparse.ArgumentParser(
|
||||||
@@ -43,6 +104,7 @@ def main(argv: Sequence[str] | None = None) -> int:
|
|||||||
|
|
||||||
output_dir = Path(config["output"]["dir"])
|
output_dir = Path(config["output"]["dir"])
|
||||||
output_dir.mkdir(parents=True, exist_ok=True)
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
phase_timings = _new_phase_timings()
|
||||||
|
|
||||||
video_manifest_path = output_dir / "video_manifest.jsonl"
|
video_manifest_path = output_dir / "video_manifest.jsonl"
|
||||||
resume_enabled = bool(config.get("output", {}).get("resume", False))
|
resume_enabled = bool(config.get("output", {}).get("resume", False))
|
||||||
@@ -63,11 +125,13 @@ def main(argv: Sequence[str] | None = None) -> int:
|
|||||||
records,
|
records,
|
||||||
record_indexes,
|
record_indexes,
|
||||||
download_source=not args.dry_run,
|
download_source=not args.dry_run,
|
||||||
|
phase_timings=phase_timings,
|
||||||
)
|
)
|
||||||
except ValueError as exc:
|
except ValueError as exc:
|
||||||
parser.error(str(exc))
|
parser.error(str(exc))
|
||||||
|
|
||||||
write_manifest(video_manifest_path, records)
|
write_manifest(video_manifest_path, records)
|
||||||
|
_write_phase_timings(output_dir, phase_timings)
|
||||||
if args.dry_run:
|
if args.dry_run:
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
@@ -93,27 +157,29 @@ def main(argv: Sequence[str] | None = None) -> int:
|
|||||||
if record.get("status") == "sampled" and record.get("video_id")
|
if record.get("status") == "sampled" and record.get("video_id")
|
||||||
}
|
}
|
||||||
changed_frame_video_ids: set[str] = set(backfilled_frame_video_ids)
|
changed_frame_video_ids: set[str] = set(backfilled_frame_video_ids)
|
||||||
for record in records:
|
with _timed_phase(phase_timings, "frame_sampling_seconds"):
|
||||||
if record.get("status") != "probed":
|
for record in records:
|
||||||
continue
|
if record.get("status") != "probed":
|
||||||
video_id = str(record.get("video_id"))
|
continue
|
||||||
if args.until == "inference" and video_id in existing_clip_video_ids:
|
video_id = str(record.get("video_id"))
|
||||||
continue
|
if args.until == "inference" and video_id in existing_clip_video_ids:
|
||||||
if video_id in existing_sampled_video_ids:
|
continue
|
||||||
continue
|
if video_id in existing_sampled_video_ids:
|
||||||
frame_records = _without_video_records(frame_records, video_id)
|
continue
|
||||||
ffmpeg_config = dict(config["ffmpeg"])
|
frame_records = _without_video_records(frame_records, video_id)
|
||||||
ffmpeg_config["timezone"] = timezone_name
|
ffmpeg_config = dict(config["ffmpeg"])
|
||||||
frame_records.extend(
|
ffmpeg_config["timezone"] = timezone_name
|
||||||
sample_video_frames(
|
frame_records.extend(
|
||||||
record,
|
sample_video_frames(
|
||||||
output_dir,
|
record,
|
||||||
ffmpeg_config,
|
output_dir,
|
||||||
manifest_path=None,
|
ffmpeg_config,
|
||||||
|
manifest_path=None,
|
||||||
|
)
|
||||||
)
|
)
|
||||||
)
|
changed_frame_video_ids.add(video_id)
|
||||||
changed_frame_video_ids.add(video_id)
|
|
||||||
write_manifest(frame_manifest_path, frame_records)
|
write_manifest(frame_manifest_path, frame_records)
|
||||||
|
_write_phase_timings(output_dir, phase_timings)
|
||||||
|
|
||||||
sampled_video_ids = {
|
sampled_video_ids = {
|
||||||
str(record.get("video_id"))
|
str(record.get("video_id"))
|
||||||
@@ -133,22 +199,28 @@ def main(argv: Sequence[str] | None = None) -> int:
|
|||||||
for record in frame_records
|
for record in frame_records
|
||||||
if str(record.get("video_id")) in clip_rebuild_video_ids
|
if str(record.get("video_id")) in clip_rebuild_video_ids
|
||||||
]
|
]
|
||||||
clip_records.extend(build_clip_records(frames_to_build, config["clip"]))
|
with _timed_phase(phase_timings, "clip_generation_seconds"):
|
||||||
|
clip_records.extend(build_clip_records(frames_to_build, config["clip"]))
|
||||||
write_manifest(output_dir / "clip_manifest.jsonl", clip_records)
|
write_manifest(output_dir / "clip_manifest.jsonl", clip_records)
|
||||||
|
_write_phase_timings(output_dir, phase_timings)
|
||||||
if args.until == "clips":
|
if args.until == "clips":
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
_run_inference(
|
with _timed_phase(phase_timings, "inference_seconds"):
|
||||||
clip_records,
|
_run_inference(
|
||||||
records,
|
clip_records,
|
||||||
output_dir,
|
records,
|
||||||
config,
|
output_dir,
|
||||||
limit_clips=args.limit_clips,
|
config,
|
||||||
resume=resume_enabled,
|
limit_clips=args.limit_clips,
|
||||||
)
|
resume=resume_enabled,
|
||||||
|
)
|
||||||
|
_write_phase_timings(output_dir, phase_timings)
|
||||||
if args.until == "inference":
|
if args.until == "inference":
|
||||||
return 0
|
return 0
|
||||||
aggregate_outputs(output_dir, config)
|
with _timed_phase(phase_timings, "aggregation_seconds"):
|
||||||
|
aggregate_outputs(output_dir, config)
|
||||||
|
_write_phase_timings(output_dir, phase_timings)
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
@@ -175,33 +247,40 @@ def _acquire_source_records(
|
|||||||
record_indexes: dict[str, int],
|
record_indexes: dict[str, int],
|
||||||
*,
|
*,
|
||||||
download_source: bool = True,
|
download_source: bool = True,
|
||||||
|
phase_timings: dict[str, object] | None = None,
|
||||||
) -> None:
|
) -> None:
|
||||||
for source_record in _source_video_records(
|
source_records = _measure_phase(
|
||||||
config,
|
phase_timings,
|
||||||
output_dir,
|
"source_acquisition_seconds",
|
||||||
download_source=download_source,
|
lambda: _source_video_records(
|
||||||
):
|
config,
|
||||||
path = source_record.get("path")
|
output_dir,
|
||||||
if not path:
|
download_source=download_source,
|
||||||
continue
|
|
||||||
video_id = stable_video_id(str(path))
|
|
||||||
existing_index = record_indexes.get(video_id)
|
|
||||||
if (
|
|
||||||
existing_index is not None
|
|
||||||
and records[existing_index].get("status") == "probed"
|
|
||||||
):
|
|
||||||
continue
|
|
||||||
|
|
||||||
probe_record = probe_video(
|
|
||||||
str(path),
|
|
||||||
timeout_seconds=config["ffprobe"]["timeout_seconds"],
|
|
||||||
)
|
)
|
||||||
record = {**source_record, **probe_record, "video_id": video_id}
|
)
|
||||||
if existing_index is None:
|
with _timed_phase(phase_timings, "video_probe_seconds"):
|
||||||
record_indexes[video_id] = len(records)
|
for source_record in source_records:
|
||||||
records.append(record)
|
path = source_record.get("path")
|
||||||
else:
|
if not path:
|
||||||
records[existing_index] = record
|
continue
|
||||||
|
video_id = stable_video_id(str(path))
|
||||||
|
existing_index = record_indexes.get(video_id)
|
||||||
|
if (
|
||||||
|
existing_index is not None
|
||||||
|
and records[existing_index].get("status") == "probed"
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
|
||||||
|
probe_record = probe_video(
|
||||||
|
str(path),
|
||||||
|
timeout_seconds=config["ffprobe"]["timeout_seconds"],
|
||||||
|
)
|
||||||
|
record = {**source_record, **probe_record, "video_id": video_id}
|
||||||
|
if existing_index is None:
|
||||||
|
record_indexes[video_id] = len(records)
|
||||||
|
records.append(record)
|
||||||
|
else:
|
||||||
|
records[existing_index] = record
|
||||||
|
|
||||||
|
|
||||||
def _source_video_records(
|
def _source_video_records(
|
||||||
|
|||||||
@@ -63,6 +63,7 @@ def build_clip_result(
|
|||||||
"status": result_status,
|
"status": result_status,
|
||||||
"monitoring_timeline": timeline,
|
"monitoring_timeline": timeline,
|
||||||
"events": _events(payload, clip_record) if result_status == "ok" else [],
|
"events": _events(payload, clip_record) if result_status == "ok" else [],
|
||||||
|
"qsc_events": _qsc_events(payload) if result_status == "ok" else [],
|
||||||
"raw_response": raw_response,
|
"raw_response": raw_response,
|
||||||
"processing": processing_record,
|
"processing": processing_record,
|
||||||
"error": result_error,
|
"error": result_error,
|
||||||
@@ -131,6 +132,17 @@ def _event(
|
|||||||
return normalized
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def _qsc_events(payload: dict[str, Any]) -> list[dict[str, Any]]:
|
||||||
|
raw_events = payload.get("qsc_events") or []
|
||||||
|
if not isinstance(raw_events, list):
|
||||||
|
return []
|
||||||
|
return [
|
||||||
|
dict(event)
|
||||||
|
for event in raw_events
|
||||||
|
if isinstance(event, dict)
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
def _video_path(video_record: dict[str, Any] | None) -> str | None:
|
def _video_path(video_record: dict[str, Any] | None) -> str | None:
|
||||||
if not video_record:
|
if not video_record:
|
||||||
return None
|
return None
|
||||||
|
|||||||
Reference in New Issue
Block a user