Add QSC prompt and phase timings

2026-06-17 22:52:54 +08:00
parent ef0047af6d
commit 0150c1ab5c
6 changed files with 304 additions and 118 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,6 @@
 # Secrets and local credentials
 access_token.md
 config.yaml
 .env
 .env.*
 *.pem
--- a/config/local_batch.yaml
+++ b/config/local_batch.yaml
@@ -54,7 +54,7 @@ vlm:
  chat_completions_path: /v1/chat/completions
  model: memai-zhengxin-v3-20260413
  timeout_seconds: 120
-  max_tokens: 512
+  max_tokens: 1024
  temperature: 0
  batch_size: 1
  image_transport: data_uri
@@ -62,96 +62,140 @@ vlm:
 prompt:
  system: >-
-    You are an AI quality inspector and store monitoring assistant for a fried chicken cutlet (鸡排) production line and storefront.
+    You are an AI quality inspector and store monitoring assistant for a fried chicken cutlet production line and storefront.
    Your task is to analyze a short video clip and output a structured JSON describing actions, quality statuses, errors, safety hazards, personnel (employees/guests), and the frame timestamp.
    Your task is to analyze a short multi-frame video clip and output one strict JSON object. Preserve the existing action, quality, safety, people, guest, and timestamp fields, and additionally detect QSC violation events.
-    All 9 top-level keys below are REQUIRED in every response. Use the specified empty-value convention when a field does not apply — never omit a key.
+    Use only visual evidence from the provided frames. Do not guess hidden facts. If something is not clearly visible, output an empty value, unknown, or [] according to the schema.
    All top-level keys below are REQUIRED in every response. Do not omit any key.
-    ### 1. Action (REQUIRED)
+    ### 1. Action
-
+    Identify the primary food-operation action.
    Identify the primary action. Use the "Action_" prefix on every label except End_Frying. If no action is detected, output "Action_Idle".
    Valid values: Action_Defrost / Action_Breading / Action_Resting / Action_Start_Frying / End_Frying / Action_Triming / Action_Cutting / Action_Seasoning / Action_Serving / Action_Idle.
    If no clear food-operation action is detected, output Action_Idle.
-
+    ### 2. quality_status
    ### 2. quality_status (REQUIRED — "" if not applicable)
    Choose based on the action:
    - Action_Breading: fully_covered | uneven
    - Action_Resting: stacked | qualified
    - Action_Start_Frying / End_Frying: standard_time | early_retrieval | overcooked | double_fried
    - Action_Cutting: complete_cut | linked | dusted_before_cut
    - Action_Seasoning: coverage_high | missed | single_side_dusted
    - Other actions: qualified
    If no ingredient is visible or the action has no applicable status, output an empty string.
-    - Action_Breading → fully_covered | uneven
+    ### 3. error_type
    Short description of legacy SOP operation anomaly only. Examples: dusted_before_cut, single_side_dusted, double_fried.
    If the operation is normal or no legacy SOP error is visible, output an empty string.
    QSC violations such as no mask, no hat, no gloves, tobacco, or foot-picking must be reported in qsc_events, not in error_type, unless they are also directly related to the legacy SOP operation.
-    - Action_Resting → stacked | qualified
+    ### 4. 安全隐患
    Chinese description of visible safety hazards in the scene. Example: 油锅附近有易燃物. If none, output an empty string.
-    - Action_Start_Frying / End_Frying → standard_time | early_retrieval | overcooked | double_fried
+    ### 5. 人物位置
    Chinese sentence describing where people are and how they are moving. Example: 员工在油锅边操作，顾客在收银台前等待. If no people are visible, output an empty string.
-    - Action_Cutting → complete_cut | linked | dusted_before_cut
+    ### 6. 总结
    Chinese sentence summarizing the scene and visible person count. Example: 画面中有2人，1名员工在操作台处理食物，1名顾客在收银台前等待. If no people are visible, output 无.
-    - Action_Seasoning → coverage_high | missed | single_side_dusted
+    ### 7. 时间
    The timestamp overlaid on the original video frame, in format YYYY-MM-DD HH:MM:SS. If the timestamp is not visible or cannot be read, output an empty string.
-    - Other actions → qualified
+    ### 8. employees
    Array of employee objects. If no employees are visible, output [].
    Each employee object must contain:
    - status: 1 if working at equipment, food, packing, counter, or operation table; 2 if standing idle, waiting, or passing by
    - warning: 0 if no visible hazard; 1 if hazard present
    - position: one of YZL_1 / LCCZT_1 / SYJ / DPL / BSZSG / DCGZT / KLJ / UNKNOWN
    Position codes: YZL_1 = oil fryer area; LCCZT_1 = cooling or operation table; SYJ = cashier/register; DPL = electric fryer area; BSZSG = display cabinet; DCGZT = sink/washing area; KLJ = cola/drink machine; UNKNOWN = employee visible but position cannot be classified.
-    If no ingredient is visible or the action has no applicable status, output "".
+    ### 9. guests
    Array with the existing mixed-key schema. If no guests are visible, output [].
    - First element is queue-level object only: {"warning": "0" or "1"}. 1 means queue has >= 3 visible guests; 0 means queue has < 3 visible guests.
    - Subsequent elements are per-guest objects only: {"status": "0"} at door, {"status": "1"} at register, or {"status": "2"} seated.
    ### 10. qsc_events
    Array of suspected QSC violation events. If no suspected violation is visible, output [].
    Detect only the following current-period QSC violations:
-    ### 3. error_type (REQUIRED — "" if no error)
+    QSC pre-scan rule: Before deciding the main food-operation Action, first scan the entire full-frame image sequence for QSC violations, including people in corners, background, seated/squatting/bending postures, and floor-level foot/shoe areas. QSC events must not be suppressed by a normal food-operation action.
-    Short description of any anomaly. Examples: "smoking", "dusted_before_cut", "single_side_dusted", "double_fried". If the operation is normal, output "".
+    - WGSJ0001: 工作状态未戴口罩
      Definition: An employee is in working state and the mouth/nose mask is clearly absent, not worn, or not covering mouth/nose.
      Working state includes frying food, making food, packing food, handling semi-finished products, touching food, operating food equipment, or working at a food operation table.
      Non-working state includes passing by, resting, waiting, short stay, or standing without obvious operation. In non-working state, no-mask alone is NOT a violation.
    - WGSJ0002: 工作状态未戴帽子
      Definition: An employee is in working state and the required work hat/cap/hair covering is clearly absent. Apply the same working-state rule as WGSJ0001.
-    ### 4. 安全隐患 (REQUIRED — "" if no hazard)
+    - WGSJ0003: 未戴手套操作食物
      Definition: An employee directly touches, handles, makes, packs, cuts, seasons, or transfers food without visible gloves. If hands are not visible, do not report this violation.
-    Chinese description of any safety hazard visible in the scene (e.g., "油锅附近有易燃物"). If none, output "".
+    - WGSJ0004: 工作区烟草制品违规
      Definition: Cigarette, e-cigarette, smoking behavior, lighter used for smoking, ashtray, or other tobacco product is visible in the food work area.
    - WGSJ0005: foot/shoe touching violation
      Chinese name for output: 抠脚或接触鞋脚.
      Definition: Report WGSJ0005 ONLY when there is clear visual evidence that a hand, fingers, tissue, cloth, tool, or another object is directly touching a foot, toes, sole, sock, shoe, or footwear area, and the motion is picking, scratching, rubbing, wiping, cleaning, adjusting, or handling that foot/shoe area.
-    ### 5. 人物位置 (REQUIRED — "" if no people)
+      Very strict rule:
      - WGSJ0005 is NOT a posture detector. Do not report it from bending, squatting, standing, walking, leaning, or a hand being near the leg/foot.
      - WGSJ0005 is NOT a "suspected" event. Do not output WGSJ0005 for manual_review unless the hand/object-to-foot/shoe contact is actually visible.
      - If the evidence is only suspicious or ambiguous, output no WGSJ0005 event. Keep qsc_events as [] unless another violation is clearly visible.
-    Descriptive Chinese sentence of where people are and how they are moving. Example: "员工在油锅边". If no one is in the frame, output "".
+      Required positive criteria:
      Output WGSJ0005 only when ALL of the following are true:
      - The foot, shoe, sock, toes, sole, or footwear area is visible.
      - The hand, fingers, tissue, cloth, tool, or object is visibly touching that foot/shoe area, not merely close to it.
      - The contact is visible in at least two frames, or one frame is extremely clear.
      - The action looks like picking, scratching, rubbing, wiping, cleaning, adjusting, or handling the foot/shoe area.
      - It is not normal walking, standing, food handling, floor cleaning, picking up an item, moving equipment, or touching a table/container/apron/clothing.
      Hard negative examples:
      Do NOT report WGSJ0005 when any of these is true:
      - A person is only standing near food, standing by a counter, or walking.
      - Feet or shoes are visible but no hand/object is visibly touching them.
      - A hand is at the table, food tray, oil pan, apron, waist, knee, pants, skirt, floor, trash bag, or equipment.
      - A person bends or squats but the hand-foot/shoe contact cannot be clearly seen.
      - The person is operating food, packing food, breading, seasoning, serving, cleaning the floor, picking up an item, or moving supplies.
      - The foot/shoe area is too small, blurry, blocked, cropped, or outside the frame.
-    ### 6. 总结 (REQUIRED — "无" if no people)
+      Output requirements for WGSJ0005:
      - violation_type must be exactly "抠脚或接触鞋脚".
      - reason must be Chinese and must explicitly say where the person is and what visible contact is seen.
      - suggested_action must be "manual_review".
      - confidence must be >= 0.80. If confidence would be below 0.80, do not output WGSJ0005.
      - evidence_frame_count must be the number of frames where direct contact is visible.
      - evidence_checklist must be exactly:
        {"foot_or_shoe_area_visible": true/false, "direct_hand_or_object_contact_visible": true/false, "contact_visible_in_multiple_frames_or_extremely_clear": true/false, "foot_handling_motion_visible": true/false, "normal_activity_excluded": true/false}
-    Descriptive Chinese sentence summarizing the scene with the exact person count. Example: "员工在油锅边炸鸡，顾客在收银台前等待". If no one is in the frame, output "无".
+    Multi-frame rule:
    - Do not rely on a single unclear frame.
    - Judge qsc_events based on the whole clip and continuous multi-frame evidence.
    - Prefer reporting a qsc_event only when the violation is visible in multiple frames, or when the visual evidence is very clear and consistent across the clip.
    - If evidence is unclear, do not report the violation; keep qsc_events as [].
    - For WGSJ0005, use the strictest threshold: only report it when direct hand/object-to-foot-or-shoe contact is clearly visible. If uncertain, do not report WGSJ0005.
    Each qsc_events item must contain:
    - violation_code: one of WGSJ0001 / WGSJ0002 / WGSJ0003 / WGSJ0004 / WGSJ0005
    - violation_type: Chinese violation name
    - is_violation: true
    - working_state: working / non_working / unknown
    - reason: concise Chinese explanation of the visible evidence
    - confidence: number from 0 to 1
    - evidence_frame_count: estimated number of frames supporting the event
    - visible_target: concise Chinese description of the person/object involved
    - evidence_checklist: for WGSJ0005 only, include {"foot_or_shoe_area_visible": true/false, "direct_hand_or_object_contact_visible": true/false, "contact_visible_in_multiple_frames_or_extremely_clear": true/false, "foot_handling_motion_visible": true/false, "normal_activity_excluded": true/false}; for other codes output {}
    - suggested_action: record / warning / manual_review
    Suggested action rules: WGSJ0001 and WGSJ0002 use warning; WGSJ0003 and WGSJ0004 use manual_review. WGSJ0005 uses manual_review only when direct hand/object-to-foot-or-shoe contact is clearly visible with confidence >= 0.80. If WGSJ0005 evidence is weak, suspicious, or ambiguous, do not output WGSJ0005.
-    ### 7. 时间 (REQUIRED — "" if unreadable)
+    ### Output format
-
+    Return strict JSON only. Do not wrap in markdown. Do not add any prose before or after the JSON.
-    The timestamp overlaid on the original video frame, in format "YYYY-MM-DD HH:MM:SS". If the timestamp is not visible or cannot be read, output "".
+    Required JSON shape:
-
+    {"Action": "Action_Idle", "quality_status": "", "error_type": "", "安全隐患": "", "人物位置": "", "总结": "无", "时间": "", "employees": [], "guests": [], "qsc_events": []}
-
+  user: >-
-    ### 8. employees (REQUIRED — [] if none)
+    Analyze this multi-frame video clip. Preserve the existing action, quality, safety, people, guest, and timestamp fields. Additionally detect current-period QSC violations in qsc_events. Return strict JSON only, with all required keys.
    Array of employee objects. Each object has ALL three keys:
    - status: "1" (working at equipment) or "2" (standing idle)
    - warning: "0" (no hazard) or "1" (hazard present)
    - position: one of YZL_1 (油锅边), LCCZT_1 (平冷操作台边), SYJ (收银机边), DPL (电扒炉旁), BSZSG (展示柜边), DCGZT (水池边), KLJ (可乐机边).
    If no employees are in the frame, output [].
    ### 9. guests (REQUIRED — [] if none, MIXED-KEY SCHEMA)
    Array with a specific mixed-key convention:
    - The FIRST element is a queue-level object with ONLY a "warning" key: {"warning": "0" or "1"}. "1" means the queue has ≥ 3 people; "0" means < 3.
    - Subsequent elements are per-guest objects with ONLY a "status" key: {"status": "0"} (at door) or {"status": "1"} (at register) or {"status": "2"} (seated). One such object per visible guest.
    If there are no guests at all, output []. If only the queue header is known, output [{"warning": "0 or 1"}].
    Example: [{"warning": "0"}, {"status": "1"}, {"status": "2"}]
    ### Output format (strict JSON, all 9 keys REQUIRED)
    {"Action": "<Action_Type>", "quality_status": "<status or empty>", "error_type": "<error or empty>", "安全隐患": "<hazard or empty>", "人物位置": "<location or empty>", "总结": "<summary or 无>", "时间": "<YYYY-MM-DD HH:MM:SS or empty>", "employees": [{"status": "<1 or 2>", "warning": "<0 or 1>", "position": "<code>"}], "guests": [{"warning": "<0 or 1>"}, {"status": "<0, 1, or 2>"}]}
    Do not wrap the JSON in markdown fences. Do not add any prose before or after the JSON.
  user: 'Analyze the video clip and return the required JSON with all 9 keys. Read the timestamp from the frame overlay into "时间".'
 schema:
  version: local-batch-v1
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -1269,6 +1269,20 @@ class CliTests(unittest.TestCase):
            self.assertEqual(folder_summary["processed_video_count"], 1)
            self.assertEqual(folder_summary["failed_video_count"], 0)
            self.assertEqual(folder_summary["event_counts"], {"queue_detected": 1})
            phase_timings = json.loads(
                (output_dir / "phase_timings.json").read_text(encoding="utf-8")
            )
            self.assertEqual(phase_timings["schema_version"], "phase-timings-v1")
            for phase in (
                "source_acquisition_seconds",
                "video_probe_seconds",
                "frame_sampling_seconds",
                "clip_generation_seconds",
                "inference_seconds",
                "aggregation_seconds",
            ):
                self.assertIn(phase, phase_timings["phases"])
                self.assertGreaterEqual(phase_timings["phases"][phase], 0)
 if __name__ == "__main__":
--- a/tests/test_result_parser.py
+++ b/tests/test_result_parser.py
@@ -104,6 +104,41 @@ class ResultParserTests(unittest.TestCase):
            "2026-06-14 12:31:20",
        )
    def test_build_clip_result_preserves_qsc_events(self):
        result = build_clip_result(
            (
                '{"Action":"Action_Idle","quality_status":"","error_type":"",'
                '"安全隐患":"","人物位置":"员工在操作台边","总结":"画面中有1人",'
                '"时间":"2026-06-16 05:00:03","employees":[],"guests":[],'
                '"qsc_events":[{"violation_code":"WGSJ0001",'
                '"violation_type":"工作状态未戴口罩","is_violation":true,'
                '"working_state":"working","reason":"员工在操作台处理食物时未见口罩",'
                '"confidence":0.92,"evidence_frame_count":3,'
                '"visible_target":"操作台边员工","evidence_checklist":{},'
                '"suggested_action":"warning"}]}'
            ),
            {
                "video_id": "video-abc",
                "clip_id": "video-abc_c000001",
                "clip_start_seconds": 0.0,
                "clip_end_seconds": 10.0,
                "clip_start_timecode": "00:00:00",
                "clip_end_timecode": "00:00:10",
                "frame_times": [],
            },
            {"path": "/videos/a.mp4"},
            {
                "schema": {"version": "local-batch-v1"},
                "runtime": {"timezone": "Asia/Shanghai"},
            },
            processing={},
        )
        self.assertEqual(result["status"], "ok")
        self.assertEqual(len(result["qsc_events"]), 1)
        self.assertEqual(result["qsc_events"][0]["violation_code"], "WGSJ0001")
        self.assertEqual(result["qsc_events"][0]["suggested_action"], "warning")
    def test_build_clip_result_records_parse_failure_without_crashing(self):
        result = build_clip_result(
            "not json",
@@ -126,6 +161,7 @@ class ResultParserTests(unittest.TestCase):
        self.assertEqual(result["status"], "parse_failed")
        self.assertEqual(result["events"], [])
        self.assertEqual(result["qsc_events"], [])
        self.assertEqual(result["monitoring_timeline"]["screen_time"], "")
        self.assertEqual(result["raw_response"], "not json")
        self.assertIn("JSON", result["error"])
--- a/video_ai_analysis_poc/cli.py
+++ b/video_ai_analysis_poc/cli.py
@@ -1,9 +1,12 @@
 from __future__ import annotations
 import argparse
 from contextlib import contextmanager
 import json
 import time
 from datetime import datetime, timezone
 from pathlib import Path
-from typing import Sequence
+from typing import Callable, Iterator, Sequence, TypeVar
 from .aggregator import aggregate_outputs
 from .clips import build_clip_records
@@ -18,6 +21,64 @@ from .result_parser import build_clip_result
 from .timeline import DEFAULT_TIMEZONE, format_beijing_time, timeline_start_epoch
 from .vlm_client import infer_clip
 T = TypeVar("T")
 def _new_phase_timings() -> dict[str, object]:
    return {
        "schema_version": "phase-timings-v1",
        "started_at": _utc_now_iso(),
        "updated_at": _utc_now_iso(),
        "phases": {},
    }
 def _write_phase_timings(
    output_dir: Path,
    phase_timings: dict[str, object],
 ) -> None:
    phase_timings["updated_at"] = _utc_now_iso()
    (output_dir / "phase_timings.json").write_text(
        json.dumps(phase_timings, ensure_ascii=False, sort_keys=True, indent=2) + "\n",
        encoding="utf-8",
    )
 def _measure_phase(
    phase_timings: dict[str, object] | None,
    phase_name: str,
    func: Callable[[], T],
 ) -> T:
    with _timed_phase(phase_timings, phase_name):
        return func()
@contextmanager
 def _timed_phase(
    phase_timings: dict[str, object] | None,
    phase_name: str,
 ) -> Iterator[None]:
    started = time.perf_counter()
    try:
        yield
    finally:
        if phase_timings is not None:
            phases = phase_timings.get("phases")
            if not isinstance(phases, dict):
                phases = {}
                phase_timings["phases"] = phases
            previous = phases.get(phase_name, 0)
            if not isinstance(previous, (int, float)):
                previous = 0
            phases[phase_name] = round(
                float(previous) + time.perf_counter() - started,
                6,
            )
 def _utc_now_iso() -> str:
    return datetime.now(timezone.utc).isoformat()
 def main(argv: Sequence[str] | None = None) -> int:
    parser = argparse.ArgumentParser(
@@ -43,6 +104,7 @@ def main(argv: Sequence[str] | None = None) -> int:
    output_dir = Path(config["output"]["dir"])
    output_dir.mkdir(parents=True, exist_ok=True)
    phase_timings = _new_phase_timings()
    video_manifest_path = output_dir / "video_manifest.jsonl"
    resume_enabled = bool(config.get("output", {}).get("resume", False))
@@ -63,11 +125,13 @@ def main(argv: Sequence[str] | None = None) -> int:
            records,
            record_indexes,
            download_source=not args.dry_run,
            phase_timings=phase_timings,
        )
    except ValueError as exc:
        parser.error(str(exc))
    write_manifest(video_manifest_path, records)
    _write_phase_timings(output_dir, phase_timings)
    if args.dry_run:
        return 0
@@ -93,27 +157,29 @@ def main(argv: Sequence[str] | None = None) -> int:
        if record.get("status") == "sampled" and record.get("video_id")
    }
    changed_frame_video_ids: set[str] = set(backfilled_frame_video_ids)
-    for record in records:
+    with _timed_phase(phase_timings, "frame_sampling_seconds"):
-        if record.get("status") != "probed":
+        for record in records:
-            continue
+            if record.get("status") != "probed":
-        video_id = str(record.get("video_id"))
+                continue
-        if args.until == "inference" and video_id in existing_clip_video_ids:
+            video_id = str(record.get("video_id"))
-            continue
+            if args.until == "inference" and video_id in existing_clip_video_ids:
-        if video_id in existing_sampled_video_ids:
+                continue
-            continue
+            if video_id in existing_sampled_video_ids:
-        frame_records = _without_video_records(frame_records, video_id)
+                continue
-        ffmpeg_config = dict(config["ffmpeg"])
+            frame_records = _without_video_records(frame_records, video_id)
-        ffmpeg_config["timezone"] = timezone_name
+            ffmpeg_config = dict(config["ffmpeg"])
-        frame_records.extend(
+            ffmpeg_config["timezone"] = timezone_name
-            sample_video_frames(
+            frame_records.extend(
-                record,
+                sample_video_frames(
-                output_dir,
+                    record,
-                ffmpeg_config,
+                    output_dir,
-                manifest_path=None,
+                    ffmpeg_config,
                    manifest_path=None,
                )
            )
-        )
+            changed_frame_video_ids.add(video_id)
        changed_frame_video_ids.add(video_id)
    write_manifest(frame_manifest_path, frame_records)
    _write_phase_timings(output_dir, phase_timings)
    sampled_video_ids = {
        str(record.get("video_id"))
@@ -133,22 +199,28 @@ def main(argv: Sequence[str] | None = None) -> int:
        for record in frame_records
        if str(record.get("video_id")) in clip_rebuild_video_ids
    ]
-    clip_records.extend(build_clip_records(frames_to_build, config["clip"]))
+    with _timed_phase(phase_timings, "clip_generation_seconds"):
        clip_records.extend(build_clip_records(frames_to_build, config["clip"]))
    write_manifest(output_dir / "clip_manifest.jsonl", clip_records)
    _write_phase_timings(output_dir, phase_timings)
    if args.until == "clips":
        return 0
-    _run_inference(
+    with _timed_phase(phase_timings, "inference_seconds"):
-        clip_records,
+        _run_inference(
-        records,
+            clip_records,
-        output_dir,
+            records,
-        config,
+            output_dir,
-        limit_clips=args.limit_clips,
+            config,
-        resume=resume_enabled,
+            limit_clips=args.limit_clips,
-    )
+            resume=resume_enabled,
        )
    _write_phase_timings(output_dir, phase_timings)
    if args.until == "inference":
        return 0
-    aggregate_outputs(output_dir, config)
+    with _timed_phase(phase_timings, "aggregation_seconds"):
        aggregate_outputs(output_dir, config)
    _write_phase_timings(output_dir, phase_timings)
    return 0
@@ -175,33 +247,40 @@ def _acquire_source_records(
    record_indexes: dict[str, int],
    *,
    download_source: bool = True,
    phase_timings: dict[str, object] | None = None,
 ) -> None:
-    for source_record in _source_video_records(
+    source_records = _measure_phase(
-        config,
+        phase_timings,
-        output_dir,
+        "source_acquisition_seconds",
-        download_source=download_source,
+        lambda: _source_video_records(
-    ):
+            config,
-        path = source_record.get("path")
+            output_dir,
-        if not path:
+            download_source=download_source,
            continue
        video_id = stable_video_id(str(path))
        existing_index = record_indexes.get(video_id)
        if (
            existing_index is not None
            and records[existing_index].get("status") == "probed"
        ):
            continue
        probe_record = probe_video(
            str(path),
            timeout_seconds=config["ffprobe"]["timeout_seconds"],
        )
-        record = {**source_record, **probe_record, "video_id": video_id}
+    )
-        if existing_index is None:
+    with _timed_phase(phase_timings, "video_probe_seconds"):
-            record_indexes[video_id] = len(records)
+        for source_record in source_records:
-            records.append(record)
+            path = source_record.get("path")
-        else:
+            if not path:
-            records[existing_index] = record
+                continue
            video_id = stable_video_id(str(path))
            existing_index = record_indexes.get(video_id)
            if (
                existing_index is not None
                and records[existing_index].get("status") == "probed"
            ):
                continue
            probe_record = probe_video(
                str(path),
                timeout_seconds=config["ffprobe"]["timeout_seconds"],
            )
            record = {**source_record, **probe_record, "video_id": video_id}
            if existing_index is None:
                record_indexes[video_id] = len(records)
                records.append(record)
            else:
                records[existing_index] = record
 def _source_video_records(
--- a/video_ai_analysis_poc/result_parser.py
+++ b/video_ai_analysis_poc/result_parser.py
@@ -63,6 +63,7 @@ def build_clip_result(
        "status": result_status,
        "monitoring_timeline": timeline,
        "events": _events(payload, clip_record) if result_status == "ok" else [],
        "qsc_events": _qsc_events(payload) if result_status == "ok" else [],
        "raw_response": raw_response,
        "processing": processing_record,
        "error": result_error,
@@ -131,6 +132,17 @@ def _event(
    return normalized
 def _qsc_events(payload: dict[str, Any]) -> list[dict[str, Any]]:
    raw_events = payload.get("qsc_events") or []
    if not isinstance(raw_events, list):
        return []
    return [
        dict(event)
        for event in raw_events
        if isinstance(event, dict)
    ]
 def _video_path(video_record: dict[str, Any] | None) -> str | None:
    if not video_record:
        return None