feat: add webhook retry queue
This commit is contained in:
11
README_zh.md
11
README_zh.md
@@ -122,8 +122,11 @@ http://127.0.0.1:19080
|
||||
- `GET /api/manage/cases/summary`
|
||||
- `POST /api/manage/cases/{case_id}/handle`
|
||||
- `POST /api/manage/webhooks/case-update`
|
||||
- `GET /api/manage/webhooks/retries`
|
||||
- `POST /api/manage/webhooks/retries/drain`
|
||||
|
||||
`/api/manage/webhooks/case-update` 需要请求头 `X-Webhook-Token`,并且请求体里的 `status` 目前固定为 `handled`。
|
||||
`/api/manage/webhooks/retries` 用于查看最新重试状态,`/api/manage/webhooks/retries/drain` 用于手动触发一次到期重试补偿。
|
||||
|
||||
## 运行识别计时进程
|
||||
|
||||
@@ -181,6 +184,9 @@ diagnostics_path = "logs/runtime_diagnostics.jsonl"
|
||||
[case_sink]
|
||||
path = "logs/cases.jsonl"
|
||||
|
||||
[webhook_retry_sink]
|
||||
path = "logs/webhook_retry.jsonl"
|
||||
|
||||
[webhooks]
|
||||
enabled = true
|
||||
event_url = "https://example.com/runtime-events"
|
||||
@@ -188,11 +194,16 @@ case_url = "https://example.com/case-events"
|
||||
callback_token = "shared-secret"
|
||||
connect_timeout_seconds = 3
|
||||
read_timeout_seconds = 5
|
||||
retry_backoff_seconds = 30
|
||||
retry_batch_limit = 20
|
||||
retry_max_attempts = 5
|
||||
retry_max_backoff_seconds = 1800
|
||||
```
|
||||
|
||||
运行时会额外记录:
|
||||
|
||||
- `logs/cases.jsonl`:本地处置单状态变更
|
||||
- `logs/webhook_retry.jsonl`:Webhook 重试队列状态快照
|
||||
- `logs/webhook_delivery.jsonl`:Webhook 投递结果审计
|
||||
|
||||
## 本地测试
|
||||
|
||||
@@ -54,3 +54,24 @@ trash_motion_cooldown_seconds = 3
|
||||
|
||||
[event_sink]
|
||||
path = "logs/events.jsonl"
|
||||
|
||||
[case_sink]
|
||||
path = "logs/cases.jsonl"
|
||||
|
||||
[webhook_retry_sink]
|
||||
path = "logs/webhook_retry.jsonl"
|
||||
|
||||
[webhook_delivery_sink]
|
||||
path = "logs/webhook_delivery.jsonl"
|
||||
|
||||
[webhooks]
|
||||
enabled = false
|
||||
event_url = ""
|
||||
case_url = ""
|
||||
callback_token = ""
|
||||
connect_timeout_seconds = 3
|
||||
read_timeout_seconds = 5
|
||||
retry_backoff_seconds = 30
|
||||
retry_batch_limit = 20
|
||||
retry_max_attempts = 5
|
||||
retry_max_backoff_seconds = 1800
|
||||
|
||||
105
docs/superpowers/plans/2026-06-09-webhook-retry-queue.md
Normal file
105
docs/superpowers/plans/2026-06-09-webhook-retry-queue.md
Normal file
@@ -0,0 +1,105 @@
|
||||
# Webhook Retry Queue Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add persistent webhook retry queue handling so failed outbound webhook deliveries are retried with backoff instead of being recorded only as one-shot failures.
|
||||
|
||||
**Architecture:** Keep the current synchronous direct-send path as the first attempt, but persist failed outbound deliveries into a separate append-only retry-state JSONL log. Reconstruct the latest retry state from that log, retry due items from runtime and management API entry points, and expose queue visibility plus manual drain control through the existing management API.
|
||||
|
||||
**Tech Stack:** Python 3.12 standard library backend, JSONL persistence, unittest, existing Vite frontend left unchanged for this phase.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Retry Queue Model And Delivery Semantics
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/cold_display_guard/webhooks.py`
|
||||
- Test: `tests/test_webhooks.py`
|
||||
|
||||
- [ ] **Step 1: Write failing retry-queue tests**
|
||||
Add tests for:
|
||||
- non-2xx direct delivery is treated as failure rather than success
|
||||
- failed direct delivery appends a pending retry snapshot
|
||||
- due retry success marks the queued item delivered
|
||||
- repeated retry failure increments attempts and eventually becomes `dead_letter`
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
Run: `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py -v`
|
||||
Expected: FAIL because retry queue helpers and non-2xx handling do not exist yet.
|
||||
|
||||
- [ ] **Step 3: Implement minimal retry queue support**
|
||||
In `src/cold_display_guard/webhooks.py`:
|
||||
- add webhook retry settings parsing
|
||||
- add retry snapshot load/append helpers
|
||||
- add in-memory retry store operations
|
||||
- treat only HTTP `2xx` as successful delivery
|
||||
- enqueue failed direct deliveries
|
||||
- retry due queued deliveries with bounded exponential backoff and dead-letter cutoff
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
Run: `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py -v`
|
||||
Expected: PASS
|
||||
|
||||
### Task 2: Runtime And Manage API Integration
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/cold_display_guard/main.py`
|
||||
- Modify: `src/cold_display_guard/manage_api.py`
|
||||
- Test: `tests/test_main.py`
|
||||
- Test: `tests/test_manage_api.py`
|
||||
|
||||
- [ ] **Step 1: Write failing integration tests**
|
||||
Add tests for:
|
||||
- runtime delivery enqueues failed outbound webhooks and drains due retries
|
||||
- manual case handling uses the queue-aware sender
|
||||
- management API can list queued retry items
|
||||
- management API can manually trigger a retry drain and report results
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
Run:
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_manage_api.py -v`
|
||||
Expected: FAIL because runtime/API do not know about queue paths or drain actions yet.
|
||||
|
||||
- [ ] **Step 3: Implement minimal integration**
|
||||
- add retry-queue path resolution to runtime and management API
|
||||
- make runtime direct sends queue-aware and drain due items each cycle
|
||||
- make case-handle callbacks/manual operations queue-aware
|
||||
- add `GET /api/manage/webhooks/retries`
|
||||
- add `POST /api/manage/webhooks/retries/drain`
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
Run:
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_manage_api.py -v`
|
||||
Expected: PASS
|
||||
|
||||
### Task 3: Config Surface, Docs, And Final Verification
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/cold_display_guard/config.py`
|
||||
- Modify: `config/example.toml`
|
||||
- Modify: `README_zh.md`
|
||||
- Test: `tests/test_config.py`
|
||||
|
||||
- [ ] **Step 1: Write failing config/doc tests**
|
||||
Extend config tests so saved config output includes retry queue sink/settings.
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
Run: `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_config.py -v`
|
||||
Expected: FAIL because retry queue config formatting does not exist yet.
|
||||
|
||||
- [ ] **Step 3: Implement config and docs updates**
|
||||
- add defaults for retry queue sink path and retry policy settings
|
||||
- expose the non-secret retry config in manage config payload
|
||||
- document retry queue behavior, new log file, and manual drain/list endpoints
|
||||
|
||||
- [ ] **Step 4: Run targeted and full verification**
|
||||
Run:
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_config.py -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_manage_api.py -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest discover -s tests -v`
|
||||
Expected: PASS
|
||||
|
||||
@@ -199,6 +199,12 @@ def format_config_document(data: dict[str, Any]) -> str:
|
||||
lines.append(f'path = "{_escape(str(case_sink.get("path", "logs/cases.jsonl")))}"')
|
||||
lines.append("")
|
||||
|
||||
webhook_retry_sink = data.get("webhook_retry_sink", {})
|
||||
if webhook_retry_sink:
|
||||
lines.append("[webhook_retry_sink]")
|
||||
lines.append(f'path = "{_escape(str(webhook_retry_sink.get("path", "logs/webhook_retry.jsonl")))}"')
|
||||
lines.append("")
|
||||
|
||||
webhooks = data.get("webhooks", {})
|
||||
if webhooks:
|
||||
lines.append("[webhooks]")
|
||||
@@ -209,6 +215,10 @@ def format_config_document(data: dict[str, Any]) -> str:
|
||||
"enabled",
|
||||
"event_url",
|
||||
"read_timeout_seconds",
|
||||
"retry_backoff_seconds",
|
||||
"retry_batch_limit",
|
||||
"retry_max_attempts",
|
||||
"retry_max_backoff_seconds",
|
||||
):
|
||||
if key not in webhooks:
|
||||
continue
|
||||
|
||||
@@ -19,7 +19,7 @@ from cold_display_guard.vision import (
|
||||
load_runtime_vision_settings,
|
||||
metrics_indicate_occupied,
|
||||
)
|
||||
from cold_display_guard.webhooks import send_batch_event_webhooks, send_case_webhooks
|
||||
from cold_display_guard.webhooks import drain_webhook_retries, send_batch_event_webhooks, send_case_webhooks
|
||||
|
||||
|
||||
def main() -> int:
|
||||
@@ -54,6 +54,7 @@ def run(config_path: str | Path, once: bool = False, max_iterations: int = 0) ->
|
||||
timezone = ZoneInfo(str(config.get("timezone", "Asia/Shanghai")))
|
||||
event_path = resolve_project_path(project_root, str(config.get("event_sink", {}).get("path", "logs/events.jsonl")))
|
||||
case_path = case_sink_path(project_root, config)
|
||||
webhook_retry_path = webhook_retry_sink_path(project_root, config)
|
||||
webhook_delivery_path = resolve_project_path(
|
||||
project_root,
|
||||
str(config.get("webhook_delivery_sink", {}).get("path", "logs/webhook_delivery.jsonl")),
|
||||
@@ -83,6 +84,7 @@ def run(config_path: str | Path, once: bool = False, max_iterations: int = 0) ->
|
||||
|
||||
event_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
case_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
webhook_retry_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
webhook_delivery_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
diagnostics_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
print(f"Cold Display Guard runtime started")
|
||||
@@ -102,7 +104,14 @@ def run(config_path: str | Path, once: bool = False, max_iterations: int = 0) ->
|
||||
events = engine.process(observation)
|
||||
append_jsonl(event_path, events)
|
||||
case_snapshots = persist_case_updates(case_store, case_path, events)
|
||||
deliver_runtime_webhooks(events, case_snapshots, config, webhook_delivery_path)
|
||||
deliver_runtime_webhooks(
|
||||
events,
|
||||
case_snapshots,
|
||||
config,
|
||||
webhook_delivery_path,
|
||||
retry_path=webhook_retry_path,
|
||||
now=when,
|
||||
)
|
||||
append_jsonl(
|
||||
diagnostics_path,
|
||||
[
|
||||
@@ -140,6 +149,11 @@ def case_sink_path(project_root: Path, config: dict) -> Path:
|
||||
return resolve_project_path(project_root, raw_path)
|
||||
|
||||
|
||||
def webhook_retry_sink_path(project_root: Path, config: dict) -> Path:
|
||||
raw_path = str(config.get("webhook_retry_sink", {}).get("path", "logs/webhook_retry.jsonl"))
|
||||
return resolve_project_path(project_root, raw_path)
|
||||
|
||||
|
||||
def append_jsonl(path: Path, payloads: list[dict]) -> None:
|
||||
if not payloads:
|
||||
return
|
||||
@@ -165,10 +179,14 @@ def deliver_runtime_webhooks(
|
||||
config: dict[str, object],
|
||||
audit_path: Path,
|
||||
*,
|
||||
retry_path: Path | None = None,
|
||||
http_post=None,
|
||||
now: datetime | None = None,
|
||||
) -> None:
|
||||
send_batch_event_webhooks(events, config, audit_path, http_post=http_post)
|
||||
send_case_webhooks(case_snapshots, config, audit_path, http_post=http_post)
|
||||
send_batch_event_webhooks(events, config, audit_path, retry_path=retry_path, http_post=http_post, now=now)
|
||||
send_case_webhooks(case_snapshots, config, audit_path, retry_path=retry_path, http_post=http_post, now=now)
|
||||
if retry_path is not None:
|
||||
drain_webhook_retries(config, retry_path, audit_path, http_post=http_post, now=now)
|
||||
|
||||
|
||||
def restore_runtime_state(diagnostics_path: Path, config: dict) -> tuple[dict[str, RegionMetrics], dict[str, int]]:
|
||||
|
||||
@@ -20,7 +20,7 @@ from cold_display_guard.config import (
|
||||
save_config_document,
|
||||
)
|
||||
from cold_display_guard.vision import load_runtime_vision_settings, metrics_indicate_occupied
|
||||
from cold_display_guard.webhooks import send_case_webhooks
|
||||
from cold_display_guard.webhooks import drain_webhook_retries, load_retry_snapshots, send_case_webhooks
|
||||
|
||||
|
||||
PROJECT_TYPE = "cold_display_guard"
|
||||
@@ -77,6 +77,12 @@ def create_handler(ctx: ManageContext) -> type[BaseHTTPRequestHandler]:
|
||||
if parsed.path == "/api/manage/cases/summary":
|
||||
self._send_json(build_case_summary(ctx))
|
||||
return
|
||||
if parsed.path == "/api/manage/webhooks/retries":
|
||||
query = parse_qs(parsed.query)
|
||||
limit = bounded_int(query.get("limit", ["200"])[0], 1, MAX_EVENT_LINES)
|
||||
status = str(query.get("status", [""])[0]).strip().lower()
|
||||
self._send_json({"items": load_webhook_retries(ctx, limit=limit, status=status), "limit": limit})
|
||||
return
|
||||
if parsed.path == "/api/manage/diagnostics":
|
||||
query = parse_qs(parsed.query)
|
||||
limit = bounded_int(query.get("limit", ["50"])[0], 1, MAX_EVENT_LINES)
|
||||
@@ -106,6 +112,9 @@ def create_handler(ctx: ManageContext) -> type[BaseHTTPRequestHandler]:
|
||||
if parsed.path == "/api/manage/webhooks/case-update":
|
||||
self._handle_case_callback()
|
||||
return
|
||||
if parsed.path == "/api/manage/webhooks/retries/drain":
|
||||
self._drain_webhook_retries()
|
||||
return
|
||||
self.send_error(HTTPStatus.NOT_FOUND)
|
||||
|
||||
def log_message(self, format: str, *args: object) -> None:
|
||||
@@ -220,6 +229,11 @@ def create_handler(ctx: ManageContext) -> type[BaseHTTPRequestHandler]:
|
||||
return
|
||||
self._send_json(snapshot)
|
||||
|
||||
def _drain_webhook_retries(self) -> None:
|
||||
payload = self._read_json()
|
||||
limit = bounded_int(payload.get("limit", 200), 1, MAX_EVENT_LINES)
|
||||
self._send_json(drain_webhook_retry_queue(ctx, limit=limit))
|
||||
|
||||
def _read_json(self) -> dict[str, Any]:
|
||||
length = int(self.headers.get("Content-Length", "0"))
|
||||
if length == 0:
|
||||
@@ -296,6 +310,7 @@ def config_payload(ctx: ManageContext) -> dict[str, Any]:
|
||||
data = load_config_document(ctx.config_path)
|
||||
event_path = event_sink_path(ctx, data)
|
||||
case_path = case_sink_path(ctx, data)
|
||||
retry_path = webhook_retry_sink_path(ctx, data)
|
||||
webhooks = dict(data.get("webhooks", {}) or {})
|
||||
webhooks.pop("callback_token", None)
|
||||
return {
|
||||
@@ -313,6 +328,7 @@ def config_payload(ctx: ManageContext) -> dict[str, Any]:
|
||||
"trash": data.get("trash", {}),
|
||||
"event_sink": {"path": str(event_path)},
|
||||
"case_sink": {"path": str(case_path)},
|
||||
"webhook_retry_sink": {"path": str(retry_path)},
|
||||
"webhooks": webhooks,
|
||||
}
|
||||
|
||||
@@ -386,6 +402,19 @@ def load_cases(ctx: ManageContext, limit: int, status: str = "") -> list[dict[st
|
||||
return cases[:limit]
|
||||
|
||||
|
||||
def load_webhook_retries(ctx: ManageContext, limit: int, status: str = "") -> list[dict[str, Any]]:
|
||||
latest: dict[str, dict[str, Any]] = {}
|
||||
for item in load_retry_snapshots(webhook_retry_sink_path(ctx)):
|
||||
retry_id = str(item.get("retry_id", "")).strip()
|
||||
if retry_id:
|
||||
latest[retry_id] = item
|
||||
items = list(latest.values())
|
||||
if status:
|
||||
items = [item for item in items if str(item.get("status", "")).lower() == status]
|
||||
items.sort(key=lambda item: str(item.get("updated_at", "")), reverse=True)
|
||||
return items[:limit]
|
||||
|
||||
|
||||
def build_case_summary(ctx: ManageContext) -> dict[str, Any]:
|
||||
cases = load_cases(ctx, limit=MAX_EVENT_LINES)
|
||||
summary = {
|
||||
@@ -450,6 +479,16 @@ def case_sink_path(ctx: ManageContext, data: dict[str, Any] | None = None) -> Pa
|
||||
return path.resolve()
|
||||
|
||||
|
||||
def webhook_retry_sink_path(ctx: ManageContext, data: dict[str, Any] | None = None) -> Path:
|
||||
if data is None:
|
||||
data = load_config_document(ctx.config_path)
|
||||
raw_path = str(data.get("webhook_retry_sink", {}).get("path", "logs/webhook_retry.jsonl"))
|
||||
path = Path(raw_path).expanduser()
|
||||
if not path.is_absolute():
|
||||
path = ctx.project_root / path
|
||||
return path.resolve()
|
||||
|
||||
|
||||
def webhook_delivery_path(ctx: ManageContext, data: dict[str, Any] | None = None) -> Path:
|
||||
if data is None:
|
||||
data = load_config_document(ctx.config_path)
|
||||
@@ -481,23 +520,47 @@ def handle_case_update(
|
||||
) -> dict[str, Any] | None:
|
||||
config = load_config_document(ctx.config_path)
|
||||
path = case_sink_path(ctx, config)
|
||||
retry_path = webhook_retry_sink_path(ctx, config)
|
||||
delivery_path = webhook_delivery_path(ctx, config)
|
||||
store = CaseStore(load_case_snapshots(path))
|
||||
matching = {item["case_id"] for item in store.latest_cases()}
|
||||
if case_id not in matching:
|
||||
return None
|
||||
handled_at = datetime.now(timezone.utc)
|
||||
snapshot = store.mark_handled(
|
||||
case_id,
|
||||
handled_at=datetime.now(timezone.utc),
|
||||
handled_at=handled_at,
|
||||
handled_by=handled_by,
|
||||
handled_source=handled_source,
|
||||
note=note,
|
||||
source_ref=source_ref,
|
||||
)
|
||||
append_case_snapshots(path, [snapshot])
|
||||
send_case_webhooks([snapshot], config, webhook_delivery_path(ctx, config))
|
||||
send_case_webhooks([snapshot], config, delivery_path, retry_path=retry_path, now=handled_at)
|
||||
drain_webhook_retries(config, retry_path, delivery_path, now=handled_at)
|
||||
return snapshot
|
||||
|
||||
|
||||
def drain_webhook_retry_queue(ctx: ManageContext, *, limit: int) -> dict[str, Any]:
|
||||
config = load_config_document(ctx.config_path)
|
||||
webhooks = dict(config.get("webhooks", {}) or {})
|
||||
webhooks["retry_batch_limit"] = limit
|
||||
config = dict(config)
|
||||
config["webhooks"] = webhooks
|
||||
updates = drain_webhook_retries(
|
||||
config,
|
||||
webhook_retry_sink_path(ctx, config),
|
||||
webhook_delivery_path(ctx, config),
|
||||
)
|
||||
return {
|
||||
"items": updates,
|
||||
"retried_count": len(updates),
|
||||
"delivered_count": sum(1 for item in updates if str(item.get("status", "")) == "delivered"),
|
||||
"dead_letter_count": sum(1 for item in updates if str(item.get("status", "")) == "dead_letter"),
|
||||
"pending_count": sum(1 for item in updates if str(item.get("status", "")) == "pending"),
|
||||
}
|
||||
|
||||
|
||||
def latest_zone_counts(diagnostics: list[dict[str, Any]], config: dict[str, Any] | None = None) -> dict[str, int]:
|
||||
for item in reversed(diagnostics):
|
||||
stable_counts = stable_zone_counts_from_diagnostics(item)
|
||||
|
||||
@@ -1,8 +1,9 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import uuid
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Callable
|
||||
from urllib import request
|
||||
@@ -16,11 +17,104 @@ class WebhookSettings:
|
||||
callback_token: str = ""
|
||||
connect_timeout_seconds: float = 3.0
|
||||
read_timeout_seconds: float = 5.0
|
||||
retry_max_attempts: int = 5
|
||||
retry_backoff_seconds: float = 30.0
|
||||
retry_max_backoff_seconds: float = 1_800.0
|
||||
retry_batch_limit: int = 20
|
||||
|
||||
|
||||
HttpPost = Callable[[str, dict[str, object], tuple[float, float]], tuple[int, str]]
|
||||
|
||||
|
||||
class RetryStore:
|
||||
def __init__(self, snapshots: list[dict[str, object]] | None = None) -> None:
|
||||
self._entries: dict[str, dict[str, object]] = {}
|
||||
for snapshot in snapshots or []:
|
||||
retry_id = str(snapshot.get("retry_id", "")).strip()
|
||||
if retry_id:
|
||||
self._entries[retry_id] = normalize_retry_snapshot(snapshot)
|
||||
|
||||
def latest_items(self, *, limit: int = 200, status: str = "") -> list[dict[str, object]]:
|
||||
items = list(self._entries.values())
|
||||
if status:
|
||||
items = [item for item in items if str(item.get("status", "")).lower() == status.lower()]
|
||||
items.sort(key=retry_sort_key, reverse=True)
|
||||
return [dict(item) for item in items[:limit]]
|
||||
|
||||
def due_items(self, now: datetime, *, limit: int) -> list[dict[str, object]]:
|
||||
due: list[dict[str, object]] = []
|
||||
for item in self._entries.values():
|
||||
if str(item.get("status", "")) != "pending":
|
||||
continue
|
||||
next_attempt_at = parse_iso_datetime(item.get("next_attempt_at"))
|
||||
if next_attempt_at is None or next_attempt_at <= now:
|
||||
due.append(item)
|
||||
due.sort(key=due_retry_sort_key)
|
||||
return [dict(item) for item in due[:limit]]
|
||||
|
||||
def enqueue_failure(
|
||||
self,
|
||||
*,
|
||||
target: str,
|
||||
url: str,
|
||||
payload: dict[str, object],
|
||||
attempted_at: datetime,
|
||||
settings: WebhookSettings,
|
||||
status_code: int | None,
|
||||
message: str,
|
||||
) -> dict[str, object]:
|
||||
retry_id = f"retry_{uuid.uuid4().hex}"
|
||||
attempt_count = 1
|
||||
pending = attempt_count < settings.retry_max_attempts
|
||||
snapshot = normalize_retry_snapshot(
|
||||
{
|
||||
"retry_id": retry_id,
|
||||
"target": target,
|
||||
"url": url,
|
||||
"payload": payload,
|
||||
"status": "pending" if pending else "dead_letter",
|
||||
"attempt_count": attempt_count,
|
||||
"created_at": attempted_at.isoformat(),
|
||||
"updated_at": attempted_at.isoformat(),
|
||||
"next_attempt_at": schedule_retry(attempted_at, settings, attempt_count).isoformat() if pending else "",
|
||||
"delivered_at": "",
|
||||
"last_status_code": status_code,
|
||||
"last_message": message,
|
||||
}
|
||||
)
|
||||
self._entries[retry_id] = snapshot
|
||||
return dict(snapshot)
|
||||
|
||||
def record_retry_result(
|
||||
self,
|
||||
retry_id: str,
|
||||
*,
|
||||
attempted_at: datetime,
|
||||
settings: WebhookSettings,
|
||||
status: str,
|
||||
status_code: int | None,
|
||||
message: str,
|
||||
) -> dict[str, object]:
|
||||
current = dict(self._entries[retry_id])
|
||||
attempt_count = int(current.get("attempt_count", 0)) + 1
|
||||
current["attempt_count"] = attempt_count
|
||||
current["updated_at"] = attempted_at.isoformat()
|
||||
current["last_status_code"] = status_code
|
||||
current["last_message"] = message
|
||||
if status == "ok":
|
||||
current["status"] = "delivered"
|
||||
current["next_attempt_at"] = ""
|
||||
current["delivered_at"] = attempted_at.isoformat()
|
||||
else:
|
||||
pending = attempt_count < settings.retry_max_attempts
|
||||
current["status"] = "pending" if pending else "dead_letter"
|
||||
current["next_attempt_at"] = schedule_retry(attempted_at, settings, attempt_count).isoformat() if pending else ""
|
||||
current["delivered_at"] = ""
|
||||
snapshot = normalize_retry_snapshot(current)
|
||||
self._entries[retry_id] = snapshot
|
||||
return dict(snapshot)
|
||||
|
||||
|
||||
def load_webhook_settings(config: dict[str, Any]) -> WebhookSettings:
|
||||
payload = config.get("webhooks", {})
|
||||
if not isinstance(payload, dict):
|
||||
@@ -32,6 +126,10 @@ def load_webhook_settings(config: dict[str, Any]) -> WebhookSettings:
|
||||
callback_token=str(payload.get("callback_token", "")),
|
||||
connect_timeout_seconds=float(payload.get("connect_timeout_seconds", 3.0)),
|
||||
read_timeout_seconds=float(payload.get("read_timeout_seconds", 5.0)),
|
||||
retry_max_attempts=max(1, int(payload.get("retry_max_attempts", 5))),
|
||||
retry_backoff_seconds=max(1.0, float(payload.get("retry_backoff_seconds", 30.0))),
|
||||
retry_max_backoff_seconds=max(1.0, float(payload.get("retry_max_backoff_seconds", 1_800.0))),
|
||||
retry_batch_limit=max(1, int(payload.get("retry_batch_limit", 20))),
|
||||
)
|
||||
|
||||
|
||||
@@ -76,24 +174,44 @@ def send_batch_event_webhooks(
|
||||
config: dict[str, Any],
|
||||
audit_path: Path,
|
||||
*,
|
||||
retry_path: Path | None = None,
|
||||
http_post: HttpPost | None = None,
|
||||
now: datetime | None = None,
|
||||
) -> list[dict[str, object]]:
|
||||
settings = load_webhook_settings(config)
|
||||
if not settings.enabled or not settings.event_url:
|
||||
return []
|
||||
attempted_at = now or datetime.now(timezone.utc)
|
||||
deliveries: list[dict[str, object]] = []
|
||||
retry_updates: list[dict[str, object]] = []
|
||||
store = load_retry_store(retry_path) if retry_path is not None else None
|
||||
for event in events:
|
||||
payload = build_batch_event_payload(event)
|
||||
deliveries.append(
|
||||
deliver_webhook(
|
||||
settings.event_url,
|
||||
payload,
|
||||
audit_path,
|
||||
target="batch_event",
|
||||
settings=settings,
|
||||
http_post=http_post,
|
||||
)
|
||||
record = deliver_webhook(
|
||||
settings.event_url,
|
||||
payload,
|
||||
audit_path,
|
||||
target="batch_event",
|
||||
settings=settings,
|
||||
http_post=http_post,
|
||||
attempted_at=attempted_at,
|
||||
delivery_mode="direct",
|
||||
)
|
||||
deliveries.append(record)
|
||||
if store is not None and record["status"] == "error":
|
||||
retry_updates.append(
|
||||
store.enqueue_failure(
|
||||
target="batch_event",
|
||||
url=settings.event_url,
|
||||
payload=payload,
|
||||
attempted_at=attempted_at,
|
||||
settings=settings,
|
||||
status_code=optional_int(record.get("status_code")),
|
||||
message=str(record.get("message", "")),
|
||||
)
|
||||
)
|
||||
if retry_path is not None:
|
||||
append_retry_snapshots(retry_path, retry_updates)
|
||||
return deliveries
|
||||
|
||||
|
||||
@@ -102,25 +220,118 @@ def send_case_webhooks(
|
||||
config: dict[str, Any],
|
||||
audit_path: Path,
|
||||
*,
|
||||
retry_path: Path | None = None,
|
||||
http_post: HttpPost | None = None,
|
||||
now: datetime | None = None,
|
||||
) -> list[dict[str, object]]:
|
||||
settings = load_webhook_settings(config)
|
||||
if not settings.enabled or not settings.case_url:
|
||||
return []
|
||||
attempted_at = now or datetime.now(timezone.utc)
|
||||
deliveries: list[dict[str, object]] = []
|
||||
retry_updates: list[dict[str, object]] = []
|
||||
store = load_retry_store(retry_path) if retry_path is not None else None
|
||||
for snapshot in snapshots:
|
||||
payload = build_case_event_payload(snapshot)
|
||||
deliveries.append(
|
||||
deliver_webhook(
|
||||
settings.case_url,
|
||||
payload,
|
||||
audit_path,
|
||||
target="case_event",
|
||||
record = deliver_webhook(
|
||||
settings.case_url,
|
||||
payload,
|
||||
audit_path,
|
||||
target="case_event",
|
||||
settings=settings,
|
||||
http_post=http_post,
|
||||
attempted_at=attempted_at,
|
||||
delivery_mode="direct",
|
||||
)
|
||||
deliveries.append(record)
|
||||
if store is not None and record["status"] == "error":
|
||||
retry_updates.append(
|
||||
store.enqueue_failure(
|
||||
target="case_event",
|
||||
url=settings.case_url,
|
||||
payload=payload,
|
||||
attempted_at=attempted_at,
|
||||
settings=settings,
|
||||
status_code=optional_int(record.get("status_code")),
|
||||
message=str(record.get("message", "")),
|
||||
)
|
||||
)
|
||||
if retry_path is not None:
|
||||
append_retry_snapshots(retry_path, retry_updates)
|
||||
return deliveries
|
||||
|
||||
|
||||
def drain_webhook_retries(
|
||||
config: dict[str, Any],
|
||||
retry_path: Path,
|
||||
audit_path: Path,
|
||||
*,
|
||||
http_post: HttpPost | None = None,
|
||||
now: datetime | None = None,
|
||||
) -> list[dict[str, object]]:
|
||||
settings = load_webhook_settings(config)
|
||||
if not settings.enabled or not retry_path.exists():
|
||||
return []
|
||||
attempted_at = now or datetime.now(timezone.utc)
|
||||
store = load_retry_store(retry_path)
|
||||
updates: list[dict[str, object]] = []
|
||||
for item in store.due_items(attempted_at, limit=settings.retry_batch_limit):
|
||||
payload = dict(item.get("payload", {}) or {})
|
||||
record = deliver_webhook(
|
||||
str(item.get("url", "")),
|
||||
payload,
|
||||
audit_path,
|
||||
target=str(item.get("target", "")),
|
||||
settings=settings,
|
||||
http_post=http_post,
|
||||
attempted_at=attempted_at,
|
||||
retry_id=str(item.get("retry_id", "")),
|
||||
delivery_mode="retry",
|
||||
)
|
||||
updates.append(
|
||||
store.record_retry_result(
|
||||
str(item.get("retry_id", "")),
|
||||
attempted_at=attempted_at,
|
||||
settings=settings,
|
||||
http_post=http_post,
|
||||
status=str(record.get("status", "error")),
|
||||
status_code=optional_int(record.get("status_code")),
|
||||
message=str(record.get("message", "")),
|
||||
)
|
||||
)
|
||||
return deliveries
|
||||
append_retry_snapshots(retry_path, updates)
|
||||
return updates
|
||||
|
||||
|
||||
def load_retry_snapshots(path: Path) -> list[dict[str, object]]:
|
||||
if not path.exists():
|
||||
return []
|
||||
snapshots: list[dict[str, object]] = []
|
||||
for line in path.read_text(encoding="utf-8").splitlines():
|
||||
try:
|
||||
payload = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
if isinstance(payload, dict):
|
||||
snapshots.append(normalize_retry_snapshot(payload))
|
||||
return snapshots
|
||||
|
||||
|
||||
def append_retry_snapshots(path: Path, snapshots: list[dict[str, object]]) -> None:
|
||||
if not snapshots:
|
||||
return
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with path.open("a", encoding="utf-8") as handle:
|
||||
if path.exists() and path.stat().st_size > 0 and not file_ends_with_newline(path):
|
||||
handle.write("\n")
|
||||
for snapshot in snapshots:
|
||||
handle.write(json.dumps(snapshot, ensure_ascii=False, sort_keys=True))
|
||||
handle.write("\n")
|
||||
|
||||
|
||||
def load_retry_store(path: Path | None) -> RetryStore:
|
||||
if path is None:
|
||||
return RetryStore()
|
||||
return RetryStore(load_retry_snapshots(path))
|
||||
|
||||
|
||||
def deliver_webhook(
|
||||
@@ -131,26 +342,46 @@ def deliver_webhook(
|
||||
target: str,
|
||||
settings: WebhookSettings,
|
||||
http_post: HttpPost | None = None,
|
||||
attempted_at: datetime | None = None,
|
||||
retry_id: str = "",
|
||||
delivery_mode: str = "direct",
|
||||
) -> dict[str, object]:
|
||||
post = http_post or post_json
|
||||
timeout = (settings.connect_timeout_seconds, settings.read_timeout_seconds)
|
||||
recorded_at = attempted_at or datetime.now(timezone.utc)
|
||||
try:
|
||||
status_code, response_text = post(url, payload, timeout)
|
||||
record = {
|
||||
"ts": datetime.now(timezone.utc).isoformat(),
|
||||
"target": target,
|
||||
"url": url,
|
||||
"status": "ok",
|
||||
"status_code": status_code,
|
||||
"message": response_text,
|
||||
}
|
||||
if 200 <= status_code < 300:
|
||||
record = {
|
||||
"ts": recorded_at.isoformat(),
|
||||
"target": target,
|
||||
"url": url,
|
||||
"status": "ok",
|
||||
"status_code": status_code,
|
||||
"message": response_text,
|
||||
"retry_id": retry_id,
|
||||
"delivery_mode": delivery_mode,
|
||||
}
|
||||
else:
|
||||
record = {
|
||||
"ts": recorded_at.isoformat(),
|
||||
"target": target,
|
||||
"url": url,
|
||||
"status": "error",
|
||||
"status_code": status_code,
|
||||
"message": response_text,
|
||||
"retry_id": retry_id,
|
||||
"delivery_mode": delivery_mode,
|
||||
}
|
||||
except OSError as exc:
|
||||
record = {
|
||||
"ts": datetime.now(timezone.utc).isoformat(),
|
||||
"ts": recorded_at.isoformat(),
|
||||
"target": target,
|
||||
"url": url,
|
||||
"status": "error",
|
||||
"message": str(exc),
|
||||
"retry_id": retry_id,
|
||||
"delivery_mode": delivery_mode,
|
||||
}
|
||||
append_delivery_record(audit_path, record)
|
||||
return record
|
||||
@@ -168,3 +399,63 @@ def append_delivery_record(path: Path, payload: dict[str, object]) -> None:
|
||||
with path.open("a", encoding="utf-8") as handle:
|
||||
handle.write(json.dumps(payload, ensure_ascii=False, sort_keys=True))
|
||||
handle.write("\n")
|
||||
|
||||
|
||||
def normalize_retry_snapshot(snapshot: dict[str, object]) -> dict[str, object]:
|
||||
payload = snapshot.get("payload", {})
|
||||
if not isinstance(payload, dict):
|
||||
payload = {}
|
||||
return {
|
||||
"retry_id": str(snapshot.get("retry_id", "")).strip(),
|
||||
"target": str(snapshot.get("target", "")).strip(),
|
||||
"url": str(snapshot.get("url", "")).strip(),
|
||||
"status": str(snapshot.get("status", "pending")).strip() or "pending",
|
||||
"attempt_count": max(0, int(snapshot.get("attempt_count", 0))),
|
||||
"payload": payload,
|
||||
"created_at": str(snapshot.get("created_at", "")).strip(),
|
||||
"updated_at": str(snapshot.get("updated_at", "")).strip(),
|
||||
"next_attempt_at": str(snapshot.get("next_attempt_at", "")).strip(),
|
||||
"delivered_at": str(snapshot.get("delivered_at", "")).strip(),
|
||||
"last_status_code": optional_int(snapshot.get("last_status_code")),
|
||||
"last_message": str(snapshot.get("last_message", "")).strip(),
|
||||
}
|
||||
|
||||
|
||||
def parse_iso_datetime(value: object) -> datetime | None:
|
||||
text = str(value or "").strip()
|
||||
if not text:
|
||||
return None
|
||||
try:
|
||||
parsed = datetime.fromisoformat(text)
|
||||
except ValueError:
|
||||
return None
|
||||
if parsed.tzinfo is None:
|
||||
return parsed.replace(tzinfo=timezone.utc)
|
||||
return parsed
|
||||
|
||||
|
||||
def schedule_retry(attempted_at: datetime, settings: WebhookSettings, attempt_count: int) -> datetime:
|
||||
exponent = max(0, attempt_count - 1)
|
||||
seconds = min(settings.retry_max_backoff_seconds, settings.retry_backoff_seconds * (2**exponent))
|
||||
return attempted_at + timedelta(seconds=seconds)
|
||||
|
||||
|
||||
def retry_sort_key(snapshot: dict[str, object]) -> tuple[str, str]:
|
||||
return str(snapshot.get("updated_at", "")), str(snapshot.get("retry_id", ""))
|
||||
|
||||
|
||||
def due_retry_sort_key(snapshot: dict[str, object]) -> tuple[str, str]:
|
||||
return str(snapshot.get("next_attempt_at", "")), str(snapshot.get("retry_id", ""))
|
||||
|
||||
|
||||
def optional_int(value: object) -> int | None:
|
||||
try:
|
||||
return int(value) if value is not None and value != "" else None
|
||||
except (TypeError, ValueError):
|
||||
return None
|
||||
|
||||
|
||||
def file_ends_with_newline(path: Path) -> bool:
|
||||
with path.open("rb") as handle:
|
||||
handle.seek(-1, 2)
|
||||
return handle.read(1) == b"\n"
|
||||
|
||||
@@ -1,51 +1,31 @@
|
||||
# Task Todo
|
||||
|
||||
- [x] Review the current project instructions and check for task-relevant lessons.
|
||||
- [x] Check repository status before writing the implementation plan.
|
||||
- [x] Inspect existing engine, CLI, docs, and frontend event handling for disposal-tracking impact.
|
||||
- [x] Write the design spec for webhook case management in an isolated worktree.
|
||||
- [x] Confirm the design with the user before implementation.
|
||||
- [x] Check repository status before starting retry-queue work.
|
||||
- [x] Re-verify that `main` includes webhook case management before layering retries on top.
|
||||
- [x] Inspect the current webhook delivery path, config surface, runtime integration point, and manage API hooks.
|
||||
- [x] Write the detailed retry-queue implementation plan to `docs/superpowers/plans/2026-06-09-webhook-retry-queue.md`.
|
||||
- [x] Execute webhook retry queue backend TDD cycle.
|
||||
- [x] Execute runtime/manage API retry integration TDD cycle.
|
||||
- [x] Update documentation/config formatting for retry queue settings and sinks.
|
||||
- [x] Run targeted verification and final full verification.
|
||||
|
||||
## Design Review
|
||||
## Notes
|
||||
|
||||
- Spec path: `docs/superpowers/specs/2026-06-09-webhook-case-management-design.md`
|
||||
- Scope fixed to local case management plus outbound and inbound webhook integration.
|
||||
- Confirmed behaviors:
|
||||
- manual handling and external callback handling are both supported
|
||||
- cases are created from `time_alarm`, `batch_pending_disposal`, and `warning_escalated`
|
||||
- both batch-event webhooks and case-state webhooks are required
|
||||
- callback `status` is exactly `handled`
|
||||
- callback-applied case handling must emit a `case_event` webhook
|
||||
- `tasks/lessons.md` is absent in this repository/worktree, so there were no prior session lessons to review.
|
||||
- Main branch merge result is available locally at `81f1709`; retry-queue work continues from branch `feat/webhook-retry-queue`.
|
||||
|
||||
## 2026-06-09 Implementation Plan
|
||||
## Review
|
||||
|
||||
- [x] Create isolated worktree for implementation on branch `feat/webhook-case-management`.
|
||||
- [x] Re-check runtime baseline in the worktree and note the local Python environment requirement.
|
||||
- [x] Write the detailed implementation plan to `docs/superpowers/plans/2026-06-09-webhook-case-management-implementation.md`.
|
||||
- [x] Execute backend case-state TDD cycle.
|
||||
- [x] Execute webhook integration TDD cycle.
|
||||
- [x] Execute management API TDD cycle.
|
||||
- [x] Execute frontend case-management TDD cycle.
|
||||
- [x] Run full verification and record outcomes.
|
||||
|
||||
## 2026-06-09 Implementation Review
|
||||
|
||||
- Worktree path: `/Users/glo/.config/superpowers/worktrees/cold_display_guard/webhook-case-management`
|
||||
- Baseline note: the default `python3` in this shell resolves to macOS system Python 3.9 and cannot import the repo's `dataclass(..., slots=True)` code. Python verification in this worktree must run through `eval "$(/opt/homebrew/bin/pyenv init -)" && python ...`, which resolves to Python 3.12.11.
|
||||
- Frontend baseline check in the worktree passed with `node --test web/test/zone-state.test.js`.
|
||||
- Implemented:
|
||||
- `src/cold_display_guard/cases.py` for case lifecycle and JSONL persistence
|
||||
- `src/cold_display_guard/webhooks.py` for outbound event/case webhook delivery and audit logging
|
||||
- runtime integration in `src/cold_display_guard/main.py`
|
||||
- case listing/summary/manual-handle/callback routes in `src/cold_display_guard/manage_api.py`
|
||||
- frontend case summary and manual-handle flow in `web/src/main.js` and `web/src/zone-state.js`
|
||||
- Targeted verification passed during implementation:
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_cases.py -v`
|
||||
- Plan saved to `docs/superpowers/plans/2026-06-09-webhook-retry-queue.md`.
|
||||
- Chosen scope keeps the first outbound webhook attempt synchronous, then persists failures into a JSONL-backed retry queue with bounded backoff and dead-letter cutoff.
|
||||
- Retry queue observability and manual compensation will be exposed through the management API rather than the frontend in this phase.
|
||||
- Implemented queue-aware webhook delivery in `src/cold_display_guard/webhooks.py`, runtime retry draining in `src/cold_display_guard/main.py`, manage API retry list/drain endpoints in `src/cold_display_guard/manage_api.py`, and config/doc updates in `src/cold_display_guard/config.py`, `config/example.toml`, and `README_zh.md`.
|
||||
- Targeted verification passed:
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_manage_api.py -v`
|
||||
- `node --test web/test/zone-state.test.js`
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_config.py -v`
|
||||
- Final verification passed:
|
||||
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest discover -s tests -v`
|
||||
- `cd web && pnpm build`
|
||||
- Frontend build note: the isolated worktree needed `cd web && pnpm install --frozen-lockfile` before `pnpm build` because `node_modules` are not shared into new worktrees.
|
||||
- `cd web && pnpm install --frozen-lockfile && pnpm build`
|
||||
|
||||
@@ -135,8 +135,13 @@ zone_ids = ["1", "2", "3"]
|
||||
"callback_token": "secret",
|
||||
"connect_timeout_seconds": 3,
|
||||
"read_timeout_seconds": 5,
|
||||
"retry_max_attempts": 4,
|
||||
"retry_backoff_seconds": 30,
|
||||
"retry_max_backoff_seconds": 300,
|
||||
"retry_batch_limit": 12,
|
||||
},
|
||||
"case_sink": {"path": "logs/cases.jsonl"},
|
||||
"webhook_retry_sink": {"path": "logs/webhook_retry.jsonl"},
|
||||
},
|
||||
)
|
||||
text = path.read_text(encoding="utf-8")
|
||||
@@ -145,8 +150,14 @@ zone_ids = ["1", "2", "3"]
|
||||
self.assertIn('event_url = "https://example.com/events"', text)
|
||||
self.assertIn('case_url = "https://example.com/cases"', text)
|
||||
self.assertIn('callback_token = "secret"', text)
|
||||
self.assertIn("retry_max_attempts = 4", text)
|
||||
self.assertIn("retry_backoff_seconds = 30", text)
|
||||
self.assertIn("retry_max_backoff_seconds = 300", text)
|
||||
self.assertIn("retry_batch_limit = 12", text)
|
||||
self.assertIn("[case_sink]", text)
|
||||
self.assertIn('path = "logs/cases.jsonl"', text)
|
||||
self.assertIn("[webhook_retry_sink]", text)
|
||||
self.assertIn('path = "logs/webhook_retry.jsonl"', text)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -7,7 +7,14 @@ from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from cold_display_guard.cases import CaseStore
|
||||
from cold_display_guard.main import case_sink_path, deliver_runtime_webhooks, persist_case_updates, restore_runtime_state
|
||||
from cold_display_guard.main import (
|
||||
case_sink_path,
|
||||
deliver_runtime_webhooks,
|
||||
persist_case_updates,
|
||||
restore_runtime_state,
|
||||
webhook_retry_sink_path,
|
||||
)
|
||||
from cold_display_guard.webhooks import load_retry_snapshots
|
||||
|
||||
|
||||
UTC = timezone.utc
|
||||
@@ -22,6 +29,14 @@ class RuntimeRestoreTests(unittest.TestCase):
|
||||
|
||||
self.assertEqual(path, (root / "logs" / "cases.jsonl").resolve())
|
||||
|
||||
def test_webhook_retry_sink_path_uses_default_logs_location(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
|
||||
path = webhook_retry_sink_path(root, {})
|
||||
|
||||
self.assertEqual(path, (root / "logs" / "webhook_retry.jsonl").resolve())
|
||||
|
||||
def test_persist_case_updates_writes_case_snapshots(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
path = Path(tmpdir) / "cases.jsonl"
|
||||
@@ -99,6 +114,61 @@ class RuntimeRestoreTests(unittest.TestCase):
|
||||
self.assertEqual(deliveries[0][1]["kind"], "batch_event")
|
||||
self.assertEqual(deliveries[1][1]["kind"], "case_event")
|
||||
|
||||
def test_deliver_runtime_webhooks_enqueues_failure_and_drains_due_retry(self) -> None:
|
||||
attempts = {"count": 0}
|
||||
|
||||
def flaky_post(url: str, payload: dict[str, object], timeout: tuple[float, float]) -> tuple[int, str]:
|
||||
attempts["count"] += 1
|
||||
if attempts["count"] == 1:
|
||||
return 503, "down"
|
||||
return 200, "ok"
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
audit_path = Path(tmpdir) / "webhook_delivery.jsonl"
|
||||
retry_path = Path(tmpdir) / "webhook_retry.jsonl"
|
||||
config = {
|
||||
"webhooks": {
|
||||
"enabled": True,
|
||||
"event_url": "https://example.com/events",
|
||||
"retry_max_attempts": 3,
|
||||
"retry_backoff_seconds": 30,
|
||||
}
|
||||
}
|
||||
deliver_runtime_webhooks(
|
||||
[
|
||||
{
|
||||
"event": "time_alarm",
|
||||
"ts": datetime(2026, 6, 9, 9, 0, tzinfo=UTC).isoformat(),
|
||||
"batch_id": "batch_000001",
|
||||
"camera_id": "cam_01",
|
||||
"zone_id": "1",
|
||||
"zone_label": "区域 1",
|
||||
"severity": "alarm",
|
||||
"state": "alerted",
|
||||
}
|
||||
],
|
||||
[],
|
||||
config,
|
||||
audit_path,
|
||||
retry_path=retry_path,
|
||||
http_post=flaky_post,
|
||||
now=datetime(2026, 6, 9, 9, 0, tzinfo=UTC),
|
||||
)
|
||||
deliver_runtime_webhooks(
|
||||
[],
|
||||
[],
|
||||
config,
|
||||
audit_path,
|
||||
retry_path=retry_path,
|
||||
http_post=flaky_post,
|
||||
now=datetime(2026, 6, 9, 9, 1, tzinfo=UTC),
|
||||
)
|
||||
retries = load_retry_snapshots(retry_path)
|
||||
|
||||
self.assertEqual(attempts["count"], 2)
|
||||
self.assertEqual(retries[-1]["status"], "delivered")
|
||||
self.assertEqual(retries[-1]["attempt_count"], 2)
|
||||
|
||||
def test_restore_runtime_state_uses_stable_occupancy_when_raw_metrics_flicker(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
diagnostics_path = Path(tmpdir) / "runtime_diagnostics.jsonl"
|
||||
|
||||
@@ -7,6 +7,7 @@ import threading
|
||||
import unittest
|
||||
from http.server import ThreadingHTTPServer
|
||||
from pathlib import Path
|
||||
from unittest import mock
|
||||
|
||||
from cold_display_guard.config import load_config_document, merge_calibration, save_config_document
|
||||
from cold_display_guard.manage_api import ManageContext, build_summary, config_payload, create_handler
|
||||
@@ -390,11 +391,13 @@ class ManageApiTests(unittest.TestCase):
|
||||
config_path,
|
||||
{
|
||||
"case_sink": {"path": "logs/cases.jsonl"},
|
||||
"webhook_retry_sink": {"path": "logs/webhook_retry.jsonl"},
|
||||
"webhooks": {
|
||||
"enabled": True,
|
||||
"event_url": "https://example.com/events",
|
||||
"case_url": "https://example.com/cases",
|
||||
"callback_token": "secret",
|
||||
"retry_max_attempts": 4,
|
||||
},
|
||||
},
|
||||
)
|
||||
@@ -402,7 +405,9 @@ class ManageApiTests(unittest.TestCase):
|
||||
payload = config_payload(ManageContext(config_path=config_path, project_root=root))
|
||||
|
||||
self.assertEqual(payload["case_sink"]["path"], str((root / "logs" / "cases.jsonl").resolve()))
|
||||
self.assertEqual(payload["webhook_retry_sink"]["path"], str((root / "logs" / "webhook_retry.jsonl").resolve()))
|
||||
self.assertTrue(payload["webhooks"]["enabled"])
|
||||
self.assertEqual(payload["webhooks"]["retry_max_attempts"], 4)
|
||||
self.assertNotIn("callback_token", payload["webhooks"])
|
||||
|
||||
def test_cases_endpoint_returns_latest_snapshots(self) -> None:
|
||||
@@ -560,6 +565,144 @@ class ManageApiTests(unittest.TestCase):
|
||||
self.assertEqual(lines[-1]["handled_source"], "manual")
|
||||
self.assertEqual(lines[-1]["payload"]["note"], "checked")
|
||||
|
||||
def test_manual_handle_endpoint_enqueues_failed_case_webhook_for_retry(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
config_path = root / "config" / "local.toml"
|
||||
save_config_document(
|
||||
config_path,
|
||||
{
|
||||
"case_sink": {"path": "logs/cases.jsonl"},
|
||||
"webhooks": {
|
||||
"enabled": True,
|
||||
"case_url": "https://example.com/cases",
|
||||
"retry_max_attempts": 3,
|
||||
"retry_backoff_seconds": 30,
|
||||
},
|
||||
"layout": {"zone_ids": ["1"]},
|
||||
},
|
||||
)
|
||||
cases_path = root / "logs" / "cases.jsonl"
|
||||
retry_path = root / "logs" / "webhook_retry.jsonl"
|
||||
cases_path.parent.mkdir()
|
||||
cases_path.write_text(
|
||||
json.dumps(
|
||||
{
|
||||
"case_id": "case_batch_000001",
|
||||
"batch_id": "batch_000001",
|
||||
"camera_id": "cam_01",
|
||||
"zone_id": "1",
|
||||
"zone_label": "区域 1",
|
||||
"case_type": "time_alarm",
|
||||
"case_status": "open",
|
||||
"source_event": "time_alarm",
|
||||
"created_at": "2026-06-09T09:00:00+08:00",
|
||||
"updated_at": "2026-06-09T09:00:00+08:00",
|
||||
"payload": {},
|
||||
}
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
ctx = ManageContext(config_path=config_path, project_root=root)
|
||||
server, thread = self._serve_once(ctx)
|
||||
try:
|
||||
with mock.patch("cold_display_guard.webhooks.post_json", side_effect=OSError("network down")):
|
||||
status, payload = self._request(
|
||||
server,
|
||||
"POST",
|
||||
"/api/manage/cases/case_batch_000001/handle",
|
||||
body={"handled_by": "alice"},
|
||||
)
|
||||
finally:
|
||||
self._stop_server(server, thread)
|
||||
|
||||
retries = [json.loads(line) for line in retry_path.read_text(encoding="utf-8").splitlines()]
|
||||
|
||||
self.assertEqual(status, 200)
|
||||
self.assertEqual(payload["case_status"], "handled")
|
||||
self.assertEqual(retries[-1]["status"], "pending")
|
||||
self.assertEqual(retries[-1]["target"], "case_event")
|
||||
self.assertEqual(retries[-1]["attempt_count"], 1)
|
||||
|
||||
def test_retry_queue_endpoint_returns_pending_items(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
config_path = root / "config" / "local.toml"
|
||||
save_config_document(config_path, {"layout": {"zone_ids": ["1"]}})
|
||||
retry_path = root / "logs" / "webhook_retry.jsonl"
|
||||
retry_path.parent.mkdir()
|
||||
retry_path.write_text(
|
||||
json.dumps(
|
||||
{
|
||||
"retry_id": "retry_000001",
|
||||
"target": "case_event",
|
||||
"url": "https://example.com/cases",
|
||||
"status": "pending",
|
||||
"attempt_count": 1,
|
||||
"payload": {"kind": "case_event"},
|
||||
"created_at": "2026-06-09T09:00:00+08:00",
|
||||
"updated_at": "2026-06-09T09:00:00+08:00",
|
||||
"next_attempt_at": "2026-06-09T09:01:00+08:00",
|
||||
}
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
ctx = ManageContext(config_path=config_path, project_root=root)
|
||||
server, thread = self._serve_once(ctx)
|
||||
try:
|
||||
status, payload = self._request(server, "GET", "/api/manage/webhooks/retries?status=pending")
|
||||
finally:
|
||||
self._stop_server(server, thread)
|
||||
|
||||
self.assertEqual(status, 200)
|
||||
self.assertEqual(payload["items"][0]["retry_id"], "retry_000001")
|
||||
self.assertEqual(payload["items"][0]["status"], "pending")
|
||||
|
||||
def test_retry_drain_endpoint_retries_pending_item(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
config_path = root / "config" / "local.toml"
|
||||
save_config_document(
|
||||
config_path,
|
||||
{
|
||||
"webhooks": {"enabled": True, "retry_max_attempts": 3, "retry_backoff_seconds": 30},
|
||||
"layout": {"zone_ids": ["1"]},
|
||||
},
|
||||
)
|
||||
retry_path = root / "logs" / "webhook_retry.jsonl"
|
||||
retry_path.parent.mkdir()
|
||||
retry_path.write_text(
|
||||
json.dumps(
|
||||
{
|
||||
"retry_id": "retry_000001",
|
||||
"target": "case_event",
|
||||
"url": "https://example.com/cases",
|
||||
"status": "pending",
|
||||
"attempt_count": 1,
|
||||
"payload": {"kind": "case_event", "case_id": "case_batch_000001"},
|
||||
"created_at": "2026-06-09T09:00:00+08:00",
|
||||
"updated_at": "2026-06-09T09:00:00+08:00",
|
||||
"next_attempt_at": "2026-06-09T09:01:00+08:00",
|
||||
}
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
ctx = ManageContext(config_path=config_path, project_root=root)
|
||||
server, thread = self._serve_once(ctx)
|
||||
try:
|
||||
with mock.patch("cold_display_guard.webhooks.post_json", return_value=(200, "ok")):
|
||||
status, payload = self._request(server, "POST", "/api/manage/webhooks/retries/drain", body={})
|
||||
finally:
|
||||
self._stop_server(server, thread)
|
||||
|
||||
lines = [json.loads(line) for line in retry_path.read_text(encoding="utf-8").splitlines()]
|
||||
|
||||
self.assertEqual(status, 200)
|
||||
self.assertEqual(payload["retried_count"], 1)
|
||||
self.assertEqual(payload["delivered_count"], 1)
|
||||
self.assertEqual(lines[-1]["status"], "delivered")
|
||||
self.assertEqual(lines[-1]["attempt_count"], 2)
|
||||
|
||||
def test_callback_endpoint_requires_token_and_handles_case(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
|
||||
@@ -9,7 +9,9 @@ from pathlib import Path
|
||||
from cold_display_guard.webhooks import (
|
||||
build_batch_event_payload,
|
||||
build_case_event_payload,
|
||||
drain_webhook_retries,
|
||||
load_webhook_settings,
|
||||
load_retry_snapshots,
|
||||
send_batch_event_webhooks,
|
||||
send_case_webhooks,
|
||||
)
|
||||
@@ -29,6 +31,10 @@ class WebhookTests(unittest.TestCase):
|
||||
"callback_token": "secret",
|
||||
"connect_timeout_seconds": 4,
|
||||
"read_timeout_seconds": 6,
|
||||
"retry_max_attempts": 4,
|
||||
"retry_backoff_seconds": 15,
|
||||
"retry_max_backoff_seconds": 90,
|
||||
"retry_batch_limit": 8,
|
||||
}
|
||||
}
|
||||
)
|
||||
@@ -39,6 +45,10 @@ class WebhookTests(unittest.TestCase):
|
||||
self.assertEqual(settings.callback_token, "secret")
|
||||
self.assertEqual(settings.connect_timeout_seconds, 4)
|
||||
self.assertEqual(settings.read_timeout_seconds, 6)
|
||||
self.assertEqual(settings.retry_max_attempts, 4)
|
||||
self.assertEqual(settings.retry_backoff_seconds, 15)
|
||||
self.assertEqual(settings.retry_max_backoff_seconds, 90)
|
||||
self.assertEqual(settings.retry_batch_limit, 8)
|
||||
|
||||
def test_build_batch_event_payload_wraps_runtime_event(self) -> None:
|
||||
payload = build_batch_event_payload(
|
||||
@@ -182,6 +192,139 @@ class WebhookTests(unittest.TestCase):
|
||||
self.assertEqual(logged[0]["target"], "batch_event")
|
||||
self.assertIn("network down", logged[0]["message"])
|
||||
|
||||
def test_non_2xx_delivery_is_enqueued_for_retry(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
audit_path = Path(tmpdir) / "webhook_delivery.jsonl"
|
||||
retry_path = Path(tmpdir) / "webhook_retry.jsonl"
|
||||
send_batch_event_webhooks(
|
||||
[
|
||||
{
|
||||
"event": "time_alarm",
|
||||
"ts": datetime(2026, 6, 9, 9, 0, tzinfo=UTC).isoformat(),
|
||||
"batch_id": "batch_000001",
|
||||
"camera_id": "cam_01",
|
||||
"zone_id": "1",
|
||||
"zone_label": "区域 1",
|
||||
"severity": "alarm",
|
||||
"state": "alerted",
|
||||
}
|
||||
],
|
||||
{
|
||||
"webhooks": {
|
||||
"enabled": True,
|
||||
"event_url": "https://example.com/events",
|
||||
"retry_max_attempts": 3,
|
||||
"retry_backoff_seconds": 30,
|
||||
}
|
||||
},
|
||||
audit_path,
|
||||
retry_path=retry_path,
|
||||
http_post=lambda url, payload, timeout: (503, "service unavailable"),
|
||||
now=datetime(2026, 6, 9, 9, 0, tzinfo=UTC),
|
||||
)
|
||||
|
||||
retries = load_retry_snapshots(retry_path)
|
||||
logged = [json.loads(line) for line in audit_path.read_text(encoding="utf-8").splitlines()]
|
||||
|
||||
self.assertEqual(logged[0]["status"], "error")
|
||||
self.assertEqual(logged[0]["status_code"], 503)
|
||||
self.assertEqual(retries[-1]["status"], "pending")
|
||||
self.assertEqual(retries[-1]["attempt_count"], 1)
|
||||
self.assertEqual(retries[-1]["target"], "batch_event")
|
||||
self.assertEqual(retries[-1]["url"], "https://example.com/events")
|
||||
|
||||
def test_due_retry_is_marked_delivered_after_success(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
audit_path = Path(tmpdir) / "webhook_delivery.jsonl"
|
||||
retry_path = Path(tmpdir) / "webhook_retry.jsonl"
|
||||
config = {
|
||||
"webhooks": {
|
||||
"enabled": True,
|
||||
"event_url": "https://example.com/events",
|
||||
"retry_max_attempts": 3,
|
||||
"retry_backoff_seconds": 30,
|
||||
}
|
||||
}
|
||||
send_batch_event_webhooks(
|
||||
[
|
||||
{
|
||||
"event": "time_alarm",
|
||||
"ts": datetime(2026, 6, 9, 9, 0, tzinfo=UTC).isoformat(),
|
||||
"batch_id": "batch_000001",
|
||||
"camera_id": "cam_01",
|
||||
"zone_id": "1",
|
||||
"zone_label": "区域 1",
|
||||
"severity": "alarm",
|
||||
"state": "alerted",
|
||||
}
|
||||
],
|
||||
config,
|
||||
audit_path,
|
||||
retry_path=retry_path,
|
||||
http_post=lambda url, payload, timeout: (503, "service unavailable"),
|
||||
now=datetime(2026, 6, 9, 9, 0, tzinfo=UTC),
|
||||
)
|
||||
|
||||
drained = drain_webhook_retries(
|
||||
config,
|
||||
retry_path,
|
||||
audit_path,
|
||||
http_post=lambda url, payload, timeout: (200, "ok"),
|
||||
now=datetime(2026, 6, 9, 9, 1, tzinfo=UTC),
|
||||
)
|
||||
retries = load_retry_snapshots(retry_path)
|
||||
|
||||
self.assertEqual(len(drained), 1)
|
||||
self.assertEqual(retries[-1]["status"], "delivered")
|
||||
self.assertEqual(retries[-1]["attempt_count"], 2)
|
||||
self.assertEqual(retries[-1]["last_status_code"], 200)
|
||||
|
||||
def test_retry_reaches_dead_letter_after_attempt_limit(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
audit_path = Path(tmpdir) / "webhook_delivery.jsonl"
|
||||
retry_path = Path(tmpdir) / "webhook_retry.jsonl"
|
||||
config = {
|
||||
"webhooks": {
|
||||
"enabled": True,
|
||||
"event_url": "https://example.com/events",
|
||||
"retry_max_attempts": 2,
|
||||
"retry_backoff_seconds": 30,
|
||||
}
|
||||
}
|
||||
send_batch_event_webhooks(
|
||||
[
|
||||
{
|
||||
"event": "time_alarm",
|
||||
"ts": datetime(2026, 6, 9, 9, 0, tzinfo=UTC).isoformat(),
|
||||
"batch_id": "batch_000001",
|
||||
"camera_id": "cam_01",
|
||||
"zone_id": "1",
|
||||
"zone_label": "区域 1",
|
||||
"severity": "alarm",
|
||||
"state": "alerted",
|
||||
}
|
||||
],
|
||||
config,
|
||||
audit_path,
|
||||
retry_path=retry_path,
|
||||
http_post=lambda url, payload, timeout: (503, "service unavailable"),
|
||||
now=datetime(2026, 6, 9, 9, 0, tzinfo=UTC),
|
||||
)
|
||||
|
||||
drained = drain_webhook_retries(
|
||||
config,
|
||||
retry_path,
|
||||
audit_path,
|
||||
http_post=lambda url, payload, timeout: (503, "still down"),
|
||||
now=datetime(2026, 6, 9, 9, 1, tzinfo=UTC),
|
||||
)
|
||||
retries = load_retry_snapshots(retry_path)
|
||||
|
||||
self.assertEqual(len(drained), 1)
|
||||
self.assertEqual(retries[-1]["status"], "dead_letter")
|
||||
self.assertEqual(retries[-1]["attempt_count"], 2)
|
||||
self.assertEqual(retries[-1]["last_status_code"], 503)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
|
||||
Reference in New Issue
Block a user