feat: add webhook retry queue

This commit is contained in:
2026-06-09 11:32:34 +08:00
parent 81f170924c
commit 8f516fdc01
12 changed files with 940 additions and 74 deletions

View File

@@ -0,0 +1,105 @@
# Webhook Retry Queue Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add persistent webhook retry queue handling so failed outbound webhook deliveries are retried with backoff instead of being recorded only as one-shot failures.
**Architecture:** Keep the current synchronous direct-send path as the first attempt, but persist failed outbound deliveries into a separate append-only retry-state JSONL log. Reconstruct the latest retry state from that log, retry due items from runtime and management API entry points, and expose queue visibility plus manual drain control through the existing management API.
**Tech Stack:** Python 3.12 standard library backend, JSONL persistence, unittest, existing Vite frontend left unchanged for this phase.
---
### Task 1: Retry Queue Model And Delivery Semantics
**Files:**
- Modify: `src/cold_display_guard/webhooks.py`
- Test: `tests/test_webhooks.py`
- [ ] **Step 1: Write failing retry-queue tests**
Add tests for:
- non-2xx direct delivery is treated as failure rather than success
- failed direct delivery appends a pending retry snapshot
- due retry success marks the queued item delivered
- repeated retry failure increments attempts and eventually becomes `dead_letter`
- [ ] **Step 2: Run test to verify it fails**
Run: `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py -v`
Expected: FAIL because retry queue helpers and non-2xx handling do not exist yet.
- [ ] **Step 3: Implement minimal retry queue support**
In `src/cold_display_guard/webhooks.py`:
- add webhook retry settings parsing
- add retry snapshot load/append helpers
- add in-memory retry store operations
- treat only HTTP `2xx` as successful delivery
- enqueue failed direct deliveries
- retry due queued deliveries with bounded exponential backoff and dead-letter cutoff
- [ ] **Step 4: Run test to verify it passes**
Run: `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py -v`
Expected: PASS
### Task 2: Runtime And Manage API Integration
**Files:**
- Modify: `src/cold_display_guard/main.py`
- Modify: `src/cold_display_guard/manage_api.py`
- Test: `tests/test_main.py`
- Test: `tests/test_manage_api.py`
- [ ] **Step 1: Write failing integration tests**
Add tests for:
- runtime delivery enqueues failed outbound webhooks and drains due retries
- manual case handling uses the queue-aware sender
- management API can list queued retry items
- management API can manually trigger a retry drain and report results
- [ ] **Step 2: Run test to verify it fails**
Run:
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_manage_api.py -v`
Expected: FAIL because runtime/API do not know about queue paths or drain actions yet.
- [ ] **Step 3: Implement minimal integration**
- add retry-queue path resolution to runtime and management API
- make runtime direct sends queue-aware and drain due items each cycle
- make case-handle callbacks/manual operations queue-aware
- add `GET /api/manage/webhooks/retries`
- add `POST /api/manage/webhooks/retries/drain`
- [ ] **Step 4: Run test to verify it passes**
Run:
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_manage_api.py -v`
Expected: PASS
### Task 3: Config Surface, Docs, And Final Verification
**Files:**
- Modify: `src/cold_display_guard/config.py`
- Modify: `config/example.toml`
- Modify: `README_zh.md`
- Test: `tests/test_config.py`
- [ ] **Step 1: Write failing config/doc tests**
Extend config tests so saved config output includes retry queue sink/settings.
- [ ] **Step 2: Run test to verify it fails**
Run: `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_config.py -v`
Expected: FAIL because retry queue config formatting does not exist yet.
- [ ] **Step 3: Implement config and docs updates**
- add defaults for retry queue sink path and retry policy settings
- expose the non-secret retry config in manage config payload
- document retry queue behavior, new log file, and manual drain/list endpoints
- [ ] **Step 4: Run targeted and full verification**
Run:
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_config.py -v`
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_webhooks.py -v`
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_main.py -v`
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest tests/test_manage_api.py -v`
- `eval "$(/opt/homebrew/bin/pyenv init -)" && PYTHONPATH=src python -m unittest discover -s tests -v`
Expected: PASS