Files
musicdl-catalog-sync-suite/catalog-sync/docs/superpowers/plans/2026-04-20-download-dual-pool-pipeline.md
T

757 lines
26 KiB
Markdown

# Download Dual-Pool Pipeline Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Split the download stage into resolver workers and downloader workers so configured download concurrency is spent on real transfers instead of long source-resolution work.
**Architecture:** Keep deferred snapshots and current database schema unchanged, refactor `CatalogDownloader` into explicit resolve-only and download-only phases, and add a runner-level in-memory `ready_queue` that connects a small resolver pool to a larger downloader pool during `download` stages.
**Tech Stack:** Python, sqlite3, unittest, ThreadPoolExecutor, queue.Queue, FastAPI ops dashboard
---
## File Map
### Existing Files To Modify
- `musicdl/catalogsync/downloader.py`
- Split current mixed `resolve + download` flow into reusable resolve-only and download-only methods.
- `musicdl/catalogsync/ops/executors.py`
- Add download-stage helpers for resolve-only and download-only item handling without breaking existing item status semantics.
- `musicdl/catalogsync/ops/runner.py`
- Add the runner-level dual-pool execution path for `download` stages.
- `tests/catalogsync/test_services.py`
- Lock the downloader API refactor with unit tests.
- `tests/catalogsync/test_ops_executors.py`
- Lock download-stage executor behavior for resolved tasks and failure handling.
- `tests/catalogsync/test_ops_runner.py`
- Lock the dual-pool queueing model, worker split, and stage lifecycle.
- `tests/catalogsync/test_ops_api.py`
- Lock dashboard worker visibility for resolver and downloader worker families if any payload expectations need adjustment.
### New Files To Create
- None required for the first implementation. Keep the change focused and schema-free.
---
### Task 1: Lock `CatalogDownloader` Into Resolve And Download Phases
**Files:**
- Modify: `tests/catalogsync/test_services.py`
- Modify: `musicdl/catalogsync/downloader.py`
- [ ] **Step 1: Write the failing tests for resolve-only and download-only behavior**
Add tests near the existing downloader tests in `tests/catalogsync/test_services.py` for:
```python
def test_catalog_downloader_resolve_song_row_returns_resolved_payload():
payload = downloader.resolve_song_row(
row={
"id": song_id,
"playlist_id": 123,
"platform": "netease",
"name": "Song Resolve",
"singers": "Singer Resolve",
"ext": "mp3",
"file_size_bytes": 24,
"metadata_json": '{"snapshot":{"identifier":"song-resolve"}}',
},
library_root=library_root,
download_sources=["qq", "kuwo"],
worker_callback=lambda **state: worker_updates.append(dict(state)),
)
assert payload is not None
assert payload["row"]["id"] == song_id
assert payload["display_text"] == "Song Resolve / Singer Resolve"
assert payload["resolved_song_info"].source == "QQMusicClient"
```
```python
def test_catalog_downloader_download_resolved_song_reports_progress_and_records_file():
ok = downloader.download_resolved_song(
resolved_payload=resolved_payload,
worker_callback=lambda **state: worker_updates.append(dict(state)),
)
assert ok is True
assert any(int(state.get("downloaded_bytes") or 0) > 0 for state in worker_updates)
assert repo.song_has_active_local_file(song_id) is True
```
```python
def test_catalog_downloader_download_song_row_remains_a_compatibility_wrapper():
ok = downloader.download_song_row(
row=row,
library_root=library_root,
download_sources=["qq"],
worker_callback=lambda **state: worker_updates.append(dict(state)),
)
assert ok is True
assert any("resolving source" in str(state.get("last_progress_text") or "") for state in worker_updates)
assert any("starting download via" in str(state.get("last_progress_text") or "") for state in worker_updates)
```
- [ ] **Step 2: Run the focused tests and verify they fail for missing APIs**
Run:
```bash
python -m pytest tests/catalogsync/test_services.py -k "resolve_song_row or download_resolved_song or compatibility_wrapper" -q
```
Expected:
- `FAILED`
- failure mentions missing `resolve_song_row` or `download_resolved_song`, or existing wrapper behavior not matching the new tests
- [ ] **Step 3: Implement minimal resolve-only and download-only APIs in `CatalogDownloader`**
Refactor `musicdl/catalogsync/downloader.py` toward this shape:
```python
@dataclass
class ResolvedDownloadPayload:
row: dict[str, object]
display_text: str
default_root: Path
target_root: Path
backend_id: int
expected_bytes: int | None
resolved_song_info: object
def resolve_song_row(
self,
row,
library_root: str | Path,
download_sources: list[str] | None = None,
worker_callback=None,
) -> ResolvedDownloadPayload | None:
row_dict = dict(row)
default_root = Path(library_root).resolve()
self._current_library_root = default_root
self.repository.ensure_local_backend(default_root, name="default-local", is_default=True)
display_name = str(row_dict.get("name") or row_dict.get("id") or "")
singers = str(row_dict.get("singers") or "").strip()
display_text = f"{display_name} / {singers}".strip(" /")
self._emit_worker_progress(row_dict, worker_callback, display_text=display_text)
metadata = json.loads(row_dict["metadata_json"]) if row_dict.get("metadata_json") else {}
song_info = deserialize_song_info(metadata.get("snapshot"))
if song_info is None:
return None
resolve_progress_callback = None
if worker_callback is not None:
resolve_progress_callback = lambda message: self._emit_worker_progress(
row_dict,
worker_callback,
display_text=display_text,
last_progress_text=message,
)
song_info = self.resolve_song_info_for_download(
row=row_dict,
song_info=song_info,
download_sources=download_sources,
progress_callback=resolve_progress_callback,
)
download_platform = self._detect_download_platform(song_info, row_dict["platform"])
target_root = self.ensure_space(
default_root,
getattr(song_info, "file_size_bytes", None) or row_dict.get("file_size_bytes"),
)
is_default_root = target_root.resolve() == default_root
backend_id = self.repository.ensure_local_backend(
target_root,
name="default-local" if is_default_root else None,
is_default=is_default_root,
)
expected_bytes = int(getattr(song_info, "file_size_bytes", None) or row_dict.get("file_size_bytes") or 0) or None
return ResolvedDownloadPayload(
row=row_dict,
display_text=display_text,
default_root=default_root,
target_root=target_root,
backend_id=backend_id,
expected_bytes=expected_bytes,
resolved_song_info=song_info,
)
```
```python
def download_resolved_song(
self,
resolved_payload: ResolvedDownloadPayload,
worker_callback=None,
) -> bool:
row = resolved_payload.row
song_info = resolved_payload.resolved_song_info
download_platform = self._detect_download_platform(song_info, row["platform"])
client = self.get_client(download_platform)
singers = self._normalize_singers(getattr(song_info, "singers", None)) or self._normalize_singers(row.get("singers"))
relative_dir = build_download_relative_dir(platform=download_platform, singers=singers)
target_dir = resolved_payload.target_root / relative_dir
target_dir.mkdir(parents=True, exist_ok=True)
song_info.work_dir = str(target_dir)
if hasattr(song_info, "_save_path"):
song_info._save_path = None
self._emit_worker_progress(
row,
worker_callback,
display_text=resolved_payload.display_text,
last_progress_text=f"starting download via {download_platform}",
)
# keep existing monitor, client.download, and record_local_file logic here
```
```python
def download_song_row(...):
resolved_payload = self.resolve_song_row(
row=row,
library_root=library_root,
download_sources=download_sources,
worker_callback=worker_callback,
)
if resolved_payload is None:
return False
return self.download_resolved_song(
resolved_payload=resolved_payload,
worker_callback=worker_callback,
)
```
- [ ] **Step 4: Run the focused tests and verify they pass**
Run:
```bash
python -m pytest tests/catalogsync/test_services.py -k "resolve_song_row or download_resolved_song or compatibility_wrapper" -q
```
Expected:
- `PASS`
- no regressions in existing progress tests around `resolving source ...` and `starting download via ...`
- [ ] **Step 5: Commit**
```bash
git add tests/catalogsync/test_services.py musicdl/catalogsync/downloader.py
git commit -m "refactor: split downloader resolve and transfer phases"
```
### Task 2: Lock Download Executor Helpers For Resolved Tasks
**Files:**
- Modify: `tests/catalogsync/test_ops_executors.py`
- Modify: `musicdl/catalogsync/ops/executors.py`
- [ ] **Step 1: Write the failing executor tests for resolve-only and download-only item handling**
Add focused tests in `tests/catalogsync/test_ops_executors.py`:
```python
def test_download_executor_resolve_item_marks_failed_when_resolution_returns_none():
with patch.object(CatalogDownloader, "resolve_song_row", return_value=None):
executor.process_resolve_item(
item_id=item_id,
worker_name="resolve-1",
ready_queue=ready_queue,
)
item = ops_repo.get_item(item_id)
assert item.status == ItemStatus.FAILED
assert ready_queue.empty()
```
```python
def test_download_executor_resolve_item_enqueues_resolved_payload():
resolved_payload = SimpleNamespace(row=row, display_text="Song A / Singer A")
with patch.object(CatalogDownloader, "resolve_song_row", return_value=resolved_payload):
executor.process_resolve_item(
item_id=item_id,
worker_name="resolve-1",
ready_queue=ready_queue,
)
queued = ready_queue.get_nowait()
assert queued.item_id == item_id
assert queued.resolved_payload is resolved_payload
```
```python
def test_download_executor_download_resolved_item_marks_item_succeeded():
with patch.object(CatalogDownloader, "download_resolved_song", return_value=True):
executor.process_download_task(
task=resolved_task,
worker_name="download-1",
)
item = ops_repo.get_item(item_id)
assert item.status == ItemStatus.SUCCEEDED
```
- [ ] **Step 2: Run the focused executor tests and verify they fail**
Run:
```bash
python -m pytest tests/catalogsync/test_ops_executors.py -k "process_resolve_item or process_download_task" -q
```
Expected:
- `FAILED`
- failure mentions missing `process_resolve_item` or `process_download_task`
- [ ] **Step 3: Add minimal executor helpers without removing current compatibility path**
Extend `musicdl/catalogsync/ops/executors.py` around `DownloadStageExecutor` with helpers shaped like:
```python
@dataclass
class ResolvedStageDownloadTask:
item_id: int
playlist_id: int | None
row: dict[str, object]
resolved_payload: object
def process_resolve_item(self, item_id: int, worker_name: str, *, ready_queue) -> None:
row = self.ops_repo.build_download_row(item_id=item_id)
song_id = int(row.get("id") or row.get("song_id") or 0)
if song_id > 0 and self.catalog_repo.song_has_active_local_file(song_id):
self.ops_repo.update_worker_state(
worker_name=worker_name,
current_job_item_id=item_id,
status="running",
current_song_id=song_id,
current_playlist_id=row.get("playlist_id"),
current_display_text=str(row.get("name") or row.get("id") or song_id),
last_progress_text="already downloaded",
)
_ensure_transition_applied(
self.ops_repo.mark_item_succeeded(
item_id=item_id,
result_payload={"already_downloaded": True},
),
item_id=item_id,
action="mark_item_succeeded",
)
return
resolved_payload = self.downloader.resolve_song_row(
row=row,
library_root=self.library_root,
download_sources=self.download_sources,
worker_callback=lambda **state: self.ops_repo.update_worker_state(
worker_name=worker_name,
current_job_item_id=item_id,
status="running",
**state,
),
)
if resolved_payload is None:
_ensure_transition_applied(
self.ops_repo.mark_item_failed(item_id=item_id, error_message="resolve returned no downloadable song"),
item_id=item_id,
action="mark_item_failed",
)
return
ready_queue.put(
ResolvedStageDownloadTask(
item_id=item_id,
playlist_id=row.get("playlist_id"),
row=row,
resolved_payload=resolved_payload,
)
)
```
```python
def process_download_task(self, task: ResolvedStageDownloadTask, worker_name: str) -> None:
succeeded = self.downloader.download_resolved_song(
resolved_payload=task.resolved_payload,
worker_callback=lambda **state: self.ops_repo.update_worker_state(
worker_name=worker_name,
current_job_item_id=task.item_id,
status="running",
**state,
),
)
if succeeded:
_ensure_transition_applied(
self.ops_repo.mark_item_succeeded(item_id=task.item_id),
item_id=task.item_id,
action="mark_item_succeeded",
)
return
_ensure_transition_applied(
self.ops_repo.mark_item_failed(item_id=task.item_id, error_message="download returned no file"),
item_id=task.item_id,
action="mark_item_failed",
)
```
Keep `process_item(...)` as the compatibility path by delegating to `download_song_row(...)`.
- [ ] **Step 4: Run the focused executor tests and verify they pass**
Run:
```bash
python -m pytest tests/catalogsync/test_ops_executors.py -k "process_resolve_item or process_download_task" -q
```
Expected:
- `PASS`
- existing `DownloadStageExecutor.process_item(...)` tests remain green
- [ ] **Step 5: Commit**
```bash
git add tests/catalogsync/test_ops_executors.py musicdl/catalogsync/ops/executors.py
git commit -m "refactor: add staged download executor helpers"
```
### Task 3: Lock The Runner-Level Dual-Pool Pipeline
**Files:**
- Modify: `tests/catalogsync/test_ops_runner.py`
- Modify: `musicdl/catalogsync/ops/runner.py`
- [ ] **Step 1: Write the failing runner tests for worker splitting and queue-driven execution**
Add tests in `tests/catalogsync/test_ops_runner.py` for:
```python
def test_download_stage_splits_workers_into_resolve_and_download_pools():
resolver_workers, download_workers = runner._download_stage_worker_split(total_workers=10)
assert resolver_workers == 3
assert download_workers == 7
```
```python
def test_download_stage_pipeline_processes_items_through_ready_queue():
processed = []
class FakeDownloadExecutor:
def process_resolve_item(self, item_id, worker_name, *, ready_queue):
ready_queue.put(SimpleNamespace(item_id=item_id, playlist_id=None, resolved_payload=f"payload-{item_id}"))
def process_download_task(self, task, worker_name):
processed.append((task.item_id, worker_name, task.resolved_payload))
repo.mark_item_succeeded(task.item_id)
with patch.object(runner, "_build_executor", return_value=FakeDownloadExecutor()):
runner._run_stage(job, stage)
assert processed
assert all(worker_name.startswith("download-") for _, worker_name, _ in processed)
```
```python
def test_download_stage_pipeline_uses_single_thread_compatibility_when_worker_count_is_one():
calls = []
class FakeDownloadExecutor:
def process_item(self, item_id, worker_name, *, already_claimed=False):
calls.append((item_id, worker_name, already_claimed))
repo.mark_item_succeeded(item_id)
with patch.object(runner, "_build_executor", return_value=FakeDownloadExecutor()):
runner._run_stage(job, stage)
assert calls
assert calls[0][1] == "download-1"
```
- [ ] **Step 2: Run the focused runner tests and verify they fail**
Run:
```bash
python -m pytest tests/catalogsync/test_ops_runner.py -k "worker_split or ready_queue or single_thread_compatibility" -q
```
Expected:
- `FAILED`
- failure mentions missing `_download_stage_worker_split` or the existing `_run_stage(...)` shape not matching the new behavior
- [ ] **Step 3: Implement the runner-level dual-pool path for download stages**
Refactor `musicdl/catalogsync/ops/runner.py` so `download` stages use a specialized path:
```python
def _download_stage_worker_split(self, total_workers: int) -> tuple[int, int]:
normalized_total = max(int(total_workers or 0), 1)
if normalized_total == 1:
return 1, 0
if normalized_total == 2:
return 1, 1
resolver_workers = max(1, min(3, normalized_total // 3))
download_workers = max(1, normalized_total - resolver_workers)
return resolver_workers, download_workers
```
```python
def _run_download_stage_pipeline(self, job, stage, executor, worker_count: int) -> None:
resolver_workers, download_workers = self._download_stage_worker_split(worker_count)
if download_workers == 0:
self._run_stage_with_single_pool(job, stage, executor, worker_count)
return
ready_queue: Queue = Queue(maxsize=max(1, download_workers * 2))
stop_event = threading.Event()
sentinel = object()
def resolver_loop(worker_index: int) -> None:
worker_name = f"resolve-{worker_index + 1}"
while not stop_event.is_set():
active_job = self.repository.get_job(job.id)
if active_job is None or active_job.status in {JobStatus.PAUSE_REQUESTED, JobStatus.CANCELED}:
stop_event.set()
return
item = self.repository.claim_next_stage_item(stage.id, worker_name)
if item is None:
return
executor.process_resolve_item(item.id, worker_name, ready_queue=ready_queue)
def download_loop(worker_index: int) -> None:
worker_name = f"download-{worker_index + 1}"
while True:
task = ready_queue.get()
if task is sentinel:
return
executor.process_download_task(task, worker_name)
with ThreadPoolExecutor(max_workers=resolver_workers + download_workers) as pool:
resolver_futures = [pool.submit(resolver_loop, index) for index in range(resolver_workers)]
download_futures = [pool.submit(download_loop, index) for index in range(download_workers)]
for future in resolver_futures:
future.result()
for _ in range(download_workers):
ready_queue.put(sentinel)
for future in download_futures:
future.result()
```
Then update `_run_stage(...)` to branch:
```python
if refreshed_stage.stage_type == StageType.DOWNLOAD.value:
self._run_download_stage_pipeline(job, refreshed_stage, executor, worker_count)
else:
self._run_stage_with_single_pool(job, refreshed_stage, executor, worker_count)
```
- [ ] **Step 4: Run the focused runner tests and verify they pass**
Run:
```bash
python -m pytest tests/catalogsync/test_ops_runner.py -k "worker_split or ready_queue or single_thread_compatibility" -q
```
Expected:
- `PASS`
- current worker-count tests still pass
- [ ] **Step 5: Commit**
```bash
git add tests/catalogsync/test_ops_runner.py musicdl/catalogsync/ops/runner.py
git commit -m "feat: run download stage through dual-pool pipeline"
```
### Task 4: Lock Dashboard And Worker-State Expectations
**Files:**
- Modify: `tests/catalogsync/test_ops_api.py`
- Modify: `musicdl/catalogsync/ops/web.py`
- Modify: `musicdl/catalogsync/ops/repository.py`
- [ ] **Step 1: Write the failing dashboard regression for resolver and downloader worker families**
Add a test in `tests/catalogsync/test_ops_api.py` shaped like:
```python
def test_dashboard_exposes_resolver_and_downloader_workers_during_download_stage():
repo.update_worker_state(
worker_name="resolve-1",
current_job_item_id=item_id_a,
status="running",
current_song_id=song_a_id,
current_display_text="Song A / Singer A",
last_progress_text="resolving source qq (1/6)",
)
repo.update_worker_state(
worker_name="download-1",
current_job_item_id=item_id_b,
status="running",
current_song_id=song_b_id,
current_display_text="Song B / Singer B",
last_progress_text="12.00MB/48.00MB",
downloaded_bytes=12 * 1024 * 1024,
total_bytes=48 * 1024 * 1024,
speed_bytes_per_sec=3 * 1024 * 1024,
progress_percent=25,
)
payload = client.get("/api/dashboard?include_task_rows=false").json()
worker_names = [worker["worker_name"] for worker in payload["workers"]]
assert "resolve-1" in worker_names
assert "download-1" in worker_names
assert payload["transfer_stats"]["download_speed_bytes_per_sec"] == 3 * 1024 * 1024
```
- [ ] **Step 2: Run the focused dashboard test and verify it fails only if payload logic needs adjustment**
Run:
```bash
python -m pytest tests/catalogsync/test_ops_api.py -k "resolver_and_downloader_workers" -q
```
Expected:
- either `FAIL` because payload assumptions need adjustment
- or `PASS`, which means no production code change is needed here
- [ ] **Step 3: Apply the minimal payload or repository changes only if the test requires them**
If worker lookup or naming assumptions need tightening, keep the code minimal, for example:
```python
# no-op if existing worker queries already behave correctly
# only adjust helper logic if it filters by a fixed "download-" prefix anywhere
```
The target is not to redesign dashboard data, only to ensure resolver workers remain visible and transfer stats still reflect downloader workers only.
- [ ] **Step 4: Re-run the focused dashboard test and verify it passes**
Run:
```bash
python -m pytest tests/catalogsync/test_ops_api.py -k "resolver_and_downloader_workers" -q
```
Expected:
- `PASS`
- [ ] **Step 5: Commit**
```bash
git add tests/catalogsync/test_ops_api.py musicdl/catalogsync/ops/web.py musicdl/catalogsync/ops/repository.py
git commit -m "test: cover resolver and downloader worker visibility"
```
### Task 5: Final Verification And NAS Reality Check
**Files:**
- Modify: `tests/catalogsync/test_services.py`
- Modify: `tests/catalogsync/test_ops_executors.py`
- Modify: `tests/catalogsync/test_ops_runner.py`
- Modify: `tests/catalogsync/test_ops_api.py`
- Modify: `musicdl/catalogsync/downloader.py`
- Modify: `musicdl/catalogsync/ops/executors.py`
- Modify: `musicdl/catalogsync/ops/runner.py`
- [ ] **Step 1: Run the full targeted regression slice**
Run:
```bash
python -m pytest tests/catalogsync/test_services.py tests/catalogsync/test_ops_executors.py tests/catalogsync/test_ops_runner.py tests/catalogsync/test_ops_api.py -k "download or transfer_stats or resolver" -q
```
Expected:
- `PASS`
- existing download progress tests still green
- new dual-pool runner tests green
- [ ] **Step 2: Run a slightly wider ops regression slice**
Run:
```bash
python -m pytest tests/catalogsync/test_ops_repository.py tests/catalogsync/test_ops_frontend.py -k "worker or transfer or dashboard" -q
```
Expected:
- `PASS`
- no unexpected worker-state regressions
- [ ] **Step 3: Deploy to NAS**
Run:
```bash
powershell -ExecutionPolicy Bypass -File .\deploy-catalogsync.ps1
```
Expected:
- deploy completes successfully
- health check passes for `http://127.0.0.1:18080/dashboard`
- [ ] **Step 4: Verify production behavior on NAS**
Run:
```powershell
$env:NAS_192168543_PASSWORD='Nie@159357'
powershell -ExecutionPolicy Bypass -File 'C:\Users\Administrator\.codex\skills\nas-ssh-192168543\scripts\run.ps1' "curl -fsS http://127.0.0.1:18080/api/dashboard?include_task_rows=false | python3 -m json.tool | head -n 160"
```
Expected:
- worker list includes both `resolve-*` and `download-*`
- at least some `download-*` workers show non-zero transfer stats simultaneously under active load
- `resolve-*` workers show `resolving source ...` text instead of pretending to be downloading
- [ ] **Step 5: Commit final integration changes**
```bash
git add tests/catalogsync/test_services.py tests/catalogsync/test_ops_executors.py tests/catalogsync/test_ops_runner.py tests/catalogsync/test_ops_api.py musicdl/catalogsync/downloader.py musicdl/catalogsync/ops/executors.py musicdl/catalogsync/ops/runner.py musicdl/catalogsync/ops/web.py musicdl/catalogsync/ops/repository.py
git commit -m "feat: split download stage into resolver and transfer pools"
```
---
## Self-Review
### Spec Coverage
- dual-pool architecture: covered by Task 3
- downloader API split: covered by Task 1
- executor support for resolved tasks: covered by Task 2
- worker-family visibility: covered by Task 4
- NAS verification of real concurrency: covered by Task 5
No spec gaps remain for this iteration.
### Placeholder Scan
- no `TODO`, `TBD`, or “implement later” placeholders remain
- each code-changing task includes concrete method names, commands, and code shapes
### Type Consistency
- `ResolvedDownloadPayload` is used consistently between Task 1 and Task 2
- `ResolvedStageDownloadTask` is used consistently between Task 2 and Task 3
- runner dual-pool entry point is consistently named `_run_download_stage_pipeline(...)`