Files

486 lines
18 KiB
Markdown

# Download Layout And NAS Deployment Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Change `musicdl.catalogsync` downloads to land under `LIBRARY_DIR/<platform>/<first_artist>/...`, preserve relative locators for later upload reuse, and add portable NAS/Linux deployment scripts plus `.env`-driven runtime layout.
**Architecture:** Add a small runtime/layout helper module for path building, safe filename components, config defaults, and directory creation. Reuse the existing downloader and CLI, but route download destinations through the new path helper and add deploy/runtime scripts under `scripts/catalogsync` so target machines can be bootstrapped and then run from `catalogsync/bin` with `catalogsync.env`.
**Tech Stack:** Python stdlib (`pathlib`, `dataclasses`, `tempfile`, `re`), `click`, existing `musicdl.catalogsync` modules, PowerShell, POSIX shell, `unittest`
---
### Task 1: Add runtime/layout helper tests and implementation
**Files:**
- Create: `musicdl/catalogsync/runtime.py`
- Create: `tests/catalogsync/test_runtime.py`
- [ ] **Step 1: Write the failing runtime/layout tests**
```python
import tempfile
import unittest
from pathlib import Path
class RuntimeLayoutTests(unittest.TestCase):
def test_runtime_config_builds_defaults_from_root_dir(self):
from musicdl.catalogsync.runtime import CatalogSyncRuntimeConfig
config = CatalogSyncRuntimeConfig.from_mapping(
{
"ROOT_DIR": "/volume4/Music_Cloud",
"PYTHON_BIN": "python3",
}
)
self.assertEqual(Path("/volume4/Music_Cloud/catalogsync"), config.app_home)
self.assertEqual(Path("/volume4/Music_Cloud/library"), config.library_dir)
self.assertEqual(Path("/volume4/Music_Cloud/catalogsync/data/catalogsync.db"), config.db_path)
self.assertEqual("platform_first_artist", config.download_layout)
def test_runtime_config_ensure_directories_creates_expected_tree(self):
from musicdl.catalogsync.runtime import CatalogSyncRuntimeConfig
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
root_dir = Path(tmpdir) / "Music_Cloud"
config = CatalogSyncRuntimeConfig.from_mapping({"ROOT_DIR": str(root_dir)})
config.ensure_directories()
self.assertTrue((root_dir / "library").is_dir())
self.assertTrue((root_dir / "catalogsync" / "app").is_dir())
self.assertTrue((root_dir / "catalogsync" / "bin").is_dir())
self.assertTrue((root_dir / "catalogsync" / "config").is_dir())
self.assertTrue((root_dir / "catalogsync" / "data").is_dir())
self.assertTrue((root_dir / "catalogsync" / "inputs").is_dir())
self.assertTrue((root_dir / "catalogsync" / "logs").is_dir())
def test_build_download_relative_dir_uses_platform_and_first_artist(self):
from musicdl.catalogsync.runtime import build_download_relative_dir
relative_dir = build_download_relative_dir(
platform="qq",
singers="Singer A / Singer B",
)
self.assertEqual(Path("qq") / "Singer A", relative_dir)
def test_build_download_relative_dir_falls_back_to_unknown_artist(self):
from musicdl.catalogsync.runtime import build_download_relative_dir
relative_dir = build_download_relative_dir(
platform="netease",
singers="",
)
self.assertEqual(Path("netease") / "Unknown Artist", relative_dir)
```
- [ ] **Step 2: Run the focused runtime/layout tests to verify they fail**
Run: `python -m unittest tests.catalogsync.test_runtime -v`
Expected: FAIL with import error for `musicdl.catalogsync.runtime` or missing helper functions
- [ ] **Step 3: Implement the minimal runtime/layout helper module**
```python
from __future__ import annotations
import re
from dataclasses import dataclass
from pathlib import Path
INVALID_PATH_CHARS_RE = re.compile(r'[<>:"/\\|?*\x00-\x1f]')
def sanitize_path_component(value: str, fallback: str) -> str:
cleaned = INVALID_PATH_CHARS_RE.sub("_", (value or "").strip()).rstrip(". ")
return cleaned or fallback
def pick_first_artist_name(singers: str | None) -> str:
for candidate in re.split(r"\s*(?:/|,|&|\|)\s*", singers or ""):
if candidate.strip():
return sanitize_path_component(candidate, "Unknown Artist")
return "Unknown Artist"
def build_download_relative_dir(platform: str, singers: str | None) -> Path:
return Path(sanitize_path_component(platform, "unknown")) / pick_first_artist_name(singers)
@dataclass(slots=True)
class CatalogSyncRuntimeConfig:
root_dir: Path
app_home: Path
library_dir: Path
db_path: Path
input_dir: Path
log_dir: Path
python_bin: str
venv_dir: Path
download_layout: str
@classmethod
def from_mapping(cls, mapping: dict[str, str]) -> "CatalogSyncRuntimeConfig":
root_dir = Path(mapping["ROOT_DIR"]).resolve()
app_home = Path(mapping.get("APP_HOME", root_dir / "catalogsync")).resolve()
library_dir = Path(mapping.get("LIBRARY_DIR", root_dir / "library")).resolve()
return cls(
root_dir=root_dir,
app_home=app_home,
library_dir=library_dir,
db_path=Path(mapping.get("DB_PATH", app_home / "data" / "catalogsync.db")).resolve(),
input_dir=Path(mapping.get("INPUT_DIR", app_home / "inputs")).resolve(),
log_dir=Path(mapping.get("LOG_DIR", app_home / "logs")).resolve(),
python_bin=mapping.get("PYTHON_BIN", "python3"),
venv_dir=Path(mapping.get("VENV_DIR", app_home / "app" / ".venv")).resolve(),
download_layout=mapping.get("DOWNLOAD_LAYOUT", "platform_first_artist"),
)
def ensure_directories(self) -> None:
for path in (
self.root_dir,
self.library_dir,
self.app_home / "app",
self.app_home / "bin",
self.app_home / "config",
self.app_home / "data",
self.app_home / "inputs",
self.app_home / "logs",
):
path.mkdir(parents=True, exist_ok=True)
```
- [ ] **Step 4: Re-run the focused runtime/layout tests**
Run: `python -m unittest tests.catalogsync.test_runtime -v`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add musicdl/catalogsync/runtime.py tests/catalogsync/test_runtime.py
git commit -m "feat: add runtime layout helpers"
```
### Task 2: Route downloader output through `platform/first_artist`
**Files:**
- Modify: `musicdl/catalogsync/downloader.py`
- Modify: `tests/catalogsync/test_services.py`
- [ ] **Step 1: Add failing downloader layout tests**
```python
def test_catalog_downloader_records_platform_first_artist_locator(self):
from musicdl.catalogsync.db import initialize_database
from musicdl.catalogsync.downloader import CatalogDownloader
from musicdl.catalogsync.models import CatalogSong
from musicdl.catalogsync.repository import CatalogRepository
class FakeClient:
def download(self, song_infos, num_threadings=1, auto_supplement_song=False):
save_path = Path(song_infos[0].work_dir) / "song-c.mp3"
save_path.parent.mkdir(parents=True, exist_ok=True)
save_path.write_bytes(b"fake-audio")
return [SimpleNamespace(save_path=str(save_path))]
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
db_path = Path(tmpdir) / "catalogsync.db"
library_root = Path(tmpdir) / "library"
initialize_database(db_path, default_library_root=library_root).close()
repo = CatalogRepository(db_path)
repo.upsert_song(
CatalogSong(
platform="qq",
remote_song_id="song-c",
name="Song C",
singers="Singer A / Singer B",
ext="mp3",
file_size_bytes=80,
metadata={"snapshot": {"identifier": "song-c"}},
)
)
downloader = CatalogDownloader(repository=repo)
with patch("musicdl.catalogsync.downloader.deserialize_song_info", return_value=SimpleNamespace(singers="Singer A / Singer B")):
with patch.object(downloader, "get_client", return_value=FakeClient()):
downloader.download_pending(library_root=library_root, limit=1)
location = repo._fetchone(
"SELECT locator FROM file_locations ORDER BY id DESC LIMIT 1"
)
self.assertEqual("qq/Singer A/song-c.mp3", location["locator"])
def test_catalog_downloader_uses_unknown_artist_fallback_directory(self):
from musicdl.catalogsync.db import initialize_database
from musicdl.catalogsync.downloader import CatalogDownloader
from musicdl.catalogsync.models import CatalogSong
from musicdl.catalogsync.repository import CatalogRepository
class FakeClient:
def download(self, song_infos, num_threadings=1, auto_supplement_song=False):
save_path = Path(song_infos[0].work_dir) / "song-a.flac"
save_path.parent.mkdir(parents=True, exist_ok=True)
save_path.write_bytes(b"fake-audio")
return [SimpleNamespace(save_path=str(save_path))]
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
db_path = Path(tmpdir) / "catalogsync.db"
library_root = Path(tmpdir) / "library"
initialize_database(db_path, default_library_root=library_root).close()
repo = CatalogRepository(db_path)
repo.upsert_song(
CatalogSong(
platform="netease",
remote_song_id="song-a",
name="Song A",
singers="",
ext="flac",
file_size_bytes=100,
metadata={"snapshot": {"identifier": "song-a"}},
)
)
downloader = CatalogDownloader(repository=repo)
with patch("musicdl.catalogsync.downloader.deserialize_song_info", return_value=SimpleNamespace(singers="")):
with patch.object(downloader, "get_client", return_value=FakeClient()):
downloader.download_pending(library_root=library_root, limit=1)
location = repo._fetchone(
"SELECT locator FROM file_locations ORDER BY id DESC LIMIT 1"
)
self.assertEqual("netease/Unknown Artist/song-a.flac", location["locator"])
```
- [ ] **Step 2: Run the focused downloader tests to verify they fail**
Run: `python -m unittest tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_records_platform_first_artist_locator tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_uses_unknown_artist_fallback_directory -v`
Expected: FAIL because the downloader still writes `platform/filename`
- [ ] **Step 3: Implement the downloader layout change**
```python
from .runtime import build_download_relative_dir
```
```python
relative_dir = build_download_relative_dir(
platform=row["platform"],
singers=getattr(song_info, "singers", None) or row.get("singers"),
)
target_dir = target_root / relative_dir
target_dir.mkdir(parents=True, exist_ok=True)
song_info.work_dir = str(target_dir)
```
Keep the locator writeback based on the actual saved file:
```python
saved_path = Path(saved_song.save_path)
relative_path = saved_path.relative_to(target_root).as_posix()
```
- [ ] **Step 4: Re-run the focused downloader tests**
Run: `python -m unittest tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_records_platform_first_artist_locator tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_uses_unknown_artist_fallback_directory -v`
Expected: PASS
- [ ] **Step 5: Run the broader catalogsync tests affected by downloader changes**
Run: `python -m unittest tests.catalogsync.test_services tests.catalogsync.test_cli -v`
Expected: PASS
- [ ] **Step 6: Commit**
```bash
git add musicdl/catalogsync/downloader.py tests/catalogsync/test_services.py
git commit -m "feat: store downloads under platform and first artist"
```
### Task 3: Add portable deployment and runtime script templates
**Files:**
- Create: `scripts/catalogsync/bootstrap_to_linux.ps1`
- Create: `scripts/catalogsync/templates/catalogsync.env.example`
- Create: `scripts/catalogsync/templates/download_all.sh`
- Create: `scripts/catalogsync/templates/download_from_file.sh`
- Modify: `tests/catalogsync/test_runtime.py`
- [ ] **Step 1: Add failing tests for deployment template content**
```python
def test_catalogsync_env_example_contains_required_keys(self):
template = Path("scripts/catalogsync/templates/catalogsync.env.example").read_text(encoding="utf-8")
self.assertIn("ROOT_DIR=", template)
self.assertIn("APP_HOME=", template)
self.assertIn("LIBRARY_DIR=", template)
self.assertIn("DB_PATH=", template)
self.assertIn("INPUT_DIR=", template)
self.assertIn("LOG_DIR=", template)
self.assertIn("DOWNLOAD_LAYOUT=platform_first_artist", template)
def test_runtime_script_template_uses_configured_library_dir(self):
script = Path("scripts/catalogsync/templates/download_from_file.sh").read_text(encoding="utf-8")
self.assertIn("LIBRARY_DIR", script)
self.assertIn("INPUT_DIR", script)
self.assertIn("musicdl.catalogsync.cli run", script)
```
- [ ] **Step 2: Run the focused runtime/template tests to verify they fail**
Run: `python -m unittest tests.catalogsync.test_runtime.RuntimeLayoutTests.test_catalogsync_env_example_contains_required_keys tests.catalogsync.test_runtime.RuntimeLayoutTests.test_runtime_script_template_uses_configured_library_dir -v`
Expected: FAIL because the template files do not exist yet
- [ ] **Step 3: Add the deployment and runtime script templates**
`scripts/catalogsync/templates/catalogsync.env.example`:
```bash
ROOT_DIR=/volume4/Music_Cloud
APP_HOME=/volume4/Music_Cloud/catalogsync
LIBRARY_DIR=/volume4/Music_Cloud/library
DB_PATH=/volume4/Music_Cloud/catalogsync/data/catalogsync.db
INPUT_DIR=/volume4/Music_Cloud/catalogsync/inputs
LOG_DIR=/volume4/Music_Cloud/catalogsync/logs
PYTHON_BIN=python3
VENV_DIR=/volume4/Music_Cloud/catalogsync/app/.venv
DOWNLOAD_LAYOUT=platform_first_artist
```
`scripts/catalogsync/templates/download_all.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
APP_HOME="$(cd "${SCRIPT_DIR}/.." && pwd)"
CONFIG_FILE="${APP_HOME}/config/catalogsync.env"
source "${CONFIG_FILE}"
mkdir -p "${LIBRARY_DIR}" "${APP_HOME}/data" "${INPUT_DIR}" "${LOG_DIR}"
"${PYTHON_BIN}" -m musicdl.catalogsync.cli run \
--db "${DB_PATH}" \
--library-root "${LIBRARY_DIR}" \
"$@"
```
`scripts/catalogsync/templates/download_from_file.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
if [[ $# -lt 1 ]]; then
echo "usage: $0 <playlist-file> [extra args...]"
exit 1
fi
PLAYLIST_FILE="$1"
shift
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
APP_HOME="$(cd "${SCRIPT_DIR}/.." && pwd)"
CONFIG_FILE="${APP_HOME}/config/catalogsync.env"
source "${CONFIG_FILE}"
mkdir -p "${LIBRARY_DIR}" "${APP_HOME}/data" "${INPUT_DIR}" "${LOG_DIR}"
"${PYTHON_BIN}" -m musicdl.catalogsync.cli run \
--db "${DB_PATH}" \
--library-root "${LIBRARY_DIR}" \
--playlist-file "${PLAYLIST_FILE}" \
"$@"
```
`scripts/catalogsync/bootstrap_to_linux.ps1` should:
```powershell
param(
[string]$Host,
[int]$Port = 22,
[string]$User,
[string]$RootDir = "/volume4/Music_Cloud"
)
$AppHome = "$RootDir/catalogsync"
$RemoteDirs = @(
$RootDir,
"$RootDir/library",
"$AppHome/app",
"$AppHome/bin",
"$AppHome/config",
"$AppHome/data",
"$AppHome/inputs",
"$AppHome/logs"
)
```
Then use `ssh` and `scp` to:
- create the remote directories
- copy the application files into `$AppHome/app`
- copy the shell script templates into `$AppHome/bin`
- copy `catalogsync.env.example` into `$AppHome/config/catalogsync.env.example` if missing
- [ ] **Step 4: Re-run the focused runtime/template tests**
Run: `python -m unittest tests.catalogsync.test_runtime -v`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add scripts/catalogsync tests/catalogsync/test_runtime.py
git commit -m "feat: add portable catalogsync deployment scripts"
```
### Task 4: Document the new layout and verify the full flow
**Files:**
- Modify: `docs/catalogsync.md`
- Modify: `README.md`
- [ ] **Step 1: Update user-facing docs with the new deployment layout**
Add:
- the `/volume4/Music_Cloud/library` versus `/volume4/Music_Cloud/catalogsync` split
- the `platform/first_artist` download layout
- the `catalogsync.env` example
- the `scripts/catalogsync/bootstrap_to_linux.ps1` usage
- the target-side `download_all.sh` and `download_from_file.sh` usage
- [ ] **Step 2: Run the full catalogsync unittest suite**
Run: `python -m unittest discover -s tests/catalogsync -v`
Expected: PASS
- [ ] **Step 3: Run a local smoke check for CLI help**
Run: `python -m musicdl.catalogsync.cli run --help`
Expected: output includes `--playlist-file`
- [ ] **Step 4: Inspect the generated diff**
Run: `git diff --stat`
Expected: only the planned runtime/layout/downloader/docs files changed
- [ ] **Step 5: Commit**
```bash
git add docs/catalogsync.md README.md
git commit -m "docs: describe NAS download layout workflow"
```