Files
musicdl-catalog-sync-suite/catalog-sync/docs/superpowers/plans/2026-04-15-download-layout-nas.md
T

18 KiB

Download Layout And NAS Deployment Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Change musicdl.catalogsync downloads to land under LIBRARY_DIR/<platform>/<first_artist>/..., preserve relative locators for later upload reuse, and add portable NAS/Linux deployment scripts plus .env-driven runtime layout.

Architecture: Add a small runtime/layout helper module for path building, safe filename components, config defaults, and directory creation. Reuse the existing downloader and CLI, but route download destinations through the new path helper and add deploy/runtime scripts under scripts/catalogsync so target machines can be bootstrapped and then run from catalogsync/bin with catalogsync.env.

Tech Stack: Python stdlib (pathlib, dataclasses, tempfile, re), click, existing musicdl.catalogsync modules, PowerShell, POSIX shell, unittest


Task 1: Add runtime/layout helper tests and implementation

Files:

  • Create: musicdl/catalogsync/runtime.py

  • Create: tests/catalogsync/test_runtime.py

  • Step 1: Write the failing runtime/layout tests

import tempfile
import unittest
from pathlib import Path


class RuntimeLayoutTests(unittest.TestCase):
    def test_runtime_config_builds_defaults_from_root_dir(self):
        from musicdl.catalogsync.runtime import CatalogSyncRuntimeConfig

        config = CatalogSyncRuntimeConfig.from_mapping(
            {
                "ROOT_DIR": "/volume4/Music_Cloud",
                "PYTHON_BIN": "python3",
            }
        )

        self.assertEqual(Path("/volume4/Music_Cloud/catalogsync"), config.app_home)
        self.assertEqual(Path("/volume4/Music_Cloud/library"), config.library_dir)
        self.assertEqual(Path("/volume4/Music_Cloud/catalogsync/data/catalogsync.db"), config.db_path)
        self.assertEqual("platform_first_artist", config.download_layout)

    def test_runtime_config_ensure_directories_creates_expected_tree(self):
        from musicdl.catalogsync.runtime import CatalogSyncRuntimeConfig

        with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
            root_dir = Path(tmpdir) / "Music_Cloud"
            config = CatalogSyncRuntimeConfig.from_mapping({"ROOT_DIR": str(root_dir)})

            config.ensure_directories()

            self.assertTrue((root_dir / "library").is_dir())
            self.assertTrue((root_dir / "catalogsync" / "app").is_dir())
            self.assertTrue((root_dir / "catalogsync" / "bin").is_dir())
            self.assertTrue((root_dir / "catalogsync" / "config").is_dir())
            self.assertTrue((root_dir / "catalogsync" / "data").is_dir())
            self.assertTrue((root_dir / "catalogsync" / "inputs").is_dir())
            self.assertTrue((root_dir / "catalogsync" / "logs").is_dir())

    def test_build_download_relative_dir_uses_platform_and_first_artist(self):
        from musicdl.catalogsync.runtime import build_download_relative_dir

        relative_dir = build_download_relative_dir(
            platform="qq",
            singers="Singer A / Singer B",
        )

        self.assertEqual(Path("qq") / "Singer A", relative_dir)

    def test_build_download_relative_dir_falls_back_to_unknown_artist(self):
        from musicdl.catalogsync.runtime import build_download_relative_dir

        relative_dir = build_download_relative_dir(
            platform="netease",
            singers="",
        )

        self.assertEqual(Path("netease") / "Unknown Artist", relative_dir)
  • Step 2: Run the focused runtime/layout tests to verify they fail

Run: python -m unittest tests.catalogsync.test_runtime -v Expected: FAIL with import error for musicdl.catalogsync.runtime or missing helper functions

  • Step 3: Implement the minimal runtime/layout helper module
from __future__ import annotations

import re
from dataclasses import dataclass
from pathlib import Path


INVALID_PATH_CHARS_RE = re.compile(r'[<>:"/\\|?*\x00-\x1f]')


def sanitize_path_component(value: str, fallback: str) -> str:
    cleaned = INVALID_PATH_CHARS_RE.sub("_", (value or "").strip()).rstrip(". ")
    return cleaned or fallback


def pick_first_artist_name(singers: str | None) -> str:
    for candidate in re.split(r"\s*(?:/|,|&|\|)\s*", singers or ""):
        if candidate.strip():
            return sanitize_path_component(candidate, "Unknown Artist")
    return "Unknown Artist"


def build_download_relative_dir(platform: str, singers: str | None) -> Path:
    return Path(sanitize_path_component(platform, "unknown")) / pick_first_artist_name(singers)


@dataclass(slots=True)
class CatalogSyncRuntimeConfig:
    root_dir: Path
    app_home: Path
    library_dir: Path
    db_path: Path
    input_dir: Path
    log_dir: Path
    python_bin: str
    venv_dir: Path
    download_layout: str

    @classmethod
    def from_mapping(cls, mapping: dict[str, str]) -> "CatalogSyncRuntimeConfig":
        root_dir = Path(mapping["ROOT_DIR"]).resolve()
        app_home = Path(mapping.get("APP_HOME", root_dir / "catalogsync")).resolve()
        library_dir = Path(mapping.get("LIBRARY_DIR", root_dir / "library")).resolve()
        return cls(
            root_dir=root_dir,
            app_home=app_home,
            library_dir=library_dir,
            db_path=Path(mapping.get("DB_PATH", app_home / "data" / "catalogsync.db")).resolve(),
            input_dir=Path(mapping.get("INPUT_DIR", app_home / "inputs")).resolve(),
            log_dir=Path(mapping.get("LOG_DIR", app_home / "logs")).resolve(),
            python_bin=mapping.get("PYTHON_BIN", "python3"),
            venv_dir=Path(mapping.get("VENV_DIR", app_home / "app" / ".venv")).resolve(),
            download_layout=mapping.get("DOWNLOAD_LAYOUT", "platform_first_artist"),
        )

    def ensure_directories(self) -> None:
        for path in (
            self.root_dir,
            self.library_dir,
            self.app_home / "app",
            self.app_home / "bin",
            self.app_home / "config",
            self.app_home / "data",
            self.app_home / "inputs",
            self.app_home / "logs",
        ):
            path.mkdir(parents=True, exist_ok=True)
  • Step 4: Re-run the focused runtime/layout tests

Run: python -m unittest tests.catalogsync.test_runtime -v Expected: PASS

  • Step 5: Commit
git add musicdl/catalogsync/runtime.py tests/catalogsync/test_runtime.py
git commit -m "feat: add runtime layout helpers"

Task 2: Route downloader output through platform/first_artist

Files:

  • Modify: musicdl/catalogsync/downloader.py

  • Modify: tests/catalogsync/test_services.py

  • Step 1: Add failing downloader layout tests

    def test_catalog_downloader_records_platform_first_artist_locator(self):
        from musicdl.catalogsync.db import initialize_database
        from musicdl.catalogsync.downloader import CatalogDownloader
        from musicdl.catalogsync.models import CatalogSong
        from musicdl.catalogsync.repository import CatalogRepository

        class FakeClient:
            def download(self, song_infos, num_threadings=1, auto_supplement_song=False):
                save_path = Path(song_infos[0].work_dir) / "song-c.mp3"
                save_path.parent.mkdir(parents=True, exist_ok=True)
                save_path.write_bytes(b"fake-audio")
                return [SimpleNamespace(save_path=str(save_path))]

        with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
            db_path = Path(tmpdir) / "catalogsync.db"
            library_root = Path(tmpdir) / "library"
            initialize_database(db_path, default_library_root=library_root).close()
            repo = CatalogRepository(db_path)
            repo.upsert_song(
                CatalogSong(
                    platform="qq",
                    remote_song_id="song-c",
                    name="Song C",
                    singers="Singer A / Singer B",
                    ext="mp3",
                    file_size_bytes=80,
                    metadata={"snapshot": {"identifier": "song-c"}},
                )
            )
            downloader = CatalogDownloader(repository=repo)

            with patch("musicdl.catalogsync.downloader.deserialize_song_info", return_value=SimpleNamespace(singers="Singer A / Singer B")):
                with patch.object(downloader, "get_client", return_value=FakeClient()):
                    downloader.download_pending(library_root=library_root, limit=1)

            location = repo._fetchone(
                "SELECT locator FROM file_locations ORDER BY id DESC LIMIT 1"
            )

        self.assertEqual("qq/Singer A/song-c.mp3", location["locator"])

    def test_catalog_downloader_uses_unknown_artist_fallback_directory(self):
        from musicdl.catalogsync.db import initialize_database
        from musicdl.catalogsync.downloader import CatalogDownloader
        from musicdl.catalogsync.models import CatalogSong
        from musicdl.catalogsync.repository import CatalogRepository

        class FakeClient:
            def download(self, song_infos, num_threadings=1, auto_supplement_song=False):
                save_path = Path(song_infos[0].work_dir) / "song-a.flac"
                save_path.parent.mkdir(parents=True, exist_ok=True)
                save_path.write_bytes(b"fake-audio")
                return [SimpleNamespace(save_path=str(save_path))]

        with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
            db_path = Path(tmpdir) / "catalogsync.db"
            library_root = Path(tmpdir) / "library"
            initialize_database(db_path, default_library_root=library_root).close()
            repo = CatalogRepository(db_path)
            repo.upsert_song(
                CatalogSong(
                    platform="netease",
                    remote_song_id="song-a",
                    name="Song A",
                    singers="",
                    ext="flac",
                    file_size_bytes=100,
                    metadata={"snapshot": {"identifier": "song-a"}},
                )
            )
            downloader = CatalogDownloader(repository=repo)

            with patch("musicdl.catalogsync.downloader.deserialize_song_info", return_value=SimpleNamespace(singers="")):
                with patch.object(downloader, "get_client", return_value=FakeClient()):
                    downloader.download_pending(library_root=library_root, limit=1)

            location = repo._fetchone(
                "SELECT locator FROM file_locations ORDER BY id DESC LIMIT 1"
            )

        self.assertEqual("netease/Unknown Artist/song-a.flac", location["locator"])
  • Step 2: Run the focused downloader tests to verify they fail

Run: python -m unittest tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_records_platform_first_artist_locator tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_uses_unknown_artist_fallback_directory -v Expected: FAIL because the downloader still writes platform/filename

  • Step 3: Implement the downloader layout change
from .runtime import build_download_relative_dir
            relative_dir = build_download_relative_dir(
                platform=row["platform"],
                singers=getattr(song_info, "singers", None) or row.get("singers"),
            )
            target_dir = target_root / relative_dir
            target_dir.mkdir(parents=True, exist_ok=True)
            song_info.work_dir = str(target_dir)

Keep the locator writeback based on the actual saved file:

            saved_path = Path(saved_song.save_path)
            relative_path = saved_path.relative_to(target_root).as_posix()
  • Step 4: Re-run the focused downloader tests

Run: python -m unittest tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_records_platform_first_artist_locator tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_uses_unknown_artist_fallback_directory -v Expected: PASS

  • Step 5: Run the broader catalogsync tests affected by downloader changes

Run: python -m unittest tests.catalogsync.test_services tests.catalogsync.test_cli -v Expected: PASS

  • Step 6: Commit
git add musicdl/catalogsync/downloader.py tests/catalogsync/test_services.py
git commit -m "feat: store downloads under platform and first artist"

Task 3: Add portable deployment and runtime script templates

Files:

  • Create: scripts/catalogsync/bootstrap_to_linux.ps1

  • Create: scripts/catalogsync/templates/catalogsync.env.example

  • Create: scripts/catalogsync/templates/download_all.sh

  • Create: scripts/catalogsync/templates/download_from_file.sh

  • Modify: tests/catalogsync/test_runtime.py

  • Step 1: Add failing tests for deployment template content

    def test_catalogsync_env_example_contains_required_keys(self):
        template = Path("scripts/catalogsync/templates/catalogsync.env.example").read_text(encoding="utf-8")
        self.assertIn("ROOT_DIR=", template)
        self.assertIn("APP_HOME=", template)
        self.assertIn("LIBRARY_DIR=", template)
        self.assertIn("DB_PATH=", template)
        self.assertIn("INPUT_DIR=", template)
        self.assertIn("LOG_DIR=", template)
        self.assertIn("DOWNLOAD_LAYOUT=platform_first_artist", template)

    def test_runtime_script_template_uses_configured_library_dir(self):
        script = Path("scripts/catalogsync/templates/download_from_file.sh").read_text(encoding="utf-8")
        self.assertIn("LIBRARY_DIR", script)
        self.assertIn("INPUT_DIR", script)
        self.assertIn("musicdl.catalogsync.cli run", script)
  • Step 2: Run the focused runtime/template tests to verify they fail

Run: python -m unittest tests.catalogsync.test_runtime.RuntimeLayoutTests.test_catalogsync_env_example_contains_required_keys tests.catalogsync.test_runtime.RuntimeLayoutTests.test_runtime_script_template_uses_configured_library_dir -v Expected: FAIL because the template files do not exist yet

  • Step 3: Add the deployment and runtime script templates

scripts/catalogsync/templates/catalogsync.env.example:

ROOT_DIR=/volume4/Music_Cloud
APP_HOME=/volume4/Music_Cloud/catalogsync
LIBRARY_DIR=/volume4/Music_Cloud/library

DB_PATH=/volume4/Music_Cloud/catalogsync/data/catalogsync.db
INPUT_DIR=/volume4/Music_Cloud/catalogsync/inputs
LOG_DIR=/volume4/Music_Cloud/catalogsync/logs

PYTHON_BIN=python3
VENV_DIR=/volume4/Music_Cloud/catalogsync/app/.venv

DOWNLOAD_LAYOUT=platform_first_artist

scripts/catalogsync/templates/download_all.sh:

#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
APP_HOME="$(cd "${SCRIPT_DIR}/.." && pwd)"
CONFIG_FILE="${APP_HOME}/config/catalogsync.env"
source "${CONFIG_FILE}"

mkdir -p "${LIBRARY_DIR}" "${APP_HOME}/data" "${INPUT_DIR}" "${LOG_DIR}"

"${PYTHON_BIN}" -m musicdl.catalogsync.cli run \
  --db "${DB_PATH}" \
  --library-root "${LIBRARY_DIR}" \
  "$@"

scripts/catalogsync/templates/download_from_file.sh:

#!/usr/bin/env bash
set -euo pipefail

if [[ $# -lt 1 ]]; then
  echo "usage: $0 <playlist-file> [extra args...]"
  exit 1
fi

PLAYLIST_FILE="$1"
shift

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
APP_HOME="$(cd "${SCRIPT_DIR}/.." && pwd)"
CONFIG_FILE="${APP_HOME}/config/catalogsync.env"
source "${CONFIG_FILE}"

mkdir -p "${LIBRARY_DIR}" "${APP_HOME}/data" "${INPUT_DIR}" "${LOG_DIR}"

"${PYTHON_BIN}" -m musicdl.catalogsync.cli run \
  --db "${DB_PATH}" \
  --library-root "${LIBRARY_DIR}" \
  --playlist-file "${PLAYLIST_FILE}" \
  "$@"

scripts/catalogsync/bootstrap_to_linux.ps1 should:

param(
    [string]$Host,
    [int]$Port = 22,
    [string]$User,
    [string]$RootDir = "/volume4/Music_Cloud"
)

$AppHome = "$RootDir/catalogsync"
$RemoteDirs = @(
    $RootDir,
    "$RootDir/library",
    "$AppHome/app",
    "$AppHome/bin",
    "$AppHome/config",
    "$AppHome/data",
    "$AppHome/inputs",
    "$AppHome/logs"
)

Then use ssh and scp to:

  • create the remote directories

  • copy the application files into $AppHome/app

  • copy the shell script templates into $AppHome/bin

  • copy catalogsync.env.example into $AppHome/config/catalogsync.env.example if missing

  • Step 4: Re-run the focused runtime/template tests

Run: python -m unittest tests.catalogsync.test_runtime -v Expected: PASS

  • Step 5: Commit
git add scripts/catalogsync tests/catalogsync/test_runtime.py
git commit -m "feat: add portable catalogsync deployment scripts"

Task 4: Document the new layout and verify the full flow

Files:

  • Modify: docs/catalogsync.md

  • Modify: README.md

  • Step 1: Update user-facing docs with the new deployment layout

Add:

  • the /volume4/Music_Cloud/library versus /volume4/Music_Cloud/catalogsync split

  • the platform/first_artist download layout

  • the catalogsync.env example

  • the scripts/catalogsync/bootstrap_to_linux.ps1 usage

  • the target-side download_all.sh and download_from_file.sh usage

  • Step 2: Run the full catalogsync unittest suite

Run: python -m unittest discover -s tests/catalogsync -v Expected: PASS

  • Step 3: Run a local smoke check for CLI help

Run: python -m musicdl.catalogsync.cli run --help Expected: output includes --playlist-file

  • Step 4: Inspect the generated diff

Run: git diff --stat Expected: only the planned runtime/layout/downloader/docs files changed

  • Step 5: Commit
git add docs/catalogsync.md README.md
git commit -m "docs: describe NAS download layout workflow"