Files

4.8 KiB

Catalog Sync Design

Goal

Build an independent catalog sync and download workflow that:

  • extracts playlist-square and toplist sources from NetEase, QQ Music, and Kuwo
  • stores playlist pool -> playlist -> song and derived artist pool -> artist -> song
  • skips duplicate downloads by (platform, remote_song_id)
  • prefers highest available quality and falls back when needed
  • supports pausing on low disk space and continuing in a new local directory
  • keeps storage metadata compatible with local paths, cloud-drive paths, and bucket/key style object storage

Scope

In Scope

  • Independent Python CLI entrypoint
  • SQLite schema for catalog, file, and task state
  • Source collectors for:
    • NetEase playlist square + toplists
    • QQ playlist square + toplists
    • Kuwo playlist square + toplists
  • Reuse existing platform parseplaylist() and download logic where practical
  • Derived artist pool updates during playlist sync
  • Lazy artist enrichment metadata and hooks
  • Local download dedupe and disk-space prompts
  • Storage schema compatible with future uploads

Out of Scope

  • Full cross-platform song canonicalization
  • GUI integration
  • Production-ready 123 cloud upload implementation
  • Streaming upload while downloading

Constraints

  • Prefer reuse of existing source clients under musicdl.modules.sources
  • Avoid new mandatory dependencies where stdlib is sufficient
  • Keep first version recoverable and inspectable from local files and SQLite
  • Preserve compatibility with the existing musicdl package and console script

Architecture

The new workflow lives in a dedicated package under musicdl.catalogsync. Collectors fetch playlist candidates per source and pool kind, then a sync layer normalizes and persists them. Playlist parsing reuses the existing per-platform clients to resolve tracks into SongInfo objects, which are then stored into catalog tables and used to derive artist pool membership. A download planner reads undispatched songs from the database, skips anything already represented by an active local file asset, and otherwise delegates the actual media fetch to existing source download logic.

Storage metadata is modeled with a logical file layer plus a location layer. file_assets describes the downloaded media version for a song, while file_locations records where that file lives. The first implementation only writes local locations, but the schema supports cloud-drive or bucket/key locations later without changing the song-level model.

Data Model

Catalog

  • playlist_pools
  • playlists
  • pool_playlists
  • artist_pools
  • artists
  • pool_artists
  • songs
  • playlist_songs
  • artist_songs

File and Storage

  • storage_backends
  • file_assets
  • file_locations
  • download_tasks

Key Behaviors

Playlist Sync

  1. Fetch playlist-square and toplist candidates for selected sources.
  2. Upsert pool rows and playlist rows.
  3. Link pools to playlists.
  4. For selected playlists, call platform parseplaylist() to resolve songs.
  5. Upsert song rows and playlist_songs.
  6. Extract artists from raw platform metadata when possible, otherwise from normalized singer strings.
  7. Upsert artists and attach them to derived artist pools and artist_songs.

Download Dedupe

  • A song is considered already owned when it has an active local file_location.
  • Dedupe key at song level is (platform, remote_song_id).
  • The first implementation keeps one preferred file asset per song. Future uploads add locations, not duplicate song rows.

Quality Selection

  • Existing platform clients already attempt higher qualities first.
  • The workflow treats the returned file as the chosen asset and persists:
    • quality label
    • extension
    • file size
    • hash when available or computable

Low Disk Space

  • Before each download, check free space for the active local backend.
  • If insufficient, pause and prompt for a new local directory.
  • Upsert a new local backend row and continue subsequent downloads there.
  • Already downloaded files remain linked to their original backend.

Future Upload Compatibility

  • storage_backends represents local FS, cloud-drive roots, or object-storage containers.
  • file_locations.container_name + locator can represent:
    • local root + relative path
    • cloud root + remote path
    • bucket + key
  • Future upload jobs can attach new non-local locations to an existing file_asset.

Acceptance Criteria

  • Selected source collectors can persist playlist-square and toplist rows into SQLite.
  • Playlist sync can populate songs and derived artists from at least the supported source set.
  • Download command skips songs already backed by active local file locations.
  • Low-space prompt can switch to a new local directory and continue.
  • Tests cover schema creation, normalization, derived artist sync, dedupe checks, and collector parsing helpers.