4.8 KiB
Catalog Sync Design
Goal
Build an independent catalog sync and download workflow that:
- extracts playlist-square and toplist sources from NetEase, QQ Music, and Kuwo
- stores
playlist pool -> playlist -> songand derivedartist pool -> artist -> song - skips duplicate downloads by
(platform, remote_song_id) - prefers highest available quality and falls back when needed
- supports pausing on low disk space and continuing in a new local directory
- keeps storage metadata compatible with local paths, cloud-drive paths, and bucket/key style object storage
Scope
In Scope
- Independent Python CLI entrypoint
- SQLite schema for catalog, file, and task state
- Source collectors for:
- NetEase playlist square + toplists
- QQ playlist square + toplists
- Kuwo playlist square + toplists
- Reuse existing platform
parseplaylist()and download logic where practical - Derived artist pool updates during playlist sync
- Lazy artist enrichment metadata and hooks
- Local download dedupe and disk-space prompts
- Storage schema compatible with future uploads
Out of Scope
- Full cross-platform song canonicalization
- GUI integration
- Production-ready 123 cloud upload implementation
- Streaming upload while downloading
Constraints
- Prefer reuse of existing source clients under
musicdl.modules.sources - Avoid new mandatory dependencies where stdlib is sufficient
- Keep first version recoverable and inspectable from local files and SQLite
- Preserve compatibility with the existing
musicdlpackage and console script
Architecture
The new workflow lives in a dedicated package under musicdl.catalogsync. Collectors fetch playlist candidates per source and pool kind, then a sync layer normalizes and persists them. Playlist parsing reuses the existing per-platform clients to resolve tracks into SongInfo objects, which are then stored into catalog tables and used to derive artist pool membership. A download planner reads undispatched songs from the database, skips anything already represented by an active local file asset, and otherwise delegates the actual media fetch to existing source download logic.
Storage metadata is modeled with a logical file layer plus a location layer. file_assets describes the downloaded media version for a song, while file_locations records where that file lives. The first implementation only writes local locations, but the schema supports cloud-drive or bucket/key locations later without changing the song-level model.
Data Model
Catalog
playlist_poolsplaylistspool_playlistsartist_poolsartistspool_artistssongsplaylist_songsartist_songs
File and Storage
storage_backendsfile_assetsfile_locationsdownload_tasks
Key Behaviors
Playlist Sync
- Fetch playlist-square and toplist candidates for selected sources.
- Upsert pool rows and playlist rows.
- Link pools to playlists.
- For selected playlists, call platform
parseplaylist()to resolve songs. - Upsert song rows and
playlist_songs. - Extract artists from raw platform metadata when possible, otherwise from normalized singer strings.
- Upsert artists and attach them to derived artist pools and
artist_songs.
Download Dedupe
- A song is considered already owned when it has an active local
file_location. - Dedupe key at song level is
(platform, remote_song_id). - The first implementation keeps one preferred file asset per song. Future uploads add locations, not duplicate song rows.
Quality Selection
- Existing platform clients already attempt higher qualities first.
- The workflow treats the returned file as the chosen asset and persists:
- quality label
- extension
- file size
- hash when available or computable
Low Disk Space
- Before each download, check free space for the active local backend.
- If insufficient, pause and prompt for a new local directory.
- Upsert a new local backend row and continue subsequent downloads there.
- Already downloaded files remain linked to their original backend.
Future Upload Compatibility
storage_backendsrepresents local FS, cloud-drive roots, or object-storage containers.file_locations.container_name + locatorcan represent:- local root + relative path
- cloud root + remote path
- bucket + key
- Future upload jobs can attach new non-local locations to an existing
file_asset.
Acceptance Criteria
- Selected source collectors can persist playlist-square and toplist rows into SQLite.
- Playlist sync can populate songs and derived artists from at least the supported source set.
- Download command skips songs already backed by active local file locations.
- Low-space prompt can switch to a new local directory and continue.
- Tests cover schema creation, normalization, derived artist sync, dedupe checks, and collector parsing helpers.