121 lines
4.8 KiB
Markdown
121 lines
4.8 KiB
Markdown
# Catalog Sync Design
|
|
|
|
## Goal
|
|
|
|
Build an independent catalog sync and download workflow that:
|
|
|
|
- extracts playlist-square and toplist sources from NetEase, QQ Music, and Kuwo
|
|
- stores `playlist pool -> playlist -> song` and derived `artist pool -> artist -> song`
|
|
- skips duplicate downloads by `(platform, remote_song_id)`
|
|
- prefers highest available quality and falls back when needed
|
|
- supports pausing on low disk space and continuing in a new local directory
|
|
- keeps storage metadata compatible with local paths, cloud-drive paths, and bucket/key style object storage
|
|
|
|
## Scope
|
|
|
|
### In Scope
|
|
|
|
- Independent Python CLI entrypoint
|
|
- SQLite schema for catalog, file, and task state
|
|
- Source collectors for:
|
|
- NetEase playlist square + toplists
|
|
- QQ playlist square + toplists
|
|
- Kuwo playlist square + toplists
|
|
- Reuse existing platform `parseplaylist()` and download logic where practical
|
|
- Derived artist pool updates during playlist sync
|
|
- Lazy artist enrichment metadata and hooks
|
|
- Local download dedupe and disk-space prompts
|
|
- Storage schema compatible with future uploads
|
|
|
|
### Out of Scope
|
|
|
|
- Full cross-platform song canonicalization
|
|
- GUI integration
|
|
- Production-ready 123 cloud upload implementation
|
|
- Streaming upload while downloading
|
|
|
|
## Constraints
|
|
|
|
- Prefer reuse of existing source clients under `musicdl.modules.sources`
|
|
- Avoid new mandatory dependencies where stdlib is sufficient
|
|
- Keep first version recoverable and inspectable from local files and SQLite
|
|
- Preserve compatibility with the existing `musicdl` package and console script
|
|
|
|
## Architecture
|
|
|
|
The new workflow lives in a dedicated package under `musicdl.catalogsync`. Collectors fetch playlist candidates per source and pool kind, then a sync layer normalizes and persists them. Playlist parsing reuses the existing per-platform clients to resolve tracks into `SongInfo` objects, which are then stored into catalog tables and used to derive artist pool membership. A download planner reads undispatched songs from the database, skips anything already represented by an active local file asset, and otherwise delegates the actual media fetch to existing source download logic.
|
|
|
|
Storage metadata is modeled with a logical file layer plus a location layer. `file_assets` describes the downloaded media version for a song, while `file_locations` records where that file lives. The first implementation only writes local locations, but the schema supports cloud-drive or bucket/key locations later without changing the song-level model.
|
|
|
|
## Data Model
|
|
|
|
### Catalog
|
|
|
|
- `playlist_pools`
|
|
- `playlists`
|
|
- `pool_playlists`
|
|
- `artist_pools`
|
|
- `artists`
|
|
- `pool_artists`
|
|
- `songs`
|
|
- `playlist_songs`
|
|
- `artist_songs`
|
|
|
|
### File and Storage
|
|
|
|
- `storage_backends`
|
|
- `file_assets`
|
|
- `file_locations`
|
|
- `download_tasks`
|
|
|
|
## Key Behaviors
|
|
|
|
### Playlist Sync
|
|
|
|
1. Fetch playlist-square and toplist candidates for selected sources.
|
|
2. Upsert pool rows and playlist rows.
|
|
3. Link pools to playlists.
|
|
4. For selected playlists, call platform `parseplaylist()` to resolve songs.
|
|
5. Upsert song rows and `playlist_songs`.
|
|
6. Extract artists from raw platform metadata when possible, otherwise from normalized singer strings.
|
|
7. Upsert artists and attach them to derived artist pools and `artist_songs`.
|
|
|
|
### Download Dedupe
|
|
|
|
- A song is considered already owned when it has an active local `file_location`.
|
|
- Dedupe key at song level is `(platform, remote_song_id)`.
|
|
- The first implementation keeps one preferred file asset per song. Future uploads add locations, not duplicate song rows.
|
|
|
|
### Quality Selection
|
|
|
|
- Existing platform clients already attempt higher qualities first.
|
|
- The workflow treats the returned file as the chosen asset and persists:
|
|
- quality label
|
|
- extension
|
|
- file size
|
|
- hash when available or computable
|
|
|
|
### Low Disk Space
|
|
|
|
- Before each download, check free space for the active local backend.
|
|
- If insufficient, pause and prompt for a new local directory.
|
|
- Upsert a new local backend row and continue subsequent downloads there.
|
|
- Already downloaded files remain linked to their original backend.
|
|
|
|
### Future Upload Compatibility
|
|
|
|
- `storage_backends` represents local FS, cloud-drive roots, or object-storage containers.
|
|
- `file_locations.container_name + locator` can represent:
|
|
- local root + relative path
|
|
- cloud root + remote path
|
|
- bucket + key
|
|
- Future upload jobs can attach new non-local locations to an existing `file_asset`.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- Selected source collectors can persist playlist-square and toplist rows into SQLite.
|
|
- Playlist sync can populate songs and derived artists from at least the supported source set.
|
|
- Download command skips songs already backed by active local file locations.
|
|
- Low-space prompt can switch to a new local directory and continue.
|
|
- Tests cover schema creation, normalization, derived artist sync, dedupe checks, and collector parsing helpers.
|