Initial import: Music_Server, MusicFree, catalog-sync
This commit is contained in:
@@ -0,0 +1,120 @@
|
||||
# Catalog Sync Design
|
||||
|
||||
## Goal
|
||||
|
||||
Build an independent catalog sync and download workflow that:
|
||||
|
||||
- extracts playlist-square and toplist sources from NetEase, QQ Music, and Kuwo
|
||||
- stores `playlist pool -> playlist -> song` and derived `artist pool -> artist -> song`
|
||||
- skips duplicate downloads by `(platform, remote_song_id)`
|
||||
- prefers highest available quality and falls back when needed
|
||||
- supports pausing on low disk space and continuing in a new local directory
|
||||
- keeps storage metadata compatible with local paths, cloud-drive paths, and bucket/key style object storage
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- Independent Python CLI entrypoint
|
||||
- SQLite schema for catalog, file, and task state
|
||||
- Source collectors for:
|
||||
- NetEase playlist square + toplists
|
||||
- QQ playlist square + toplists
|
||||
- Kuwo playlist square + toplists
|
||||
- Reuse existing platform `parseplaylist()` and download logic where practical
|
||||
- Derived artist pool updates during playlist sync
|
||||
- Lazy artist enrichment metadata and hooks
|
||||
- Local download dedupe and disk-space prompts
|
||||
- Storage schema compatible with future uploads
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- Full cross-platform song canonicalization
|
||||
- GUI integration
|
||||
- Production-ready 123 cloud upload implementation
|
||||
- Streaming upload while downloading
|
||||
|
||||
## Constraints
|
||||
|
||||
- Prefer reuse of existing source clients under `musicdl.modules.sources`
|
||||
- Avoid new mandatory dependencies where stdlib is sufficient
|
||||
- Keep first version recoverable and inspectable from local files and SQLite
|
||||
- Preserve compatibility with the existing `musicdl` package and console script
|
||||
|
||||
## Architecture
|
||||
|
||||
The new workflow lives in a dedicated package under `musicdl.catalogsync`. Collectors fetch playlist candidates per source and pool kind, then a sync layer normalizes and persists them. Playlist parsing reuses the existing per-platform clients to resolve tracks into `SongInfo` objects, which are then stored into catalog tables and used to derive artist pool membership. A download planner reads undispatched songs from the database, skips anything already represented by an active local file asset, and otherwise delegates the actual media fetch to existing source download logic.
|
||||
|
||||
Storage metadata is modeled with a logical file layer plus a location layer. `file_assets` describes the downloaded media version for a song, while `file_locations` records where that file lives. The first implementation only writes local locations, but the schema supports cloud-drive or bucket/key locations later without changing the song-level model.
|
||||
|
||||
## Data Model
|
||||
|
||||
### Catalog
|
||||
|
||||
- `playlist_pools`
|
||||
- `playlists`
|
||||
- `pool_playlists`
|
||||
- `artist_pools`
|
||||
- `artists`
|
||||
- `pool_artists`
|
||||
- `songs`
|
||||
- `playlist_songs`
|
||||
- `artist_songs`
|
||||
|
||||
### File and Storage
|
||||
|
||||
- `storage_backends`
|
||||
- `file_assets`
|
||||
- `file_locations`
|
||||
- `download_tasks`
|
||||
|
||||
## Key Behaviors
|
||||
|
||||
### Playlist Sync
|
||||
|
||||
1. Fetch playlist-square and toplist candidates for selected sources.
|
||||
2. Upsert pool rows and playlist rows.
|
||||
3. Link pools to playlists.
|
||||
4. For selected playlists, call platform `parseplaylist()` to resolve songs.
|
||||
5. Upsert song rows and `playlist_songs`.
|
||||
6. Extract artists from raw platform metadata when possible, otherwise from normalized singer strings.
|
||||
7. Upsert artists and attach them to derived artist pools and `artist_songs`.
|
||||
|
||||
### Download Dedupe
|
||||
|
||||
- A song is considered already owned when it has an active local `file_location`.
|
||||
- Dedupe key at song level is `(platform, remote_song_id)`.
|
||||
- The first implementation keeps one preferred file asset per song. Future uploads add locations, not duplicate song rows.
|
||||
|
||||
### Quality Selection
|
||||
|
||||
- Existing platform clients already attempt higher qualities first.
|
||||
- The workflow treats the returned file as the chosen asset and persists:
|
||||
- quality label
|
||||
- extension
|
||||
- file size
|
||||
- hash when available or computable
|
||||
|
||||
### Low Disk Space
|
||||
|
||||
- Before each download, check free space for the active local backend.
|
||||
- If insufficient, pause and prompt for a new local directory.
|
||||
- Upsert a new local backend row and continue subsequent downloads there.
|
||||
- Already downloaded files remain linked to their original backend.
|
||||
|
||||
### Future Upload Compatibility
|
||||
|
||||
- `storage_backends` represents local FS, cloud-drive roots, or object-storage containers.
|
||||
- `file_locations.container_name + locator` can represent:
|
||||
- local root + relative path
|
||||
- cloud root + remote path
|
||||
- bucket + key
|
||||
- Future upload jobs can attach new non-local locations to an existing `file_asset`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- Selected source collectors can persist playlist-square and toplist rows into SQLite.
|
||||
- Playlist sync can populate songs and derived artists from at least the supported source set.
|
||||
- Download command skips songs already backed by active local file locations.
|
||||
- Low-space prompt can switch to a new local directory and continue.
|
||||
- Tests cover schema creation, normalization, derived artist sync, dedupe checks, and collector parsing helpers.
|
||||
Reference in New Issue
Block a user