Initial import: Music_Server, MusicFree, catalog-sync
@@ -0,0 +1,364 @@
|
||||
# Musicdl APIs
|
||||
|
||||
|
||||
## `musicdl.musicdl.MusicClient`
|
||||
|
||||
A unified interface encapsulated for all supported music platforms. Arguments supported when initializing this class include:
|
||||
|
||||
- **music_sources** (`list[str]`, optional): A list of music client names to be enabled.
|
||||
Each name must be a key registered in `MusicClientBuilder.REGISTERED_MODULES`.
|
||||
If left empty, the following default sources are used:
|
||||
`['MiguMusicClient', 'NeteaseMusicClient', 'QQMusicClient', 'KuwoMusicClient', 'QianqianMusicClient']`.
|
||||
|
||||
- **init_music_clients_cfg** (`dict[str, dict]`, optional): Per-client initialization configuration.
|
||||
The outer dict is keyed by music source name (*e.g.*, `"NeteaseMusicClient"`), and each value is a dict that overrides the default config:
|
||||
```python
|
||||
{
|
||||
"search_size_per_source": 5,
|
||||
"auto_set_proxies": False,
|
||||
"random_update_ua": False,
|
||||
"enable_search_curl_cffi": False,
|
||||
"enable_download_curl_cffi": False,
|
||||
"enable_parse_curl_cffi": False,
|
||||
"max_retries": 3,
|
||||
"maintain_session": False,
|
||||
"logger_handle": LoggerHandle(),
|
||||
"disable_print": True,
|
||||
"work_dir": "musicdl_outputs",
|
||||
"freeproxy_settings": None,
|
||||
"default_search_cookies": {},
|
||||
"default_download_cookies": {},
|
||||
"default_parse_cookies": {},
|
||||
"type": music_source,
|
||||
"search_size_per_page": 10,
|
||||
"strict_limit_search_size_per_page": True,
|
||||
"quark_parser_config": {},
|
||||
}
|
||||
```
|
||||
Any keys you provide will overwrite the defaults for that specific source only.
|
||||
|
||||
- **clients_threadings** (`dict[str, int]`, optional): Number of threads to use for each music client when searching/downloading.
|
||||
Keys are music source names; values are integers.
|
||||
If a source is missing from this dict, it defaults to `5` threads.
|
||||
|
||||
- **requests_overrides** (`dict[str, dict]`, optional): Per-client overrides for HTTP requests.
|
||||
Keys are music source names; values are dicts that will be forwarded as `request_overrides` to the underlying clients’ `search` and `download` methods.
|
||||
Typical usage is to pass `requests.get`-like kwargs such as custom `headers`, `proxies`, or `timeout`.
|
||||
If a source is missing from this dict, it defaults to an empty dict `{}`.
|
||||
|
||||
- **search_rules** (`dict[str, dict]`, optional): Per-client search rules.
|
||||
Keys are music source names; values are dicts passed as `rule` to the clients’ `search` method to control source-specific search behavior (*e.g.*, quality filters, sort rules, *etc.*, depending on the implementation of each client).
|
||||
If a source is missing from this dict, it defaults to an empty dict `{}`.
|
||||
|
||||
Once initialized, `MusicClient` exposes high-level `search` and `download` methods that automatically dispatch requests to all configured music sources.
|
||||
|
||||
#### `MusicClient.startcmdui()`
|
||||
|
||||
Start an interactive command-line interface for searching and downloading music.
|
||||
|
||||
This method:
|
||||
|
||||
- Prints basic usage information (version, save paths, *etc.*.).
|
||||
- Prompts the user to input keywords for music search.
|
||||
- Calls `MusicClient.search()` to retrieve search results from all configured music sources.
|
||||
- Displays a formatted table of candidate songs with IDs.
|
||||
- Opens a cursor-based selection UI where the user can choose one or multiple songs:
|
||||
- Use "↑/↓" to move the cursor
|
||||
- Press "Space" to toggle selection
|
||||
- Press "a" to select all, "i" to invert selection
|
||||
- Press "Enter" to confirm and start downloading
|
||||
- Press "Esc" or "q" to cancel selection
|
||||
- Collects the corresponding song info entries and calls `MusicClient.download()` to download them.
|
||||
|
||||
Special commands (at the main prompt):
|
||||
|
||||
- Enter `r` to **reinitialize** the program (*i.e.*, return to the main menu).
|
||||
- Enter `q` to **exit** the program.
|
||||
|
||||
This method runs in a loop and blocks until the user quits.
|
||||
|
||||
#### `MusicClient.search(keyword: str)`
|
||||
|
||||
Search for songs from all configured music platforms using a given `keyword`.
|
||||
The results from all sources are collected into a dictionary.
|
||||
Each per-source result is a list of song info dictionaries, which typically include: `singers`, `song_name`. `file_size`, `duration`, `album`, `source`, `ext` and other client-specific metadata.
|
||||
|
||||
- **Arguments**:
|
||||
|
||||
- **keyword** (`str`): Search keyword, *e.g.*, song name, artist name, *etc.*.
|
||||
|
||||
- **Returns**:
|
||||
|
||||
- `dict[str, list[SongInfo]]`: A mapping from music source name (*e.g.*, `"NeteaseMusicClient"`) to a list of song info dictionaries returned by that source.
|
||||
|
||||
#### `MusicClient.download(song_infos: list[SongInfo])`
|
||||
|
||||
Download one or more songs given a list of song info dictionaries.
|
||||
Thread settings and request overrides are automatically taken from `MusicClient.clients_threadings` and `MusicClient.requests_overrides`.
|
||||
|
||||
- **Arguments**:
|
||||
|
||||
- **song_infos** (`list[SongInfo]`): A list of song info dictionaries, usually taken from the output of `MusicClient.search()`.
|
||||
Each dictionary must contain a source key so that the method can route it to the appropriate client.
|
||||
|
||||
- **Returns**:
|
||||
|
||||
- `None`.
|
||||
|
||||
|
||||
## `musicdl.modules.sources.BaseMusicClient`
|
||||
|
||||
`BaseMusicClient` defines the common workflow for searching, downloading, and playlist parsing across different music sources.
|
||||
Concrete clients only need to implement the source-specific parsing and URL construction logic, while the base class handles concurrency, progress display, deduplication, working-directory creation, and result serialization.
|
||||
To put it simply, `BaseMusicClient` is the abstract base class for all concrete music clients, including,
|
||||
|
||||
- `musicdl.modules.sources.AppleMusicClient`
|
||||
- `musicdl.modules.sources.BilibiliMusicClient`
|
||||
- `musicdl.modules.sources.DeezerMusicClient`
|
||||
- `musicdl.modules.sources.FiveSingMusicClient`
|
||||
- `musicdl.modules.sources.JamendoMusicClient`
|
||||
- `musicdl.modules.sources.JooxMusicClient`
|
||||
- `musicdl.modules.sources.KugouMusicClient`
|
||||
- `musicdl.modules.sources.KuwoMusicClient`
|
||||
- `musicdl.modules.sources.MiguMusicClient`
|
||||
- `musicdl.modules.sources.NeteaseMusicClient`
|
||||
- `musicdl.modules.sources.QianqianMusicClient`
|
||||
- `musicdl.modules.sources.QQMusicClient`
|
||||
- `musicdl.modules.sources.QobuzMusicClient`
|
||||
- `musicdl.modules.sources.SodaMusicClient`
|
||||
- `musicdl.modules.sources.StreetVoiceMusicClient`
|
||||
- `musicdl.modules.sources.SoundCloudMusicClient`
|
||||
- `musicdl.modules.sources.SpotifyMusicClient`
|
||||
- `musicdl.modules.sources.TIDALMusicClient`
|
||||
- `musicdl.modules.sources.YouTubeMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.BuguyyMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.FiveSongMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.FangpiMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.FLMP3MusicClient`
|
||||
- `musicdl.modules.thirdpartysites.GequbaoMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.GequhaiMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.HTQYYMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.JCPOOMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.KKWSMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.LivePOOMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.MituMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.TwoT58MusicClient`
|
||||
- `musicdl.modules.thirdpartysites.YinyuedaoMusicClient`
|
||||
- `musicdl.modules.thirdpartysites.ZhuolinMusicClient`
|
||||
- `musicdl.modules.common.GDStudioMusicClient`
|
||||
- `musicdl.modules.common.JBSouMusicClient`
|
||||
- `musicdl.modules.common.MP3JuiceMusicClient`
|
||||
- `musicdl.modules.common.MyFreeMP3MusicClient`
|
||||
- `musicdl.modules.common.TuneHubMusicClient`
|
||||
- `musicdl.modules.audiobooks.LizhiMusicClient`
|
||||
- `musicdl.modules.audiobooks.LRTSMusicClient`
|
||||
- `musicdl.modules.audiobooks.QingtingMusicClient`
|
||||
- `musicdl.modules.audiobooks.XimalayaMusicClient`
|
||||
|
||||
End users usually **do not** instantiate `BaseMusicClient` directly, but instead use one of the specific clients above.
|
||||
The methods documented here describe the common behavior of all these clients.
|
||||
Arguments supported when initializing this class include:
|
||||
|
||||
- **search_size_per_source** (`int`, default `5`):
|
||||
Maximum number of search results to fetch per source.
|
||||
|
||||
- **auto_set_proxies** (`bool`, default `False`):
|
||||
If `True`, randomly assign a free proxy fetched by `freeproxy.ProxiedSessionClient` (details refer to [FreeProxy](https://github.com/CharlesPikachu/freeproxy/tree/master)) for each request (not work for `AppleMusicClient`, `TIDALMusicClient` and `YouTubeMusicClient`).
|
||||
|
||||
- **random_update_ua** (`bool`, default `False`):
|
||||
If `True`, randomly refresh the `User-Agent` header on each request (not work for `AppleMusicClient`, `TIDALMusicClient`, `KugouMusicClient` and `YouTubeMusicClient`).
|
||||
|
||||
- **enable_search_curl_cffi** (`bool`, default `False`):
|
||||
If `True`, `curl_cffi.requests.Session` is used for each search request (not work for `AppleMusicClient`, `TIDALMusicClient` and `YouTubeMusicClient`).
|
||||
|
||||
- **enable_download_curl_cffi** (`bool`, default `False`):
|
||||
If `True`, `curl_cffi.requests.Session` is used for each download request (not work for `AppleMusicClient`, `TIDALMusicClient` and `YouTubeMusicClient`).
|
||||
|
||||
- **enable_parse_curl_cffi** (`bool`, default `False`):
|
||||
If `True`, `curl_cffi.requests.Session` is used for each parseplaylist request (not work for `AppleMusicClient`, `TIDALMusicClient` and `YouTubeMusicClient`).
|
||||
|
||||
- **max_retries** (`int`, default `3`):
|
||||
Maximum number of retry attempts for each HTTP request in `BaseMusicClient.get()` / `BaseMusicClient.post()`.
|
||||
|
||||
- **maintain_session** (`bool`, default `False`):
|
||||
If `False`, a new `requests.Session` is created before each request;
|
||||
if `True`, the same session is reused across requests (not work for `AppleMusicClient`, `TIDALMusicClient`, `KugouMusicClient` and `YouTubeMusicClient`).
|
||||
|
||||
- **logger_handle** (`LoggerHandle`, optional):
|
||||
Logger instance used for logging.
|
||||
If `None`, a new `LoggerHandle` is created.
|
||||
|
||||
- **disable_print** (`bool`, default `False`):
|
||||
If `True`, suppress printing in `logger_handle` calls where supported.
|
||||
|
||||
- **work_dir** (`str`, default `'musicdl_outputs'`):
|
||||
Root directory for saving search and download results.
|
||||
Each search will create its own subdirectory under this path.
|
||||
|
||||
- **freeproxy_settings** (`dict` or `None`, default `None`):
|
||||
Arguments passed when instantiating `freeproxy.ProxiedSessionClient`.
|
||||
If `None`, defaults to `dict(disable_print=True, proxy_sources=['ProxiflyProxiedSession'], max_tries=20, init_proxied_session_cfg={})` when `auto_set_proxies=True`.
|
||||
|
||||
- **default_search_cookies** (`dict` or `None`, default `{}`):
|
||||
Default cookies used for `BaseMusicClient.search` requests.
|
||||
|
||||
- **default_download_cookies** (`dict` or `None`, default `{}`):
|
||||
Default cookies used for `BaseMusicClient.download` requests.
|
||||
|
||||
- **default_parse_cookies** (`dict` or `None`, default `{}`):
|
||||
Default cookies used for `BaseMusicClient.parseplaylist` requests.
|
||||
|
||||
- **search_size_per_page** (`int`, default `10`):
|
||||
When searching for songs, if `search_size_per_source` is greater than `search_size_per_page`,
|
||||
the downloader will send paginated requests to the corresponding sites to retrieve the search results,
|
||||
with each page containing `search_size_per_page` songs.
|
||||
|
||||
- **strict_limit_search_size_per_page** (`bool`, default `True`):
|
||||
Some sites do not allow `search_size_per_page` to control how many songs are returned per request,
|
||||
which may cause the final number of search results from that site to exceed `search_size_per_source`.
|
||||
Setting this parameter to `True` enforces that the total number of results is less than or equal to `search_size_per_source`.
|
||||
|
||||
- **quark_parser_config** (`dict` or `None`, default `{}`):
|
||||
Some sites, such as `MituMusicClient`, `GequbaoMusicClient`, `YinyuedaoMusicClient`, and `BuguyyMusicClient`,
|
||||
store their lossless audio files on [Quark Netdisk](https://pan.quark.cn/).
|
||||
For these websites, if you want to download lossless-quality music files using musicdl,
|
||||
you need to configure `quark_parser_config` with the `cookies` from your Quark Netdisk web session after logging in, *e.g.*,
|
||||
`quark_parser_config={'cookies': xxxxxx}`.
|
||||
|
||||
#### `BaseMusicClient.search(keyword: str, num_threadings=5, request_overrides=None, rule=None, main_process_context=None, main_progress_id=None, main_progress_lock=None)`
|
||||
|
||||
Search for audio resources from the current music source, such as Netease, Kugou, QQ, and others.
|
||||
This method delegates platform-specific logic to `_constructsearchurls()` and `_search()`, then merges and deduplicates the results.
|
||||
|
||||
- **Arguments**:
|
||||
|
||||
- **keyword** (`str`)
|
||||
Search keyword, such as a song name, artist name, album title, or other query text.
|
||||
|
||||
- **num_threadings** (`int`, default: `5`)
|
||||
Number of worker threads used to search across all constructed search URLs concurrently.
|
||||
|
||||
- **request_overrides** (`dict | None`, default: `None`)
|
||||
Extra request options forwarded to the underlying search requests, such as `headers`, `cookies`, `proxies`, `timeout`, or `verify`.
|
||||
If `None`, it is treated as an empty dictionary.
|
||||
|
||||
- **rule** (`dict | None`, default: `None`)
|
||||
Client-specific search options passed into `_constructsearchurls()`.
|
||||
This may include filters such as page rules, quality constraints, sort preferences, or other source-specific search parameters.
|
||||
If `None`, it is treated as an empty dictionary.
|
||||
|
||||
- **main_process_context** (`rich.progress.Progress | None`, default: `None`)
|
||||
Optional external Rich `Progress` instance. If provided, the search task is attached to that progress context instead of creating a new one internally.
|
||||
|
||||
- **main_progress_id** (`int | None`, default: `None`)
|
||||
Optional task ID in `main_process_context` used to update a shared global progress bar across multiple sources.
|
||||
|
||||
- **main_progress_lock** (`threading.Lock | None`, default: `None`)
|
||||
Optional lock used to synchronize progress updates when multiple clients share the same progress context.
|
||||
|
||||
- **Returns**:
|
||||
|
||||
- **`list[SongInfo]`**
|
||||
A deduplicated list of `SongInfo` objects returned by the source-specific `_search()` implementation.
|
||||
|
||||
After searching, this method also assigns a generated `work_dir` to each result. For episodic items, episode-level working directories may also be assigned.
|
||||
|
||||
- **Behavior**
|
||||
|
||||
- Logs the start and end of the search process.
|
||||
- Calls `_constructsearchurls()` to generate one or more search URLs.
|
||||
- Uses a thread pool to run `_search()` concurrently on all generated URLs.
|
||||
- Merges results from all threads.
|
||||
- Removes duplicates using the `SongInfo.identifier` field.
|
||||
- Creates a unique working directory for the current search.
|
||||
- Saves search results to `search_results.pkl` inside the corresponding working directory.
|
||||
- Returns all valid `SongInfo` results.
|
||||
|
||||
- **Notes**
|
||||
|
||||
- Concrete subclasses must implement:
|
||||
- `BaseMusicClient._constructsearchurls()`
|
||||
- `BaseMusicClient._search()`
|
||||
- Deduplication is based on `song_info.identifier`.
|
||||
- The returned items are `SongInfo` objects, not plain dictionaries, although they are serialized as dictionaries when saved to disk.
|
||||
|
||||
#### `BaseMusicClient.download(song_infos: list[SongInfo], num_threadings=5, request_overrides=None, auto_supplement_song=True)`
|
||||
|
||||
Download one or more audio items represented by `SongInfo` objects.
|
||||
|
||||
This method supports both standard HTTP downloads and HLS downloads, depending on `song_info.protocol`.
|
||||
|
||||
- **Arguments**:
|
||||
|
||||
- **song_infos** (`list[SongInfo]`)
|
||||
A list of `SongInfo` objects to download, typically returned by `BaseMusicClient.search()` or `BaseMusicClient.parseplaylist()`.
|
||||
|
||||
- **num_threadings** (`int`, default: `5`)
|
||||
Number of worker threads used for concurrent downloading.
|
||||
|
||||
- **request_overrides** (`dict | None`, default: `None`)
|
||||
Extra request options forwarded to the underlying download request, such as `headers`, `cookies`, `proxies`, `timeout`, or `verify`.
|
||||
If `None`, it is treated as an empty dictionary.
|
||||
|
||||
- **auto_supplement_song** (`bool`, default: `True`)
|
||||
Whether to post-process successfully downloaded items with `SongInfoUtils.supplsonginfothensavelyricsthenwritetags(...)`.
|
||||
When enabled, the downloader may supplement metadata, save lyrics, and write tags after download.
|
||||
|
||||
- **Returns**:
|
||||
|
||||
- **`list[SongInfo]`**
|
||||
A list of successfully downloaded `SongInfo` objects.
|
||||
|
||||
- **Behavior**
|
||||
|
||||
- Logs the start and end of the download process.
|
||||
- Shortens paths in `song_infos` before downloading.
|
||||
- Creates a Rich progress display with:
|
||||
- an overall audio progress bar
|
||||
- per-song progress bars
|
||||
- transfer speed and estimated remaining time
|
||||
- Downloads items concurrently using a thread pool.
|
||||
- Supports:
|
||||
- *HLS* downloads through `HLSDownloader`
|
||||
- *HTTP* downloads from in-memory `downloaded_contents`
|
||||
- *HTTP* streamed downloads from `download_url`
|
||||
- Saves successful results to `download_results.pkl` in the corresponding working directory.
|
||||
|
||||
- **Protocol-specific behavior**
|
||||
|
||||
- If `song_info.protocol == "HLS"`:
|
||||
- Uses `HLSDownloader`
|
||||
- Downloads the best quality stream
|
||||
- Removes temporary segments after completion
|
||||
|
||||
- If `song_info.protocol == "HTTP"` and `song_info.downloaded_contents` is already available:
|
||||
- Writes the in-memory bytes directly to `song_info.save_path`
|
||||
|
||||
- If `song_info.protocol == "HTTP"` and `downloaded_contents` is not available:
|
||||
- Streams the file from `song_info.download_url`
|
||||
|
||||
- **Notes**
|
||||
|
||||
- Individual download failures do not stop the entire batch.
|
||||
- Failed items are skipped from the returned list.
|
||||
- Per-item headers may override global request headers if `song_info.default_download_headers` is set.
|
||||
|
||||
#### `BaseMusicClient.parseplaylist(playlist_url: str, request_overrides=None)`
|
||||
|
||||
Parse a playlist URL and extract downloadable audio items from it.
|
||||
|
||||
This method is intended for source-specific playlist parsing, such as album pages, playlist pages, episode collections, or shared links.
|
||||
|
||||
- **Arguments**
|
||||
|
||||
- **playlist_url** (`str`)
|
||||
URL of the playlist or collection page to parse.
|
||||
|
||||
- **request_overrides** (`dict | None`, default: `None`)
|
||||
Extra request options forwarded to the underlying parsing requests, such as `headers`, `cookies`, `proxies`, `timeout`, or `verify`.
|
||||
If `None`, it is treated as an empty dictionary.
|
||||
|
||||
- **Returns**
|
||||
|
||||
- Usually **`list[SongInfo]`**
|
||||
A list of parsed `SongInfo` objects representing the items in the playlist.
|
||||
@@ -0,0 +1,38 @@
|
||||
# Personal Information
|
||||
|
||||
#### Homepage
|
||||
|
||||
Personal Homepage: [https://charlespikachu.github.io/](https://charlespikachu.github.io/)
|
||||
|
||||
#### Social Media
|
||||
|
||||
**WeChat Official Account (微信公众号)**
|
||||
|
||||
Here’s my WeChat Official Account! You can scan the QR code to follow, or search "Charles的皮卡丘" / "Charles_pikachu" Thanks!
|
||||
|
||||

|
||||
|
||||
**Twitter (推特账号)**
|
||||
|
||||
My Twitter handle is listed below. You’re very welcome to follow.
|
||||
|
||||
[https://x.com/Charles40973215](https://x.com/Charles40973215)
|
||||
|
||||
**Zhihu (知乎账号)**
|
||||
|
||||
If you use Zhihu, you can follow me via the account below.
|
||||
|
||||
[https://www.zhihu.com/people/charles_pikachu](https://www.zhihu.com/people/charles_pikachu)
|
||||
|
||||
**Bilibili (B站账号)**
|
||||
|
||||
You can find my Bilibili account below. I’d be glad to have you as a follower.
|
||||
|
||||
[https://space.bilibili.com/406756145](https://space.bilibili.com/406756145)
|
||||
|
||||
**Github**
|
||||
|
||||
My GitHub profile is shared below, where I post code and projects, feel free to follow.
|
||||
|
||||
[https://github.com/CharlesPikachu](https://github.com/CharlesPikachu)
|
||||
|
||||
@@ -0,0 +1,163 @@
|
||||
# Release Log
|
||||
|
||||
- 2026-03-16: Released musicdl v2.10.2 — added multiple shared premium membership APIs for Tidal, Deezer, and Qobuz; added support for multiple lyric search platforms to supplement third-party lyric information for several music platforms supported by musicdl; and made some minor code optimizations.
|
||||
|
||||
- 2026-03-14: Released musicdl v2.10.1 — added music search and download support for Qobuz (https://play.qobuz.com/discover), along with playlist parsing and download features; expanded the number of shared NetEase Cloud Music premium accounts; fixed several known minor bugs.
|
||||
|
||||
- 2026-03-13: Released musicdl v2.10.0 — added support for the Deezer Music Client (search, download, playlist); modified arguments of some API interfaces so that users can choose whether to write song metadata and save lyrics; and attempted to fix several known bugs.
|
||||
|
||||
- 2026-03-11: Released musicdl v2.9.10 — meticulously refactored the core code for Qishui Music, SoundCloud, TIDAL, and Apple Music to support playlist parsing and downloading across these platforms; fixed some bugs.
|
||||
|
||||
- 2026-03-10: Released musicdl v2.9.9 — refactored the code for QQ Music, Migu Music, Kugou Music, Kuwo Music, Qianqian Music, NetEase Cloud Music, YouTube Music, JOOX Music, Jamendo Music, Bilibili Music, 5SING Music, and StreetVoice to better support playlist parsing and future feature expansion; also improved the downloadable audio quality for some of the above platforms, such as Jamendo Music; fix some known bugs.
|
||||
|
||||
- 2026-03-07: Released musicdl v2.9.8 — fixed multiple third-party music search and download platforms; resolved a bug in determining whether music download links are available; unified the code style across the e-book platform and third-party music platforms.
|
||||
|
||||
- 2026-02-24: Released musicdl v2.9.7 — fix some bugs in musicdl, and add support for searching and downloading books and albums from LanRenTingShu site.
|
||||
|
||||
- 2026-02-19: Released musicdl v2.9.6 — this release introduces official API support for parsing complete playlists across NetEase, Migu, Qianqian, Kuwo, and Kugou Music; it also includes bug fixes for incomplete playlist/lyric fetching on specific platforms, alongside minor under-the-hood code improvements.
|
||||
|
||||
- 2026-02-14: Released musicdl v2.9.5 — this update fixes the incomplete playlist retrieval for NetEase Cloud Music, adds support for Kugou Music playlist parsing, introduces multiple lossless music APIs, and includes several potential bug fixes.
|
||||
|
||||
- 2026-02-11: Released musicdl v2.9.4 — supported batch downloading for Kuwo Music playlists; optimized playlist downloading for NetEase Cloud Music and QQ Music; added support for QQ Music’s "Premium Master 4.0" (臻品母带 4.0) to provide high-fidelity audio files.
|
||||
|
||||
- 2026-02-10: Released musicdl v2.9.3 — refactored the TIDAL client to utilize N_m3u8DL-RE for HI_RES_LOSSLESS support, and optimized the lossless interface for shared membership accounts.
|
||||
|
||||
- 2026-02-07: Released musicdl v2.9.2 — support parsing and downloading QQ Music playlists; update arguments for some API endpoints; update the KuGou Music track link parsing API to enable downloading music files with Viper audio effects/quality using a KuGou VIP account; and fix several bugs.
|
||||
|
||||
- 2026-02-05: Released musicdl v2.9.1 — support ALAC-quality downloads for Apple Music; add song search and download for the StreetVoice platform; add two new shared VIP-account API endpoints for NetEase Cloud Music.
|
||||
|
||||
- 2026-02-03: Released musicdl v2.9.0 — added support for native SoundCloud search and download APIs, session cookie authentication, and batch lossless music downloads from NetEase Cloud Music playlists.
|
||||
|
||||
- 2026-01-31: Released musicdl v2.8.12 — refactored the terminal table rendering algorithm to better accommodate support for giant table; fixed kugou lossless api; added a new YouTube parsing endpoint.
|
||||
|
||||
- 2026-01-30: Released musicdl v2.8.11 — added or enhanced search and download support for Ximalaya, Lizhi FM, and Qingting FM; fixed several known bugs.
|
||||
|
||||
- 2026-01-29: Released musicdl v2.8.10 — support batch downloading audiobooks from the same album on the Ximalaya platform; update the API interfaces for Ximalaya, Kuwo, and TuneHub; and fix some minor bugs.
|
||||
|
||||
- 2026-01-28: Released musicdl v2.8.9 — add an automatic song tag autofill feature, introduce shared membership APIs for additional platforms, and deploy a TuneHub hotfix.
|
||||
|
||||
- 2026-01-26: Released musicdl v2.8.8 — added a new lossless music search and download site, implemented a Lanzou Cloud direct-download link parser, and performed partial code optimizations.
|
||||
|
||||
- 2026-01-22: Released musicdl v2.8.7 — refactor the code for three music platforms (*i.e.*, YouTube, Joox, and Jamendo) to retrieve higher-quality audio from each platform.
|
||||
|
||||
- 2026-01-21: Released musicdl v2.8.6 — refactor the currently supported unofficial download sites to return more standardized song information.
|
||||
|
||||
- 2026-01-19: Released musicdl v2.8.5 — refactored four cross-platform search sources to speed up song cover and metadata extraction via the musicdl API, revamped the terminal UI, and fixed several potential bugs.
|
||||
|
||||
- 2026-01-16: Released musicdl v2.8.4 — partial code optimizations, added support for Qishui Music, refactored the Bilibili and 5sing music APIs.
|
||||
|
||||
- 2026-01-14: Released musicdl v2.8.3 — refactor and optimize the code for Migu Music, QQ Music, NetEase Cloud Music, Qianqian Music, Kuwo Music, and Kugou Music, standardize the lyrics output format, extract song cover image URLs into a unified location in the returned song information, and integrate more high-quality third-party APIs for retrieving lossless music.
|
||||
|
||||
- 2026-01-01: Released musicdl v2.8.2 — adjusted the FreeProxy API arguments, improved the robustness of fetching high-quality music from NetEase Cloud Music, QQ Music, and Migu Music, and fixed some bugs.
|
||||
|
||||
- 2025-12-31: Released musicdl v2.8.1 — support more lossless-music sharing sites, add pagination support for some of the previously supported sites, and fix the Rich display conflict under two-level concurrency with multi-source searching.
|
||||
|
||||
- 2025-12-29: Released musicdl v2.8.0 — added support for two additional sites, improved the search speed, stability and metadata completeness when fetching lossless tracks from some sites, and optimized certain default arguments (prioritize search speed).
|
||||
|
||||
- 2025-12-24: Released musicdl v2.7.6 — add support for MissEvan FM, and fix bugs on the Gequhai site.
|
||||
|
||||
- 2025-12-24: Released musicdl v2.7.5 — add support for lossless music search and parsing for the gequhai site, and optimize parts of the code.
|
||||
|
||||
- 2025-12-19: Released musicdl v2.7.4 — supports music search and download using TuneHubMusicClient.
|
||||
|
||||
- 2025-12-17: Released musicdl v2.7.3 — supports native Bilibili music search and downloads, improves the search speed of some third-party APIs, refactors the Ximalaya music platform code, and includes several other minor code optimizations.
|
||||
|
||||
- 2025-12-15: Released musicdl v2.7.2 — added support for jamendo and make some improvements.
|
||||
|
||||
- 2025-12-11: Released musicdl v2.7.1 — added support for two new sites and fixed several potential bugs.
|
||||
|
||||
- 2025-12-10: Released musicdl v2.7.0 — the code has been further refactored, with a large amount of redundant code removed or merged; all supported sites can now download lossless music (for some sites, you need to set your membership cookies in the command line or in the code), the search speed has been greatly optimized, and several problematic sites have been fixed.
|
||||
|
||||
- 2025-12-02: Released musicdl v2.6.2 — support parsing `AppleMusicClient` encrypted audio streams, along with some minor optimizations.
|
||||
|
||||
- 2025-12-01: Released musicdl v2.6.1 — we have provided more comprehensive documentation and added four new music search and download sources, *i.e.*, `MituMusicClient`, `GequbaoMusicClient`, `YinyuedaoMusicClient`, and `BuguyyMusicClient`, which allow you to download a large collection of lossless tracks.
|
||||
|
||||
- 2025-11-30: Released musicdl v2.6.0 — by tuning and improving the search arguments, we have significantly increased the search efficiency for some music sources, added support for searching and downloading music from Apple Music and MP3 Juice, and made several other minor optimizations.
|
||||
|
||||
- 2025-11-25: Released musicdl v2.5.0 — supports searching and downloading from YouTube Music and make musicdl more robust.
|
||||
|
||||
- 2025-11-21: Released musicdl v2.4.6 — fixed bugs caused by mismatched arguments in MusicClient.download and optimized music sources.
|
||||
|
||||
- 2025-11-19: Released musicdl v2.4.5 — fix potential in-place modified bugs in HTTP requests.
|
||||
|
||||
- 2025-11-19: Released musicdl v2.4.4 — some minor improvements and bug fixes.
|
||||
|
||||
- 2025-11-15: Released musicdl v2.4.3 — migu and netease have introduced an automatic audio quality enhancement feature, which significantly increases the chances of getting lossless quality, Hi-Res audio, JyEffect (HD surround sound), Sky (immersive surround sound), and JyMaster (ultra-clear master quality).
|
||||
|
||||
- 2025-11-15: Released musicdl v2.4.2 — save meta info to music files from TIDAL, fix user input bugs and migu search bugs.
|
||||
|
||||
- 2025-11-14: Released musicdl v2.4.1 — beautify print, add support for TIDAL (TIDAL is an artist-first, fan-centered music streaming platform that delivers over 110 million songs in HiFi sound quality to the global music community).
|
||||
|
||||
- 2025-11-12: Released musicdl v2.4.0 — complete code refactor; reintroduced support for music search and downloads on major platforms.
|
||||
|
||||
- 2023-02-22: Released musicdl v2.3.6 — fixed incorrect Netease Cloud Music information display and Migu lossless music download failures.
|
||||
|
||||
- 2022-09-03: Released musicdl v2.3.5 — fixed QQ Music song search.
|
||||
|
||||
- 2022-06-08: Released musicdl v2.3.4 — added support for Ximalaya audio search and download.
|
||||
|
||||
- 2022-05-14: Released musicdl v2.3.3 — fixed minor bugs.
|
||||
|
||||
- 2022-03-24: Released musicdl v2.3.0–v2.3.2 — removed Xiami Music; improved user interaction (progress bar, speech recognition, printed messages, etc.); and optimized code.
|
||||
|
||||
- 2022-03-15: Released musicdl v2.2.8 — fixed download issues for KuGou Music.
|
||||
|
||||
- 2022-03-08: Released musicdl v2.2.7 — added support for running directly via the `musicdl` command in the terminal.
|
||||
|
||||
- 2022-02-09: Released musicdl v2.2.6 — fixed header retrieval issues for QQ Music downloads.
|
||||
|
||||
- 2022-01-15: Released musicdl v2.2.5 — added support for specifying page indices in some music sources.
|
||||
|
||||
- 2022-01-05: Released musicdl v2.2.4 — refactored the code, removed Baidu lossless music (API deprecated and not recoverable), and fixed 5Sing song search.
|
||||
|
||||
- 2021-12-14: Released musicdl v2.2.3 — fixed issues with KuWo Music sources.
|
||||
|
||||
- 2021-08-30: Released musicdl v2.2.2 — added support for online voice-based song requests.
|
||||
|
||||
- 2021-08-29: Released musicdl v2.2.1 — fixed KuWo Music and Qianqian Music.
|
||||
|
||||
- 2020-11-27: Released musicdl v2.2.0 — added support for Lizhi FM, YiTing Music, and 5Sing Music; fixed issues with the progress bar and display; and added multi-threaded search.
|
||||
|
||||
- 2020-11-21: Released musicdl v2.1.11 — fixed Qianqian Music.
|
||||
|
||||
- 2020-11-20: Released musicdl v2.1.10 — further code optimization.
|
||||
|
||||
- 2020-11-04: Released musicdl v2.1.9 — added support for saving error messages to a `.log` file.
|
||||
|
||||
- 2020-10-17: Released musicdl v2.1.8 — optimized code.
|
||||
|
||||
- 2020-10-16: Released musicdl v2.1.7 — fixed Baidu lossless and Qianqian Music interfaces, and removed potential bugs and unnecessary characters.
|
||||
|
||||
- 2020-07-26: Released musicdl v2.1.6 — updated KuGou integration and removed the default proxy to avoid potential bugs.
|
||||
|
||||
- 2020-07-04: Released musicdl v2.1.5 — fixed QQ Music and JOOX Music.
|
||||
|
||||
- 2020-04-15: Released musicdl v2.1.4 — fixed several minor issues.
|
||||
|
||||
- 2020-04-03: Released musicdl v2.1.2 — added support for JOOX Music.
|
||||
|
||||
- 2020-04-02: Released musicdl v2.1.1 — improved robustness and re-added support for Xiami Music.
|
||||
|
||||
- 2020-04-01: Released musicdl v2.1.0 — major upgrade with a more user-friendly design, bug fixes, code optimization, and added project documentation.
|
||||
|
||||
- 2020-01-07: Released musicdl v2.0.7 — fixed bugs in Migu Music.
|
||||
|
||||
- 2019-08-28: Released musicdl v2.0.6 — added support for Migu Music.
|
||||
|
||||
- 2019-08-24: Released musicdl v2.0.5 — fixed various bugs and repaired invalid APIs.
|
||||
|
||||
- 2019-07-13: Released musicdl v2.0.4 — added support for Baidu lossless music.
|
||||
|
||||
- 2019-06-09: Released musicdl v2.0.2 — fixed bugs in KuGou Music.
|
||||
|
||||
- 2019-04-15: Released musicdl v2.0.2 — fixed bugs in Xiami Music.
|
||||
|
||||
- 2019-02-02: Released musicdl v2.0.1 — optimized code, improved user experience, and added support for installation via pip.
|
||||
|
||||
- 2018-08-05: Released musicdl v1.3 — added support for running in the terminal.
|
||||
|
||||
- 2018-07-02: Released musicdl v1.2 — optimized code and added support for Xiami Music.
|
||||
|
||||
- 2018-07-01: Released musicdl v1.1 — added support for KuWo Music.
|
||||
|
||||
- 2018-06-27: Released musicdl v1.0 — added download support for Netease Cloud Music, Qianqian Music, KuGou Music, and QQ Music.
|
||||
@@ -0,0 +1,12 @@
|
||||
# Project Disclaimer
|
||||
|
||||
<div align="center">
|
||||
<img src="https://raw.githubusercontent.com/CharlesPikachu/musicdl/master/docs/logo.png" width="600"/>
|
||||
</div>
|
||||
<br />
|
||||
|
||||
This repository is provided solely for educational and research purposes. Commercial use is prohibited.
|
||||
The software only interacts with publicly accessible web endpoints and does not host, store, mirror, or distribute any copyrighted or proprietary content.
|
||||
No executables are distributed with this repository. Redistribution, resale, or bundling of this software (or any derivative packaged distribution) without explicit permission is strictly prohibited.
|
||||
Access to paid, subscription, or otherwise restricted content must be obtained through authorized channels (*e.g.*, purchase or subscription via the relevant service). Use of this software to circumvent paywalls, DRM, licensing restrictions, or other access controls is strictly prohibited.
|
||||
If you are a copyright or rights holder and believe that this repository infringes your rights, please contact me with sufficient detail (*e.g.*, relevant URLs and proof of ownership), and I will promptly investigate and take appropriate action, which may include removal of the referenced material.
|
||||
@@ -0,0 +1,68 @@
|
||||
# Musicdl Installation
|
||||
|
||||
#### Environment Requirements
|
||||
|
||||
- Operating system: Linux, macOS, or Windows.
|
||||
- Python version: Python 3.9+ with requirements in [musicdl requirements.txt](https://github.com/CharlesPikachu/musicdl/blob/master/requirements.txt).
|
||||
|
||||
#### Installation Instructions
|
||||
|
||||
You have three installation methods to choose from,
|
||||
|
||||
```sh
|
||||
# from pip
|
||||
pip install musicdl
|
||||
# from github repo method-1
|
||||
pip install git+https://github.com/CharlesPikachu/musicdl.git@master
|
||||
# from github repo method-2
|
||||
git clone https://github.com/CharlesPikachu/musicdl.git
|
||||
cd musicdl
|
||||
python setup.py install
|
||||
```
|
||||
|
||||
Some of the music downloaders supported by musicdl require additional CLI tools to function properly, mainly for decrypting encrypted search/download requests and audio files.
|
||||
These CLI tools include,
|
||||
|
||||
- [FFmpeg](https://www.ffmpeg.org/): At the moment, only `TIDALMusicClient` and `AppleMusicClient` depends on FFmpeg for audio file decoding.
|
||||
If you don’t need to use `TIDALMusicClient` and `AppleMusicClient` when working with musicdl, you don’t need to install FFmpeg.
|
||||
After installing it, you should run the following command in a terminal (Command Prompt / PowerShell on Windows, Terminal on macOS/Linux) to check whether FFmpeg is on your system `PATH`:
|
||||
```bash
|
||||
ffmpeg -version
|
||||
```
|
||||
If FFmpeg is installed correctly and on your `PATH`, this command will print the FFmpeg version information (*e.g.*, a few lines starting with `ffmpeg version ...`).
|
||||
If you see an error like `command not found` or `'ffmpeg' is not recognized as an internal or external command`, then FFmpeg is either not installed or not added to your `PATH`.
|
||||
|
||||
- [Node.js](https://nodejs.org/en): Currently, only `YouTubeMusicClient` in musicdl depends on Node.js, so if you don’t need `YouTubeMusicClient`, you don’t have to install Node.js.
|
||||
Similar to FFmpeg, after installing Node.js, you should run the following command to check whether Node.js is on your system `PATH`:
|
||||
```bash
|
||||
node -v (npm -v)
|
||||
```
|
||||
If Node.js is installed correctly, `node -v` will print the Node.js version (*e.g.*, `v22.11.0`), and `npm -v` will print the npm version.
|
||||
If you see a similar `command not found` / `not recognized` error, Node.js is not installed correctly or not available on your `PATH`.
|
||||
|
||||
- [N_m3u8DL-RE](https://github.com/nilaoda/N_m3u8DL-RE): N_m3u8DL-RE is a powerful open-source command-line tool for downloading, decrypting, and muxing HLS/DASH (m3u8/mpd) streaming media into local video files.
|
||||
In musicdl, this library is mainly used for handling `TIDALMusicClient` and `AppleMusicClient` audio streams, so if you don’t need `TIDALMusicClient` and `AppleMusicClient` support, you don’t need to install it.
|
||||
After installing N_m3u8DL-RE, you need to make sure all of its executables are available in your system `PATH`.
|
||||
A quick way to verify this is that you should be able to run
|
||||
```bash
|
||||
python -c "import shutil; print(shutil.which('N_m3u8DL-RE'))"
|
||||
```
|
||||
in Command Prompt and get the full path without an error.
|
||||
|
||||
- [Bento4](https://www.bento4.com/downloads/): Bento4 is an open-source C++ toolkit for reading, writing, inspecting, and packaging MP4 files and related multimedia formats.
|
||||
In musicdl, this library is mainly used for handling `AppleMusicClient` audio streams, so if you don’t need `AppleMusicClient` support, you don’t need to install it.
|
||||
After installing Bento4, you need to make sure all of its executables are available in your system `PATH`.
|
||||
A quick way to verify this is that you should be able to run
|
||||
```bash
|
||||
python -c "import shutil; print(shutil.which('mp4decrypt'))"
|
||||
```
|
||||
in Command Prompt and get the full path without an error.
|
||||
|
||||
- [amdecrypt](https://github.com/CharlesPikachu/musicdl/releases/tag/clitools): amdecrypt is a command-line tool developed by AI that leverages Bento4's mp4decrypt to process Apple Music encrypted files into playable formats.
|
||||
You can obtain it from the [GitHub Releases](https://github.com/CharlesPikachu/musicdl/releases/tag/clitools) of this repository.
|
||||
After installing amdecrypt, you need to make sure all of its executables are available in your system `PATH`.
|
||||
A quick way to verify this is that you should be able to run
|
||||
```bash
|
||||
python -c "import shutil; print(shutil.which('amdecrypt'))"
|
||||
```
|
||||
in Command Prompt and get the full path without an error.
|
||||
@@ -0,0 +1,20 @@
|
||||
# Minimal makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line, and also
|
||||
# from the environment for the first two.
|
||||
SPHINXOPTS ?=
|
||||
SPHINXBUILD ?= sphinx-build
|
||||
SOURCEDIR = source
|
||||
BUILDDIR = build
|
||||
|
||||
# Put it first so that "make" without argument is like "make help".
|
||||
help:
|
||||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
|
||||
.PHONY: help Makefile
|
||||
|
||||
# Catch-all target: route all unknown targets to Sphinx using the new
|
||||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
|
||||
%: Makefile
|
||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
@@ -0,0 +1,34 @@
|
||||
# Playground
|
||||
|
||||
## Projects
|
||||
|
||||
| Project (EN) | Project (CN) | WeChat Article | Project Location |
|
||||
| :----: | :----: | :----: | :----: |
|
||||
| Music downloader GUI | 音乐下载器GUI界面 | [click](https://mp.weixin.qq.com/s/fN1ORyI6lzQFqxf6Zk1oIg) | [musicdlgui](https://github.com/CharlesPikachu/musicdl/tree/master/examples/examples/musicdlgui) |
|
||||
| Singer lyrics analysis | 歌手歌词分析 | [click](https://mp.weixin.qq.com/s/I8Dy7CoM2ThnSpjoUaPtig) | [singerlyricsanalysis](https://github.com/CharlesPikachu/musicdl/tree/master/examples/examples/singerlyricsanalysis) |
|
||||
| Lyric-based song snippet retrieval | 歌词获取歌曲片段 | [click](https://mp.weixin.qq.com/s/Vmc1IhuhMJ6C5vBwBe43Pg) | [searchlyrics](https://github.com/CharlesPikachu/musicdl/tree/master/examples/examples/searchlyrics) |
|
||||
|
||||
## Usage and results demonstration
|
||||
|
||||
#### Music downloader GUI
|
||||
|
||||
This project is located in the [musicdlgui](https://github.com/CharlesPikachu/musicdl/tree/master/examples/examples/musicdlgui) directory.
|
||||
To use it, run `python musicdlgui.py`. The result is shown below:
|
||||
|
||||
<div align="center">
|
||||
<img src="https://raw.githubusercontent.com/CharlesPikachu/musicdl/master/examples/musicdlgui/screenshot.png" width="600" alt="musicdl logo" />
|
||||
</div>
|
||||
|
||||
#### Singer lyrics analysis
|
||||
|
||||
This project is located in the [singerlyricsanalysis](https://github.com/CharlesPikachu/musicdl/tree/master/examples/examples/singerlyricsanalysis) directory.
|
||||
To use it, run `python singerlyricsanalysis.py`. The result is shown below:
|
||||
|
||||
<div align="center">
|
||||
<img src="https://raw.githubusercontent.com/CharlesPikachu/musicdl/master/examples/singerlyricsanalysis/screenshot.png" width="600" alt="musicdl logo" />
|
||||
</div>
|
||||
|
||||
#### Lyric-based song snippet retrieval
|
||||
|
||||
This project is located in the [searchlyrics](https://github.com/CharlesPikachu/musicdl/tree/master/examples/examples/searchlyrics) directory.
|
||||
To use it, run `python searchlyrics.py`.
|
||||
@@ -0,0 +1,935 @@
|
||||
# Quick Start
|
||||
|
||||
#### Typical Examples
|
||||
|
||||
Here, we provide some common musicdl use cases to help you quickly get started with the tool.
|
||||
|
||||
If you want the quickest way to run musicdl to verify that your environment meets its basic requirements and that musicdl has been installed successfully, you can write and run the following code,
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
music_client = musicdl.MusicClient(music_sources=['MiguMusicClient', 'NeteaseMusicClient', 'QQMusicClient', 'KuwoMusicClient', 'QianqianMusicClient'])
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
The above code runs musicdl using `MiguMusicClient`, `NeteaseMusicClient`, `QQMusicClient`, `KuwoMusicClient` and `QianqianMusicClient` as both the search sources and download sources.
|
||||
|
||||
Of course, you can also run musicdl by entering the following equivalent command directly in the command line,
|
||||
|
||||
```bash
|
||||
musicdl -m NeteaseMusicClient,MiguMusicClient,QQMusicClient,KuwoMusicClient,QianqianMusicClient
|
||||
```
|
||||
|
||||
Please note that musicdl uses five Mainland China music sources by default for searching.
|
||||
If you need to use overseas music sources, you must manually specify the music platform each time you run the program.
|
||||
For example:
|
||||
|
||||
```bash
|
||||
musicdl -m GDStudioMusicClient,JamendoMusicClient
|
||||
```
|
||||
|
||||
In addition, searching and downloading from many music sources simultaneously may be relatively slow.
|
||||
Each run may take about 10–30 seconds.
|
||||
If you are confident that your song can be found on a specific platform or a few platforms, for example, `NeteaseMusicClient`, `QQMusicClient` or `KuwoMusicClient`,
|
||||
it is recommended to directly specify those platforms:
|
||||
|
||||
```bash
|
||||
musicdl -m NeteaseMusicClient,QQMusicClient,KuwoMusicClient
|
||||
```
|
||||
|
||||
The demonstration is as follows,
|
||||
|
||||
<div align="center">
|
||||
<div>
|
||||
<img src="https://github.com/CharlesPikachu/musicdl/raw/master/docs/screenshot/screenshot.png" width="600"/>
|
||||
</div>
|
||||
<div>
|
||||
<img src="https://github.com/CharlesPikachu/musicdl/raw/master/docs/screenshot/screenshot.gif" width="600"/>
|
||||
</div>
|
||||
</div>
|
||||
<br />
|
||||
|
||||
You can also use `musicdl --help` to see the detailed usage of the musicdl command-line tool, as follows:
|
||||
|
||||
```bash
|
||||
Usage: musicdl [OPTIONS]
|
||||
|
||||
Options:
|
||||
--version Show the version and exit.
|
||||
-k, --keyword TEXT The keywords for the music search. If left
|
||||
empty, an interactive terminal will open
|
||||
automatically.
|
||||
-p, --playlist-url, --playlist_url TEXT
|
||||
Given a playlist URL, e.g., "https://music.1
|
||||
63.com/#/playlist?id=7583298906", musicdl
|
||||
automatically parses the playlist and
|
||||
downloads all tracks in it.
|
||||
-m, --music-sources, --music_sources TEXT
|
||||
The music search and download sources.
|
||||
[default: MiguMusicClient,NeteaseMusicClient
|
||||
,QQMusicClient,KuwoMusicClient,QianqianMusicClient]
|
||||
-i, --init-music-clients-cfg, --init_music_clients_cfg TEXT
|
||||
Config such as `work_dir` for each music
|
||||
client as a JSON string.
|
||||
-r, --requests-overrides, --requests_overrides TEXT
|
||||
Requests.get / Requests.post kwargs such as
|
||||
`headers` and `proxies` for each music
|
||||
client as a JSON string.
|
||||
-c, --clients-threadings, --clients_threadings TEXT
|
||||
Number of threads used for each music client
|
||||
as a JSON string.
|
||||
-s, --search-rules, --search_rules TEXT
|
||||
Search rules for each music client as a JSON
|
||||
string.
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
If you want to change the download path for the music files, you can write the following code:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
init_music_clients_cfg = dict()
|
||||
init_music_clients_cfg['MiguMusicClient'] = {'work_dir': 'migu'}
|
||||
init_music_clients_cfg['NeteaseMusicClient'] = {'work_dir': 'netease'}
|
||||
init_music_clients_cfg['QQMusicClient'] = {'work_dir': 'qq'}
|
||||
music_client = musicdl.MusicClient(music_sources=['MiguMusicClient', 'NeteaseMusicClient', 'QQMusicClient'])
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Alternatively, you can equivalently run the following command directly in the command line:
|
||||
|
||||
```bash
|
||||
musicdl -m NeteaseMusicClient,MiguMusicClient,QQMusicClient -i "{'MiguMusicClient': {'work_dir': 'migu'}, {'NeteaseMusicClient': {'work_dir': 'netease'}, {'QQMusicClient': {'work_dir': 'qq'}}"
|
||||
```
|
||||
|
||||
If you are a VIP user on a particular music platform, you can pass the cookies from your logged-in web session on that platform to musicdl to improve the quality of song search and downloads.
|
||||
Specifically, for example, if you have a membership on `QQMusicClient`, your code can be written as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
your_vip_cookies_with_str_or_dict_format = ""
|
||||
init_music_clients_cfg = dict()
|
||||
init_music_clients_cfg['QQMusicClient'] = {'default_search_cookies': your_vip_cookies_with_str_or_dict_format, 'default_download_cookies': your_vip_cookies_with_str_or_dict_format}
|
||||
music_client = musicdl.MusicClient(music_sources=['NeteaseMusicClient', 'QQMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Of course, you can also achieve the same effect by running the following command directly in the command line:
|
||||
|
||||
```bash
|
||||
musicdl -m NeteaseMusicClient,QQMusicClient -i "{'QQMusicClient': {'default_search_cookies': your_vip_cookies_with_str_or_dict_format, 'default_download_cookies': your_vip_cookies_with_str_or_dict_format}}"
|
||||
```
|
||||
|
||||
If you want to search for more songs on a specific music platform (*e.g.*, `QQMusicClient`), you can do the following:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
init_music_clients_cfg = dict()
|
||||
init_music_clients_cfg['QQMusicClient'] = {'search_size_per_source': 20}
|
||||
music_client = musicdl.MusicClient(music_sources=['NeteaseMusicClient', 'QQMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Or enter the following in the command line:
|
||||
|
||||
```bash
|
||||
musicdl -m NeteaseMusicClient,QQMusicClient -i "{'QQMusicClient': {'search_size_per_source': 20}}"
|
||||
```
|
||||
|
||||
In this way, you can see up to 20 search results from `QQMusicClient`.
|
||||
|
||||
If you want to use the [pyfreeproxy](https://github.com/CharlesPikachu/freeproxy) library to automatically leverage free online proxies for music search and download, you can do it as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
init_music_clients_cfg = dict()
|
||||
init_music_clients_cfg['NeteaseMusicClient'] = {
|
||||
'search_size_per_source': 1000, 'auto_set_proxies': True,
|
||||
'freeproxy_settings': dict(
|
||||
proxy_sources=["ProxyScrapeProxiedSession", "ProxylistProxiedSession"],
|
||||
init_proxied_session_cfg={"max_pages": 2, "filter_rule": {"country_code": ["CN"], "anonymity": ["elite"], "protocol": ["http", "https"]}},
|
||||
disable_print=True,
|
||||
max_tries=20
|
||||
)
|
||||
}
|
||||
music_client = musicdl.MusicClient(music_sources=['NeteaseMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
The command-line usage is similar:
|
||||
|
||||
```bash
|
||||
musicdl -m NeteaseMusicClient -i "{'NeteaseMusicClient': {'search_size_per_source': 1000, 'auto_set_proxies': True, 'freeproxy_settings': {'proxy_sources':['ProxyScrapeProxiedSession','ProxylistProxiedSession'],'init_proxied_session_cfg':{'max_pages':2,'filter_rule':{'country_code':['CN'],'anonymity':['elite'],'protocol':['http','https']}},'disable_print':True,'max_tries':20}}}"
|
||||
```
|
||||
|
||||
#### Separating Search and Download Results
|
||||
|
||||
You can also call the `.search` and `.download` interfaces of musicdl separately to inspect its intermediate results or perform secondary development,
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
# instance
|
||||
music_client = musicdl.MusicClient(music_sources=['NeteaseMusicClient'])
|
||||
# search
|
||||
search_results = music_client.search(keyword='尾戒')
|
||||
print(search_results)
|
||||
song_infos = []
|
||||
for song_infos_per_source in list(search_results.values()):
|
||||
song_infos.extend(song_infos_per_source)
|
||||
# download
|
||||
music_client.download(song_infos=song_infos)
|
||||
```
|
||||
|
||||
You can also choose not to use the unified `MusicClient` interface and instead directly import the definition class for a specific music platform for secondary development.
|
||||
Take `NeteaseMusicClient` as an example:
|
||||
|
||||
```python
|
||||
from musicdl.modules.sources import NeteaseMusicClient
|
||||
|
||||
netease_music_client = NeteaseMusicClient()
|
||||
# search
|
||||
search_results = netease_music_client.search(keyword='那些年')
|
||||
print(search_results)
|
||||
# download
|
||||
netease_music_client.download(song_infos=search_results)
|
||||
```
|
||||
|
||||
All supported classes can be obtained by printing `MusicClientBuilder.REGISTERED_MODULES`, *e.g*,
|
||||
|
||||
```python
|
||||
from musicdl.modules import MusicClientBuilder
|
||||
|
||||
print(MusicClientBuilder.REGISTERED_MODULES)
|
||||
```
|
||||
|
||||
#### Download Playlist Items
|
||||
|
||||
From musicdl v2.9.0 onward, support for downloading user playlists from each platform will be added gradually. The platforms currently supported are as follows:
|
||||
|
||||
- [AppleMusicClient | 苹果音乐](https://music.apple.com/)
|
||||
- [DeezerMusicClient | Deezer (法国音乐平台)](https://www.deezer.com/us/)
|
||||
- [FiveSingMusicClient | 5SING音乐](https://5sing.kugou.com/index.html)
|
||||
- [JamendoMusicClient | 简音乐 (欧美流行音乐)](https://www.jamendo.com/)
|
||||
- [JooxMusicClient | QQ音乐海外版](https://www.joox.com/hk)
|
||||
- [KuwoMusicClient | 酷我音乐](http://www.kuwo.cn/)
|
||||
- [KugouMusicClient | 酷狗音乐](https://www.kugou.com/)
|
||||
- [MiguMusicClient | 咪咕音乐](https://music.migu.cn/v5/#/musicLibrary)
|
||||
- [NeteaseMusicClient | 网易云音乐](https://music.163.com/)
|
||||
- [QQMusicClient | QQ音乐](https://y.qq.com/)
|
||||
- [QianqianMusicClient | 千千音乐](https://music.91q.com/)
|
||||
- [QobuzMusicClient | Qobuz (提供CD质量的流媒体平台)](https://play.qobuz.com/discover)
|
||||
- [SoundCloudMusicClient | 声云](https://soundcloud.com/discover)
|
||||
- [StreetVoiceMusicClient | 街声](https://www.streetvoice.cn/)
|
||||
- [SodaMusicClient | 汽水音乐](https://www.douyin.com/qishui/)
|
||||
- [SpotifyMusicClient | Spotify (思播)](https://open.spotify.com/)
|
||||
- [TIDALMusicClient | TIDAL (提供HiFi音质的流媒体平台)](https://tidal.com/)
|
||||
|
||||
Specifically, you only need to run the following command in the terminal, musicdl will automatically detect the playlist in the link and download it in batch:
|
||||
|
||||
```sh
|
||||
# Parse and Download Apple Music Playlist
|
||||
# >>> not use wrapper
|
||||
musicdl -p "https://music.apple.com/cn/playlist/%E5%8D%81%E5%A4%A7%E4%B8%93%E8%BE%91/pl.u-mJy81mECzBL49zM" -m AppleMusicClient -i "{'AppleMusicClient': {'default_parse_cookies': your_vip_cookies_with_str_or_dict_format}}"
|
||||
# >>> use wrapper
|
||||
musicdl -p "https://music.apple.com/cn/playlist/%E5%8D%81%E5%A4%A7%E4%B8%93%E8%BE%91/pl.u-mJy81mECzBL49zM" -m AppleMusicClient -i "{'AppleMusicClient': {'use_wrapper': True, 'wrapper_account_url': 'http://127.0.0.1:30020/', 'wrapper_decrypt_ip': '127.0.0.1:10020'}}"
|
||||
# Parse and Download Deezer Music Playlist
|
||||
musicdl -p "https://www.deezer.com/us/playlist/4697225044" -m DeezerMusicClient -i "{'DeezerMusicClient': {'default_parse_cookies': your_vip_cookies_with_str_or_dict_format}}"
|
||||
# Parse and Download 5SING Music Playlist
|
||||
musicdl -p "https://5sing.kugou.com/yeluoluo/dj/631b3fa72418b11003089b8d.html" -m FiveSingMusicClient
|
||||
# Parse and Download Jamendo Music Playlist
|
||||
musicdl -p "https://www.jamendo.com/playlist/500544876/best-of-february-2020" -m JamendoMusicClient
|
||||
# Parse and Download Joox Music Playlist
|
||||
musicdl -p "https://www.joox.com/hk/playlist/MqgK_LYD3Sb3I9Iziq+8NA==" -m JooxMusicClient
|
||||
# Parse and Download Kuwo Music Playlist
|
||||
musicdl -p "https://www.kuwo.cn/playlist_detail/2358858706" -m KuwoMusicClient
|
||||
# Parse and Download Kugou Music Playlist
|
||||
musicdl -p "https://www.kugou.com/yy/special/single/3280341.html" -m KugouMusicClient
|
||||
# Parse and Download Migu Music Playlist
|
||||
musicdl -p "https://music.migu.cn/v5/#/playlist?playlistId=228114498&playlistType=ordinary" -m MiguMusicClient
|
||||
# Parse and Download NetEase Music Playlist
|
||||
musicdl -p "https://music.163.com/#/playlist?id=3039971654" -m NeteaseMusicClient
|
||||
# Parse and Download QQ Music Playlist
|
||||
musicdl -p "https://y.qq.com/n/ryqq_v2/playlist/8740590963" -m QQMusicClient
|
||||
# Parse and Download QianQian Music Playlist
|
||||
musicdl -p "https://music.91q.com/songlist/295893" -m QianqianMusicClient
|
||||
# Parse and Download Qobuz Music Playlist
|
||||
musicdl -p "https://open.qobuz.com/playlist/22318381" -m QobuzMusicClient
|
||||
# Parse and Download StreetVoice Music Playlist
|
||||
musicdl -p "https://www.streetvoice.cn/morgan22/playlists/436444/" -m StreetVoiceMusicClient
|
||||
# Parse and Download SoundCloud Music Playlist
|
||||
musicdl -p "https://soundcloud.com/pandadub/sets/the-lost-ship" -m SoundCloudMusicClient
|
||||
# Parse and Download Soda Music Playlist
|
||||
musicdl -p "https://qishui.douyin.com/s/iHFSgNKw/" -m SodaMusicClient
|
||||
# Parse and Download Spotify Music Playlist
|
||||
musicdl -p "https://open.spotify.com/playlist/37i9dQZF1E8NWHOpySOxQd" -m SpotifyMusicClient
|
||||
# Parse and Download TIDAL Music Playlist
|
||||
musicdl -p "https://tidal.com/playlist/a94e7dce-da66-413d-81a5-990328afa3c9" -m TIDALMusicClient -i "{'TIDALMusicClient': {'default_parse_cookies': your_vip_cookies_with_str_or_dict_format}}"
|
||||
```
|
||||
|
||||
Alternatively, use the following code to invoke it,
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
init_music_clients_cfg = {'NeteaseMusicClient': {'default_parse_cookies': YOUR_VIP_COOKIES}}
|
||||
music_client = musicdl.MusicClient(music_sources=['NeteaseMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
song_infos = music_client.parseplaylist("https://music.163.com/#/playlist?id=7583298906")
|
||||
music_client.download(song_infos=song_infos)
|
||||
```
|
||||
|
||||
Common Issues and Solutions (FQA):
|
||||
|
||||
<details style="margin-bottom: 24px;">
|
||||
<summary><em>How to Parse New Kugou Web Playlist URLs?</em></summary>
|
||||
<br>
|
||||
|
||||
If you have a new playlist link (*e.g.*, `https://www.kugou.com/songlist/gcid_3zs9qlpmzdz003/`), you need to manually extract the `special ID` via your browser. Follow these steps:
|
||||
|
||||
1. *Open & Login*: Open the playlist link in your browser and ensure you are already logged into KuGou Music.
|
||||
2. *Inspect Network*: Open your browser's Developer Tools (F12) and use network traffic capture to inspect the returned HTML page.
|
||||
3. *Find the ID*: Search for the keyword `"specialid"` in the response. The number immediately following it is your special ID.
|
||||
4. *Construct the URL*: Replace `{YOUR_SPECIAL_ID}` in the format below with the number you found:
|
||||
- `https://www.kugou.com/yy/special/single/{YOUR_SPECIAL_ID}.html`
|
||||
5. *Run*: Use this newly constructed link as the playlist input for musicdl.
|
||||
|
||||
</details>
|
||||
|
||||
<details style="margin-bottom: 24px;">
|
||||
<summary><em>Why is The Downloaded Apple Music Playlist Incomplete?</em></summary>
|
||||
<br>
|
||||
|
||||
musicdl currently only supports parsing playlists with a maximum of 300 tracks.
|
||||
|
||||
If your playlist exceeds this limit, please manually split the large playlist into several smaller ones, and then use musicdl to parse and download them individually.
|
||||
|
||||
</details>
|
||||
|
||||
#### WhisperLRC
|
||||
|
||||
On some music platforms, it’s not possible to obtain the lyric files corresponding to the audio, *e.g*, `XimalayaMusicClient` and `MituMusicClient`.
|
||||
To handle this, we provide a faster-whisper interface that can automatically generate lyrics for tracks whose lyrics are unavailable for download.
|
||||
|
||||
For audio files that have already been downloaded, you can use the following invocation to automatically generate lyrics for the local file,
|
||||
|
||||
```python
|
||||
from musicdl.modules import WhisperLRC
|
||||
|
||||
your_local_music_file_path = 'xxx.flac'
|
||||
WhisperLRC(model_size_or_path='base').fromfilepath(your_local_music_file_path)
|
||||
```
|
||||
|
||||
The available `model_size_or_path`, ordered from smallest to largest, are:
|
||||
|
||||
```python
|
||||
tiny, tiny.en, base, base.en, small, small.en, distil-small.en, medium, medium.en, distil-medium.en, large-v1, large-v2, large-v3, large, distil-large-v2, distil-large-v3, large-v3-turbo, turbo
|
||||
```
|
||||
|
||||
In general, the larger the model, the better the generated lyrics (transcription/translation) will be, but this also means it will take longer to run.
|
||||
|
||||
If you want to automatically generate lyric files during the download process,
|
||||
you can set the environment variable `ENABLE_WHISPERLRC=True` (for example, by running `export ENABLE_WHISPERLRC=True`).
|
||||
However, this is generally not recommended, as it may cause a single run of the program to take a very long time,
|
||||
unless you set `search_size_per_source` to `1` and `model_size_or_path` to `tiny`.
|
||||
|
||||
Of course, you can also directly call `.fromurl` to generate a lyrics file for a song given by a direct URL:
|
||||
|
||||
```python
|
||||
from musicdl.modules import WhisperLRC
|
||||
|
||||
music_file_link = ''
|
||||
WhisperLRC(model_size_or_path='base').fromurl(music_link)
|
||||
```
|
||||
|
||||
#### Scenarios Where Quark Netdisk Login Cookies Are Required
|
||||
|
||||
Some websites share high-quality or lossless music files via [Quark Netdisk](https://pan.quark.cn/) links, for example, `MituMusicClient`, `GequbaoMusicClient`, `YinyuedaoMusicClient`, and `BuguyyMusicClient`.
|
||||
|
||||
If you want to download high-quality or lossless audio files from these music platforms, you need to provide the cookies from your logged-in Quark Netdisk web session when calling musicdl.
|
||||
For example, you can do the following:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
init_music_clients_cfg = dict()
|
||||
init_music_clients_cfg['YinyuedaoMusicClient'] = {'quark_parser_config': {'cookies': your_cookies_with_str_or_dict_format}}
|
||||
init_music_clients_cfg['GequbaoMusicClient'] = {'quark_parser_config': {'cookies': your_cookies_with_str_or_dict_format}}
|
||||
init_music_clients_cfg['MituMusicClient'] = {'quark_parser_config': {'cookies': your_cookies_with_str_or_dict_format}}
|
||||
init_music_clients_cfg['BuguyyMusicClient'] = {'quark_parser_config': {'cookies': your_cookies_with_str_or_dict_format}}
|
||||
|
||||
music_client = musicdl.MusicClient(music_sources=['MituMusicClient', 'YinyuedaoMusicClient', 'GequbaoMusicClient', 'BuguyyMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Please note that musicdl does not provide any speed-limit bypass for Quark Netdisk.
|
||||
If the cookies you supply belong to a non-VIP Quark account, the download speed may be limited to only a few hundred KB/s.
|
||||
|
||||
Also note that Quark Drive will first save the music file to your own Quark account (usually in the "From: Shares (来自: 分享)" folder) and then start the download.
|
||||
Therefore, if your Quark storage is insufficient, the download may fail.
|
||||
|
||||
#### Kugou Music Download
|
||||
|
||||
Musicdl currently supports searching and downloading from KuGou Music, and it is used in the same way as other music clients.
|
||||
The only thing to note is that if you need to configure member cookies to download purchased albums/singles or member-exclusive audio quality, the cookies must be in the following format:
|
||||
|
||||
```python
|
||||
{
|
||||
'KUGOU_API_GUID': 'xxxx',
|
||||
'KUGOU_API_MID': 'xxxx',
|
||||
'KUGOU_API_MAC': 'xxxx',
|
||||
'KUGOU_API_DEV': 'xxxx',
|
||||
'token': 'xxxx',
|
||||
'userid': 'xxxx',
|
||||
'dfid': 'xxxx'
|
||||
}
|
||||
```
|
||||
|
||||
You can either use the [build_cookies_for_kugou.py](https://github.com/CharlesPikachu/musicdl/blob/master/scripts/build_cookies_for_kugou.py) script provided in the repo to obtain them directly,
|
||||
or capture the above arguments yourself via network packet capture on the KuGou app or the web client, and then configure musicdl as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
cookies = {'KUGOU_API_GUID': 'xxxx', 'KUGOU_API_MID': 'xxxx', 'KUGOU_API_MAC': 'xxxx', 'KUGOU_API_DEV': 'xxxx', 'token': 'xxxx', 'userid': 'xxxx', 'dfid': 'xxxx'}
|
||||
init_music_clients_cfg = {'KugouMusicClient': {'default_search_cookies': cookies, 'search_size_per_source': 5}}
|
||||
music_client = musicdl.MusicClient(music_sources=['KugouMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Keep in mind that cookie names captured from network traffic may not match the cookie names required by musicdl.
|
||||
You need to map them correctly to construct valid cookies, otherwise, member-only music downloads won’t work.
|
||||
|
||||
#### LizhiFM and XimalayaFM Track/Album Download
|
||||
|
||||
Musicdl currently also supports searching for and downloading individual audio tracks, as well as entire albums, from long-form audio platforms (*e.g.*, Ximalaya and Lizhi FM) that host podcasts and audiobooks.
|
||||
By default, both modes start simultaneously, and the top few search results for each mode are shown based on the input keyword.
|
||||
|
||||
A simple usage example is shown below,
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
init_music_clients_cfg = {'XimalayaMusicClient': {'search_size_per_source': 2}}
|
||||
music_client = musicdl.MusicClient(music_sources=['XimalayaMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
The result of running the code above looks like this,
|
||||
|
||||
<div align="center">
|
||||
<div>
|
||||
<img src="https://github.com/CharlesPikachu/musicdl/raw/master/docs/screenshot/ximalayascreenshot.gif" width="600"/>
|
||||
</div>
|
||||
</div>
|
||||
<br />
|
||||
|
||||
You can also choose the search type yourself by setting `allowed_search_types`, for example:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
# only search by track
|
||||
init_music_clients_cfg = {'XimalayaMusicClient': {'search_size_per_source': 2, 'allowed_search_types': ['track']}}
|
||||
# only search by album
|
||||
init_music_clients_cfg = {'XimalayaMusicClient': {'search_size_per_source': 2, 'allowed_search_types': ['album']}}
|
||||
# instance music_client
|
||||
music_client = musicdl.MusicClient(music_sources=['XimalayaMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
# start
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Please note that the code above only supports downloading free albums and audio.
|
||||
If you need to download paid audio, please configure cookies in `init_music_clients_cfg`, just as you would with other music clients.
|
||||
|
||||
#### LanRenTingShu Book/Album Download
|
||||
|
||||
Musicdl currently supports searching and downloading books (书籍) and albums (节目) from LanRenTingShu. Example usage:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
# only search by book
|
||||
init_music_clients_cfg = {'LRTSMusicClient': {'search_size_per_source': 2, 'allowed_search_types': ['book']}}
|
||||
# only search by album
|
||||
init_music_clients_cfg = {'LRTSMusicClient': {'search_size_per_source': 2, 'allowed_search_types': ['album']}}
|
||||
# search by album and book
|
||||
init_music_clients_cfg = {'LRTSMusicClient': {'search_size_per_source': 2, 'allowed_search_types': ['album', 'book']}}
|
||||
# instance music_client
|
||||
music_client = musicdl.MusicClient(music_sources=['LRTSMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
# start
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
By default, this example only downloads free albums and tracks. To access paid content, you must configure your user cookies in `init_music_clients_cfg`.
|
||||
|
||||
#### QingtingFM Track/Album Download
|
||||
|
||||
The usage for searching and downloading on the QingTing FM website is similar to Ximalaya and Lizhi FM.
|
||||
The only thing to watch out for is how cookies are set, it differs from typical music client objects.
|
||||
|
||||
Specifically, without logging in (*i.e.*, when you don’t need to download paid audio), you can invoke it by running `musicdl -m QingtingMusicClient` in the command line, or by calling it via the following code:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
# only search by track
|
||||
init_music_clients_cfg = {'QingtingMusicClient': {'search_size_per_source': 2, 'allowed_search_types': ['track']}}
|
||||
# only search by album
|
||||
init_music_clients_cfg = {'QingtingMusicClient': {'search_size_per_source': 2, 'allowed_search_types': ['album']}}
|
||||
# search by album and track
|
||||
init_music_clients_cfg = {'QingtingMusicClient': {'search_size_per_source': 2, 'allowed_search_types': ['album', 'track']}}
|
||||
# instance music_client
|
||||
music_client = musicdl.MusicClient(music_sources=['QingtingMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
# start
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
When you need to download paid audio, you’ll have to capture the network traffic yourself on the [QingTing FM web client](https://www.qtfm.cn/).
|
||||
Look for an AJAX request with the keyword `auth`, its response data will look like:
|
||||
|
||||
```python
|
||||
{
|
||||
"errorno": 0,
|
||||
"errormsg": "",
|
||||
"data": {
|
||||
"qingting_id": "xxxx",
|
||||
"access_token": "xxx",
|
||||
"refresh_token": "xxx",
|
||||
"expires_in": 7200
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Or, use the script [build_cookies_for_qingtingfm.py](https://github.com/CharlesPikachu/musicdl/tree/master/scripts/build_cookies_for_qingtingfm) in this repository to retrieve it.
|
||||
|
||||
Once you’ve obtained this data, you can configure cookies for `QingtingMusicClient` as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
cookies = {"qingting_id": "xxxx", "access_token": "xxx", "refresh_token": "xxx"}
|
||||
init_music_clients_cfg = {'QingtingMusicClient': {'default_search_cookies': cookies, 'default_download_cookies': cookies, 'search_size_per_source': 3}}
|
||||
music_client = musicdl.MusicClient(music_sources=['QingtingMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Of course, it’s worth noting that another prerequisite for downloading paid audio is that your account must already have permission to access (listen to) that audio.
|
||||
|
||||
#### Apple Music Download
|
||||
|
||||
Before using `AppleMusicClient`, please ensure that the following command-line tools are installed and available in your environment,
|
||||
|
||||
- [FFmpeg](https://www.ffmpeg.org/)
|
||||
- [N_m3u8DL-RE](https://github.com/nilaoda/N_m3u8DL-RE)
|
||||
- [Bento4](https://www.bento4.com/downloads/)
|
||||
- [amdecrypt](https://github.com/CharlesPikachu/musicdl/releases/tag/clitools)
|
||||
|
||||
Apple Music is like TIDAL, only users with a paid Apple Music subscription can download Apple Music tracks, otherwise, you can only download an approximately 30-90 second preview clip.
|
||||
Specifically, for paid Apple Music users, musicdl supports downloading music files in the following formats,
|
||||
|
||||
- `aac-legacy`
|
||||
- `aac-he-legacy`
|
||||
- `aac`
|
||||
- `aac-he`
|
||||
- `aac-binaural`
|
||||
- `aac-downmix`
|
||||
- `aac-he-binaural`
|
||||
- `aac-he-downmix`
|
||||
- `atmos`
|
||||
- `ac3`
|
||||
- `alac`
|
||||
|
||||
Specifically, if you only need to download tracks in the `aac-legacy` and `aac-he-legacy` quality tiers, you just need to make sure that [FFmpeg](https://www.ffmpeg.org/) and [N_m3u8DL-RE](https://github.com/nilaoda/N_m3u8DL-RE) are already installed and available in your environment variables.
|
||||
Then, set the `media-user-token` argument you obtained by capturing network traffic from the Apple Music website as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
from musicdl.modules.sources.apple import SongCodec
|
||||
|
||||
cookies = {'media-user-token': xxx}
|
||||
init_music_clients_cfg = {'AppleMusicClient': {'default_search_cookies': cookies, 'search_size_per_source': 10, 'language': 'en-US', 'codec': SongCodec.AAC_LEGACY}}
|
||||
music_client = musicdl.MusicClient(music_sources=['AppleMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
However, if you need to download higher-quality audio (*e.g.*, `alac`), the setup is relatively more complex.
|
||||
First, follow the [wrapper](https://github.com/WorldObservationLog/wrapper) guide and start the wrapper server (❗ **note that Windows users need to download and install WSL first, followed by installing Ubuntu on WSL, and finally start the wrapper server within Ubuntu, otherwise, decryption will most likely fail** ❗).
|
||||
Then, in addition to [FFmpeg](https://www.ffmpeg.org/) and [N_m3u8DL-RE](https://github.com/nilaoda/N_m3u8DL-RE), you also need to install [Bento4](https://www.bento4.com/downloads/) and [amdecrypt](https://github.com/CharlesPikachu/musicdl/releases/tag/clitools).
|
||||
Finally, configure your musicdl as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
from musicdl.modules.sources.apple import SongCodec
|
||||
|
||||
init_music_clients_cfg = {'AppleMusicClient': {
|
||||
'search_size_per_source': 10,
|
||||
'language': 'en-US',
|
||||
'codec': SongCodec.ALAC,
|
||||
'use_wrapper': True,
|
||||
'wrapper_account_url': 'http://127.0.0.1:30020/',
|
||||
'wrapper_decrypt_ip': '127.0.0.1:10020',
|
||||
}}
|
||||
music_client = musicdl.MusicClient(music_sources=['AppleMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Note that the `wrapper_account_url` and `wrapper_decrypt_ip` settings must match the corresponding arguments configured in your [wrapper server](https://github.com/WorldObservationLog/wrapper).
|
||||
|
||||
#### Deezer Music Download
|
||||
|
||||
musicdl now supports searching for and downloading music from the Deezer Music Client, as well as parsing playlists. Specifically, there are three possible scenarios.
|
||||
|
||||
The first is using musicdl directly for music search, download, or playlist parsing without configuring login cookies.
|
||||
In this case, you will most likely only be able to download song preview clips, usually around 30 seconds long (musicdl occasionally shares some shared Deezer premium accounts. Therefore, you might sometimes be able to download lossless music directly using musicdl, even without configuring any Deezer premium cookies).
|
||||
A simple usage example is as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
music_client = musicdl.MusicClient(music_sources=['DeezerMusicClient'])
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
The second is configuring login cookies, but the logged-in account is not a Deezer Premium subscriber.
|
||||
In this case, you will only be able to download songs at 128 kbps.
|
||||
A simple example of how to use it is shown below:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
# cookies must contain "arl"
|
||||
# >>> example1: cookies = {'arl': xxx, ...}
|
||||
# >>> example2: cookies = arl=xxx; key1=value1; key2=value2; ...
|
||||
cookies = YOUR_COOKIES_IN_DICT_OR_STR_FORMAT
|
||||
init_music_clients_cfg = {'DeezerMusicClient': {'default_search_cookies': cookies, 'search_size_per_source': 5}}
|
||||
music_client = musicdl.MusicClient(music_sources=['DeezerMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
The third is configuring login cookies, with the logged-in account being a Deezer Premium subscriber.
|
||||
In this case, you can download music in Deezer’s highest-quality FLAC lossless format.
|
||||
The invocation code is entirely identical to that used in the second scenario.
|
||||
|
||||
#### Qobuz Music Download
|
||||
|
||||
Qobuz is the world leader in 24-bit Hi-Res downloads, offering more than 100 million tracks for streaming in unequalled sound quality (FLAC 16 Bits / 44.1kHz).
|
||||
|
||||
To use musicdl to download songs from Qobuz, you must have a paid Qobuz membership account.
|
||||
Otherwise, musicdl will automatically call some third-party APIs that use shared Qobuz member accounts to try to resolve the song you need.
|
||||
Since the long-term reliability and stability of these third-party APIs cannot be guaranteed, if they become unavailable and you do not have a valid paid Qobuz membership account yourself, you will only be able to access roughly 30-second preview clips.
|
||||
|
||||
Specifically, if you have a valid paid Qobuz membership account, first, you need to obtain the member cookies yourself by capturing network traffic on [Qobuz’s official website](https://play.qobuz.com/discover).
|
||||
The cookies format should be as follows:
|
||||
|
||||
```python
|
||||
{"x-user-auth-token": "xxx", ...} OR "x-user-auth-token=xxx;..."
|
||||
```
|
||||
|
||||
Of course, you can also directly use the script [build_cookies_for_qobuz.py](https://github.com/CharlesPikachu/musicdl/blob/master/scripts/build_cookies_for_qobuz.py) provided in musicdl to build the member cookies required by musicdl.
|
||||
|
||||
A simple example of the download code is as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
cookies = {'x-user-auth-token': 'xxx'}
|
||||
init_music_clients_cfg = {'QobuzMusicClient': {'default_search_cookies': cookies, 'search_size_per_source': 5}}
|
||||
music_client = musicdl.MusicClient(music_sources=['QobuzMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Notably, for non-member users, setting cookies can only improve the audio quality, but the downloadable content is still limited to a 30-second preview clip.
|
||||
|
||||
#### SoundCloud Music Download
|
||||
|
||||
Musicdl lets you search for and download your favorite songs from SoundCloud. Specifically, you only need to run the following command:
|
||||
|
||||
```
|
||||
musicdl -m SoundCloudMusicClient
|
||||
```
|
||||
|
||||
Or you can invoke it with the following code:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
music_client = musicdl.MusicClient(music_sources=['SoundCloudMusicClient'])
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
The only thing to note is that `SoundCloudMusicClient` handles login cookies for downloading subscriber-only tracks slightly differently from the other music clients.
|
||||
You need to capture packets (*i.e.*, sniff the network requests) from [SoundCloud’s official website](https://soundcloud.com/) yourself to obtain the *Authorization* field in the request headers, then fill it in as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
cookies = {'oauth_token': 'OAuth x-xxxxxx-xxxxxxxxx-xxxxxxx'}
|
||||
init_music_clients_cfg = {'SoundCloudMusicClient': {'default_search_cookies': cookies, 'default_download_cookies': cookies, 'search_size_per_source': 5}}
|
||||
music_client = musicdl.MusicClient(music_sources=['SoundCloudMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
#### Spotify Music Download
|
||||
|
||||
|
||||
|
||||
#### TIDAL High-Quality Music Download
|
||||
|
||||
Prior to using `TIDALMusicClient`, verify that the following command-line tools are correctly installed and available in your environment,
|
||||
|
||||
- [PyAV](https://github.com/PyAV-Org/PyAV)
|
||||
- [FFmpeg](https://www.ffmpeg.org/)
|
||||
- [N_m3u8DL-RE](https://github.com/nilaoda/N_m3u8DL-RE)
|
||||
|
||||
If you plan to use musicdl to download high-quality lossless audio from TIDAL, you must have an active TIDAL subscription.
|
||||
Otherwise, musicdl may fall back to third-party sources, and stable access to the highest-quality lossless files cannot be guaranteed.
|
||||
|
||||
After you have a TIDAL membership account, you need to manually capture the cookies of your account from the TIDAL website using network packet capturing.
|
||||
The format is as follows:
|
||||
|
||||
```python
|
||||
{
|
||||
"access_token": "xxx",
|
||||
"refresh_token": "xxx",
|
||||
"expires": "2026-02-10T07:32:18.102233",
|
||||
"user_id": xxx,
|
||||
"country_code": "SG",
|
||||
"client_id": "7m7Ap0JC9j1cOM3n",
|
||||
"client_secret": "vRAdA108tlvkJpTsGZS8rGZ7xTlbJ0qaZ2K9saEzsgY="
|
||||
}
|
||||
```
|
||||
|
||||
Of course, musicdl also provides a script [build_cookies_for_tidal.py](https://github.com/CharlesPikachu/musicdl/blob/master/scripts/build_cookies_for_tidal.py) to automatically obtain your TIDAL membership cookies.
|
||||
You can simply run the script and follow the prompts to retrieve the cookies mentioned above.
|
||||
|
||||
Once you have successfully obtained your membership cookies, you can use musicdl to download lossless music from TIDAL using the following method,
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
cookies = "YOUR_VIP_COOKIES"
|
||||
init_music_clients_cfg = {'TIDALMusicClient': {'default_search_cookies': cookies, 'search_size_per_source': 5}}
|
||||
music_client = musicdl.MusicClient(music_sources=['TIDALMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
#### YouTube Music Download
|
||||
|
||||
If you want to use musicdl to search for and download music from `YouTubeMusicClient`, you must have [Node.js](https://nodejs.org/en) installed, *e.g.*, on Linux, you can install Node.js using the following script:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -e
|
||||
|
||||
# Install nvm (Node Version Manager)
|
||||
curl -fsSL https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
|
||||
|
||||
# Load nvm for this script
|
||||
export NVM_DIR="$HOME/.nvm"
|
||||
[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"
|
||||
|
||||
# Install and use latest LTS Node.js
|
||||
nvm install --lts
|
||||
nvm use --lts
|
||||
|
||||
# Print versions
|
||||
node -v
|
||||
npm -v
|
||||
```
|
||||
|
||||
On macOS, you can install Node.js using the following script:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -e
|
||||
|
||||
# Install nvm (Node Version Manager)
|
||||
curl -fsSL https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
|
||||
|
||||
# Load nvm for this script
|
||||
export NVM_DIR="$HOME/.nvm"
|
||||
[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"
|
||||
|
||||
# Install and use latest LTS Node.js
|
||||
nvm install --lts
|
||||
nvm use --lts
|
||||
|
||||
# Print versions
|
||||
node -v
|
||||
npm -v
|
||||
```
|
||||
|
||||
On Windows (PowerShell), you can install Node.js using the following script:
|
||||
|
||||
```bash
|
||||
# Install Node.js LTS via winget
|
||||
winget install --id OpenJS.NodeJS.LTS -e --source winget
|
||||
|
||||
# Print hint for version check
|
||||
Write-Output ""
|
||||
Write-Output "Please reopen PowerShell and run:"
|
||||
Write-Output " node -v"
|
||||
Write-Output " npm -v"
|
||||
```
|
||||
|
||||
A simple example of searching for and downloading music from `YouTubeMusicClient` is as follows,
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
music_client = musicdl.MusicClient(music_sources=['YouTubeMusicClient'])
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
#### GD Studio Music Download
|
||||
|
||||
We’ve added `GDStudioMusicClient` to musicdl as a practical solution for users who are on a tight budget or who find it difficult to configure extra command-line tools/arguments for musicdl.
|
||||
With only the basic installation of musicdl, you can search for and download high-quality music files from the following music platforms:
|
||||
|
||||
| Source (EN) | Source (CN) | Official Websites | `allowed_music_sources` |
|
||||
| ----------------- | ------------------- | ----------------------------------- | ------------------- |
|
||||
| Spotify | Spotify | https://www.spotify.com | `spotify` |
|
||||
| Tencent (QQ Music) | QQ音乐 | https://y.qq.com | `tencent` |
|
||||
| NetEase Cloud Music | 网易云音乐 | https://music.163.com | `netease` |
|
||||
| Kuwo | 酷我音乐 | https://www.kuwo.cn | `kuwo` |
|
||||
| TIDAL | TIDAL | https://tidal.com | `tidal` |
|
||||
| Qobuz | Qobuz | https://www.qobuz.com | `qobuz` |
|
||||
| JOOX | JOOX | https://www.joox.com | `joox` |
|
||||
| Bilibili | 哔哩哔哩 | https://www.bilibili.com | `bilibili` |
|
||||
| Apple Music | 苹果音乐 | https://www.apple.com/apple-music/ | `apple` |
|
||||
| YouTube Music | 油管音乐 | https://music.youtube.com | `ytmusic` |
|
||||
|
||||
Specifically, you just need to write and run a few lines of code like this
|
||||
(song retrieval from YouTube and Tencent is unstable, so musicdl disables these two sources by default.
|
||||
You can manually enable them by setting `allowed_music_sources`.):
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
music_client = musicdl.MusicClient(music_sources=['GDStudioMusicClient'])
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Or, equivalently, run the following command in the command line:
|
||||
|
||||
```bash
|
||||
musicdl -m GDStudioMusicClient
|
||||
```
|
||||
|
||||
By default, the above code will search for and download music from eight music platforms, excluding YouTube and Tencent Music (as using `GDStudioMusicClient` for search and download on both platforms seems to be unstable).
|
||||
The screenshot of the running result is as follows:
|
||||
|
||||
<div align="center">
|
||||
<div>
|
||||
<img src="https://github.com/CharlesPikachu/musicdl/raw/master/docs/screenshot/gdstudioscreenshot.png" width="600"/>
|
||||
</div>
|
||||
</div>
|
||||
<br />
|
||||
|
||||
However, please note that this way of running is not very stable (*e.g.*, some sources may fail to find any valid songs) and is likely to exceed the limit on the number of requests per minute allowed for a single IP by `GDStudioMusicClient`.
|
||||
If you still wish to perform a full-platform search, we recommend modifying the default arguments as follows:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
init_music_clients_cfg = {'GDStudioMusicClient': {'search_size_per_source': 1}}
|
||||
music_client = musicdl.MusicClient(music_sources=['GDStudioMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
The equivalent command in the command line is:
|
||||
|
||||
```bash
|
||||
musicdl -m GDStudioMusicClient -i "{'GDStudioMusicClient': {'search_size_per_source': 1}}"
|
||||
```
|
||||
|
||||
Or, an even better option is to manually specify a few platforms where you believe your desired music files are likely to be found, for example:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
# allowed_music_sources can be set to any subset (i.e., any combination) of ['spotify', 'tencent', 'netease', 'kuwo', 'tidal', 'qobuz', 'joox', 'bilibili', 'apple', 'ytmusic']
|
||||
init_music_clients_cfg = {'GDStudioMusicClient': {'search_size_per_source': 5, 'allowed_music_sources': ['spotify', 'qobuz', 'tidal', 'apple']}}
|
||||
music_client = musicdl.MusicClient(music_sources=['GDStudioMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
The way to run it from the command line is similar:
|
||||
|
||||
```bash
|
||||
musicdl -m GDStudioMusicClient -i "{'GDStudioMusicClient': {'search_size_per_source': 5, 'allowed_music_sources': ['spotify', 'qobuz', 'tidal', 'apple']}}"
|
||||
```
|
||||
|
||||
#### JBSou Music Download
|
||||
|
||||
`JBSouMusicClient`’s functionality is similar to `TuneHubMusicClient`’s.
|
||||
Both are third-party APIs that consolidate music search and download functions from multiple platforms into a single interface.
|
||||
The key difference is that `JBSouMusicClient` focuses on searching and downloading 320 kbps MP3 audio files.
|
||||
The list of music platforms it currently supports is as follows:
|
||||
|
||||
| Source (EN) | Source (CN) | Official Websites | `allowed_music_sources` |
|
||||
| ----------------- | ------------------- | ----------------------------------- | ------------------- |
|
||||
| Tencent (QQ Music) | QQ音乐 | https://y.qq.com | `qq` |
|
||||
| NetEase Cloud Music | 网易云音乐 | https://music.163.com | `netease` |
|
||||
| Kuwo | 酷我音乐 | https://www.kuwo.cn | `kuwo` |
|
||||
| Kugou | 酷狗音乐 | https://www.kugou.com/ | `kugou` |
|
||||
|
||||
More specifically, its invocation is as follows,
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
init_music_clients_cfg = {'JBSouMusicClient': {'search_size_per_source': 5, 'allowed_music_sources': ['qq', 'netease', 'kuwo', 'kugou']}}
|
||||
music_client = musicdl.MusicClient(music_sources=['JBSouMusicClient'], init_music_clients_cfg=init_music_clients_cfg)
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
The screenshot of the running result is as follows:
|
||||
|
||||
<div align="center">
|
||||
<div>
|
||||
<img src="https://github.com/CharlesPikachu/musicdl/raw/master/docs/screenshot/jbsouscreenshot.png" width="600"/>
|
||||
</div>
|
||||
</div>
|
||||
<br />
|
||||
|
||||
#### TuneHub Music Download
|
||||
|
||||
`TuneHubMusicClient` is actually quite similar to `GDStudioMusicClient`, as it allows music search and download from multiple music platforms.
|
||||
However, it primarily supports music platforms in Mainland China and offers fewer music sources compared to `GDStudioMusicClient`.
|
||||
Specifically, the list of platforms it currently supports is as follows:
|
||||
|
||||
| Source (EN) | Source (CN) | Official Websites | `allowed_music_sources` |
|
||||
| ----------------- | ------------------- | ----------------------------------- | ------------------- |
|
||||
| Tencent (QQ Music) | QQ音乐 | https://y.qq.com | `qq` |
|
||||
| NetEase Cloud Music | 网易云音乐 | https://music.163.com | `netease` |
|
||||
| Kuwo | 酷我音乐 | https://www.kuwo.cn | `kuwo` |
|
||||
|
||||
Specifically, you can call it using the following code:
|
||||
|
||||
```python
|
||||
from musicdl import musicdl
|
||||
|
||||
music_client = musicdl.MusicClient(music_sources=['TuneHubMusicClient'])
|
||||
music_client.startcmdui()
|
||||
```
|
||||
|
||||
Alternatively, you can directly run the following command in the terminal:
|
||||
|
||||
```python
|
||||
musicdl -m TuneHubMusicClient
|
||||
```
|
||||
|
||||
The screenshot of the running result is as follows:
|
||||
|
||||
<div align="center">
|
||||
<div>
|
||||
<img src="https://github.com/CharlesPikachu/musicdl/raw/master/docs/screenshot/tunehubscreenshot.png" width="600"/>
|
||||
</div>
|
||||
</div>
|
||||
<br />
|
||||
@@ -0,0 +1,11 @@
|
||||
# Recommended Projects
|
||||
|
||||
| Project | ⭐ Stars | 📦 Version | ⏱ Last Update | 🛠 Repository |
|
||||
| ------------- | --------- | ----------- | ---------------- | -------- |
|
||||
| 🎵 **Musicdl**<br/>轻量级无损音乐下载器 | [](https://github.com/CharlesPikachu/musicdl) | [](https://pypi.org/project/musicdl) | [](https://github.com/CharlesPikachu/musicdl/commits/master) | [🛠 Repository](https://github.com/CharlesPikachu/musicdl) |
|
||||
| 🎬 **Videodl**<br/>轻量级高清无水印视频下载器 | [](https://github.com/CharlesPikachu/videodl) | [](https://pypi.org/project/videofetch) | [](https://github.com/CharlesPikachu/videodl/commits/master) | [🛠 Repository](https://github.com/CharlesPikachu/videodl) |
|
||||
| 🖼️ **Imagedl**<br/>轻量级海量图片搜索下载器 | [](https://github.com/CharlesPikachu/imagedl) | [](https://pypi.org/project/pyimagedl) | [](https://github.com/CharlesPikachu/imagedl/commits/main) | [🛠 Repository](https://github.com/CharlesPikachu/imagedl) |
|
||||
| 🌐 **FreeProxy**<br/>全球海量高质量免费代理采集器 | [](https://github.com/CharlesPikachu/freeproxy) | [](https://pypi.org/project/pyfreeproxy) | [](https://github.com/CharlesPikachu/freeproxy/commits/master) | [🛠 Repository](https://github.com/CharlesPikachu/freeproxy) |
|
||||
| 🌐 **MusicSquare**<br/>简易音乐搜索下载和播放网页 | [](https://github.com/CharlesPikachu/musicsquare) | [](https://pypi.org/project/musicdl) | [](https://github.com/CharlesPikachu/musicsquare/commits/main) | [🛠 Repository](https://github.com/CharlesPikachu/musicsquare) |
|
||||
| 🌐 **FreeGPTHub**<br/>真正免费的GPT统一接口 | [](https://github.com/CharlesPikachu/FreeGPTHub) | [](https://pypi.org/project/freegpthub) | [](https://github.com/CharlesPikachu/FreeGPTHub/commits/main) | [🛠 Repository](https://github.com/CharlesPikachu/FreeGPTHub) |
|
||||
|
||||
@@ -0,0 +1,72 @@
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# This file only contains a selection of the most common options. For a full
|
||||
# list see the documentation:
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
# -- Path setup --------------------------------------------------------------
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#
|
||||
# import os
|
||||
# import sys
|
||||
# sys.path.insert(0, os.path.abspath('.'))
|
||||
|
||||
|
||||
# -- Project information -----------------------------------------------------
|
||||
|
||||
project = 'musicdl'
|
||||
copyright = '2018-2030, Zhenchao Jin'
|
||||
author = 'Zhenchao Jin'
|
||||
release = '2.10.2'
|
||||
|
||||
# -- General configuration ---------------------------------------------------
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be
|
||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||
# ones.
|
||||
extensions = [
|
||||
'sphinx.ext.autodoc',
|
||||
'sphinx.ext.napoleon',
|
||||
'sphinx.ext.viewcode',
|
||||
'recommonmark',
|
||||
'sphinx_markdown_tables',
|
||||
]
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
||||
# The suffix(es) of source filenames.
|
||||
# You can specify multiple suffix as a list of string:
|
||||
#
|
||||
source_suffix = {
|
||||
'.rst': 'restructuredtext',
|
||||
'.md': 'markdown',
|
||||
}
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = 'index'
|
||||
|
||||
# List of patterns, relative to source directory, that match files and
|
||||
# directories to ignore when looking for source files.
|
||||
# This pattern also affects html_static_path and html_extra_path.
|
||||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
|
||||
|
||||
|
||||
# -- Options for HTML output -------------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
#
|
||||
html_theme = 'sphinx_rtd_theme'
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
# html_static_path = ['_static']
|
||||
|
||||
# For multi language
|
||||
# locale_dirs = ['locale/']
|
||||
# gettext_compact = False
|
||||
@@ -0,0 +1,20 @@
|
||||
.. Musicdl documentation master file, created by
|
||||
sphinx-quickstart on Sat Feb 29 22:07:23 2020.
|
||||
You can adapt this file completely to your liking, but it should at least
|
||||
contain the root `toctree` directive.
|
||||
|
||||
Welcome to Musicdl's documentation!
|
||||
========================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
Disclaimer.md
|
||||
Install.md
|
||||
catalogsync.md
|
||||
Quickstart.md
|
||||
API.md
|
||||
Playground.md
|
||||
Changelog.md
|
||||
Recommend.md
|
||||
Author.md
|
||||
|
After Width: | Height: | Size: 8.8 KiB |
@@ -0,0 +1,35 @@
|
||||
@ECHO OFF
|
||||
|
||||
pushd %~dp0
|
||||
|
||||
REM Command file for Sphinx documentation
|
||||
|
||||
if "%SPHINXBUILD%" == "" (
|
||||
set SPHINXBUILD=sphinx-build
|
||||
)
|
||||
set SOURCEDIR=source
|
||||
set BUILDDIR=build
|
||||
|
||||
if "%1" == "" goto help
|
||||
|
||||
%SPHINXBUILD% >NUL 2>NUL
|
||||
if errorlevel 9009 (
|
||||
echo.
|
||||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
|
||||
echo.installed, then set the SPHINXBUILD environment variable to point
|
||||
echo.to the full path of the 'sphinx-build' executable. Alternatively you
|
||||
echo.may add the Sphinx directory to PATH.
|
||||
echo.
|
||||
echo.If you don't have Sphinx installed, grab it from
|
||||
echo.http://sphinx-doc.org/
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
|
||||
goto end
|
||||
|
||||
:help
|
||||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
|
||||
|
||||
:end
|
||||
popd
|
||||
|
After Width: | Height: | Size: 26 KiB |
@@ -0,0 +1,6 @@
|
||||
recommonmark
|
||||
sphinx==5.0.0
|
||||
sphinx_markdown_tables==0.0.12
|
||||
sphinx_rtd_theme
|
||||
markdown==3.0.1
|
||||
setuptools<82
|
||||
|
After Width: | Height: | Size: 152 KiB |
|
After Width: | Height: | Size: 162 KiB |
|
After Width: | Height: | Size: 14 MiB |
|
After Width: | Height: | Size: 548 KiB |
|
After Width: | Height: | Size: 166 KiB |
|
After Width: | Height: | Size: 6.0 MiB |
@@ -0,0 +1,70 @@
|
||||
# Catalog Sync Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add an independent catalog-sync CLI that harvests playlist pools, persists playlist/song/artist relationships, and automates deduplicated downloads.
|
||||
|
||||
**Architecture:** Implement a new `musicdl.catalogsync` package with a SQLite repository layer, per-source collectors, playlist-sync services, and a CLI wrapper that reuses existing `musicdl` source clients for parsing and downloading. Model downloaded media as logical file assets plus storage locations so local paths work now and cloud/object-storage locations fit later.
|
||||
|
||||
**Tech Stack:** Python stdlib (`sqlite3`, `json`, `pathlib`, `hashlib`, `shutil`), `click`, existing `musicdl` source clients, `requests`, `unittest`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add failing tests for schema and normalization helpers
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/catalogsync/test_db.py`
|
||||
- Create: `tests/catalogsync/test_models.py`
|
||||
- Create: `tests/catalogsync/fixtures/`
|
||||
|
||||
- [ ] Write failing tests for schema creation and song dedupe helpers.
|
||||
- [ ] Run the focused unittest commands and verify they fail for missing modules.
|
||||
- [ ] Implement the minimal schema and helper modules to satisfy the tests.
|
||||
- [ ] Re-run the focused tests and verify they pass.
|
||||
|
||||
### Task 2: Add failing tests for collector parsing helpers
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/catalogsync/test_collectors.py`
|
||||
- Create: `musicdl/catalogsync/collectors/`
|
||||
|
||||
- [ ] Write fixture-driven failing tests for NetEase, QQ, and Kuwo collector parsing helpers.
|
||||
- [ ] Run the focused unittest commands and verify they fail.
|
||||
- [ ] Implement minimal collector modules and parsing helpers.
|
||||
- [ ] Re-run the focused tests and verify they pass.
|
||||
|
||||
### Task 3: Implement repository, sync services, and download planner
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/db.py`
|
||||
- Create: `musicdl/catalogsync/repository.py`
|
||||
- Create: `musicdl/catalogsync/services.py`
|
||||
- Create: `musicdl/catalogsync/downloader.py`
|
||||
|
||||
- [ ] Write failing service tests for playlist sync, derived artist sync, and download dedupe.
|
||||
- [ ] Run the focused unittest commands and verify they fail.
|
||||
- [ ] Implement the repository and service layer.
|
||||
- [ ] Re-run the focused tests and verify they pass.
|
||||
|
||||
### Task 4: Implement CLI and package integration
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/cli.py`
|
||||
- Modify: `setup.py`
|
||||
- Modify: `musicdl/__init__.py`
|
||||
|
||||
- [ ] Write failing CLI smoke tests around argument parsing and DB initialization.
|
||||
- [ ] Run the focused unittest commands and verify they fail.
|
||||
- [ ] Implement the CLI entrypoint and wire a new console script.
|
||||
- [ ] Re-run the focused tests and verify they pass.
|
||||
|
||||
### Task 5: Verify end-to-end behavior and document usage
|
||||
|
||||
**Files:**
|
||||
- Create: `docs/catalogsync.md`
|
||||
- Modify: `README.md`
|
||||
|
||||
- [ ] Run the full focused unittest suite for `tests/catalogsync`.
|
||||
- [ ] Run a manual CLI smoke flow against a temporary SQLite DB.
|
||||
- [ ] Update user-facing docs with command examples and caveats.
|
||||
- [ ] Re-run final verification commands and capture results.
|
||||
@@ -0,0 +1,485 @@
|
||||
# Download Layout And NAS Deployment Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Change `musicdl.catalogsync` downloads to land under `LIBRARY_DIR/<platform>/<first_artist>/...`, preserve relative locators for later upload reuse, and add portable NAS/Linux deployment scripts plus `.env`-driven runtime layout.
|
||||
|
||||
**Architecture:** Add a small runtime/layout helper module for path building, safe filename components, config defaults, and directory creation. Reuse the existing downloader and CLI, but route download destinations through the new path helper and add deploy/runtime scripts under `scripts/catalogsync` so target machines can be bootstrapped and then run from `catalogsync/bin` with `catalogsync.env`.
|
||||
|
||||
**Tech Stack:** Python stdlib (`pathlib`, `dataclasses`, `tempfile`, `re`), `click`, existing `musicdl.catalogsync` modules, PowerShell, POSIX shell, `unittest`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add runtime/layout helper tests and implementation
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/runtime.py`
|
||||
- Create: `tests/catalogsync/test_runtime.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing runtime/layout tests**
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class RuntimeLayoutTests(unittest.TestCase):
|
||||
def test_runtime_config_builds_defaults_from_root_dir(self):
|
||||
from musicdl.catalogsync.runtime import CatalogSyncRuntimeConfig
|
||||
|
||||
config = CatalogSyncRuntimeConfig.from_mapping(
|
||||
{
|
||||
"ROOT_DIR": "/volume4/Music_Cloud",
|
||||
"PYTHON_BIN": "python3",
|
||||
}
|
||||
)
|
||||
|
||||
self.assertEqual(Path("/volume4/Music_Cloud/catalogsync"), config.app_home)
|
||||
self.assertEqual(Path("/volume4/Music_Cloud/library"), config.library_dir)
|
||||
self.assertEqual(Path("/volume4/Music_Cloud/catalogsync/data/catalogsync.db"), config.db_path)
|
||||
self.assertEqual("platform_first_artist", config.download_layout)
|
||||
|
||||
def test_runtime_config_ensure_directories_creates_expected_tree(self):
|
||||
from musicdl.catalogsync.runtime import CatalogSyncRuntimeConfig
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
root_dir = Path(tmpdir) / "Music_Cloud"
|
||||
config = CatalogSyncRuntimeConfig.from_mapping({"ROOT_DIR": str(root_dir)})
|
||||
|
||||
config.ensure_directories()
|
||||
|
||||
self.assertTrue((root_dir / "library").is_dir())
|
||||
self.assertTrue((root_dir / "catalogsync" / "app").is_dir())
|
||||
self.assertTrue((root_dir / "catalogsync" / "bin").is_dir())
|
||||
self.assertTrue((root_dir / "catalogsync" / "config").is_dir())
|
||||
self.assertTrue((root_dir / "catalogsync" / "data").is_dir())
|
||||
self.assertTrue((root_dir / "catalogsync" / "inputs").is_dir())
|
||||
self.assertTrue((root_dir / "catalogsync" / "logs").is_dir())
|
||||
|
||||
def test_build_download_relative_dir_uses_platform_and_first_artist(self):
|
||||
from musicdl.catalogsync.runtime import build_download_relative_dir
|
||||
|
||||
relative_dir = build_download_relative_dir(
|
||||
platform="qq",
|
||||
singers="Singer A / Singer B",
|
||||
)
|
||||
|
||||
self.assertEqual(Path("qq") / "Singer A", relative_dir)
|
||||
|
||||
def test_build_download_relative_dir_falls_back_to_unknown_artist(self):
|
||||
from musicdl.catalogsync.runtime import build_download_relative_dir
|
||||
|
||||
relative_dir = build_download_relative_dir(
|
||||
platform="netease",
|
||||
singers="",
|
||||
)
|
||||
|
||||
self.assertEqual(Path("netease") / "Unknown Artist", relative_dir)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused runtime/layout tests to verify they fail**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_runtime -v`
|
||||
Expected: FAIL with import error for `musicdl.catalogsync.runtime` or missing helper functions
|
||||
|
||||
- [ ] **Step 3: Implement the minimal runtime/layout helper module**
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
INVALID_PATH_CHARS_RE = re.compile(r'[<>:"/\\|?*\x00-\x1f]')
|
||||
|
||||
|
||||
def sanitize_path_component(value: str, fallback: str) -> str:
|
||||
cleaned = INVALID_PATH_CHARS_RE.sub("_", (value or "").strip()).rstrip(". ")
|
||||
return cleaned or fallback
|
||||
|
||||
|
||||
def pick_first_artist_name(singers: str | None) -> str:
|
||||
for candidate in re.split(r"\s*(?:/|,|&|\|)\s*", singers or ""):
|
||||
if candidate.strip():
|
||||
return sanitize_path_component(candidate, "Unknown Artist")
|
||||
return "Unknown Artist"
|
||||
|
||||
|
||||
def build_download_relative_dir(platform: str, singers: str | None) -> Path:
|
||||
return Path(sanitize_path_component(platform, "unknown")) / pick_first_artist_name(singers)
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class CatalogSyncRuntimeConfig:
|
||||
root_dir: Path
|
||||
app_home: Path
|
||||
library_dir: Path
|
||||
db_path: Path
|
||||
input_dir: Path
|
||||
log_dir: Path
|
||||
python_bin: str
|
||||
venv_dir: Path
|
||||
download_layout: str
|
||||
|
||||
@classmethod
|
||||
def from_mapping(cls, mapping: dict[str, str]) -> "CatalogSyncRuntimeConfig":
|
||||
root_dir = Path(mapping["ROOT_DIR"]).resolve()
|
||||
app_home = Path(mapping.get("APP_HOME", root_dir / "catalogsync")).resolve()
|
||||
library_dir = Path(mapping.get("LIBRARY_DIR", root_dir / "library")).resolve()
|
||||
return cls(
|
||||
root_dir=root_dir,
|
||||
app_home=app_home,
|
||||
library_dir=library_dir,
|
||||
db_path=Path(mapping.get("DB_PATH", app_home / "data" / "catalogsync.db")).resolve(),
|
||||
input_dir=Path(mapping.get("INPUT_DIR", app_home / "inputs")).resolve(),
|
||||
log_dir=Path(mapping.get("LOG_DIR", app_home / "logs")).resolve(),
|
||||
python_bin=mapping.get("PYTHON_BIN", "python3"),
|
||||
venv_dir=Path(mapping.get("VENV_DIR", app_home / "app" / ".venv")).resolve(),
|
||||
download_layout=mapping.get("DOWNLOAD_LAYOUT", "platform_first_artist"),
|
||||
)
|
||||
|
||||
def ensure_directories(self) -> None:
|
||||
for path in (
|
||||
self.root_dir,
|
||||
self.library_dir,
|
||||
self.app_home / "app",
|
||||
self.app_home / "bin",
|
||||
self.app_home / "config",
|
||||
self.app_home / "data",
|
||||
self.app_home / "inputs",
|
||||
self.app_home / "logs",
|
||||
):
|
||||
path.mkdir(parents=True, exist_ok=True)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused runtime/layout tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_runtime -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/runtime.py tests/catalogsync/test_runtime.py
|
||||
git commit -m "feat: add runtime layout helpers"
|
||||
```
|
||||
|
||||
### Task 2: Route downloader output through `platform/first_artist`
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/downloader.py`
|
||||
- Modify: `tests/catalogsync/test_services.py`
|
||||
|
||||
- [ ] **Step 1: Add failing downloader layout tests**
|
||||
|
||||
```python
|
||||
def test_catalog_downloader_records_platform_first_artist_locator(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.downloader import CatalogDownloader
|
||||
from musicdl.catalogsync.models import CatalogSong
|
||||
from musicdl.catalogsync.repository import CatalogRepository
|
||||
|
||||
class FakeClient:
|
||||
def download(self, song_infos, num_threadings=1, auto_supplement_song=False):
|
||||
save_path = Path(song_infos[0].work_dir) / "song-c.mp3"
|
||||
save_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
save_path.write_bytes(b"fake-audio")
|
||||
return [SimpleNamespace(save_path=str(save_path))]
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
library_root = Path(tmpdir) / "library"
|
||||
initialize_database(db_path, default_library_root=library_root).close()
|
||||
repo = CatalogRepository(db_path)
|
||||
repo.upsert_song(
|
||||
CatalogSong(
|
||||
platform="qq",
|
||||
remote_song_id="song-c",
|
||||
name="Song C",
|
||||
singers="Singer A / Singer B",
|
||||
ext="mp3",
|
||||
file_size_bytes=80,
|
||||
metadata={"snapshot": {"identifier": "song-c"}},
|
||||
)
|
||||
)
|
||||
downloader = CatalogDownloader(repository=repo)
|
||||
|
||||
with patch("musicdl.catalogsync.downloader.deserialize_song_info", return_value=SimpleNamespace(singers="Singer A / Singer B")):
|
||||
with patch.object(downloader, "get_client", return_value=FakeClient()):
|
||||
downloader.download_pending(library_root=library_root, limit=1)
|
||||
|
||||
location = repo._fetchone(
|
||||
"SELECT locator FROM file_locations ORDER BY id DESC LIMIT 1"
|
||||
)
|
||||
|
||||
self.assertEqual("qq/Singer A/song-c.mp3", location["locator"])
|
||||
|
||||
def test_catalog_downloader_uses_unknown_artist_fallback_directory(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.downloader import CatalogDownloader
|
||||
from musicdl.catalogsync.models import CatalogSong
|
||||
from musicdl.catalogsync.repository import CatalogRepository
|
||||
|
||||
class FakeClient:
|
||||
def download(self, song_infos, num_threadings=1, auto_supplement_song=False):
|
||||
save_path = Path(song_infos[0].work_dir) / "song-a.flac"
|
||||
save_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
save_path.write_bytes(b"fake-audio")
|
||||
return [SimpleNamespace(save_path=str(save_path))]
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
library_root = Path(tmpdir) / "library"
|
||||
initialize_database(db_path, default_library_root=library_root).close()
|
||||
repo = CatalogRepository(db_path)
|
||||
repo.upsert_song(
|
||||
CatalogSong(
|
||||
platform="netease",
|
||||
remote_song_id="song-a",
|
||||
name="Song A",
|
||||
singers="",
|
||||
ext="flac",
|
||||
file_size_bytes=100,
|
||||
metadata={"snapshot": {"identifier": "song-a"}},
|
||||
)
|
||||
)
|
||||
downloader = CatalogDownloader(repository=repo)
|
||||
|
||||
with patch("musicdl.catalogsync.downloader.deserialize_song_info", return_value=SimpleNamespace(singers="")):
|
||||
with patch.object(downloader, "get_client", return_value=FakeClient()):
|
||||
downloader.download_pending(library_root=library_root, limit=1)
|
||||
|
||||
location = repo._fetchone(
|
||||
"SELECT locator FROM file_locations ORDER BY id DESC LIMIT 1"
|
||||
)
|
||||
|
||||
self.assertEqual("netease/Unknown Artist/song-a.flac", location["locator"])
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused downloader tests to verify they fail**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_records_platform_first_artist_locator tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_uses_unknown_artist_fallback_directory -v`
|
||||
Expected: FAIL because the downloader still writes `platform/filename`
|
||||
|
||||
- [ ] **Step 3: Implement the downloader layout change**
|
||||
|
||||
```python
|
||||
from .runtime import build_download_relative_dir
|
||||
```
|
||||
|
||||
```python
|
||||
relative_dir = build_download_relative_dir(
|
||||
platform=row["platform"],
|
||||
singers=getattr(song_info, "singers", None) or row.get("singers"),
|
||||
)
|
||||
target_dir = target_root / relative_dir
|
||||
target_dir.mkdir(parents=True, exist_ok=True)
|
||||
song_info.work_dir = str(target_dir)
|
||||
```
|
||||
|
||||
Keep the locator writeback based on the actual saved file:
|
||||
|
||||
```python
|
||||
saved_path = Path(saved_song.save_path)
|
||||
relative_path = saved_path.relative_to(target_root).as_posix()
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused downloader tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_records_platform_first_artist_locator tests.catalogsync.test_services.CatalogServiceTests.test_catalog_downloader_uses_unknown_artist_fallback_directory -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Run the broader catalogsync tests affected by downloader changes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_services tests.catalogsync.test_cli -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/downloader.py tests/catalogsync/test_services.py
|
||||
git commit -m "feat: store downloads under platform and first artist"
|
||||
```
|
||||
|
||||
### Task 3: Add portable deployment and runtime script templates
|
||||
|
||||
**Files:**
|
||||
- Create: `scripts/catalogsync/bootstrap_to_linux.ps1`
|
||||
- Create: `scripts/catalogsync/templates/catalogsync.env.example`
|
||||
- Create: `scripts/catalogsync/templates/download_all.sh`
|
||||
- Create: `scripts/catalogsync/templates/download_from_file.sh`
|
||||
- Modify: `tests/catalogsync/test_runtime.py`
|
||||
|
||||
- [ ] **Step 1: Add failing tests for deployment template content**
|
||||
|
||||
```python
|
||||
def test_catalogsync_env_example_contains_required_keys(self):
|
||||
template = Path("scripts/catalogsync/templates/catalogsync.env.example").read_text(encoding="utf-8")
|
||||
self.assertIn("ROOT_DIR=", template)
|
||||
self.assertIn("APP_HOME=", template)
|
||||
self.assertIn("LIBRARY_DIR=", template)
|
||||
self.assertIn("DB_PATH=", template)
|
||||
self.assertIn("INPUT_DIR=", template)
|
||||
self.assertIn("LOG_DIR=", template)
|
||||
self.assertIn("DOWNLOAD_LAYOUT=platform_first_artist", template)
|
||||
|
||||
def test_runtime_script_template_uses_configured_library_dir(self):
|
||||
script = Path("scripts/catalogsync/templates/download_from_file.sh").read_text(encoding="utf-8")
|
||||
self.assertIn("LIBRARY_DIR", script)
|
||||
self.assertIn("INPUT_DIR", script)
|
||||
self.assertIn("musicdl.catalogsync.cli run", script)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused runtime/template tests to verify they fail**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_runtime.RuntimeLayoutTests.test_catalogsync_env_example_contains_required_keys tests.catalogsync.test_runtime.RuntimeLayoutTests.test_runtime_script_template_uses_configured_library_dir -v`
|
||||
Expected: FAIL because the template files do not exist yet
|
||||
|
||||
- [ ] **Step 3: Add the deployment and runtime script templates**
|
||||
|
||||
`scripts/catalogsync/templates/catalogsync.env.example`:
|
||||
|
||||
```bash
|
||||
ROOT_DIR=/volume4/Music_Cloud
|
||||
APP_HOME=/volume4/Music_Cloud/catalogsync
|
||||
LIBRARY_DIR=/volume4/Music_Cloud/library
|
||||
|
||||
DB_PATH=/volume4/Music_Cloud/catalogsync/data/catalogsync.db
|
||||
INPUT_DIR=/volume4/Music_Cloud/catalogsync/inputs
|
||||
LOG_DIR=/volume4/Music_Cloud/catalogsync/logs
|
||||
|
||||
PYTHON_BIN=python3
|
||||
VENV_DIR=/volume4/Music_Cloud/catalogsync/app/.venv
|
||||
|
||||
DOWNLOAD_LAYOUT=platform_first_artist
|
||||
```
|
||||
|
||||
`scripts/catalogsync/templates/download_all.sh`:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
APP_HOME="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
CONFIG_FILE="${APP_HOME}/config/catalogsync.env"
|
||||
source "${CONFIG_FILE}"
|
||||
|
||||
mkdir -p "${LIBRARY_DIR}" "${APP_HOME}/data" "${INPUT_DIR}" "${LOG_DIR}"
|
||||
|
||||
"${PYTHON_BIN}" -m musicdl.catalogsync.cli run \
|
||||
--db "${DB_PATH}" \
|
||||
--library-root "${LIBRARY_DIR}" \
|
||||
"$@"
|
||||
```
|
||||
|
||||
`scripts/catalogsync/templates/download_from_file.sh`:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ $# -lt 1 ]]; then
|
||||
echo "usage: $0 <playlist-file> [extra args...]"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
PLAYLIST_FILE="$1"
|
||||
shift
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
APP_HOME="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
CONFIG_FILE="${APP_HOME}/config/catalogsync.env"
|
||||
source "${CONFIG_FILE}"
|
||||
|
||||
mkdir -p "${LIBRARY_DIR}" "${APP_HOME}/data" "${INPUT_DIR}" "${LOG_DIR}"
|
||||
|
||||
"${PYTHON_BIN}" -m musicdl.catalogsync.cli run \
|
||||
--db "${DB_PATH}" \
|
||||
--library-root "${LIBRARY_DIR}" \
|
||||
--playlist-file "${PLAYLIST_FILE}" \
|
||||
"$@"
|
||||
```
|
||||
|
||||
`scripts/catalogsync/bootstrap_to_linux.ps1` should:
|
||||
|
||||
```powershell
|
||||
param(
|
||||
[string]$Host,
|
||||
[int]$Port = 22,
|
||||
[string]$User,
|
||||
[string]$RootDir = "/volume4/Music_Cloud"
|
||||
)
|
||||
|
||||
$AppHome = "$RootDir/catalogsync"
|
||||
$RemoteDirs = @(
|
||||
$RootDir,
|
||||
"$RootDir/library",
|
||||
"$AppHome/app",
|
||||
"$AppHome/bin",
|
||||
"$AppHome/config",
|
||||
"$AppHome/data",
|
||||
"$AppHome/inputs",
|
||||
"$AppHome/logs"
|
||||
)
|
||||
```
|
||||
|
||||
Then use `ssh` and `scp` to:
|
||||
|
||||
- create the remote directories
|
||||
- copy the application files into `$AppHome/app`
|
||||
- copy the shell script templates into `$AppHome/bin`
|
||||
- copy `catalogsync.env.example` into `$AppHome/config/catalogsync.env.example` if missing
|
||||
|
||||
- [ ] **Step 4: Re-run the focused runtime/template tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_runtime -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/catalogsync tests/catalogsync/test_runtime.py
|
||||
git commit -m "feat: add portable catalogsync deployment scripts"
|
||||
```
|
||||
|
||||
### Task 4: Document the new layout and verify the full flow
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/catalogsync.md`
|
||||
- Modify: `README.md`
|
||||
|
||||
- [ ] **Step 1: Update user-facing docs with the new deployment layout**
|
||||
|
||||
Add:
|
||||
|
||||
- the `/volume4/Music_Cloud/library` versus `/volume4/Music_Cloud/catalogsync` split
|
||||
- the `platform/first_artist` download layout
|
||||
- the `catalogsync.env` example
|
||||
- the `scripts/catalogsync/bootstrap_to_linux.ps1` usage
|
||||
- the target-side `download_all.sh` and `download_from_file.sh` usage
|
||||
|
||||
- [ ] **Step 2: Run the full catalogsync unittest suite**
|
||||
|
||||
Run: `python -m unittest discover -s tests/catalogsync -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 3: Run a local smoke check for CLI help**
|
||||
|
||||
Run: `python -m musicdl.catalogsync.cli run --help`
|
||||
Expected: output includes `--playlist-file`
|
||||
|
||||
- [ ] **Step 4: Inspect the generated diff**
|
||||
|
||||
Run: `git diff --stat`
|
||||
Expected: only the planned runtime/layout/downloader/docs files changed
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/catalogsync.md README.md
|
||||
git commit -m "docs: describe NAS download layout workflow"
|
||||
```
|
||||
@@ -0,0 +1,476 @@
|
||||
# Playlist File Run Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add a `--playlist-file` option to `musicdl.catalogsync` so `run` can process only playlists listed in a text file while preserving the current default workflow when the option is absent.
|
||||
|
||||
**Architecture:** Keep the existing `collect -> sync -> download` path unchanged and add a narrow file-driven branch in `run`. Parse playlist-file lines into normalized playlist import entries, upsert them into the existing catalog tables under a `manual_file` pool, then sync and download only the referenced playlist IDs.
|
||||
|
||||
**Tech Stack:** Python stdlib (`pathlib`, `dataclasses`, `urllib.parse`), `click`, existing `musicdl.catalogsync` modules, `unittest`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add file playlist parsing and manual-file pool support
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/manual_playlists.py`
|
||||
- Modify: `musicdl/catalogsync/models.py`
|
||||
- Modify: `musicdl/catalogsync/repository.py`
|
||||
- Test: `tests/catalogsync/test_manual_playlists.py`
|
||||
|
||||
- [ ] **Step 1: Write failing tests for playlist-file parsing**
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class ManualPlaylistParsingTests(unittest.TestCase):
|
||||
def test_parse_playlist_file_supports_url_and_platform_url_lines(self):
|
||||
from musicdl.catalogsync.manual_playlists import parse_playlist_file
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
playlist_file = Path(tmpdir) / "playlists.txt"
|
||||
playlist_file.write_text(
|
||||
"\n".join(
|
||||
[
|
||||
"# comment",
|
||||
"https://music.163.com/#/playlist?id=17745989905",
|
||||
"qq,https://y.qq.com/n/ryqq/playlist/7707261125",
|
||||
"https://music.163.com/#/playlist?id=17745989905",
|
||||
"",
|
||||
]
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
parsed = parse_playlist_file(playlist_file)
|
||||
|
||||
self.assertEqual(5, parsed.total_lines)
|
||||
self.assertEqual(0, parsed.skipped_lines)
|
||||
self.assertEqual(2, len(parsed.entries))
|
||||
self.assertEqual("netease", parsed.entries[0].platform)
|
||||
self.assertEqual("17745989905", parsed.entries[0].remote_id)
|
||||
self.assertEqual("qq", parsed.entries[1].platform)
|
||||
self.assertEqual("7707261125", parsed.entries[1].remote_id)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused parser test to verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_manual_playlists.ManualPlaylistParsingTests.test_parse_playlist_file_supports_url_and_platform_url_lines -v`
|
||||
Expected: FAIL with import error for `musicdl.catalogsync.manual_playlists` or missing parser helpers
|
||||
|
||||
- [ ] **Step 3: Implement the parser module and manual-file pool helper**
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
from .models import PlaylistCandidate
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class ParsedPlaylistFile:
|
||||
entries: list[PlaylistCandidate]
|
||||
total_lines: int
|
||||
skipped_lines: int
|
||||
|
||||
|
||||
def parse_playlist_file(path: str | Path) -> ParsedPlaylistFile:
|
||||
raise NotImplementedError
|
||||
```
|
||||
|
||||
```python
|
||||
def get_or_create_manual_file_pool(self, playlist_file: str | Path) -> int:
|
||||
resolved = Path(playlist_file).resolve()
|
||||
return self.upsert_playlist_pool(
|
||||
platform="manual",
|
||||
pool_kind="manual_file",
|
||||
external_id=f"manual_file:{resolved}",
|
||||
name=f"Manual File Import: {resolved.name}",
|
||||
url=str(resolved),
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused parser tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_manual_playlists -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/manual_playlists.py musicdl/catalogsync/models.py musicdl/catalogsync/repository.py tests/catalogsync/test_manual_playlists.py
|
||||
git commit -m "feat: add playlist file parsing support"
|
||||
```
|
||||
|
||||
### Task 2: Add manual playlist import and targeted sync
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/services.py`
|
||||
- Modify: `musicdl/catalogsync/repository.py`
|
||||
- Test: `tests/catalogsync/test_manual_playlists.py`
|
||||
- Test: `tests/catalogsync/test_services.py`
|
||||
|
||||
- [ ] **Step 1: Write failing tests for manual playlist import and targeted sync**
|
||||
|
||||
```python
|
||||
def test_import_manual_playlists_creates_manual_pool_and_returns_playlist_ids(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.manual_playlists import parse_playlist_file
|
||||
from musicdl.catalogsync.repository import CatalogRepository
|
||||
from musicdl.catalogsync.services import CatalogSyncService
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
playlist_file = Path(tmpdir) / "playlists.txt"
|
||||
playlist_file.write_text(
|
||||
"https://music.163.com/#/playlist?id=17745989905\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
initialize_database(db_path).close()
|
||||
repo = CatalogRepository(db_path)
|
||||
service = CatalogSyncService(repo)
|
||||
|
||||
parsed = parse_playlist_file(playlist_file)
|
||||
playlist_ids = service.import_manual_playlists(playlist_file, parsed.entries)
|
||||
|
||||
self.assertEqual(1, len(playlist_ids))
|
||||
self.assertEqual(1, repo.count_rows("playlists"))
|
||||
self.assertEqual(
|
||||
1,
|
||||
len(repo.list_pool_playlist_links(repo.get_or_create_manual_file_pool(playlist_file))),
|
||||
)
|
||||
```
|
||||
|
||||
```python
|
||||
def test_sync_specific_playlists_only_processes_requested_playlist_ids(self):
|
||||
from unittest.mock import patch
|
||||
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.models import PlaylistCandidate
|
||||
from musicdl.catalogsync.repository import CatalogRepository
|
||||
from musicdl.catalogsync.services import CatalogSyncService
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
initialize_database(db_path).close()
|
||||
repo = CatalogRepository(db_path)
|
||||
service = CatalogSyncService(repo)
|
||||
|
||||
playlist_a = repo.upsert_playlist(
|
||||
PlaylistCandidate(
|
||||
platform="netease",
|
||||
pool_kind="manual_file",
|
||||
remote_id="17745989905",
|
||||
name="Playlist A",
|
||||
url="https://music.163.com/#/playlist?id=17745989905",
|
||||
)
|
||||
)
|
||||
playlist_b = repo.upsert_playlist(
|
||||
PlaylistCandidate(
|
||||
platform="netease",
|
||||
pool_kind="manual_file",
|
||||
remote_id="17729789137",
|
||||
name="Playlist B",
|
||||
url="https://music.163.com/#/playlist?id=17729789137",
|
||||
)
|
||||
)
|
||||
|
||||
with patch.object(service, "resolve_playlist_song_infos", return_value=[] ) as resolver:
|
||||
service.sync_specific_playlists([playlist_b])
|
||||
|
||||
called_row = resolver.call_args[0][0]
|
||||
self.assertEqual(playlist_b, int(called_row["id"]))
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused services tests to verify they fail**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_manual_playlists tests.catalogsync.test_services -v`
|
||||
Expected: FAIL with missing `import_manual_playlists`, `list_playlists_by_ids`, and `sync_specific_playlists`
|
||||
|
||||
- [ ] **Step 3: Implement manual playlist import and targeted sync entry points**
|
||||
|
||||
```python
|
||||
def list_playlists_by_ids(self, playlist_ids: list[int]) -> list[sqlite3.Row]:
|
||||
placeholders = ", ".join("?" for _ in playlist_ids)
|
||||
return self._fetchall(
|
||||
f"SELECT * FROM playlists WHERE id IN ({placeholders}) ORDER BY id ASC",
|
||||
tuple(playlist_ids),
|
||||
)
|
||||
```
|
||||
|
||||
```python
|
||||
def import_manual_playlists(self, playlist_file: str | Path, candidates: list[PlaylistCandidate]) -> list[int]:
|
||||
pool_id = self.repository.get_or_create_manual_file_pool(playlist_file)
|
||||
playlist_ids = []
|
||||
for candidate in candidates:
|
||||
playlist_id = self.repository.upsert_playlist(candidate)
|
||||
self.repository.link_pool_playlist(pool_id, playlist_id)
|
||||
playlist_ids.append(playlist_id)
|
||||
return playlist_ids
|
||||
|
||||
def sync_specific_playlists(self, playlist_ids: list[int]) -> int:
|
||||
processed = 0
|
||||
for playlist_row in self.repository.list_playlists_by_ids(playlist_ids):
|
||||
song_infos = self.resolve_playlist_song_infos(playlist_row)
|
||||
for pool_id in self.repository.get_pool_ids_for_playlist(int(playlist_row["id"])):
|
||||
self.store_playlist_songs(int(playlist_row["id"]), pool_id, song_infos)
|
||||
processed += len(song_infos)
|
||||
return processed
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused services tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_manual_playlists tests.catalogsync.test_services -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/services.py musicdl/catalogsync/repository.py tests/catalogsync/test_manual_playlists.py tests/catalogsync/test_services.py
|
||||
git commit -m "feat: add manual playlist import and targeted sync"
|
||||
```
|
||||
|
||||
### Task 3: Add targeted download planning for selected playlists
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/downloader.py`
|
||||
- Modify: `musicdl/catalogsync/repository.py`
|
||||
- Test: `tests/catalogsync/test_services.py`
|
||||
|
||||
- [ ] **Step 1: Write a failing test for limiting downloads to selected playlist IDs**
|
||||
|
||||
```python
|
||||
def test_download_planner_can_limit_queue_to_specific_playlists(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.models import CatalogSong, PlaylistCandidate
|
||||
from musicdl.catalogsync.repository import CatalogRepository
|
||||
from musicdl.catalogsync.downloader import DownloadPlanner
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
initialize_database(db_path).close()
|
||||
repo = CatalogRepository(db_path)
|
||||
|
||||
playlist_a = repo.upsert_playlist(
|
||||
PlaylistCandidate(
|
||||
platform="qq",
|
||||
pool_kind="manual_file",
|
||||
remote_id="7707261125",
|
||||
name="Playlist A",
|
||||
url="https://y.qq.com/n/ryqq/playlist/7707261125",
|
||||
)
|
||||
)
|
||||
playlist_b = repo.upsert_playlist(
|
||||
PlaylistCandidate(
|
||||
platform="qq",
|
||||
pool_kind="manual_file",
|
||||
remote_id="7578943835",
|
||||
name="Playlist B",
|
||||
url="https://y.qq.com/n/ryqq/playlist/7578943835",
|
||||
)
|
||||
)
|
||||
song_a = repo.upsert_song(CatalogSong(platform="qq", remote_song_id="song-a", name="A"))
|
||||
song_b = repo.upsert_song(CatalogSong(platform="qq", remote_song_id="song-b", name="B"))
|
||||
repo.link_playlist_song(playlist_a, song_a, 1)
|
||||
repo.link_playlist_song(playlist_b, song_b, 1)
|
||||
|
||||
planner = DownloadPlanner(repo)
|
||||
queue = planner.build_download_queue(playlist_ids=[playlist_a])
|
||||
|
||||
self.assertEqual([song_a], [item["song_id"] for item in queue])
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused download planner test to verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_services.CatalogServiceTests.test_download_planner_can_limit_queue_to_specific_playlists -v`
|
||||
Expected: FAIL because `playlist_ids` is unsupported
|
||||
|
||||
- [ ] **Step 3: Implement repository and downloader filtering by playlist IDs**
|
||||
|
||||
```python
|
||||
def list_pending_download_songs(
|
||||
self,
|
||||
sources: list[str] | None = None,
|
||||
limit: int | None = None,
|
||||
playlist_ids: list[int] | None = None,
|
||||
) -> list[sqlite3.Row]:
|
||||
query = """
|
||||
SELECT DISTINCT s.*
|
||||
FROM songs s
|
||||
JOIN playlist_songs ps ON ps.song_id = s.id
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1
|
||||
FROM file_locations fl
|
||||
JOIN file_assets fa ON fa.id = fl.file_asset_id
|
||||
JOIN storage_backends sb ON sb.id = fl.backend_id
|
||||
WHERE fa.song_id = s.id
|
||||
AND fl.status = 'active'
|
||||
AND sb.backend_type = 'local_fs'
|
||||
)
|
||||
"""
|
||||
```
|
||||
|
||||
```python
|
||||
def build_download_queue(
|
||||
self,
|
||||
sources: list[str] | None = None,
|
||||
limit: int | None = None,
|
||||
playlist_ids: list[int] | None = None,
|
||||
) -> list[dict]:
|
||||
rows = self.repository.list_pending_download_songs(
|
||||
sources=sources,
|
||||
limit=limit,
|
||||
playlist_ids=playlist_ids,
|
||||
)
|
||||
queue = []
|
||||
for row in rows:
|
||||
if self.repository.song_has_active_local_file(int(row["id"])):
|
||||
continue
|
||||
item = dict(row)
|
||||
item["song_id"] = int(row["id"])
|
||||
queue.append(item)
|
||||
return queue
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused download tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_services -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/downloader.py musicdl/catalogsync/repository.py tests/catalogsync/test_services.py
|
||||
git commit -m "feat: limit downloads to selected playlists"
|
||||
```
|
||||
|
||||
### Task 4: Wire `--playlist-file` into the CLI
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/cli.py`
|
||||
- Modify: `tests/catalogsync/test_cli.py`
|
||||
- Test: `tests/catalogsync/test_manual_playlists.py`
|
||||
|
||||
- [ ] **Step 1: Write failing CLI tests for the file-driven `run` path**
|
||||
|
||||
```python
|
||||
def test_run_command_uses_playlist_file_branch_without_collect(self):
|
||||
from musicdl.catalogsync.cli import cli
|
||||
|
||||
runner = CliRunner()
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
playlist_file = Path(tmpdir) / "playlists.txt"
|
||||
playlist_file.write_text(
|
||||
"https://music.163.com/#/playlist?id=17745989905\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
with patch("musicdl.catalogsync.cli.CatalogSyncApplication") as app_cls:
|
||||
app = app_cls.return_value
|
||||
|
||||
result = runner.invoke(
|
||||
cli,
|
||||
[
|
||||
"run",
|
||||
"--db",
|
||||
str(db_path),
|
||||
"--library-root",
|
||||
str(Path(tmpdir) / "library"),
|
||||
"--playlist-file",
|
||||
str(playlist_file),
|
||||
],
|
||||
)
|
||||
|
||||
self.assertEqual(0, result.exit_code, msg=result.output)
|
||||
app.collect_playlists.assert_not_called()
|
||||
app.run_playlist_file.assert_called_once()
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused CLI test to verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_cli.CatalogCliTests.test_run_command_uses_playlist_file_branch_without_collect -v`
|
||||
Expected: FAIL because `--playlist-file` and `run_playlist_file` do not exist
|
||||
|
||||
- [ ] **Step 3: Implement the CLI branch and application method**
|
||||
|
||||
```python
|
||||
def run_playlist_file(self, playlist_file: str, sources: list[str] | None = None, limit: int | None = None):
|
||||
parsed = parse_playlist_file(playlist_file)
|
||||
playlist_ids = self.service.import_manual_playlists(playlist_file, parsed.entries)
|
||||
if limit is not None:
|
||||
playlist_ids = playlist_ids[:limit]
|
||||
synced = self.service.sync_specific_playlists(playlist_ids)
|
||||
downloaded = self.downloader.download_pending(
|
||||
self.library_root,
|
||||
sources=sources,
|
||||
limit=limit,
|
||||
playlist_ids=playlist_ids,
|
||||
)
|
||||
return parsed, synced, downloaded
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused CLI tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_cli -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/cli.py tests/catalogsync/test_cli.py tests/catalogsync/test_manual_playlists.py
|
||||
git commit -m "feat: add playlist file run option"
|
||||
```
|
||||
|
||||
### Task 5: Update docs and run final verification
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/catalogsync.md`
|
||||
- Modify: `README.md`
|
||||
- Test: `tests/catalogsync/test_manual_playlists.py`
|
||||
- Test: `tests/catalogsync/test_cli.py`
|
||||
- Test: `tests/catalogsync/test_services.py`
|
||||
|
||||
- [ ] **Step 1: Update docs with `--playlist-file` examples and file format rules**
|
||||
|
||||
```markdown
|
||||
### 从文件读取歌单
|
||||
|
||||
```bash
|
||||
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --playlist-file D:\lists\playlists.txt
|
||||
```
|
||||
|
||||
文件支持:
|
||||
- 一行一个歌单 URL
|
||||
- `平台,歌单URL`
|
||||
- `#` 注释行
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the full catalogsync test suite**
|
||||
|
||||
Run: `python -m unittest discover -s tests/catalogsync -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 3: Run a manual smoke flow using a temporary playlist file**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
python -m musicdl.catalogsync.cli init-db --db .tmp\playlist-file\catalogsync.db --library-root .tmp\playlist-file\library
|
||||
python -m musicdl.catalogsync.cli run --db .tmp\playlist-file\catalogsync.db --library-root .tmp\playlist-file\library --playlist-file .tmp\playlist-file\playlists.txt
|
||||
```
|
||||
|
||||
Expected:
|
||||
- command exits 0
|
||||
- the file-driven branch skips `collect`
|
||||
- playlists from the file are imported and processed
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/catalogsync.md README.md
|
||||
git commit -m "docs: add playlist file run usage"
|
||||
```
|
||||
@@ -0,0 +1,958 @@
|
||||
# Catalogsync Operations Console Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Build a NAS-local operations console for `musicdl.catalogsync` with queue-based `collect/sync/download/upload` jobs, soft pause and resume, crash-safe recovery, song-level retry, worker visibility, and `catalogsync.env` configuration management.
|
||||
|
||||
**Architecture:** Add a new `musicdl.catalogsync.ops` package for execution-state persistence, env revision management, job orchestration, and FastAPI-backed UI/API endpoints. Reuse the existing catalog tables and execution services by wrapping them in stage and item executors rather than rebuilding the domain logic from scratch.
|
||||
|
||||
**Tech Stack:** Python 3, unittest, SQLite, FastAPI, Jinja2, Server-Sent Events, existing `musicdl.catalogsync` services/downloader/uploader, NAS shell templates.
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
### New files
|
||||
|
||||
- `musicdl/catalogsync/ops/__init__.py`
|
||||
- exports the operations-console public entry points
|
||||
- `musicdl/catalogsync/ops/models.py`
|
||||
- enums, dataclasses, and small helpers for job, stage, item, and worker states
|
||||
- `musicdl/catalogsync/ops/repository.py`
|
||||
- CRUD for `job_runs`, `job_stages`, `job_items`, `job_workers`, `job_commands`, `job_events`, `job_logs`, and `config_revisions`
|
||||
- `musicdl/catalogsync/ops/config.py`
|
||||
- load, validate, snapshot, version, and apply `catalogsync.env`
|
||||
- `musicdl/catalogsync/ops/executors.py`
|
||||
- adapters that run `collect`, `sync`, `download`, and `upload` one work item at a time
|
||||
- `musicdl/catalogsync/ops/runner.py`
|
||||
- queue scheduler, command polling, pause/resume, and crash recovery
|
||||
- `musicdl/catalogsync/ops/web.py`
|
||||
- FastAPI app factory, API routes, page routes, and SSE stream
|
||||
- `musicdl/catalogsync/templates/ops/base.html`
|
||||
- shared layout
|
||||
- `musicdl/catalogsync/templates/ops/dashboard.html`
|
||||
- dashboard page
|
||||
- `musicdl/catalogsync/templates/ops/jobs.html`
|
||||
- queue and active-job page
|
||||
- `musicdl/catalogsync/templates/ops/job_detail.html`
|
||||
- per-job detail page
|
||||
- `musicdl/catalogsync/templates/ops/playlists.html`
|
||||
- playlist-pool and playlist status page
|
||||
- `musicdl/catalogsync/templates/ops/songs.html`
|
||||
- worker and song-processing page
|
||||
- `musicdl/catalogsync/templates/ops/logs.html`
|
||||
- log and exception page
|
||||
- `musicdl/catalogsync/templates/ops/config.html`
|
||||
- env editor and revision page
|
||||
- `musicdl/catalogsync/static/ops/app.js`
|
||||
- lightweight browser logic for SSE updates and page actions
|
||||
- `scripts/catalogsync/templates/serve_console.sh`
|
||||
- NAS runtime script for the web console
|
||||
- `tests/catalogsync/test_ops_db.py`
|
||||
- schema and repository coverage for operations tables
|
||||
- `tests/catalogsync/test_ops_config.py`
|
||||
- env loading, snapshot, revision, and apply tests
|
||||
- `tests/catalogsync/test_ops_runner.py`
|
||||
- state-machine, pause/resume, and recovery tests
|
||||
- `tests/catalogsync/test_ops_executors.py`
|
||||
- per-stage executor tests
|
||||
- `tests/catalogsync/test_ops_api.py`
|
||||
- FastAPI route and SSE tests
|
||||
|
||||
### Modified files
|
||||
|
||||
- `musicdl/catalogsync/db.py`
|
||||
- create the new operations tables
|
||||
- `musicdl/catalogsync/repository.py`
|
||||
- add query helpers that the operations layer can reuse from catalog data
|
||||
- `musicdl/catalogsync/services.py`
|
||||
- expose a playlist-row based sync unit for the runner
|
||||
- `musicdl/catalogsync/downloader.py`
|
||||
- expose song-level download execution for one queued item
|
||||
- `musicdl/catalogsync/uploader.py`
|
||||
- expose upload-task-level execution for one queued item
|
||||
- `musicdl/catalogsync/cli.py`
|
||||
- add the `serve` command
|
||||
- `musicdl/catalogsync/runtime.py`
|
||||
- add web host, port, and config-path runtime fields
|
||||
- `scripts/catalogsync/templates/catalogsync.env.example`
|
||||
- add web-console settings
|
||||
- `scripts/catalogsync/templates/install_runtime.sh`
|
||||
- install any new web-console dependencies
|
||||
- `docs/catalogsync.md`
|
||||
- document the console workflow and runtime script
|
||||
- `README.md`
|
||||
- add a concise operations-console entry
|
||||
- `requirements.txt`
|
||||
- add the web-console runtime dependencies
|
||||
- `setup.py`
|
||||
- include templates/static assets in packaging
|
||||
- `MANIFEST.in`
|
||||
- include template/static files for source distributions
|
||||
- `tests/catalogsync/test_cli.py`
|
||||
- cover the new `serve` command
|
||||
- `tests/catalogsync/test_runtime.py`
|
||||
- cover the new runtime config fields
|
||||
|
||||
### Dependency notes
|
||||
|
||||
- Add `fastapi`
|
||||
- Add `uvicorn`
|
||||
- Add `jinja2`
|
||||
- Add `python-multipart`
|
||||
|
||||
These should be added only once and then reused by the `serve` command, tests, and NAS runtime script.
|
||||
|
||||
### Task 1: Add Operations Schema And Repository Primitives
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/ops/__init__.py`
|
||||
- Create: `musicdl/catalogsync/ops/models.py`
|
||||
- Create: `musicdl/catalogsync/ops/repository.py`
|
||||
- Modify: `musicdl/catalogsync/db.py`
|
||||
- Test: `tests/catalogsync/test_ops_db.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing schema and repository tests**
|
||||
|
||||
```python
|
||||
import sqlite3
|
||||
import tempfile
|
||||
import unittest
|
||||
from contextlib import closing
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class OperationsSchemaTests(unittest.TestCase):
|
||||
def test_initialize_database_creates_operations_tables(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
|
||||
expected_tables = {
|
||||
"job_runs",
|
||||
"job_stages",
|
||||
"job_items",
|
||||
"job_workers",
|
||||
"job_commands",
|
||||
"job_events",
|
||||
"job_logs",
|
||||
"config_revisions",
|
||||
}
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
initialize_database(db_path).close()
|
||||
with closing(sqlite3.connect(db_path)) as conn:
|
||||
tables = {
|
||||
row[0]
|
||||
for row in conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type = 'table'"
|
||||
).fetchall()
|
||||
}
|
||||
|
||||
self.assertTrue(expected_tables.issubset(tables))
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the targeted test and verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_db -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `FAIL` because the operations tables and repository module do not exist yet
|
||||
|
||||
- [ ] **Step 3: Implement the schema, enums, and repository**
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/db.py
|
||||
REQUIRED_TABLES |= {
|
||||
"job_runs",
|
||||
"job_stages",
|
||||
"job_items",
|
||||
"job_workers",
|
||||
"job_commands",
|
||||
"job_events",
|
||||
"job_logs",
|
||||
"config_revisions",
|
||||
}
|
||||
```
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/ops/models.py
|
||||
from dataclasses import dataclass
|
||||
from enum import StrEnum
|
||||
|
||||
|
||||
class JobStatus(StrEnum):
|
||||
QUEUED = "queued"
|
||||
RUNNING = "running"
|
||||
PAUSE_REQUESTED = "pause_requested"
|
||||
PAUSED = "paused"
|
||||
COMPLETED = "completed"
|
||||
COMPLETED_WITH_ERRORS = "completed_with_errors"
|
||||
FAILED = "failed"
|
||||
CANCELED = "canceled"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class JobCreateRequest:
|
||||
job_type: str
|
||||
config_snapshot: dict
|
||||
sources: list[str] | None = None
|
||||
```
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/ops/repository.py
|
||||
class OperationsRepository:
|
||||
def create_job_run(self, job_type: str, config_snapshot: dict, sources=None, download_sources=None, playlist_scope=None) -> int:
|
||||
with connect_database(self.db_path) as conn:
|
||||
cursor = conn.execute(
|
||||
"""
|
||||
INSERT INTO job_runs (job_type, config_snapshot_json, sources, download_sources, playlist_scope_json)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
job_type,
|
||||
json.dumps(config_snapshot, ensure_ascii=False),
|
||||
",".join(sources or []),
|
||||
",".join(download_sources or []),
|
||||
json.dumps(playlist_scope or {}, ensure_ascii=False),
|
||||
),
|
||||
)
|
||||
return int(cursor.lastrowid)
|
||||
|
||||
def create_job_stage(self, job_id: int, stage_type: str, seq_no: int) -> int:
|
||||
with connect_database(self.db_path) as conn:
|
||||
cursor = conn.execute(
|
||||
"INSERT INTO job_stages (job_run_id, stage_type, seq_no) VALUES (?, ?, ?)",
|
||||
(job_id, stage_type, seq_no),
|
||||
)
|
||||
return int(cursor.lastrowid)
|
||||
|
||||
def create_job_item(self, job_stage_id: int, item_type: str, item_key: str, **extra) -> int:
|
||||
with connect_database(self.db_path) as conn:
|
||||
cursor = conn.execute(
|
||||
"""
|
||||
INSERT INTO job_items (job_stage_id, item_type, item_key, playlist_pool_id, playlist_id, song_id, file_location_id, payload_json)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
job_stage_id,
|
||||
item_type,
|
||||
item_key,
|
||||
extra.get("playlist_pool_id"),
|
||||
extra.get("playlist_id"),
|
||||
extra.get("song_id"),
|
||||
extra.get("file_location_id"),
|
||||
json.dumps(extra.get("payload_json") or {}, ensure_ascii=False),
|
||||
),
|
||||
)
|
||||
return int(cursor.lastrowid)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the targeted test and verify it passes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_db -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `OK`
|
||||
- the test output shows the new operations tables are created
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/db.py musicdl/catalogsync/ops/__init__.py musicdl/catalogsync/ops/models.py musicdl/catalogsync/ops/repository.py tests/catalogsync/test_ops_db.py
|
||||
git commit -m "feat: add operations schema and repository primitives"
|
||||
```
|
||||
|
||||
### Task 2: Add Env Revision Management And Job Config Snapshots
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/ops/config.py`
|
||||
- Modify: `musicdl/catalogsync/ops/repository.py`
|
||||
- Test: `tests/catalogsync/test_ops_config.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing config and revision tests**
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class EnvManagerTests(unittest.TestCase):
|
||||
def test_load_snapshot_and_save_revision(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.ops.config import CatalogsyncEnvManager
|
||||
from musicdl.catalogsync.ops.repository import OperationsRepository
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
env_path = Path(tmpdir) / "catalogsync.env"
|
||||
env_path.write_text(
|
||||
"LIBRARY_DIR=/volume4/Music_Cloud/library\nDOWNLOAD_SOURCES=qq,kuwo,migu\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
initialize_database(db_path).close()
|
||||
repo = OperationsRepository(db_path)
|
||||
manager = CatalogsyncEnvManager(env_path=env_path, repository=repo)
|
||||
|
||||
snapshot = manager.build_job_snapshot()
|
||||
revision_id = manager.save_revision(note="initial import")
|
||||
revisions = manager.list_revisions()
|
||||
|
||||
self.assertEqual(["qq", "kuwo", "migu"], snapshot["download_sources"])
|
||||
self.assertEqual(revision_id, revisions[0]["id"])
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the targeted test and verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_config -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `FAIL` because the env manager and config revision methods do not exist yet
|
||||
|
||||
- [ ] **Step 3: Implement the env manager and revision methods**
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/ops/config.py
|
||||
class CatalogsyncEnvManager:
|
||||
SNAPSHOT_KEYS = {
|
||||
"DB_PATH",
|
||||
"LIBRARY_DIR",
|
||||
"DOWNLOAD_SOURCES",
|
||||
"OBJECT_BACKEND_NAME",
|
||||
"OBJECT_BUCKET",
|
||||
"OBJECT_ENDPOINT",
|
||||
}
|
||||
|
||||
def load_current(self) -> dict[str, str]:
|
||||
values: dict[str, str] = {}
|
||||
for line in self.env_path.read_text(encoding="utf-8").splitlines():
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#") or "=" not in line:
|
||||
continue
|
||||
key, value = line.split("=", 1)
|
||||
values[key.strip()] = value.strip()
|
||||
return values
|
||||
|
||||
def build_job_snapshot(self) -> dict:
|
||||
current = self.load_current()
|
||||
return {
|
||||
"library_root": current.get("LIBRARY_DIR", ""),
|
||||
"download_sources": [item for item in current.get("DOWNLOAD_SOURCES", "").split(",") if item],
|
||||
"env_values": {key: current[key] for key in self.SNAPSHOT_KEYS if key in current},
|
||||
}
|
||||
|
||||
def save_revision(self, note: str | None = None) -> int:
|
||||
content = self.env_path.read_text(encoding="utf-8")
|
||||
digest = hashlib.sha256(content.encode("utf-8")).hexdigest()
|
||||
return self.repository.create_config_revision(
|
||||
file_path=str(self.env_path),
|
||||
content_text=content,
|
||||
content_hash=digest,
|
||||
note=note,
|
||||
)
|
||||
|
||||
def apply_revision(self, revision_id: int) -> None:
|
||||
row = self.repository.get_config_revision(revision_id)
|
||||
self.env_path.write_text(row["content_text"], encoding="utf-8")
|
||||
self.repository.mark_config_revision_applied(revision_id)
|
||||
```
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/ops/repository.py
|
||||
def create_config_revision(self, file_path: str, content_text: str, content_hash: str, note: str | None = None) -> int:
|
||||
with connect_database(self.db_path) as conn:
|
||||
cursor = conn.execute(
|
||||
"""
|
||||
INSERT INTO config_revisions (file_path, content_text, content_hash, note)
|
||||
VALUES (?, ?, ?, ?)
|
||||
""",
|
||||
(file_path, content_text, content_hash, note),
|
||||
)
|
||||
return int(cursor.lastrowid)
|
||||
|
||||
def get_config_revision(self, revision_id: int):
|
||||
with connect_database(self.db_path) as conn:
|
||||
return conn.execute("SELECT * FROM config_revisions WHERE id = ?", (revision_id,)).fetchone()
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the targeted test and verify it passes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_config -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `OK`
|
||||
- revision apply rewrites the env file and marks the revision as applied
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/ops/config.py musicdl/catalogsync/ops/repository.py tests/catalogsync/test_ops_config.py
|
||||
git commit -m "feat: add env revision and job snapshot management"
|
||||
```
|
||||
|
||||
### Task 3: Implement The Runner State Machine And Recovery Logic
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/ops/runner.py`
|
||||
- Modify: `musicdl/catalogsync/ops/models.py`
|
||||
- Modify: `musicdl/catalogsync/ops/repository.py`
|
||||
- Test: `tests/catalogsync/test_ops_runner.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing runner-state tests**
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class RunnerStateTests(unittest.TestCase):
|
||||
def test_recover_orphaned_jobs_converts_running_items_to_interrupted(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.ops.repository import OperationsRepository
|
||||
from musicdl.catalogsync.ops.runner import CatalogsyncRunner
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
initialize_database(db_path).close()
|
||||
repo = OperationsRepository(db_path)
|
||||
runner = CatalogsyncRunner(db_path=db_path, env_path=Path(tmpdir) / "catalogsync.env")
|
||||
|
||||
job_id = repo.create_job_run("download_only", {"library_root": "/tmp/library"})
|
||||
stage_id = repo.create_job_stage(job_id, "download", 1)
|
||||
item_id = repo.create_job_item(stage_id, "song", "song:1", song_id=1)
|
||||
repo.mark_job_running(job_id)
|
||||
repo.mark_stage_running(stage_id)
|
||||
repo.mark_item_running(item_id, worker_id=1)
|
||||
|
||||
runner.recover_incomplete_jobs()
|
||||
|
||||
job = repo.get_job_run(job_id)
|
||||
item = repo.get_job_item(item_id)
|
||||
|
||||
self.assertEqual("paused", job["status"])
|
||||
self.assertEqual("interrupted", item["status"])
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the targeted test and verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_runner -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `FAIL` because the runner, command flow, and recovery helpers do not exist yet
|
||||
|
||||
- [ ] **Step 3: Implement the runner, command polling, and recovery**
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/ops/runner.py
|
||||
class CatalogsyncRunner:
|
||||
def recover_incomplete_jobs(self) -> None:
|
||||
for job in self.repository.list_recoverable_jobs():
|
||||
self.repository.pause_job_for_recovery(int(job["id"]))
|
||||
for item in self.repository.list_running_items(int(job["id"])):
|
||||
self.repository.mark_item_interrupted(int(item["id"]), last_error="Runner restarted during execution")
|
||||
self.repository.add_job_event(int(job["id"]), "recovery_requeued", "Recovered job after runner restart")
|
||||
|
||||
def apply_pending_commands(self) -> None:
|
||||
for command in self.repository.list_pending_commands():
|
||||
if command["command_type"] == "pause":
|
||||
self.repository.request_job_pause(int(command["job_run_id"]))
|
||||
elif command["command_type"] == "resume":
|
||||
self.repository.resume_job(int(command["job_run_id"]))
|
||||
elif command["command_type"] == "retry_item":
|
||||
self.repository.requeue_item(int(command["target_item_id"]), force=False)
|
||||
elif command["command_type"] == "force_retry_item":
|
||||
self.repository.requeue_item(int(command["target_item_id"]), force=True)
|
||||
self.repository.mark_command_applied(int(command["id"]))
|
||||
|
||||
def reconcile_pause_state(self, job_id: int) -> None:
|
||||
if self.repository.job_has_running_items(job_id):
|
||||
return
|
||||
self.repository.finalize_pause(job_id)
|
||||
|
||||
def loop_once(self) -> None:
|
||||
self.apply_pending_commands()
|
||||
active_job = self.repository.claim_next_runnable_job()
|
||||
if active_job is None:
|
||||
return
|
||||
self.repository.mark_job_running(int(active_job["id"]))
|
||||
```
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/ops/repository.py
|
||||
def request_job_pause(self, job_id: int) -> None:
|
||||
with connect_database(self.db_path) as conn:
|
||||
conn.execute("UPDATE job_runs SET status = 'pause_requested' WHERE id = ?", (job_id,))
|
||||
conn.execute(
|
||||
"UPDATE job_stages SET status = 'pause_requested' WHERE job_run_id = ? AND status = 'running'",
|
||||
(job_id,),
|
||||
)
|
||||
|
||||
def pause_job_for_recovery(self, job_id: int) -> None:
|
||||
with connect_database(self.db_path) as conn:
|
||||
conn.execute("UPDATE job_runs SET status = 'paused' WHERE id = ?", (job_id,))
|
||||
conn.execute(
|
||||
"UPDATE job_stages SET status = 'paused' WHERE job_run_id = ? AND status IN ('running', 'pause_requested')",
|
||||
(job_id,),
|
||||
)
|
||||
|
||||
def mark_item_interrupted(self, item_id: int, last_error: str | None = None) -> None:
|
||||
with connect_database(self.db_path) as conn:
|
||||
conn.execute(
|
||||
"UPDATE job_items SET status = 'interrupted', worker_id = NULL, ended_at = CURRENT_TIMESTAMP, last_error = ? WHERE id = ?",
|
||||
(last_error, item_id),
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the targeted test and verify it passes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_runner -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `OK`
|
||||
- orphaned running items become `interrupted`
|
||||
- soft pause waits until no item remains running before closing the stage
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/ops/runner.py musicdl/catalogsync/ops/models.py musicdl/catalogsync/ops/repository.py tests/catalogsync/test_ops_runner.py
|
||||
git commit -m "feat: add operations runner state machine and recovery"
|
||||
```
|
||||
|
||||
### Task 4: Add Stage Executors And Single-Item Execution Hooks
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/ops/executors.py`
|
||||
- Modify: `musicdl/catalogsync/services.py`
|
||||
- Modify: `musicdl/catalogsync/downloader.py`
|
||||
- Modify: `musicdl/catalogsync/uploader.py`
|
||||
- Modify: `musicdl/catalogsync/ops/repository.py`
|
||||
- Test: `tests/catalogsync/test_ops_executors.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing executor integration tests**
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
|
||||
class StageExecutorTests(unittest.TestCase):
|
||||
def test_download_executor_marks_item_succeeded(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.ops.executors import DownloadStageExecutor
|
||||
from musicdl.catalogsync.ops.repository import OperationsRepository
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
library_root = Path(tmpdir) / "library"
|
||||
initialize_database(db_path, default_library_root=library_root).close()
|
||||
repo = OperationsRepository(db_path)
|
||||
stage_id = repo.create_job_stage(repo.create_job_run("download_only", {"library_root": str(library_root)}), "download", 1)
|
||||
item_id = repo.create_job_item(stage_id, "song", "song:1", song_id=1, payload_json={"row": {"id": 1, "platform": "qq"}})
|
||||
|
||||
executor = DownloadStageExecutor(db_path=db_path, library_root=library_root, download_sources=["qq"])
|
||||
with patch("musicdl.catalogsync.downloader.CatalogDownloader.download_song_row", return_value=True):
|
||||
executor.process_item(item_id=item_id, worker_name="download-1")
|
||||
|
||||
item = repo.get_job_item(item_id)
|
||||
|
||||
self.assertEqual("succeeded", item["status"])
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the targeted test and verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_executors -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `FAIL` because the executor layer and single-item helpers do not exist yet
|
||||
|
||||
- [ ] **Step 3: Implement executor adapters and expose item-level hooks**
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/downloader.py
|
||||
class CatalogDownloader:
|
||||
def download_song_row(
|
||||
self,
|
||||
row: dict,
|
||||
library_root: str | Path,
|
||||
download_sources: list[str] | None = None,
|
||||
worker_callback=None,
|
||||
) -> bool:
|
||||
default_root = Path(library_root).resolve()
|
||||
if worker_callback:
|
||||
worker_callback(
|
||||
current_song_id=int(row["id"]),
|
||||
current_playlist_id=row.get("playlist_id"),
|
||||
current_display_text=f'{row.get("name", row["id"])} / {row.get("singers", "")}'.strip(" /"),
|
||||
)
|
||||
return self._download_one(row=row, default_root=default_root, download_sources=download_sources)
|
||||
```
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/uploader.py
|
||||
class CatalogUploader:
|
||||
def process_upload_task_row(self, task_row, backend_name: str) -> str:
|
||||
backend = self.get_backend(backend_name)
|
||||
return self._process_task(task_row, backend, uploader=None)
|
||||
```
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/services.py
|
||||
class CatalogSyncService:
|
||||
def sync_playlist_row(self, playlist_row) -> int:
|
||||
song_infos = self.resolve_playlist_song_infos(playlist_row)
|
||||
source_pool_ids = self.repository.get_pool_ids_for_playlist(int(playlist_row["id"]))
|
||||
linked_count = 0
|
||||
for source_pool_id in source_pool_ids:
|
||||
linked_count += self.store_playlist_songs(
|
||||
playlist_id=int(playlist_row["id"]),
|
||||
source_pool_id=source_pool_id,
|
||||
song_infos=song_infos,
|
||||
)
|
||||
return linked_count
|
||||
```
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/ops/executors.py
|
||||
class DownloadStageExecutor:
|
||||
def process_item(self, item_id: int, worker_name: str) -> None:
|
||||
item = self.ops_repo.claim_item(item_id=item_id, worker_name=worker_name)
|
||||
row = self.ops_repo.build_download_row(item_id)
|
||||
ok = self.downloader.download_song_row(
|
||||
row=row,
|
||||
library_root=self.library_root,
|
||||
download_sources=self.download_sources,
|
||||
worker_callback=lambda **state: self.ops_repo.update_worker_state(worker_name=worker_name, **state),
|
||||
)
|
||||
if ok:
|
||||
self.ops_repo.mark_item_succeeded(item_id)
|
||||
else:
|
||||
self.ops_repo.mark_item_failed(item_id, "download returned no file")
|
||||
|
||||
|
||||
class SyncStageExecutor:
|
||||
def process_item(self, item_id: int, worker_name: str) -> None:
|
||||
item = self.ops_repo.claim_item(item_id=item_id, worker_name=worker_name)
|
||||
playlist_row = self.ops_repo.get_playlist_row_for_item(item_id)
|
||||
linked_count = self.service.sync_playlist_row(playlist_row)
|
||||
self.ops_repo.mark_item_succeeded(item_id, result_payload={"linked_count": linked_count})
|
||||
|
||||
|
||||
class UploadStageExecutor:
|
||||
def process_item(self, item_id: int, worker_name: str) -> None:
|
||||
item = self.ops_repo.claim_item(item_id=item_id, worker_name=worker_name)
|
||||
upload_row = self.ops_repo.get_upload_row_for_item(item_id)
|
||||
result = self.uploader.process_upload_task_row(upload_row, backend_name=self.backend_name)
|
||||
if result == "succeeded":
|
||||
self.ops_repo.mark_item_succeeded(item_id)
|
||||
else:
|
||||
self.ops_repo.mark_item_failed(item_id, f"upload result: {result}")
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the targeted test and verify it passes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_executors -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `OK`
|
||||
- download, sync, and upload work items can be processed one at a time
|
||||
- the item records and worker state update correctly
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/downloader.py musicdl/catalogsync/uploader.py musicdl/catalogsync/services.py musicdl/catalogsync/ops/executors.py musicdl/catalogsync/ops/repository.py tests/catalogsync/test_ops_executors.py
|
||||
git commit -m "feat: add stage executors for operations console"
|
||||
```
|
||||
|
||||
### Task 5: Build The FastAPI UI And Management API
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/ops/web.py`
|
||||
- Create: `musicdl/catalogsync/templates/ops/base.html`
|
||||
- Create: `musicdl/catalogsync/templates/ops/dashboard.html`
|
||||
- Create: `musicdl/catalogsync/templates/ops/jobs.html`
|
||||
- Create: `musicdl/catalogsync/templates/ops/job_detail.html`
|
||||
- Create: `musicdl/catalogsync/templates/ops/playlists.html`
|
||||
- Create: `musicdl/catalogsync/templates/ops/songs.html`
|
||||
- Create: `musicdl/catalogsync/templates/ops/logs.html`
|
||||
- Create: `musicdl/catalogsync/templates/ops/config.html`
|
||||
- Create: `musicdl/catalogsync/static/ops/app.js`
|
||||
- Modify: `setup.py`
|
||||
- Modify: `MANIFEST.in`
|
||||
- Test: `tests/catalogsync/test_ops_api.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing API and page tests**
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
|
||||
class OperationsApiTests(unittest.TestCase):
|
||||
def test_dashboard_and_jobs_endpoints_render(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.ops.web import create_app
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
env_path = Path(tmpdir) / "catalogsync.env"
|
||||
env_path.write_text("LIBRARY_DIR=/volume4/Music_Cloud/library\n", encoding="utf-8")
|
||||
initialize_database(db_path).close()
|
||||
client = TestClient(create_app(db_path=db_path, env_path=env_path))
|
||||
|
||||
dashboard = client.get("/dashboard")
|
||||
jobs = client.get("/api/jobs")
|
||||
|
||||
self.assertEqual(200, dashboard.status_code)
|
||||
self.assertEqual(200, jobs.status_code)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the targeted test and verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_api -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `FAIL` because the FastAPI app, templates, and API endpoints do not exist yet
|
||||
|
||||
- [ ] **Step 3: Implement the FastAPI app, pages, APIs, and SSE**
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/ops/web.py
|
||||
from fastapi import FastAPI, Request
|
||||
from fastapi.responses import HTMLResponse, StreamingResponse
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from fastapi.templating import Jinja2Templates
|
||||
|
||||
|
||||
def create_app(db_path: str | Path, env_path: str | Path) -> FastAPI:
|
||||
app = FastAPI(title="Catalogsync Operations Console")
|
||||
repo = OperationsRepository(db_path)
|
||||
env_manager = CatalogsyncEnvManager(env_path=env_path, repository=repo)
|
||||
templates = Jinja2Templates(directory=str(Path(__file__).resolve().parents[1] / "templates"))
|
||||
app.mount("/static", StaticFiles(directory=str(Path(__file__).resolve().parents[1] / "static")), name="static")
|
||||
@app.get("/dashboard", response_class=HTMLResponse)
|
||||
def dashboard(request: Request):
|
||||
return templates.TemplateResponse(
|
||||
request,
|
||||
"ops/dashboard.html",
|
||||
{"title": "总览", "summary": repo.get_dashboard_summary()},
|
||||
)
|
||||
|
||||
@app.get("/api/jobs")
|
||||
def api_jobs():
|
||||
return {"items": repo.list_jobs()}
|
||||
|
||||
@app.get("/api/events/stream")
|
||||
def api_events():
|
||||
def event_stream():
|
||||
while True:
|
||||
yield f"data: {json.dumps(repo.get_live_snapshot(), ensure_ascii=False)}\n\n"
|
||||
return StreamingResponse(event_stream(), media_type="text/event-stream")
|
||||
```
|
||||
|
||||
```html
|
||||
<!-- musicdl/catalogsync/templates/ops/base.html -->
|
||||
<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>{{ title or "Catalogsync Console" }}</title>
|
||||
<script src="/static/ops/app.js" defer></script>
|
||||
</head>
|
||||
<body data-sse-url="/api/events/stream">
|
||||
<nav>
|
||||
<a href="/dashboard">总览</a>
|
||||
<a href="/jobs">任务中心</a>
|
||||
<a href="/playlists">歌单池</a>
|
||||
<a href="/songs">歌曲处理</a>
|
||||
<a href="/logs">日志异常</a>
|
||||
<a href="/config">配置管理</a>
|
||||
</nav>
|
||||
{% block content %}{% endblock %}
|
||||
</body>
|
||||
</html>
|
||||
```
|
||||
|
||||
```text
|
||||
# MANIFEST.in
|
||||
recursive-include musicdl/catalogsync/templates/ops *.html
|
||||
recursive-include musicdl/catalogsync/static/ops *.js
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the targeted test and verify it passes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_api -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `OK`
|
||||
- page routes render
|
||||
- queue-control and config endpoints return the expected status codes
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/ops/web.py musicdl/catalogsync/templates/ops/base.html musicdl/catalogsync/templates/ops/dashboard.html musicdl/catalogsync/templates/ops/jobs.html musicdl/catalogsync/templates/ops/job_detail.html musicdl/catalogsync/templates/ops/playlists.html musicdl/catalogsync/templates/ops/songs.html musicdl/catalogsync/templates/ops/logs.html musicdl/catalogsync/templates/ops/config.html musicdl/catalogsync/static/ops/app.js setup.py MANIFEST.in tests/catalogsync/test_ops_api.py
|
||||
git commit -m "feat: add operations console web app"
|
||||
```
|
||||
|
||||
### Task 6: Wire The CLI, Runtime Scripts, Docs, And Final Verification
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/cli.py`
|
||||
- Modify: `musicdl/catalogsync/runtime.py`
|
||||
- Modify: `requirements.txt`
|
||||
- Modify: `setup.py`
|
||||
- Modify: `scripts/catalogsync/templates/catalogsync.env.example`
|
||||
- Modify: `scripts/catalogsync/templates/install_runtime.sh`
|
||||
- Create: `scripts/catalogsync/templates/serve_console.sh`
|
||||
- Modify: `docs/catalogsync.md`
|
||||
- Modify: `README.md`
|
||||
- Modify: `tests/catalogsync/test_cli.py`
|
||||
- Modify: `tests/catalogsync/test_runtime.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing CLI and runtime tests**
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
from click.testing import CliRunner
|
||||
|
||||
|
||||
class CatalogConsoleCliTests(unittest.TestCase):
|
||||
def test_serve_command_builds_web_app(self):
|
||||
from musicdl.catalogsync.cli import cli
|
||||
|
||||
runner = CliRunner()
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
env_path = Path(tmpdir) / "catalogsync.env"
|
||||
env_path.write_text("LIBRARY_DIR=/volume4/Music_Cloud/library\n", encoding="utf-8")
|
||||
|
||||
with patch("musicdl.catalogsync.cli.uvicorn.run") as uvicorn_run:
|
||||
result = runner.invoke(
|
||||
cli,
|
||||
[
|
||||
"serve",
|
||||
"--db",
|
||||
str(db_path),
|
||||
"--env-file",
|
||||
str(env_path),
|
||||
"--host",
|
||||
"0.0.0.0",
|
||||
"--port",
|
||||
"8421",
|
||||
],
|
||||
)
|
||||
|
||||
self.assertEqual(0, result.exit_code, msg=result.output)
|
||||
uvicorn_run.assert_called_once()
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the targeted tests and verify they fail**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_cli tests.catalogsync.test_runtime -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `FAIL` because the `serve` command and web runtime fields do not exist yet
|
||||
|
||||
- [ ] **Step 3: Implement the serve command, runtime fields, dependencies, and NAS script**
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/runtime.py
|
||||
@dataclass
|
||||
class CatalogSyncRuntimeConfig:
|
||||
root_dir: Path
|
||||
app_home: Path
|
||||
library_dir: Path
|
||||
db_path: Path
|
||||
input_dir: Path
|
||||
log_dir: Path
|
||||
python_bin: str
|
||||
venv_dir: Path
|
||||
download_layout: str
|
||||
env_file: Path
|
||||
web_host: str
|
||||
web_port: int
|
||||
```
|
||||
|
||||
```python
|
||||
# musicdl/catalogsync/cli.py
|
||||
@cli.command("serve")
|
||||
@click.option("--db", "db_path", required=True, type=click.Path(dir_okay=False))
|
||||
@click.option("--env-file", required=True, type=click.Path(dir_okay=False, exists=True))
|
||||
@click.option("--host", default="0.0.0.0", show_default=True)
|
||||
@click.option("--port", default=8421, type=int, show_default=True)
|
||||
def serve_command(db_path: str, env_file: str, host: str, port: int):
|
||||
import uvicorn
|
||||
from .ops.web import create_app
|
||||
|
||||
app = create_app(db_path=db_path, env_path=env_file)
|
||||
uvicorn.run(app, host=host, port=port)
|
||||
```
|
||||
|
||||
```bash
|
||||
# scripts/catalogsync/templates/serve_console.sh
|
||||
"${VENV_DIR}/bin/python" -m musicdl.catalogsync.cli serve \
|
||||
--db "${DB_PATH}" \
|
||||
--env-file "${ENV_FILE:-${CONFIG_FILE}}" \
|
||||
--host "${WEB_HOST:-0.0.0.0}" \
|
||||
--port "${WEB_PORT:-8421}"
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Update docs and runtime templates**
|
||||
|
||||
```text
|
||||
# scripts/catalogsync/templates/catalogsync.env.example
|
||||
ENV_FILE=/volume4/Music_Cloud/catalogsync/config/catalogsync.env
|
||||
WEB_HOST=0.0.0.0
|
||||
WEB_PORT=8421
|
||||
```
|
||||
|
||||
Document in:
|
||||
|
||||
- `docs/catalogsync.md`
|
||||
- `README.md`
|
||||
|
||||
- [ ] **Step 5: Run the full verification suite**
|
||||
|
||||
Run: `python -m unittest discover -s tests/catalogsync -v`
|
||||
|
||||
Expected:
|
||||
|
||||
- `OK`
|
||||
- all previous catalogsync tests still pass
|
||||
- the new operations-console tests pass
|
||||
|
||||
Run: `python -m musicdl.catalogsync.cli serve --help`
|
||||
|
||||
Expected:
|
||||
|
||||
- help output lists `--db`, `--env-file`, `--host`, and `--port`
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/cli.py musicdl/catalogsync/runtime.py requirements.txt setup.py scripts/catalogsync/templates/catalogsync.env.example scripts/catalogsync/templates/install_runtime.sh scripts/catalogsync/templates/serve_console.sh docs/catalogsync.md README.md tests/catalogsync/test_cli.py tests/catalogsync/test_runtime.py
|
||||
git commit -m "feat: ship operations console runtime and docs"
|
||||
```
|
||||
@@ -0,0 +1,427 @@
|
||||
# Object Storage Upload Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add an S3-compatible object storage upload pipeline to `musicdl.catalogsync`, persist remote locations and backend presence, and expose it through dedicated CLI commands while keeping the existing local download workflow intact.
|
||||
|
||||
**Architecture:** Extend the SQLite schema and repository so upload state is queue-driven and derived from the existing `file_assets` plus `file_locations` model. Add a focused `uploader.py` module that plans missing uploads, resolves credentials from environment variables, uploads to an object backend with limited concurrency, and records remote locations plus summary presence rows.
|
||||
|
||||
**Tech Stack:** Python stdlib (`json`, `os`, `pathlib`, `threading`, `concurrent.futures`), `sqlite3`, `click`, `boto3`, existing `musicdl.catalogsync` modules, `unittest`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Extend schema and repository for object storage uploads
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/db.py`
|
||||
- Modify: `musicdl/catalogsync/repository.py`
|
||||
- Modify: `tests/catalogsync/test_db.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing schema and repository tests**
|
||||
|
||||
```python
|
||||
class DatabaseSchemaTests(unittest.TestCase):
|
||||
def test_initialize_database_creates_upload_tables(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
conn = initialize_database(db_path)
|
||||
conn.close()
|
||||
|
||||
with closing(sqlite3.connect(db_path)) as verify_conn:
|
||||
tables = {
|
||||
row[0]
|
||||
for row in verify_conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type = 'table'"
|
||||
).fetchall()
|
||||
}
|
||||
|
||||
self.assertIn("song_backend_presence", tables)
|
||||
self.assertIn("upload_tasks", tables)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused schema tests to verify they fail**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_db -v`
|
||||
Expected: FAIL because `song_backend_presence` and `upload_tasks` do not exist yet.
|
||||
|
||||
- [ ] **Step 3: Implement schema, backend upsert, presence refresh, and upload queue helpers**
|
||||
|
||||
```python
|
||||
REQUIRED_TABLES = {
|
||||
"playlist_pools",
|
||||
"playlists",
|
||||
"pool_playlists",
|
||||
"artist_pools",
|
||||
"artists",
|
||||
"pool_artists",
|
||||
"songs",
|
||||
"playlist_songs",
|
||||
"artist_songs",
|
||||
"storage_backends",
|
||||
"file_assets",
|
||||
"file_locations",
|
||||
"download_tasks",
|
||||
"song_backend_presence",
|
||||
"upload_tasks",
|
||||
}
|
||||
|
||||
def upsert_object_storage_backend(
|
||||
self,
|
||||
name: str,
|
||||
container_name: str,
|
||||
endpoint: str,
|
||||
region: str | None,
|
||||
base_prefix: str | None,
|
||||
credential_env_prefix: str,
|
||||
addressing_style: str | None = None,
|
||||
public_base_url: str | None = None,
|
||||
) -> int:
|
||||
config = {
|
||||
"endpoint": endpoint,
|
||||
"region": region,
|
||||
"base_prefix": base_prefix,
|
||||
"addressing_style": addressing_style,
|
||||
"public_base_url": public_base_url,
|
||||
"credential_env_prefix": credential_env_prefix,
|
||||
}
|
||||
return self._upsert_backend_row(name=name, backend_type="object_storage", container_name=container_name, config=config)
|
||||
|
||||
def get_backend_by_name(self, name: str) -> sqlite3.Row | None:
|
||||
return self._fetchone("SELECT * FROM storage_backends WHERE name = ?", (name,))
|
||||
|
||||
def record_remote_file(
|
||||
self,
|
||||
file_asset_id: int,
|
||||
backend_id: int,
|
||||
container_name: str,
|
||||
locator: str,
|
||||
public_url: str | None,
|
||||
download_url: str | None,
|
||||
) -> int:
|
||||
return self._execute(
|
||||
"""
|
||||
INSERT INTO file_locations (
|
||||
file_asset_id, backend_id, container_name, locator, absolute_path,
|
||||
public_url, download_url, status, is_primary
|
||||
) VALUES (?, ?, ?, ?, NULL, ?, ?, 'active', 0)
|
||||
ON CONFLICT(file_asset_id, backend_id, locator) DO UPDATE SET
|
||||
public_url = excluded.public_url,
|
||||
download_url = excluded.download_url,
|
||||
status = excluded.status,
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
""",
|
||||
(file_asset_id, backend_id, container_name, locator, public_url, download_url),
|
||||
)
|
||||
|
||||
def refresh_song_backend_presence(self, song_id: int, backend_id: int) -> None:
|
||||
self._execute_presence_refresh(song_id=song_id, backend_id=backend_id)
|
||||
def enqueue_upload_task(
|
||||
self,
|
||||
file_asset_id: int,
|
||||
source_location_id: int,
|
||||
target_backend_id: int,
|
||||
target_container_name: str,
|
||||
target_locator: str,
|
||||
) -> int:
|
||||
return self._execute(
|
||||
"""
|
||||
INSERT INTO upload_tasks (
|
||||
file_asset_id, source_location_id, target_backend_id, target_container_name, target_locator
|
||||
) VALUES (?, ?, ?, ?, ?)
|
||||
ON CONFLICT(file_asset_id, target_backend_id, target_locator) DO NOTHING
|
||||
""",
|
||||
(file_asset_id, source_location_id, target_backend_id, target_container_name, target_locator),
|
||||
)
|
||||
|
||||
def list_pending_upload_tasks(self, target_backend_id: int, limit: int | None = None) -> list[sqlite3.Row]:
|
||||
return self._fetchall(
|
||||
"""
|
||||
SELECT ut.*, fl.absolute_path, fa.song_id
|
||||
FROM upload_tasks ut
|
||||
JOIN file_locations fl ON fl.id = ut.source_location_id
|
||||
JOIN file_assets fa ON fa.id = ut.file_asset_id
|
||||
WHERE ut.target_backend_id = ? AND ut.status IN ('pending', 'failed')
|
||||
ORDER BY ut.id ASC
|
||||
LIMIT COALESCE(?, -1)
|
||||
""",
|
||||
(target_backend_id, limit),
|
||||
)
|
||||
|
||||
def mark_upload_task_status(self, task_id: int, status: str, last_error: str | None = None) -> None:
|
||||
self._execute(
|
||||
"""
|
||||
UPDATE upload_tasks
|
||||
SET status = ?, last_error = ?, updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = ?
|
||||
""",
|
||||
(status, last_error, task_id),
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused schema tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_db -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/db.py musicdl/catalogsync/repository.py tests/catalogsync/test_db.py
|
||||
git commit -m "feat: add upload queue schema and repository helpers"
|
||||
```
|
||||
|
||||
### Task 2: Add the object storage uploader module with limited concurrency
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/uploader.py`
|
||||
- Modify: `tests/catalogsync/test_services.py`
|
||||
|
||||
- [ ] **Step 1: Write failing uploader tests**
|
||||
|
||||
```python
|
||||
class ObjectStorageUploadTests(unittest.TestCase):
|
||||
def test_upload_runner_records_remote_location_and_presence(self):
|
||||
uploader = CatalogUploader(repository=repo, worker_count=2)
|
||||
backend_id = repo.upsert_object_storage_backend(
|
||||
name="main-s3",
|
||||
container_name="music-bucket",
|
||||
endpoint="https://s3.example.com",
|
||||
region="auto",
|
||||
base_prefix="music",
|
||||
credential_env_prefix="CATALOGSYNC_MAIN_S3",
|
||||
)
|
||||
|
||||
task_count = uploader.enqueue_missing_uploads(backend_name="main-s3")
|
||||
self.assertEqual(1, task_count)
|
||||
|
||||
with patch("musicdl.catalogsync.uploader.build_s3_client", return_value=fake_client):
|
||||
summary = uploader.run(backend_name="main-s3")
|
||||
|
||||
self.assertEqual(1, summary["succeeded"])
|
||||
self.assertTrue(repo.song_has_active_backend_file(song_id, backend_id))
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused uploader tests to verify they fail**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_services.ObjectStorageUploadTests -v`
|
||||
Expected: FAIL because `musicdl.catalogsync.uploader` and upload repository helpers do not exist yet.
|
||||
|
||||
- [ ] **Step 3: Implement the upload planner and S3-compatible runner**
|
||||
|
||||
```python
|
||||
class CatalogUploader:
|
||||
def enqueue_missing_uploads(self, backend_name: str, song_ids: list[int] | None = None) -> int:
|
||||
backend = self.repository.get_backend_by_name(backend_name)
|
||||
candidates = self.repository.list_missing_object_upload_candidates(int(backend["id"]), song_ids=song_ids)
|
||||
return sum(
|
||||
1
|
||||
for row in candidates
|
||||
if self.repository.enqueue_upload_task(
|
||||
file_asset_id=int(row["file_asset_id"]),
|
||||
source_location_id=int(row["source_location_id"]),
|
||||
target_backend_id=int(backend["id"]),
|
||||
target_container_name=backend["container_name"],
|
||||
target_locator=row["target_locator"],
|
||||
)
|
||||
)
|
||||
|
||||
def run(self, backend_name: str, limit: int | None = None) -> dict[str, int]:
|
||||
tasks = self.repository.list_pending_upload_tasks(target_backend_id=backend_id, limit=limit)
|
||||
return self._run_tasks(tasks, backend)
|
||||
|
||||
class S3CompatibleUploader:
|
||||
def upload_file(self, local_path: Path, container_name: str, locator: str) -> dict[str, str | None]:
|
||||
self.client.upload_file(str(local_path), container_name, locator, ExtraArgs=extra_args or None)
|
||||
return {"public_url": public_url, "download_url": None}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused uploader tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_services.ObjectStorageUploadTests -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/uploader.py tests/catalogsync/test_services.py
|
||||
git commit -m "feat: add s3 compatible uploader"
|
||||
```
|
||||
|
||||
### Task 3: Expose backend registration and upload execution through the CLI
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/cli.py`
|
||||
- Modify: `tests/catalogsync/test_cli.py`
|
||||
- Modify: `setup.py`
|
||||
- Modify: `requirements.txt`
|
||||
|
||||
- [ ] **Step 1: Write failing CLI tests for backend registration and upload**
|
||||
|
||||
```python
|
||||
def test_register_object_backend_command_wires_application_method(self):
|
||||
from musicdl.catalogsync.cli import cli
|
||||
|
||||
with patch("musicdl.catalogsync.cli.CatalogSyncApplication") as app_cls:
|
||||
result = CliRunner().invoke(
|
||||
cli,
|
||||
[
|
||||
"register-object-backend",
|
||||
"--db", str(db_path),
|
||||
"--name", "main-s3",
|
||||
"--bucket", "music-bucket",
|
||||
"--endpoint", "https://s3.example.com",
|
||||
"--region", "auto",
|
||||
"--base-prefix", "music",
|
||||
"--credential-env-prefix", "CATALOGSYNC_MAIN_S3",
|
||||
],
|
||||
)
|
||||
|
||||
self.assertEqual(0, result.exit_code, msg=result.output)
|
||||
app_cls.return_value.register_object_backend.assert_called_once()
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused CLI tests to verify they fail**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_cli -v`
|
||||
Expected: FAIL because the new commands and application methods do not exist yet.
|
||||
|
||||
- [ ] **Step 3: Add CLI methods and commands**
|
||||
|
||||
```python
|
||||
class CatalogSyncApplication:
|
||||
def register_object_backend(self, **kwargs):
|
||||
return self.repository.upsert_object_storage_backend(**kwargs)
|
||||
|
||||
def upload_files(self, backend_name: str, workers: int = 4, limit: int | None = None, enqueue_only: bool = False):
|
||||
uploader = CatalogUploader(self.repository, worker_count=workers)
|
||||
queued = uploader.enqueue_missing_uploads(backend_name=backend_name)
|
||||
return {"queued": queued} if enqueue_only else uploader.run(backend_name=backend_name, limit=limit)
|
||||
|
||||
@cli.command("register-object-backend")
|
||||
@click.option("--name", required=True)
|
||||
@click.option("--bucket", "container_name", required=True)
|
||||
@click.option("--endpoint", required=True)
|
||||
@click.option("--workers", type=int, default=4, show_default=True)
|
||||
@click.option("--enqueue-only/--run", default=False, show_default=True)
|
||||
def upload_command(db_path: str, library_root: str | None, backend_name: str, workers: int, limit: int | None, enqueue_only: bool):
|
||||
app = CatalogSyncApplication(db_path=db_path, library_root=library_root)
|
||||
result = app.upload_files(
|
||||
backend_name=backend_name,
|
||||
workers=workers,
|
||||
limit=limit,
|
||||
enqueue_only=enqueue_only,
|
||||
)
|
||||
click.echo(result)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused CLI tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_cli -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/cli.py tests/catalogsync/test_cli.py setup.py requirements.txt
|
||||
git commit -m "feat: add upload cli commands"
|
||||
```
|
||||
|
||||
### Task 4: Add bounded concurrency to download and upload execution
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/downloader.py`
|
||||
- Modify: `musicdl/catalogsync/uploader.py`
|
||||
- Modify: `tests/catalogsync/test_services.py`
|
||||
|
||||
- [ ] **Step 1: Write failing concurrency and disk-switch tests**
|
||||
|
||||
```python
|
||||
def test_catalog_downloader_reuses_new_root_after_space_prompt(self):
|
||||
downloader = CatalogDownloader(repository=repo, worker_count=2)
|
||||
with patch("builtins.input", side_effect=[str(second_root)]):
|
||||
count = downloader.download_pending(library_root=first_root, limit=2)
|
||||
self.assertEqual(2, count)
|
||||
|
||||
def test_catalog_uploader_uses_bounded_workers(self):
|
||||
uploader = CatalogUploader(repository=repo, worker_count=3)
|
||||
summary = uploader.run(backend_name="main-s3")
|
||||
self.assertEqual(3, summary["workers"])
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused concurrency tests to verify they fail**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_services -v`
|
||||
Expected: FAIL because downloader and uploader are still single-threaded or do not report bounded worker behavior yet.
|
||||
|
||||
- [ ] **Step 3: Implement limited worker pools and one-time root switching**
|
||||
|
||||
```python
|
||||
class CatalogDownloader:
|
||||
def __init__(self, repository: CatalogRepository, work_dir: str = "musicdl_outputs/catalogsync", worker_count: int = 3):
|
||||
self.worker_count = max(1, worker_count)
|
||||
self._current_library_root: Path | None = None
|
||||
|
||||
def ensure_space(self, root_path: str | Path, required_bytes: int | None) -> Path:
|
||||
if self._current_library_root is None:
|
||||
self._current_library_root = Path(root_path).resolve()
|
||||
root = self._current_library_root
|
||||
while required_bytes and shutil.disk_usage(root).free < required_bytes:
|
||||
root = Path(input("磁盘空间不足,请输入新的下载目录继续: ").strip()).resolve()
|
||||
root.mkdir(parents=True, exist_ok=True)
|
||||
self._current_library_root = root
|
||||
return root
|
||||
|
||||
def download_pending(self, library_root: str | Path, sources: list[str] | None = None, limit: int | None = None, playlist_ids: list[int] | None = None) -> int:
|
||||
with ThreadPoolExecutor(max_workers=self.worker_count) as executor:
|
||||
futures = [executor.submit(self._download_one, row, default_root) for row in queue]
|
||||
return sum(1 for future in as_completed(futures) if future.result())
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Re-run the focused concurrency tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_services -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/downloader.py musicdl/catalogsync/uploader.py tests/catalogsync/test_services.py
|
||||
git commit -m "feat: add bounded download and upload concurrency"
|
||||
```
|
||||
|
||||
### Task 5: Document the operator workflow and run final verification
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/catalogsync.md`
|
||||
- Modify: `README.md`
|
||||
|
||||
- [ ] **Step 1: Update operator docs for object storage upload**
|
||||
|
||||
```markdown
|
||||
## Object Storage Upload
|
||||
|
||||
1. Register one backend with `musicdl-catalogsync register-object-backend`
|
||||
2. Export `${PREFIX}_ACCESS_KEY_ID` and `${PREFIX}_SECRET_ACCESS_KEY`
|
||||
3. Run `musicdl-catalogsync upload --backend main-s3`
|
||||
4. Inspect `file_locations`, `song_backend_presence`, and `upload_tasks`
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Verify help output and test suite**
|
||||
|
||||
Run: `python -m unittest discover -s tests/catalogsync -v`
|
||||
Expected: PASS
|
||||
|
||||
Run: `python -m musicdl.catalogsync.cli run --help`
|
||||
Expected: PASS and include existing run options
|
||||
|
||||
Run: `python -m musicdl.catalogsync.cli upload --help`
|
||||
Expected: PASS and include object storage upload options
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/catalogsync.md README.md
|
||||
git commit -m "docs: add object storage upload workflow"
|
||||
```
|
||||
@@ -0,0 +1,276 @@
|
||||
# Task Tree Dashboard Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Replace the dashboard Task Center detail tables with a stable task -> playlist -> song tree that updates node state in place.
|
||||
|
||||
**Architecture:** Keep the existing FastAPI endpoints and lazy playlist-song endpoint, but change the repository task query to keep finished tasks visible and change the dashboard frontend from table redraws to keyed tree-node patching. The top dashboard cards remain unchanged in this iteration.
|
||||
|
||||
**Tech Stack:** Python, FastAPI, Jinja2 templates, vanilla JavaScript, unittest
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Keep finished tasks in the Task Center query
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/ops/repository.py`
|
||||
- Modify: `tests/catalogsync/test_ops_repository.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing repository test**
|
||||
|
||||
```python
|
||||
def test_list_task_center_rows_includes_completed_jobs(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.ops.models import JobStatus
|
||||
from musicdl.catalogsync.ops.repository import OpsRepository
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
initialize_database(db_path).close()
|
||||
repo = OpsRepository(db_path)
|
||||
|
||||
completed_job_id = repo.create_job(
|
||||
job_type="download_only",
|
||||
config_snapshot={},
|
||||
status=JobStatus.COMPLETED,
|
||||
playlist_scope={"playlist_ids": [42]},
|
||||
)
|
||||
|
||||
rows = repo.list_task_center_rows(limit=20)
|
||||
|
||||
rows_by_id = {int(row["id"]): row for row in rows}
|
||||
self.assertIn(completed_job_id, rows_by_id)
|
||||
self.assertEqual("completed", rows_by_id[completed_job_id]["status"])
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_repository.OpsRepositoryTaskCenterTests.test_list_task_center_rows_includes_completed_jobs -v`
|
||||
|
||||
Expected: FAIL because completed jobs are filtered out of `list_task_center_rows()`.
|
||||
|
||||
- [ ] **Step 3: Write the minimal implementation**
|
||||
|
||||
```python
|
||||
rows = self._fetchall(
|
||||
"""
|
||||
SELECT *
|
||||
FROM job_runs
|
||||
WHERE status IN (?, ?, ?, ?, ?, ?, ?, ?)
|
||||
...
|
||||
""",
|
||||
(
|
||||
JobStatus.RUNNING.value,
|
||||
JobStatus.PAUSE_REQUESTED.value,
|
||||
JobStatus.QUEUED.value,
|
||||
JobStatus.PAUSED.value,
|
||||
JobStatus.COMPLETED.value,
|
||||
JobStatus.COMPLETED_WITH_ERRORS.value,
|
||||
JobStatus.FAILED.value,
|
||||
JobStatus.CANCELED.value,
|
||||
...
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_repository.OpsRepositoryTaskCenterTests.test_list_task_center_rows_includes_completed_jobs -v`
|
||||
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/ops/repository.py tests/catalogsync/test_ops_repository.py
|
||||
git commit -m "test: keep completed tasks in task center"
|
||||
```
|
||||
|
||||
### Task 2: Lock dashboard HTML to the tree shell
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/templates/ops/dashboard.html`
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing dashboard HTML test**
|
||||
|
||||
```python
|
||||
def test_dashboard_page_renders_task_tree_shell_without_detail_tables(self):
|
||||
from musicdl.catalogsync.ops.models import JobStatus
|
||||
from musicdl.catalogsync.ops.repository import OpsRepository
|
||||
|
||||
client, db_path, _ = self._build_client()
|
||||
repo = OpsRepository(db_path)
|
||||
job_id = repo.create_job(
|
||||
job_type="download_only",
|
||||
config_snapshot={},
|
||||
status=JobStatus.RUNNING,
|
||||
)
|
||||
|
||||
response = client.get("/dashboard")
|
||||
html = response.text
|
||||
|
||||
self.assertIn('data-task-tree-root', html)
|
||||
self.assertIn(f'data-task-node="{job_id}"', html)
|
||||
self.assertNotIn("<h3>Summary</h3>", html)
|
||||
self.assertNotIn("<h3>Stages</h3>", html)
|
||||
self.assertNotIn("<h3>Workers</h3>", html)
|
||||
self.assertNotIn("<h3>Running Items</h3>", html)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_api.OperationsApiTests.test_dashboard_page_renders_task_tree_shell_without_detail_tables -v`
|
||||
|
||||
Expected: FAIL because the template still renders the table/detail shell.
|
||||
|
||||
- [ ] **Step 3: Write the minimal template implementation**
|
||||
|
||||
```html
|
||||
<div class="card">
|
||||
<h2>Task Center</h2>
|
||||
<div class="task-tree" data-task-tree-root>
|
||||
{% for row in task_rows %}
|
||||
<section class="task-node" data-task-node="{{ row.id }}">
|
||||
<div class="task-node__header">
|
||||
<button type="button" data-task-toggle="{{ row.id }}">+</button>
|
||||
<div class="task-node__meta">
|
||||
<strong>{{ row.display_name }}</strong>
|
||||
<div class="muted">{{ row.job_type }}</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="task-node__children" data-task-children="{{ row.id }}" hidden></div>
|
||||
</section>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_api.OperationsApiTests.test_dashboard_page_renders_task_tree_shell_without_detail_tables -v`
|
||||
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/templates/ops/dashboard.html tests/catalogsync/test_ops_api.py
|
||||
git commit -m "feat: replace task center table with tree shell"
|
||||
```
|
||||
|
||||
### Task 3: Replace Task Center redraw with keyed tree patching
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/static/ops/app.js`
|
||||
- Modify: `musicdl/catalogsync/templates/ops/base.html`
|
||||
|
||||
- [ ] **Step 1: Add the tree patch helpers**
|
||||
|
||||
```javascript
|
||||
function upsertTaskTree(rows) {
|
||||
var root = document.querySelector("[data-task-tree-root]");
|
||||
if (!root) {
|
||||
return;
|
||||
}
|
||||
var seen = {};
|
||||
rows.forEach(function (row) {
|
||||
var id = String(row.id);
|
||||
seen[id] = true;
|
||||
var node = ensureTaskNode(root, row);
|
||||
patchTaskNode(node, row);
|
||||
});
|
||||
pruneMissingTaskNodes(root, seen);
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Switch `updateDashboard()` away from `setTaskRows()`**
|
||||
|
||||
```javascript
|
||||
if (Object.prototype.hasOwnProperty.call(payload, "task_rows")) {
|
||||
dashboardState.taskRows = payload.task_rows || [];
|
||||
pruneTaskState(dashboardState.taskRows);
|
||||
upsertTaskTree(dashboardState.taskRows);
|
||||
restoreExpandedTaskRows();
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Rebuild expanded task rendering as playlist nodes only**
|
||||
|
||||
```javascript
|
||||
function applyTaskDetail(jobId, payload) {
|
||||
dashboardState.detailCache[String(jobId)] = payload;
|
||||
patchPlaylistTree(String(jobId), payload.playlist_progress || []);
|
||||
restoreExpandedPlaylistRows(String(jobId));
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Rebuild playlist song rendering as song child nodes**
|
||||
|
||||
```javascript
|
||||
function applyPlaylistSongs(jobId, playlistId, songs) {
|
||||
var key = playlistKey(jobId, playlistId);
|
||||
var body = document.querySelector('[data-playlist-song-list="' + key + '"]');
|
||||
patchSongTree(body, songs || []);
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Version the static asset to force the browser to pick up the new script**
|
||||
|
||||
```html
|
||||
<script src="/static/ops/app.js?v=20260417_task_tree_v3" defer></script>
|
||||
```
|
||||
|
||||
- [ ] **Step 6: Run targeted API tests**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_api tests.catalogsync.test_ops_repository -v`
|
||||
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 7: Commit**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/static/ops/app.js musicdl/catalogsync/templates/ops/base.html
|
||||
git commit -m "feat: patch dashboard task tree in place"
|
||||
```
|
||||
|
||||
### Task 4: Final verification and docs sync
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/catalogsync.md`
|
||||
|
||||
- [ ] **Step 1: Document the new dashboard behavior**
|
||||
|
||||
```markdown
|
||||
## Task Center
|
||||
|
||||
The dashboard Task Center now renders a tree:
|
||||
|
||||
- task
|
||||
- playlist
|
||||
- song
|
||||
|
||||
Task state updates patch the existing node in place. Expanding a task no longer renders Summary, Stages, Workers, or Running Items tables.
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run regression verification**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_api tests.catalogsync.test_ops_repository -v`
|
||||
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 3: Manual browser verification**
|
||||
|
||||
Run the dashboard, expand one task and one playlist, wait through multiple refresh cycles, and verify:
|
||||
|
||||
- no large detail tables appear
|
||||
- paused/completed tasks stay visible
|
||||
- expanded nodes remain expanded
|
||||
- the task tree does not visibly flash as a full block
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/catalogsync.md
|
||||
git commit -m "docs: describe task tree dashboard"
|
||||
```
|
||||
@@ -0,0 +1,61 @@
|
||||
# Playlist Export On Download Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** 把歌单导出目录的生成时机从 `sync` 挪到“所选歌单下载完成后”,并增加“输出所选歌单”按钮做状态分流导出。
|
||||
|
||||
**Architecture:** 后端以 `CatalogRepository` 的歌单状态为准,把所选歌单分成“直接导出 / download_only / sync_download”三组。`sync` 阶段不再写 `playlists/`,而是在带歌单作用域的下载阶段结束后统一刷新对应歌单的导出目录。
|
||||
|
||||
**Tech Stack:** Python, FastAPI, SQLite, Jinja2, vanilla JavaScript, pytest
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Lock behavior with tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_services.py`
|
||||
- Modify: `tests/catalogsync/test_ops_runner.py`
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
|
||||
- [ ] 写失败测试,证明 `sync_playlist_row()` 不再直接写 `playlists/`
|
||||
- [ ] 写失败测试,证明作用域下载任务完成后会刷新歌单导出目录
|
||||
- [ ] 写失败测试,证明 `POST /api/playlists/export` 会把歌单按状态分流
|
||||
|
||||
### Task 2: Implement backend export routing
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/repository.py`
|
||||
- Modify: `musicdl/catalogsync/services.py`
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
|
||||
- [ ] 增加“按歌单 id 查询导出状态”的仓库方法
|
||||
- [ ] 删除 `sync_playlist_row()` 中的自动导出调用
|
||||
- [ ] 新增 `POST /api/playlists/export`
|
||||
- [ ] 复用现有建任务逻辑返回 `download_only` / `sync_download` 任务信息
|
||||
|
||||
### Task 3: Export after scoped download completes
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/ops/runner.py`
|
||||
|
||||
- [ ] 在下载阶段结束后,对 `playlist_scope.playlist_ids` 执行歌单目录刷新
|
||||
- [ ] 仅对 scoped download job 生效
|
||||
- [ ] 出错只记录事件,不破坏下载阶段主状态
|
||||
|
||||
### Task 4: Update playlist UI
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/templates/ops/playlists.html`
|
||||
- Modify: `musicdl/catalogsync/static/ops/app.js`
|
||||
|
||||
- [ ] 新增 `Export Selected Playlists` 按钮
|
||||
- [ ] 前端处理返回结果,分别提示直接导出数量和新建任务
|
||||
- [ ] 保持单歌单 `Export Folder` 弹窗按钮可用
|
||||
|
||||
### Task 5: Verify
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/catalogsync.md`
|
||||
|
||||
- [ ] 运行相关 pytest 用例
|
||||
- [ ] 更新项目文档,写明“sync 不导出、download 导出、export selected 分流”
|
||||
@@ -0,0 +1,168 @@
|
||||
# Task Center Bandwidth Summary Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Show real-time aggregate download speed next to the Task Center heading, while rendering upload speed as a placeholder.
|
||||
|
||||
**Architecture:** Reuse existing per-worker `speed_bytes_per_sec` values already stored in `job_workers`, aggregate them in the dashboard payload as `transfer_stats`, and render/update a single Task Center header node from server-rendered HTML plus dashboard refreshes. Upload remains explicitly unimplemented and is shown as a placeholder string.
|
||||
|
||||
**Tech Stack:** FastAPI, Jinja2, vanilla JavaScript, `unittest`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Lock the API and page contract with tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing API test**
|
||||
|
||||
Add coverage that seeds a running download worker with `speed_bytes_per_sec=2 * 1024 * 1024`, calls `/api/dashboard`, and asserts:
|
||||
|
||||
```python
|
||||
self.assertEqual("2.0 MB/s", payload["transfer_stats"]["download_speed_text"])
|
||||
self.assertEqual("-", payload["transfer_stats"]["upload_speed_text"])
|
||||
```
|
||||
|
||||
Also render `/dashboard` and assert the Task Center area includes:
|
||||
|
||||
```python
|
||||
self.assertIn("Task Center", html)
|
||||
self.assertIn("Down 2.0 MB/s", html)
|
||||
self.assertIn("Up -", html)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_api.OperationsApiTests.test_dashboard_transfer_stats_exposes_download_speed_and_upload_placeholder -v`
|
||||
|
||||
Expected: FAIL because `transfer_stats` and the new header text do not exist yet.
|
||||
|
||||
### Task 2: Lock the browser refresh behavior with a frontend test
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_ops_frontend.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing frontend test**
|
||||
|
||||
Expose `updateDashboard`, create a fake `[data-task-center-transfer]` node, call:
|
||||
|
||||
```javascript
|
||||
api.updateDashboard({
|
||||
transfer_stats: {
|
||||
download_speed_text: "2.0 MB/s",
|
||||
upload_speed_text: "-"
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
and assert the node text becomes:
|
||||
|
||||
```javascript
|
||||
"Down 2.0 MB/s | Up -"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_frontend.OperationsFrontendTests.test_update_dashboard_refreshes_task_center_transfer_summary -v`
|
||||
|
||||
Expected: FAIL because no Task Center transfer node is updated yet.
|
||||
|
||||
### Task 3: Implement backend aggregation and initial render
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
- Modify: `musicdl/catalogsync/templates/ops/dashboard.html`
|
||||
|
||||
- [ ] **Step 1: Add backend transfer aggregation**
|
||||
|
||||
In `musicdl/catalogsync/ops/web.py`, extend worker serialization to include numeric speed fields and add a helper that returns:
|
||||
|
||||
```python
|
||||
{
|
||||
"download_speed_bytes_per_sec": ...,
|
||||
"download_speed_text": ...,
|
||||
"upload_speed_bytes_per_sec": 0,
|
||||
"upload_speed_text": "-",
|
||||
}
|
||||
```
|
||||
|
||||
using the sum of active download worker `speed_bytes_per_sec`.
|
||||
|
||||
- [ ] **Step 2: Add the server-rendered Task Center summary node**
|
||||
|
||||
In `musicdl/catalogsync/templates/ops/dashboard.html`, change the Task Center heading area to include:
|
||||
|
||||
```html
|
||||
<span class="muted" data-task-center-transfer>Down {{ transfer_stats.download_speed_text }} | Up {{ transfer_stats.upload_speed_text }}</span>
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run the API test to verify it passes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_api.OperationsApiTests.test_dashboard_transfer_stats_exposes_download_speed_and_upload_placeholder -v`
|
||||
|
||||
Expected: PASS
|
||||
|
||||
### Task 4: Implement live refresh wiring
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/static/ops/app.js`
|
||||
|
||||
- [ ] **Step 1: Update the dashboard refresh path**
|
||||
|
||||
In `updateDashboard(payload)`, look up `[data-task-center-transfer]` and set:
|
||||
|
||||
```javascript
|
||||
"Down " + downloadSpeedText + " | Up " + uploadSpeedText
|
||||
```
|
||||
|
||||
with defaults of `"0 B/s"` for missing download speed and `"-"` for upload.
|
||||
|
||||
- [ ] **Step 2: Run the frontend test to verify it passes**
|
||||
|
||||
Run: `python -m unittest tests.catalogsync.test_ops_frontend.OperationsFrontendTests.test_update_dashboard_refreshes_task_center_transfer_summary -v`
|
||||
|
||||
Expected: PASS
|
||||
|
||||
### Task 5: Regression verification
|
||||
|
||||
**Files:**
|
||||
- Modify: none
|
||||
|
||||
- [ ] **Step 1: Run focused regression**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m unittest tests.catalogsync.test_ops_api tests.catalogsync.test_ops_frontend -v
|
||||
node -e "new Function(require('fs').readFileSync('musicdl/catalogsync/static/ops/app.js','utf8')); console.log('app.js syntax ok')"
|
||||
```
|
||||
|
||||
Expected: all selected tests PASS and `app.js syntax ok` prints.
|
||||
|
||||
- [ ] **Step 2: Sync to NAS and restart**
|
||||
|
||||
Sync these files to `/volume4/Music_Cloud/catalogsync/app/...`:
|
||||
|
||||
```text
|
||||
musicdl/catalogsync/ops/web.py
|
||||
musicdl/catalogsync/templates/ops/dashboard.html
|
||||
musicdl/catalogsync/static/ops/app.js
|
||||
```
|
||||
|
||||
Then restart:
|
||||
|
||||
```bash
|
||||
nohup bash /volume4/Music_Cloud/catalogsync/bin/serve_console.sh >/dev/null 2>&1 &
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Verify deployed page**
|
||||
|
||||
Check:
|
||||
|
||||
```bash
|
||||
http://127.0.0.1:18080/dashboard
|
||||
http://127.0.0.1:18080/api/dashboard?include_task_rows=false
|
||||
```
|
||||
|
||||
Expected: the Task Center heading shows `Down ... | Up -`.
|
||||
@@ -0,0 +1,40 @@
|
||||
# Download Runner And Dashboard Fixes Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Fix catalog-sync so download jobs start promptly on NAS and dashboard download speed only reflects truly active download workers.
|
||||
|
||||
**Architecture:** Tighten dashboard worker selection to current running items only, and remove the pre-worker playlist export refresh that can block a download stage before any download worker starts. Keep playlist export behavior during item completion and stage finalization.
|
||||
|
||||
**Tech Stack:** Python, sqlite3, unittest, FastAPI ops dashboard
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Lock The Regression With Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
- Modify: `tests/catalogsync/test_ops_runner.py`
|
||||
|
||||
- [ ] Add a dashboard API regression test that seeds one real running download worker plus stale historical workers and expects transfer speed to only include the live worker.
|
||||
- [ ] Add a runner regression test that keeps a stage open with pending downloads and expects already-completed playlist exports not to run before pending download workers start.
|
||||
- [ ] Run targeted tests first and confirm they fail for the expected reason.
|
||||
|
||||
### Task 2: Apply The Minimal Fix
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
- Modify: `musicdl/catalogsync/ops/runner.py`
|
||||
|
||||
- [ ] Restrict dashboard worker rows to workers whose current job item is still running under an active job.
|
||||
- [ ] Remove the pre-worker playlist artifact refresh from download stage startup so worker claiming is not blocked by export work.
|
||||
- [ ] Keep existing per-playlist export on item completion and full export refresh on stage completion.
|
||||
|
||||
### Task 3: Verify The Fix
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
- Modify: `tests/catalogsync/test_ops_runner.py`
|
||||
|
||||
- [ ] Run the focused regression tests and confirm they pass.
|
||||
- [ ] Run a slightly wider ops test slice to catch nearby regressions.
|
||||
@@ -0,0 +1,765 @@
|
||||
# Playlist Export Local ZIP Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** 把 `Export` 统一改成“导出到前端本地 ZIP”,服务端先确保 NAS `playlists/` 目录存在,再打包 ZIP 返回浏览器下载。
|
||||
|
||||
**Architecture:** 复用现有 `playlist_artifacts.py` 目录生成链路,不直接从数据库拼 ZIP。新增一个轻量 `export_bundles.py` 负责 ZIP 文件名、临时 bundle 目录和打包逻辑;`ops/web.py` 只负责状态分流、下载接口与 HTTP 响应;前端 `app.js` 只负责触发下载或显示“已入队”的状态。
|
||||
|
||||
**Tech Stack:** Python, FastAPI, Starlette `FileResponse`, `zipfile`, SQLite, vanilla JavaScript, Node-based frontend tests, pytest
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
- Create: `musicdl/catalogsync/export_bundles.py`
|
||||
- 负责:
|
||||
- 生成单歌单 / 多歌单 ZIP 文件名
|
||||
- 把现有歌单目录打成 ZIP
|
||||
- 在服务端临时 bundle 目录落地 ZIP
|
||||
- 根据 token 找回 bundle 路径
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
- 负责:
|
||||
- `GET /api/playlists/{playlist_id}/export.zip`
|
||||
- `POST /api/playlists/export-zip`
|
||||
- `GET /api/exports/bundles/{token}.zip`
|
||||
- 保留并弱化 `GET /api/playlists/{playlist_id}/export-folder`
|
||||
- 可选保留 `POST /api/playlists/export` 兼容旧前端,但内部改成调用新逻辑
|
||||
- Modify: `musicdl/catalogsync/templates/ops/playlists.html`
|
||||
- 负责按钮文案从 `Export Folder` / `Export Selected Playlists` 改为 `Export` / `Export Selected`
|
||||
- Modify: `musicdl/catalogsync/static/ops/app.js`
|
||||
- 负责:
|
||||
- 单歌单导出直接下载 ZIP
|
||||
- 批量导出根据后端返回决定“自动下载 ZIP”还是“显示 queued 提示”
|
||||
- 不再把 `Export` 当成“仅 NAS 上生成目录”
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
- 负责后端 API 行为测试
|
||||
- Create: `tests/catalogsync/test_export_bundles.py`
|
||||
- 负责 ZIP 打包 helper 纯逻辑测试
|
||||
- Modify: `tests/catalogsync/test_ops_frontend.py`
|
||||
- 负责前端按钮文案和下载流程测试
|
||||
- Modify: `docs/catalogsync.md`
|
||||
- 负责记录 `Export` 语义更新
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add ZIP Bundle Helper
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/export_bundles.py`
|
||||
- Test: `tests/catalogsync/test_export_bundles.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
import zipfile
|
||||
|
||||
|
||||
class ExportBundleTests(unittest.TestCase):
|
||||
def test_build_single_playlist_zip_creates_expected_top_level_folder(self):
|
||||
from musicdl.catalogsync.export_bundles import create_single_playlist_bundle
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
playlist_dir = root / "Playlist A_100"
|
||||
(playlist_dir / "covers").mkdir(parents=True, exist_ok=True)
|
||||
(playlist_dir / "playlist.yaml").write_text("playlist_id: 100\n", encoding="utf-8")
|
||||
(playlist_dir / ".playlist_meta.json").write_text("{}", encoding="utf-8")
|
||||
(playlist_dir / "covers" / "playlist-cover.jpg").write_bytes(b"cover")
|
||||
|
||||
bundle_path = create_single_playlist_bundle(
|
||||
bundle_root=root / "bundles",
|
||||
playlist_dir=playlist_dir,
|
||||
playlist={"id": 100, "platform": "qq", "name": "Playlist A"},
|
||||
)
|
||||
|
||||
self.assertTrue(bundle_path.exists())
|
||||
with zipfile.ZipFile(bundle_path) as zf:
|
||||
names = set(zf.namelist())
|
||||
|
||||
self.assertIn("Playlist A_100/playlist.yaml", names)
|
||||
self.assertIn("Playlist A_100/.playlist_meta.json", names)
|
||||
self.assertIn("Playlist A_100/covers/playlist-cover.jpg", names)
|
||||
|
||||
def test_build_multi_playlist_zip_wraps_directories_under_playlists_root(self):
|
||||
from musicdl.catalogsync.export_bundles import create_multi_playlist_bundle
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
playlist_a = root / "Playlist A_100"
|
||||
playlist_b = root / "Playlist B_200"
|
||||
playlist_a.mkdir(parents=True, exist_ok=True)
|
||||
playlist_b.mkdir(parents=True, exist_ok=True)
|
||||
(playlist_a / "playlist.yaml").write_text("playlist_id: 100\n", encoding="utf-8")
|
||||
(playlist_b / "playlist.yaml").write_text("playlist_id: 200\n", encoding="utf-8")
|
||||
|
||||
bundle_path = create_multi_playlist_bundle(
|
||||
bundle_root=root / "bundles",
|
||||
playlist_dirs=[playlist_a, playlist_b],
|
||||
)
|
||||
|
||||
with zipfile.ZipFile(bundle_path) as zf:
|
||||
names = set(zf.namelist())
|
||||
|
||||
self.assertIn("playlists/Playlist A_100/playlist.yaml", names)
|
||||
self.assertIn("playlists/Playlist B_200/playlist.yaml", names)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_export_bundles.py -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- FAIL with `ModuleNotFoundError` or missing function errors for `musicdl.catalogsync.export_bundles`
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
from .runtime import sanitize_path_component
|
||||
|
||||
|
||||
def _token() -> str:
|
||||
return str(int(time.time() * 1000))
|
||||
|
||||
|
||||
def build_single_bundle_name(playlist: dict[str, object]) -> str:
|
||||
platform = sanitize_path_component(str(playlist.get("platform") or ""), "playlist")
|
||||
playlist_id = sanitize_path_component(str(playlist.get("id") or ""), "0")
|
||||
name = sanitize_path_component(str(playlist.get("name") or ""), "playlist")
|
||||
return f"playlist-{platform}-{playlist_id}-{name}.zip"
|
||||
|
||||
|
||||
def build_multi_bundle_name() -> str:
|
||||
return "playlists-export-" + time.strftime("%Y%m%d-%H%M%S") + ".zip"
|
||||
|
||||
|
||||
def _zip_tree(zf: zipfile.ZipFile, source_dir: Path, archive_root: str) -> None:
|
||||
for path in sorted(source_dir.rglob("*")):
|
||||
if not path.is_file():
|
||||
continue
|
||||
relative = path.relative_to(source_dir).as_posix()
|
||||
zf.write(path, f"{archive_root}/{relative}")
|
||||
|
||||
|
||||
def create_single_playlist_bundle(*, bundle_root: Path, playlist_dir: Path, playlist: dict[str, object]) -> Path:
|
||||
bundle_root.mkdir(parents=True, exist_ok=True)
|
||||
destination = bundle_root / (_token() + "-" + build_single_bundle_name(playlist))
|
||||
with zipfile.ZipFile(destination, "w", compression=zipfile.ZIP_DEFLATED) as zf:
|
||||
_zip_tree(zf, playlist_dir, playlist_dir.name)
|
||||
return destination
|
||||
|
||||
|
||||
def create_multi_playlist_bundle(*, bundle_root: Path, playlist_dirs: list[Path]) -> Path:
|
||||
bundle_root.mkdir(parents=True, exist_ok=True)
|
||||
destination = bundle_root / (_token() + "-" + build_multi_bundle_name())
|
||||
with zipfile.ZipFile(destination, "w", compression=zipfile.ZIP_DEFLATED) as zf:
|
||||
for playlist_dir in playlist_dirs:
|
||||
_zip_tree(zf, playlist_dir, f"playlists/{playlist_dir.name}")
|
||||
return destination
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_export_bundles.py -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/catalogsync/test_export_bundles.py musicdl/catalogsync/export_bundles.py
|
||||
git commit -m "feat: add playlist export zip bundle helpers"
|
||||
```
|
||||
|
||||
### Task 2: Add Single-Playlist ZIP Export API
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
- Test: `tests/catalogsync/test_ops_api.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
```python
|
||||
def test_api_playlist_export_zip_downloads_single_playlist_bundle(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.ops.web import create_app
|
||||
import zipfile
|
||||
import io
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
db_path = root / "catalogsync.db"
|
||||
env_path = root / "catalogsync.env"
|
||||
env_path.write_text(f"ROOT_DIR={root.as_posix()}\n", encoding="utf-8")
|
||||
initialize_database(db_path, default_library_root=root / "library").close()
|
||||
|
||||
playlist_id = self._seed_playlist(
|
||||
db_path,
|
||||
platform="qq",
|
||||
pool_kind="manual_file",
|
||||
remote_id="playlist-export-zip",
|
||||
name="Playlist Export Zip",
|
||||
)
|
||||
song_id = self._seed_song(
|
||||
db_path,
|
||||
platform="qq",
|
||||
remote_id="song-export-zip",
|
||||
name="Song Export Zip",
|
||||
)
|
||||
self._link_playlist_song(db_path, playlist_id=playlist_id, song_id=song_id, position=1)
|
||||
self._mark_local_downloaded(
|
||||
db_path,
|
||||
song_id=song_id,
|
||||
relative_path="qq/Singer A/song-export-zip.mp3",
|
||||
)
|
||||
|
||||
app = create_app(db_path=db_path, env_path=env_path)
|
||||
client = TestClient(app)
|
||||
self.addCleanup(client.close)
|
||||
|
||||
response = client.get(f"/api/playlists/{playlist_id}/export.zip")
|
||||
|
||||
self.assertEqual(200, response.status_code)
|
||||
self.assertEqual("application/zip", response.headers["content-type"])
|
||||
archive = zipfile.ZipFile(io.BytesIO(response.content))
|
||||
self.assertIn(f"Playlist Export Zip_{playlist_id}/playlist.yaml", archive.namelist())
|
||||
|
||||
|
||||
def test_api_playlist_export_zip_returns_409_for_unsynced_playlist(self):
|
||||
client, db_path, _ = self._build_client()
|
||||
playlist_id = self._seed_playlist(
|
||||
db_path,
|
||||
platform="qq",
|
||||
pool_kind="manual_file",
|
||||
remote_id="playlist-export-unsynced",
|
||||
name="Playlist Export Unsynced",
|
||||
)
|
||||
|
||||
response = client.get(f"/api/playlists/{playlist_id}/export.zip")
|
||||
|
||||
self.assertEqual(409, response.status_code)
|
||||
payload = response.json()
|
||||
self.assertEqual("unsynced", payload["state_code"])
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_api.py -k "playlist_export_zip_downloads_single_playlist_bundle or playlist_export_zip_returns_409_for_unsynced_playlist" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- FAIL with `404` because `/api/playlists/{playlist_id}/export.zip` does not exist yet
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
```python
|
||||
from fastapi.responses import FileResponse
|
||||
|
||||
from musicdl.catalogsync.export_bundles import create_single_playlist_bundle
|
||||
|
||||
|
||||
def _playlist_state_or_404(catalog_repo: CatalogRepository, playlist_id: int) -> dict[str, Any]:
|
||||
rows = catalog_repo.list_playlist_export_state_rows([playlist_id])
|
||||
if not rows:
|
||||
raise HTTPException(status_code=404, detail="playlist not found")
|
||||
return dict(rows[0])
|
||||
|
||||
|
||||
@app.get("/api/playlists/{playlist_id}/export.zip")
|
||||
def api_playlist_export_zip(playlist_id: int):
|
||||
state_row = _playlist_state_or_404(catalog_repo, playlist_id)
|
||||
if str(state_row.get("state_code") or "") != "downloaded":
|
||||
raise HTTPException(
|
||||
status_code=409,
|
||||
detail={
|
||||
"state_code": str(state_row.get("state_code") or ""),
|
||||
"playlist_id": playlist_id,
|
||||
"message": "playlist is not ready for immediate export",
|
||||
},
|
||||
)
|
||||
|
||||
env_values = env_manager.load_current()
|
||||
playlists_root = _resolve_playlists_root(env_values, catalog_repo)
|
||||
if playlists_root is None:
|
||||
raise HTTPException(status_code=500, detail="playlists root is not configured")
|
||||
|
||||
service = CatalogSyncService(repository=catalog_repo, playlists_root=playlists_root)
|
||||
playlist_dir = service.ensure_playlist_artifacts_for_playlist(playlist_id)
|
||||
if playlist_dir is None or not playlist_dir.exists():
|
||||
raise HTTPException(status_code=404, detail="playlist export folder not found")
|
||||
|
||||
bundle_root = Path(env_values.get("ROOT_DIR") or playlists_root.parent) / "export-bundles"
|
||||
bundle_path = create_single_playlist_bundle(
|
||||
bundle_root=bundle_root,
|
||||
playlist_dir=playlist_dir,
|
||||
playlist=state_row,
|
||||
)
|
||||
return FileResponse(
|
||||
bundle_path,
|
||||
media_type="application/zip",
|
||||
filename=bundle_path.name.split("-", 1)[1],
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_api.py -k "playlist_export_zip_downloads_single_playlist_bundle or playlist_export_zip_returns_409_for_unsynced_playlist" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/catalogsync/test_ops_api.py musicdl/catalogsync/ops/web.py
|
||||
git commit -m "feat: add single playlist zip export api"
|
||||
```
|
||||
|
||||
### Task 3: Add Bulk Export ZIP Prepare + Download API
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
- Test: `tests/catalogsync/test_ops_api.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
```python
|
||||
def test_api_playlists_export_zip_returns_download_url_when_all_selected_playlists_are_ready(self):
|
||||
from musicdl.catalogsync.ops.web import create_app
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
db_path = root / "catalogsync.db"
|
||||
env_path = root / "catalogsync.env"
|
||||
env_path.write_text(f"ROOT_DIR={root.as_posix()}\n", encoding="utf-8")
|
||||
initialize_database(db_path, default_library_root=root / "library").close()
|
||||
|
||||
playlist_a = self._seed_playlist(db_path, platform="qq", pool_kind="manual_file", remote_id="bulk-a", name="Bulk A")
|
||||
playlist_b = self._seed_playlist(db_path, platform="qq", pool_kind="manual_file", remote_id="bulk-b", name="Bulk B")
|
||||
song_a = self._seed_song(db_path, platform="qq", remote_id="bulk-song-a", name="Bulk Song A")
|
||||
song_b = self._seed_song(db_path, platform="qq", remote_id="bulk-song-b", name="Bulk Song B")
|
||||
self._link_playlist_song(db_path, playlist_id=playlist_a, song_id=song_a, position=1)
|
||||
self._link_playlist_song(db_path, playlist_id=playlist_b, song_id=song_b, position=1)
|
||||
self._mark_local_downloaded(db_path, song_id=song_a, relative_path="qq/Singer A/bulk-song-a.mp3")
|
||||
self._mark_local_downloaded(db_path, song_id=song_b, relative_path="qq/Singer A/bulk-song-b.mp3")
|
||||
|
||||
app = create_app(db_path=db_path, env_path=env_path)
|
||||
client = TestClient(app)
|
||||
self.addCleanup(client.close)
|
||||
|
||||
response = client.post("/api/playlists/export-zip", json={"playlist_ids": [playlist_a, playlist_b]})
|
||||
payload = response.json()
|
||||
download_response = client.get(payload["download_url"])
|
||||
|
||||
self.assertEqual(200, response.status_code)
|
||||
self.assertEqual("ready", payload["status"])
|
||||
self.assertEqual(200, download_response.status_code)
|
||||
self.assertEqual("application/zip", download_response.headers["content-type"])
|
||||
|
||||
|
||||
def test_api_playlists_export_zip_queues_jobs_when_any_selected_playlist_is_not_ready(self):
|
||||
client, db_path, _ = self._build_client()
|
||||
downloaded_playlist = self._seed_playlist(
|
||||
db_path,
|
||||
platform="qq",
|
||||
pool_kind="manual_file",
|
||||
remote_id="bulk-ready",
|
||||
name="Bulk Ready",
|
||||
)
|
||||
song_id = self._seed_song(db_path, platform="qq", remote_id="bulk-ready-song", name="Bulk Ready Song")
|
||||
self._link_playlist_song(db_path, playlist_id=downloaded_playlist, song_id=song_id, position=1)
|
||||
self._mark_local_downloaded(db_path, song_id=song_id, relative_path="qq/Singer A/bulk-ready-song.mp3")
|
||||
|
||||
unsynced_playlist = self._seed_playlist(
|
||||
db_path,
|
||||
platform="netease",
|
||||
pool_kind="playlist_square",
|
||||
remote_id="bulk-unsynced",
|
||||
name="Bulk Unsynced",
|
||||
)
|
||||
|
||||
response = client.post("/api/playlists/export-zip", json={"playlist_ids": [downloaded_playlist, unsynced_playlist]})
|
||||
|
||||
self.assertEqual(200, response.status_code)
|
||||
payload = response.json()
|
||||
self.assertEqual("queued", payload["status"])
|
||||
self.assertEqual([downloaded_playlist], payload["ready_playlist_ids"])
|
||||
self.assertEqual([unsynced_playlist], payload["blocked_playlist_ids"])
|
||||
self.assertIsNotNone(payload["sync_download_job"])
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_api.py -k "playlists_export_zip_returns_download_url_when_all_selected_playlists_are_ready or playlists_export_zip_queues_jobs_when_any_selected_playlist_is_not_ready" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- FAIL with `404` because `/api/playlists/export-zip` and `/api/exports/bundles/{token}.zip` do not exist yet
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
```python
|
||||
from musicdl.catalogsync.export_bundles import create_multi_playlist_bundle
|
||||
|
||||
|
||||
def _bundle_root(env_values: dict[str, str], playlists_root: Path) -> Path:
|
||||
root_dir = str(env_values.get("ROOT_DIR") or "").strip()
|
||||
if root_dir:
|
||||
path = Path(root_dir).resolve() / "export-bundles"
|
||||
path.mkdir(parents=True, exist_ok=True)
|
||||
return path
|
||||
path = playlists_root.parent / "export-bundles"
|
||||
path.mkdir(parents=True, exist_ok=True)
|
||||
return path
|
||||
|
||||
|
||||
@app.post("/api/playlists/export-zip")
|
||||
def api_export_selected_playlists_zip(payload: PlaylistBulkRequest):
|
||||
playlist_ids = _normalize_playlist_ids(payload.playlist_ids)
|
||||
if not playlist_ids:
|
||||
raise HTTPException(status_code=422, detail="playlist_ids is required")
|
||||
|
||||
state_rows = catalog_repo.list_playlist_export_state_rows(playlist_ids)
|
||||
rows_by_id = {int(row["id"]): dict(row) for row in state_rows}
|
||||
|
||||
ready_ids: list[int] = []
|
||||
blocked_ids: list[int] = []
|
||||
sync_download_ids: list[int] = []
|
||||
download_ids: list[int] = []
|
||||
for playlist_id in playlist_ids:
|
||||
row = rows_by_id.get(playlist_id)
|
||||
if row is None:
|
||||
continue
|
||||
state_code = str(row.get("state_code") or "")
|
||||
if state_code == "downloaded":
|
||||
ready_ids.append(playlist_id)
|
||||
else:
|
||||
blocked_ids.append(playlist_id)
|
||||
if state_code == "unsynced":
|
||||
sync_download_ids.append(playlist_id)
|
||||
elif state_code != "downloading":
|
||||
download_ids.append(playlist_id)
|
||||
|
||||
if blocked_ids:
|
||||
return {
|
||||
"status": "queued",
|
||||
"message": f"{len(blocked_ids)} playlists queued for sync/download before export.",
|
||||
"ready_playlist_ids": ready_ids,
|
||||
"blocked_playlist_ids": blocked_ids,
|
||||
"download_job": _create_scoped_playlist_job(repo, env_manager, job_type="download_only", playlist_ids=download_ids, requested_by=payload.requested_by),
|
||||
"sync_download_job": _create_scoped_playlist_job(repo, env_manager, job_type="sync_download", playlist_ids=sync_download_ids, requested_by=payload.requested_by),
|
||||
}
|
||||
|
||||
env_values = env_manager.load_current()
|
||||
playlists_root = _resolve_playlists_root(env_values, catalog_repo)
|
||||
service = CatalogSyncService(repository=catalog_repo, playlists_root=playlists_root)
|
||||
playlist_dirs = [service.ensure_playlist_artifacts_for_playlist(playlist_id) for playlist_id in playlist_ids]
|
||||
valid_dirs = [path for path in playlist_dirs if path is not None and path.exists()]
|
||||
bundle_path = create_multi_playlist_bundle(bundle_root=_bundle_root(env_values, playlists_root), playlist_dirs=valid_dirs)
|
||||
return {
|
||||
"status": "ready",
|
||||
"playlist_ids": playlist_ids,
|
||||
"download_url": f"/api/exports/bundles/{bundle_path.name}",
|
||||
}
|
||||
|
||||
|
||||
@app.get("/api/exports/bundles/{bundle_name}")
|
||||
def api_export_bundle_download(bundle_name: str):
|
||||
env_values = env_manager.load_current()
|
||||
playlists_root = _resolve_playlists_root(env_values, catalog_repo)
|
||||
if playlists_root is None:
|
||||
raise HTTPException(status_code=500, detail="playlists root is not configured")
|
||||
bundle_path = _bundle_root(env_values, playlists_root) / bundle_name
|
||||
if not bundle_path.exists():
|
||||
raise HTTPException(status_code=404, detail="bundle not found")
|
||||
filename = bundle_name.split("-", 1)[1] if "-" in bundle_name else bundle_name
|
||||
return FileResponse(bundle_path, media_type="application/zip", filename=filename)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_api.py -k "playlists_export_zip_returns_download_url_when_all_selected_playlists_are_ready or playlists_export_zip_queues_jobs_when_any_selected_playlist_is_not_ready" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/catalogsync/test_ops_api.py musicdl/catalogsync/ops/web.py
|
||||
git commit -m "feat: add bulk playlist export zip workflow"
|
||||
```
|
||||
|
||||
### Task 4: Rename Export Buttons and Switch Frontend to Browser Download
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/templates/ops/playlists.html`
|
||||
- Modify: `musicdl/catalogsync/static/ops/app.js`
|
||||
- Test: `tests/catalogsync/test_ops_frontend.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing frontend tests**
|
||||
|
||||
```python
|
||||
def test_playlist_modal_export_button_text_is_export(self):
|
||||
repo_root = Path(__file__).resolve().parents[2]
|
||||
html = (repo_root / "musicdl/catalogsync/templates/ops/playlists.html").read_text(encoding="utf-8")
|
||||
self.assertIn(">Export</button>", html)
|
||||
self.assertNotIn(">Export Folder</button>", html)
|
||||
|
||||
|
||||
def test_export_selected_action_uses_download_url_when_backend_returns_ready(self):
|
||||
repo_root = Path(__file__).resolve().parents[2]
|
||||
script = textwrap.dedent(
|
||||
r'''
|
||||
const fs = require("fs");
|
||||
const path = require("path");
|
||||
const sourcePath = path.join(process.cwd(), "musicdl/catalogsync/static/ops/app.js");
|
||||
const source = fs.readFileSync(sourcePath, "utf8");
|
||||
let navigatedTo = "";
|
||||
|
||||
const exportButton = {
|
||||
handlers: {},
|
||||
disabled: false,
|
||||
getAttribute(name) {
|
||||
if (name === "data-playlist-action") return "export-selected";
|
||||
return null;
|
||||
},
|
||||
addEventListener(type, handler) {
|
||||
this.handlers[type] = handler;
|
||||
},
|
||||
};
|
||||
const checkbox = { checked: true, value: "101", addEventListener() {} };
|
||||
const selectionCount = { textContent: "" };
|
||||
const root = {
|
||||
querySelectorAll(selector) {
|
||||
if (selector === "[data-playlist-checkbox]") return [checkbox];
|
||||
if (selector === "[data-playlist-action]") return [exportButton];
|
||||
return [];
|
||||
},
|
||||
querySelector(selector) {
|
||||
if (selector === "[data-playlist-selection-count]") return selectionCount;
|
||||
return null;
|
||||
},
|
||||
addEventListener() {},
|
||||
};
|
||||
const body = { getAttribute() { return ""; }, appendChild() {}, removeChild() {} };
|
||||
const document = {
|
||||
body,
|
||||
querySelector(selector) {
|
||||
if (selector === "[data-playlists-page]") return root;
|
||||
return null;
|
||||
},
|
||||
querySelectorAll() { return []; },
|
||||
createElement() { return { click() {}, remove() {} }; },
|
||||
};
|
||||
const windowObj = {
|
||||
Number,
|
||||
setTimeout(fn) { fn(); return 1; },
|
||||
clearTimeout() {},
|
||||
alert() {},
|
||||
URL: { createObjectURL() { return "blob:test"; }, revokeObjectURL() {} },
|
||||
Blob: function Blob() {},
|
||||
location: { set href(value) { navigatedTo = value; }, get href() { return navigatedTo; } },
|
||||
fetch(url, options) {
|
||||
if (url !== "/api/playlists/export-zip") throw new Error("unexpected url: " + url);
|
||||
return Promise.resolve({
|
||||
ok: true,
|
||||
json() {
|
||||
return Promise.resolve({ status: "ready", download_url: "/api/exports/bundles/token.zip" });
|
||||
}
|
||||
});
|
||||
},
|
||||
};
|
||||
|
||||
global.window = windowObj;
|
||||
global.document = document;
|
||||
global.setTimeout = windowObj.setTimeout;
|
||||
global.clearTimeout = windowObj.clearTimeout;
|
||||
eval(source);
|
||||
|
||||
exportButton.handlers.click({});
|
||||
Promise.resolve().then(() => {
|
||||
if (navigatedTo !== "/api/exports/bundles/token.zip") {
|
||||
throw new Error("unexpected download target: " + navigatedTo);
|
||||
}
|
||||
process.exit(0);
|
||||
}).catch((error) => {
|
||||
console.error(error && error.stack ? error.stack : String(error));
|
||||
process.exit(1);
|
||||
});
|
||||
'''
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_frontend.py -k "playlist_modal_export_button_text_is_export or export_selected_action_uses_download_url_when_backend_returns_ready" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- FAIL because current template still says `Export Folder`
|
||||
- FAIL because current JS still posts to `/api/playlists/export` and only shows status text
|
||||
|
||||
- [ ] **Step 3: Write minimal implementation**
|
||||
|
||||
```javascript
|
||||
var endpointMap = {
|
||||
sync: "/api/playlists/sync",
|
||||
download: "/api/playlists/download",
|
||||
"sync-download": "/api/playlists/sync-download",
|
||||
"export-selected": "/api/playlists/export-zip",
|
||||
"mark-wanted": "/api/playlists/mark-wanted",
|
||||
"unmark-wanted": "/api/playlists/unmark-wanted",
|
||||
};
|
||||
|
||||
function startBrowserDownload(url) {
|
||||
window.location.href = String(url || "");
|
||||
}
|
||||
|
||||
function handleExportSelectedResponse(data) {
|
||||
if (data && data.status === "ready" && data.download_url) {
|
||||
startBrowserDownload(data.download_url);
|
||||
return;
|
||||
}
|
||||
var messages = [];
|
||||
if (data && data.message) {
|
||||
messages.push(String(data.message));
|
||||
}
|
||||
if (data && data.download_job && data.download_job.id) {
|
||||
messages.push("download job #" + String(data.download_job.id));
|
||||
}
|
||||
if (data && data.sync_download_job && data.sync_download_job.id) {
|
||||
messages.push("sync+download job #" + String(data.sync_download_job.id));
|
||||
}
|
||||
showMessage(messages.join("; ") || "export queued", false);
|
||||
window.setTimeout(function () {
|
||||
window.location.reload();
|
||||
}, 1200);
|
||||
}
|
||||
|
||||
function handleSinglePlaylistExportDownload(playlistId) {
|
||||
window.location.href = "/api/playlists/" + encodeURIComponent(playlistId) + "/export.zip";
|
||||
}
|
||||
```
|
||||
|
||||
Template snippet:
|
||||
|
||||
```html
|
||||
<button type="button" class="secondary" data-playlist-export disabled>Export</button>
|
||||
<button type="button" class="secondary" data-playlist-action="export-selected">Export Selected</button>
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run test to verify it passes**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_frontend.py -k "playlist_modal_export_button_text_is_export or export_selected_action_uses_download_url_when_backend_returns_ready" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/catalogsync/test_ops_frontend.py musicdl/catalogsync/templates/ops/playlists.html musicdl/catalogsync/static/ops/app.js
|
||||
git commit -m "feat: switch export ui to browser zip download"
|
||||
```
|
||||
|
||||
### Task 5: Document the New Export Semantics and Run Full Regression
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/catalogsync.md`
|
||||
|
||||
- [ ] **Step 1: Update docs with the new semantics**
|
||||
|
||||
```markdown
|
||||
## 2026-04-19 Playlist Export To Local ZIP
|
||||
|
||||
- `Export` now means browser download to the user's local machine.
|
||||
- NAS `playlists/` remains the internal artifact cache and bundle source.
|
||||
- `GET /api/playlists/{playlist_id}/export.zip` downloads one playlist ZIP.
|
||||
- `POST /api/playlists/export-zip` prepares bulk export:
|
||||
- returns `status=ready` with `download_url` when every selected playlist is export-ready
|
||||
- returns `status=queued` with background job info when any selected playlist still needs sync/download
|
||||
- `GET /api/exports/bundles/{token}.zip` downloads a prepared bulk ZIP bundle.
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run focused regression suites**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_export_bundles.py -q
|
||||
python -m pytest tests/catalogsync/test_ops_api.py -q
|
||||
python -m pytest tests/catalogsync/test_ops_frontend.py -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- all tests PASS
|
||||
|
||||
- [ ] **Step 3: Run the broader regression that already covers current export + runner behavior**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_services.py -q
|
||||
python -m pytest tests/catalogsync/test_ops_runner.py -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- all tests PASS
|
||||
- no regression to the existing “download stage refreshes playlist artifacts” behavior
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/catalogsync.md
|
||||
git commit -m "docs: describe local zip export workflow"
|
||||
```
|
||||
@@ -0,0 +1,756 @@
|
||||
# Download Dual-Pool Pipeline Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Split the download stage into resolver workers and downloader workers so configured download concurrency is spent on real transfers instead of long source-resolution work.
|
||||
|
||||
**Architecture:** Keep deferred snapshots and current database schema unchanged, refactor `CatalogDownloader` into explicit resolve-only and download-only phases, and add a runner-level in-memory `ready_queue` that connects a small resolver pool to a larger downloader pool during `download` stages.
|
||||
|
||||
**Tech Stack:** Python, sqlite3, unittest, ThreadPoolExecutor, queue.Queue, FastAPI ops dashboard
|
||||
|
||||
---
|
||||
|
||||
## File Map
|
||||
|
||||
### Existing Files To Modify
|
||||
|
||||
- `musicdl/catalogsync/downloader.py`
|
||||
- Split current mixed `resolve + download` flow into reusable resolve-only and download-only methods.
|
||||
- `musicdl/catalogsync/ops/executors.py`
|
||||
- Add download-stage helpers for resolve-only and download-only item handling without breaking existing item status semantics.
|
||||
- `musicdl/catalogsync/ops/runner.py`
|
||||
- Add the runner-level dual-pool execution path for `download` stages.
|
||||
- `tests/catalogsync/test_services.py`
|
||||
- Lock the downloader API refactor with unit tests.
|
||||
- `tests/catalogsync/test_ops_executors.py`
|
||||
- Lock download-stage executor behavior for resolved tasks and failure handling.
|
||||
- `tests/catalogsync/test_ops_runner.py`
|
||||
- Lock the dual-pool queueing model, worker split, and stage lifecycle.
|
||||
- `tests/catalogsync/test_ops_api.py`
|
||||
- Lock dashboard worker visibility for resolver and downloader worker families if any payload expectations need adjustment.
|
||||
|
||||
### New Files To Create
|
||||
|
||||
- None required for the first implementation. Keep the change focused and schema-free.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Lock `CatalogDownloader` Into Resolve And Download Phases
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_services.py`
|
||||
- Modify: `musicdl/catalogsync/downloader.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests for resolve-only and download-only behavior**
|
||||
|
||||
Add tests near the existing downloader tests in `tests/catalogsync/test_services.py` for:
|
||||
|
||||
```python
|
||||
def test_catalog_downloader_resolve_song_row_returns_resolved_payload():
|
||||
payload = downloader.resolve_song_row(
|
||||
row={
|
||||
"id": song_id,
|
||||
"playlist_id": 123,
|
||||
"platform": "netease",
|
||||
"name": "Song Resolve",
|
||||
"singers": "Singer Resolve",
|
||||
"ext": "mp3",
|
||||
"file_size_bytes": 24,
|
||||
"metadata_json": '{"snapshot":{"identifier":"song-resolve"}}',
|
||||
},
|
||||
library_root=library_root,
|
||||
download_sources=["qq", "kuwo"],
|
||||
worker_callback=lambda **state: worker_updates.append(dict(state)),
|
||||
)
|
||||
|
||||
assert payload is not None
|
||||
assert payload["row"]["id"] == song_id
|
||||
assert payload["display_text"] == "Song Resolve / Singer Resolve"
|
||||
assert payload["resolved_song_info"].source == "QQMusicClient"
|
||||
```
|
||||
|
||||
```python
|
||||
def test_catalog_downloader_download_resolved_song_reports_progress_and_records_file():
|
||||
ok = downloader.download_resolved_song(
|
||||
resolved_payload=resolved_payload,
|
||||
worker_callback=lambda **state: worker_updates.append(dict(state)),
|
||||
)
|
||||
|
||||
assert ok is True
|
||||
assert any(int(state.get("downloaded_bytes") or 0) > 0 for state in worker_updates)
|
||||
assert repo.song_has_active_local_file(song_id) is True
|
||||
```
|
||||
|
||||
```python
|
||||
def test_catalog_downloader_download_song_row_remains_a_compatibility_wrapper():
|
||||
ok = downloader.download_song_row(
|
||||
row=row,
|
||||
library_root=library_root,
|
||||
download_sources=["qq"],
|
||||
worker_callback=lambda **state: worker_updates.append(dict(state)),
|
||||
)
|
||||
|
||||
assert ok is True
|
||||
assert any("resolving source" in str(state.get("last_progress_text") or "") for state in worker_updates)
|
||||
assert any("starting download via" in str(state.get("last_progress_text") or "") for state in worker_updates)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused tests and verify they fail for missing APIs**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_services.py -k "resolve_song_row or download_resolved_song or compatibility_wrapper" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- `FAILED`
|
||||
- failure mentions missing `resolve_song_row` or `download_resolved_song`, or existing wrapper behavior not matching the new tests
|
||||
|
||||
- [ ] **Step 3: Implement minimal resolve-only and download-only APIs in `CatalogDownloader`**
|
||||
|
||||
Refactor `musicdl/catalogsync/downloader.py` toward this shape:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ResolvedDownloadPayload:
|
||||
row: dict[str, object]
|
||||
display_text: str
|
||||
default_root: Path
|
||||
target_root: Path
|
||||
backend_id: int
|
||||
expected_bytes: int | None
|
||||
resolved_song_info: object
|
||||
|
||||
|
||||
def resolve_song_row(
|
||||
self,
|
||||
row,
|
||||
library_root: str | Path,
|
||||
download_sources: list[str] | None = None,
|
||||
worker_callback=None,
|
||||
) -> ResolvedDownloadPayload | None:
|
||||
row_dict = dict(row)
|
||||
default_root = Path(library_root).resolve()
|
||||
self._current_library_root = default_root
|
||||
self.repository.ensure_local_backend(default_root, name="default-local", is_default=True)
|
||||
display_name = str(row_dict.get("name") or row_dict.get("id") or "")
|
||||
singers = str(row_dict.get("singers") or "").strip()
|
||||
display_text = f"{display_name} / {singers}".strip(" /")
|
||||
self._emit_worker_progress(row_dict, worker_callback, display_text=display_text)
|
||||
|
||||
metadata = json.loads(row_dict["metadata_json"]) if row_dict.get("metadata_json") else {}
|
||||
song_info = deserialize_song_info(metadata.get("snapshot"))
|
||||
if song_info is None:
|
||||
return None
|
||||
|
||||
resolve_progress_callback = None
|
||||
if worker_callback is not None:
|
||||
resolve_progress_callback = lambda message: self._emit_worker_progress(
|
||||
row_dict,
|
||||
worker_callback,
|
||||
display_text=display_text,
|
||||
last_progress_text=message,
|
||||
)
|
||||
|
||||
song_info = self.resolve_song_info_for_download(
|
||||
row=row_dict,
|
||||
song_info=song_info,
|
||||
download_sources=download_sources,
|
||||
progress_callback=resolve_progress_callback,
|
||||
)
|
||||
download_platform = self._detect_download_platform(song_info, row_dict["platform"])
|
||||
target_root = self.ensure_space(
|
||||
default_root,
|
||||
getattr(song_info, "file_size_bytes", None) or row_dict.get("file_size_bytes"),
|
||||
)
|
||||
is_default_root = target_root.resolve() == default_root
|
||||
backend_id = self.repository.ensure_local_backend(
|
||||
target_root,
|
||||
name="default-local" if is_default_root else None,
|
||||
is_default=is_default_root,
|
||||
)
|
||||
expected_bytes = int(getattr(song_info, "file_size_bytes", None) or row_dict.get("file_size_bytes") or 0) or None
|
||||
return ResolvedDownloadPayload(
|
||||
row=row_dict,
|
||||
display_text=display_text,
|
||||
default_root=default_root,
|
||||
target_root=target_root,
|
||||
backend_id=backend_id,
|
||||
expected_bytes=expected_bytes,
|
||||
resolved_song_info=song_info,
|
||||
)
|
||||
```
|
||||
|
||||
```python
|
||||
def download_resolved_song(
|
||||
self,
|
||||
resolved_payload: ResolvedDownloadPayload,
|
||||
worker_callback=None,
|
||||
) -> bool:
|
||||
row = resolved_payload.row
|
||||
song_info = resolved_payload.resolved_song_info
|
||||
download_platform = self._detect_download_platform(song_info, row["platform"])
|
||||
client = self.get_client(download_platform)
|
||||
singers = self._normalize_singers(getattr(song_info, "singers", None)) or self._normalize_singers(row.get("singers"))
|
||||
relative_dir = build_download_relative_dir(platform=download_platform, singers=singers)
|
||||
target_dir = resolved_payload.target_root / relative_dir
|
||||
target_dir.mkdir(parents=True, exist_ok=True)
|
||||
song_info.work_dir = str(target_dir)
|
||||
if hasattr(song_info, "_save_path"):
|
||||
song_info._save_path = None
|
||||
self._emit_worker_progress(
|
||||
row,
|
||||
worker_callback,
|
||||
display_text=resolved_payload.display_text,
|
||||
last_progress_text=f"starting download via {download_platform}",
|
||||
)
|
||||
# keep existing monitor, client.download, and record_local_file logic here
|
||||
```
|
||||
|
||||
```python
|
||||
def download_song_row(...):
|
||||
resolved_payload = self.resolve_song_row(
|
||||
row=row,
|
||||
library_root=library_root,
|
||||
download_sources=download_sources,
|
||||
worker_callback=worker_callback,
|
||||
)
|
||||
if resolved_payload is None:
|
||||
return False
|
||||
return self.download_resolved_song(
|
||||
resolved_payload=resolved_payload,
|
||||
worker_callback=worker_callback,
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the focused tests and verify they pass**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_services.py -k "resolve_song_row or download_resolved_song or compatibility_wrapper" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- `PASS`
|
||||
- no regressions in existing progress tests around `resolving source ...` and `starting download via ...`
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/catalogsync/test_services.py musicdl/catalogsync/downloader.py
|
||||
git commit -m "refactor: split downloader resolve and transfer phases"
|
||||
```
|
||||
|
||||
### Task 2: Lock Download Executor Helpers For Resolved Tasks
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_ops_executors.py`
|
||||
- Modify: `musicdl/catalogsync/ops/executors.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing executor tests for resolve-only and download-only item handling**
|
||||
|
||||
Add focused tests in `tests/catalogsync/test_ops_executors.py`:
|
||||
|
||||
```python
|
||||
def test_download_executor_resolve_item_marks_failed_when_resolution_returns_none():
|
||||
with patch.object(CatalogDownloader, "resolve_song_row", return_value=None):
|
||||
executor.process_resolve_item(
|
||||
item_id=item_id,
|
||||
worker_name="resolve-1",
|
||||
ready_queue=ready_queue,
|
||||
)
|
||||
|
||||
item = ops_repo.get_item(item_id)
|
||||
assert item.status == ItemStatus.FAILED
|
||||
assert ready_queue.empty()
|
||||
```
|
||||
|
||||
```python
|
||||
def test_download_executor_resolve_item_enqueues_resolved_payload():
|
||||
resolved_payload = SimpleNamespace(row=row, display_text="Song A / Singer A")
|
||||
with patch.object(CatalogDownloader, "resolve_song_row", return_value=resolved_payload):
|
||||
executor.process_resolve_item(
|
||||
item_id=item_id,
|
||||
worker_name="resolve-1",
|
||||
ready_queue=ready_queue,
|
||||
)
|
||||
|
||||
queued = ready_queue.get_nowait()
|
||||
assert queued.item_id == item_id
|
||||
assert queued.resolved_payload is resolved_payload
|
||||
```
|
||||
|
||||
```python
|
||||
def test_download_executor_download_resolved_item_marks_item_succeeded():
|
||||
with patch.object(CatalogDownloader, "download_resolved_song", return_value=True):
|
||||
executor.process_download_task(
|
||||
task=resolved_task,
|
||||
worker_name="download-1",
|
||||
)
|
||||
|
||||
item = ops_repo.get_item(item_id)
|
||||
assert item.status == ItemStatus.SUCCEEDED
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused executor tests and verify they fail**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_executors.py -k "process_resolve_item or process_download_task" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- `FAILED`
|
||||
- failure mentions missing `process_resolve_item` or `process_download_task`
|
||||
|
||||
- [ ] **Step 3: Add minimal executor helpers without removing current compatibility path**
|
||||
|
||||
Extend `musicdl/catalogsync/ops/executors.py` around `DownloadStageExecutor` with helpers shaped like:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ResolvedStageDownloadTask:
|
||||
item_id: int
|
||||
playlist_id: int | None
|
||||
row: dict[str, object]
|
||||
resolved_payload: object
|
||||
|
||||
|
||||
def process_resolve_item(self, item_id: int, worker_name: str, *, ready_queue) -> None:
|
||||
row = self.ops_repo.build_download_row(item_id=item_id)
|
||||
song_id = int(row.get("id") or row.get("song_id") or 0)
|
||||
if song_id > 0 and self.catalog_repo.song_has_active_local_file(song_id):
|
||||
self.ops_repo.update_worker_state(
|
||||
worker_name=worker_name,
|
||||
current_job_item_id=item_id,
|
||||
status="running",
|
||||
current_song_id=song_id,
|
||||
current_playlist_id=row.get("playlist_id"),
|
||||
current_display_text=str(row.get("name") or row.get("id") or song_id),
|
||||
last_progress_text="already downloaded",
|
||||
)
|
||||
_ensure_transition_applied(
|
||||
self.ops_repo.mark_item_succeeded(
|
||||
item_id=item_id,
|
||||
result_payload={"already_downloaded": True},
|
||||
),
|
||||
item_id=item_id,
|
||||
action="mark_item_succeeded",
|
||||
)
|
||||
return
|
||||
resolved_payload = self.downloader.resolve_song_row(
|
||||
row=row,
|
||||
library_root=self.library_root,
|
||||
download_sources=self.download_sources,
|
||||
worker_callback=lambda **state: self.ops_repo.update_worker_state(
|
||||
worker_name=worker_name,
|
||||
current_job_item_id=item_id,
|
||||
status="running",
|
||||
**state,
|
||||
),
|
||||
)
|
||||
if resolved_payload is None:
|
||||
_ensure_transition_applied(
|
||||
self.ops_repo.mark_item_failed(item_id=item_id, error_message="resolve returned no downloadable song"),
|
||||
item_id=item_id,
|
||||
action="mark_item_failed",
|
||||
)
|
||||
return
|
||||
ready_queue.put(
|
||||
ResolvedStageDownloadTask(
|
||||
item_id=item_id,
|
||||
playlist_id=row.get("playlist_id"),
|
||||
row=row,
|
||||
resolved_payload=resolved_payload,
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
```python
|
||||
def process_download_task(self, task: ResolvedStageDownloadTask, worker_name: str) -> None:
|
||||
succeeded = self.downloader.download_resolved_song(
|
||||
resolved_payload=task.resolved_payload,
|
||||
worker_callback=lambda **state: self.ops_repo.update_worker_state(
|
||||
worker_name=worker_name,
|
||||
current_job_item_id=task.item_id,
|
||||
status="running",
|
||||
**state,
|
||||
),
|
||||
)
|
||||
if succeeded:
|
||||
_ensure_transition_applied(
|
||||
self.ops_repo.mark_item_succeeded(item_id=task.item_id),
|
||||
item_id=task.item_id,
|
||||
action="mark_item_succeeded",
|
||||
)
|
||||
return
|
||||
_ensure_transition_applied(
|
||||
self.ops_repo.mark_item_failed(item_id=task.item_id, error_message="download returned no file"),
|
||||
item_id=task.item_id,
|
||||
action="mark_item_failed",
|
||||
)
|
||||
```
|
||||
|
||||
Keep `process_item(...)` as the compatibility path by delegating to `download_song_row(...)`.
|
||||
|
||||
- [ ] **Step 4: Run the focused executor tests and verify they pass**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_executors.py -k "process_resolve_item or process_download_task" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- `PASS`
|
||||
- existing `DownloadStageExecutor.process_item(...)` tests remain green
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/catalogsync/test_ops_executors.py musicdl/catalogsync/ops/executors.py
|
||||
git commit -m "refactor: add staged download executor helpers"
|
||||
```
|
||||
|
||||
### Task 3: Lock The Runner-Level Dual-Pool Pipeline
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_ops_runner.py`
|
||||
- Modify: `musicdl/catalogsync/ops/runner.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing runner tests for worker splitting and queue-driven execution**
|
||||
|
||||
Add tests in `tests/catalogsync/test_ops_runner.py` for:
|
||||
|
||||
```python
|
||||
def test_download_stage_splits_workers_into_resolve_and_download_pools():
|
||||
resolver_workers, download_workers = runner._download_stage_worker_split(total_workers=10)
|
||||
assert resolver_workers == 3
|
||||
assert download_workers == 7
|
||||
```
|
||||
|
||||
```python
|
||||
def test_download_stage_pipeline_processes_items_through_ready_queue():
|
||||
processed = []
|
||||
|
||||
class FakeDownloadExecutor:
|
||||
def process_resolve_item(self, item_id, worker_name, *, ready_queue):
|
||||
ready_queue.put(SimpleNamespace(item_id=item_id, playlist_id=None, resolved_payload=f"payload-{item_id}"))
|
||||
|
||||
def process_download_task(self, task, worker_name):
|
||||
processed.append((task.item_id, worker_name, task.resolved_payload))
|
||||
repo.mark_item_succeeded(task.item_id)
|
||||
|
||||
with patch.object(runner, "_build_executor", return_value=FakeDownloadExecutor()):
|
||||
runner._run_stage(job, stage)
|
||||
|
||||
assert processed
|
||||
assert all(worker_name.startswith("download-") for _, worker_name, _ in processed)
|
||||
```
|
||||
|
||||
```python
|
||||
def test_download_stage_pipeline_uses_single_thread_compatibility_when_worker_count_is_one():
|
||||
calls = []
|
||||
|
||||
class FakeDownloadExecutor:
|
||||
def process_item(self, item_id, worker_name, *, already_claimed=False):
|
||||
calls.append((item_id, worker_name, already_claimed))
|
||||
repo.mark_item_succeeded(item_id)
|
||||
|
||||
with patch.object(runner, "_build_executor", return_value=FakeDownloadExecutor()):
|
||||
runner._run_stage(job, stage)
|
||||
|
||||
assert calls
|
||||
assert calls[0][1] == "download-1"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused runner tests and verify they fail**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_runner.py -k "worker_split or ready_queue or single_thread_compatibility" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- `FAILED`
|
||||
- failure mentions missing `_download_stage_worker_split` or the existing `_run_stage(...)` shape not matching the new behavior
|
||||
|
||||
- [ ] **Step 3: Implement the runner-level dual-pool path for download stages**
|
||||
|
||||
Refactor `musicdl/catalogsync/ops/runner.py` so `download` stages use a specialized path:
|
||||
|
||||
```python
|
||||
def _download_stage_worker_split(self, total_workers: int) -> tuple[int, int]:
|
||||
normalized_total = max(int(total_workers or 0), 1)
|
||||
if normalized_total == 1:
|
||||
return 1, 0
|
||||
if normalized_total == 2:
|
||||
return 1, 1
|
||||
resolver_workers = max(1, min(3, normalized_total // 3))
|
||||
download_workers = max(1, normalized_total - resolver_workers)
|
||||
return resolver_workers, download_workers
|
||||
```
|
||||
|
||||
```python
|
||||
def _run_download_stage_pipeline(self, job, stage, executor, worker_count: int) -> None:
|
||||
resolver_workers, download_workers = self._download_stage_worker_split(worker_count)
|
||||
if download_workers == 0:
|
||||
self._run_stage_with_single_pool(job, stage, executor, worker_count)
|
||||
return
|
||||
|
||||
ready_queue: Queue = Queue(maxsize=max(1, download_workers * 2))
|
||||
stop_event = threading.Event()
|
||||
sentinel = object()
|
||||
|
||||
def resolver_loop(worker_index: int) -> None:
|
||||
worker_name = f"resolve-{worker_index + 1}"
|
||||
while not stop_event.is_set():
|
||||
active_job = self.repository.get_job(job.id)
|
||||
if active_job is None or active_job.status in {JobStatus.PAUSE_REQUESTED, JobStatus.CANCELED}:
|
||||
stop_event.set()
|
||||
return
|
||||
item = self.repository.claim_next_stage_item(stage.id, worker_name)
|
||||
if item is None:
|
||||
return
|
||||
executor.process_resolve_item(item.id, worker_name, ready_queue=ready_queue)
|
||||
|
||||
def download_loop(worker_index: int) -> None:
|
||||
worker_name = f"download-{worker_index + 1}"
|
||||
while True:
|
||||
task = ready_queue.get()
|
||||
if task is sentinel:
|
||||
return
|
||||
executor.process_download_task(task, worker_name)
|
||||
|
||||
with ThreadPoolExecutor(max_workers=resolver_workers + download_workers) as pool:
|
||||
resolver_futures = [pool.submit(resolver_loop, index) for index in range(resolver_workers)]
|
||||
download_futures = [pool.submit(download_loop, index) for index in range(download_workers)]
|
||||
for future in resolver_futures:
|
||||
future.result()
|
||||
for _ in range(download_workers):
|
||||
ready_queue.put(sentinel)
|
||||
for future in download_futures:
|
||||
future.result()
|
||||
```
|
||||
|
||||
Then update `_run_stage(...)` to branch:
|
||||
|
||||
```python
|
||||
if refreshed_stage.stage_type == StageType.DOWNLOAD.value:
|
||||
self._run_download_stage_pipeline(job, refreshed_stage, executor, worker_count)
|
||||
else:
|
||||
self._run_stage_with_single_pool(job, refreshed_stage, executor, worker_count)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the focused runner tests and verify they pass**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_runner.py -k "worker_split or ready_queue or single_thread_compatibility" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- `PASS`
|
||||
- current worker-count tests still pass
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/catalogsync/test_ops_runner.py musicdl/catalogsync/ops/runner.py
|
||||
git commit -m "feat: run download stage through dual-pool pipeline"
|
||||
```
|
||||
|
||||
### Task 4: Lock Dashboard And Worker-State Expectations
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
- Modify: `musicdl/catalogsync/ops/repository.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing dashboard regression for resolver and downloader worker families**
|
||||
|
||||
Add a test in `tests/catalogsync/test_ops_api.py` shaped like:
|
||||
|
||||
```python
|
||||
def test_dashboard_exposes_resolver_and_downloader_workers_during_download_stage():
|
||||
repo.update_worker_state(
|
||||
worker_name="resolve-1",
|
||||
current_job_item_id=item_id_a,
|
||||
status="running",
|
||||
current_song_id=song_a_id,
|
||||
current_display_text="Song A / Singer A",
|
||||
last_progress_text="resolving source qq (1/6)",
|
||||
)
|
||||
repo.update_worker_state(
|
||||
worker_name="download-1",
|
||||
current_job_item_id=item_id_b,
|
||||
status="running",
|
||||
current_song_id=song_b_id,
|
||||
current_display_text="Song B / Singer B",
|
||||
last_progress_text="12.00MB/48.00MB",
|
||||
downloaded_bytes=12 * 1024 * 1024,
|
||||
total_bytes=48 * 1024 * 1024,
|
||||
speed_bytes_per_sec=3 * 1024 * 1024,
|
||||
progress_percent=25,
|
||||
)
|
||||
|
||||
payload = client.get("/api/dashboard?include_task_rows=false").json()
|
||||
|
||||
worker_names = [worker["worker_name"] for worker in payload["workers"]]
|
||||
assert "resolve-1" in worker_names
|
||||
assert "download-1" in worker_names
|
||||
assert payload["transfer_stats"]["download_speed_bytes_per_sec"] == 3 * 1024 * 1024
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused dashboard test and verify it fails only if payload logic needs adjustment**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_api.py -k "resolver_and_downloader_workers" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- either `FAIL` because payload assumptions need adjustment
|
||||
- or `PASS`, which means no production code change is needed here
|
||||
|
||||
- [ ] **Step 3: Apply the minimal payload or repository changes only if the test requires them**
|
||||
|
||||
If worker lookup or naming assumptions need tightening, keep the code minimal, for example:
|
||||
|
||||
```python
|
||||
# no-op if existing worker queries already behave correctly
|
||||
# only adjust helper logic if it filters by a fixed "download-" prefix anywhere
|
||||
```
|
||||
|
||||
The target is not to redesign dashboard data, only to ensure resolver workers remain visible and transfer stats still reflect downloader workers only.
|
||||
|
||||
- [ ] **Step 4: Re-run the focused dashboard test and verify it passes**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_api.py -k "resolver_and_downloader_workers" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- `PASS`
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/catalogsync/test_ops_api.py musicdl/catalogsync/ops/web.py musicdl/catalogsync/ops/repository.py
|
||||
git commit -m "test: cover resolver and downloader worker visibility"
|
||||
```
|
||||
|
||||
### Task 5: Final Verification And NAS Reality Check
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/catalogsync/test_services.py`
|
||||
- Modify: `tests/catalogsync/test_ops_executors.py`
|
||||
- Modify: `tests/catalogsync/test_ops_runner.py`
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
- Modify: `musicdl/catalogsync/downloader.py`
|
||||
- Modify: `musicdl/catalogsync/ops/executors.py`
|
||||
- Modify: `musicdl/catalogsync/ops/runner.py`
|
||||
|
||||
- [ ] **Step 1: Run the full targeted regression slice**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_services.py tests/catalogsync/test_ops_executors.py tests/catalogsync/test_ops_runner.py tests/catalogsync/test_ops_api.py -k "download or transfer_stats or resolver" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- `PASS`
|
||||
- existing download progress tests still green
|
||||
- new dual-pool runner tests green
|
||||
|
||||
- [ ] **Step 2: Run a slightly wider ops regression slice**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/catalogsync/test_ops_repository.py tests/catalogsync/test_ops_frontend.py -k "worker or transfer or dashboard" -q
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- `PASS`
|
||||
- no unexpected worker-state regressions
|
||||
|
||||
- [ ] **Step 3: Deploy to NAS**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
powershell -ExecutionPolicy Bypass -File .\deploy-catalogsync.ps1
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- deploy completes successfully
|
||||
- health check passes for `http://127.0.0.1:18080/dashboard`
|
||||
|
||||
- [ ] **Step 4: Verify production behavior on NAS**
|
||||
|
||||
Run:
|
||||
|
||||
```powershell
|
||||
$env:NAS_192168543_PASSWORD='Nie@159357'
|
||||
powershell -ExecutionPolicy Bypass -File 'C:\Users\Administrator\.codex\skills\nas-ssh-192168543\scripts\run.ps1' "curl -fsS http://127.0.0.1:18080/api/dashboard?include_task_rows=false | python3 -m json.tool | head -n 160"
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- worker list includes both `resolve-*` and `download-*`
|
||||
- at least some `download-*` workers show non-zero transfer stats simultaneously under active load
|
||||
- `resolve-*` workers show `resolving source ...` text instead of pretending to be downloading
|
||||
|
||||
- [ ] **Step 5: Commit final integration changes**
|
||||
|
||||
```bash
|
||||
git add tests/catalogsync/test_services.py tests/catalogsync/test_ops_executors.py tests/catalogsync/test_ops_runner.py tests/catalogsync/test_ops_api.py musicdl/catalogsync/downloader.py musicdl/catalogsync/ops/executors.py musicdl/catalogsync/ops/runner.py musicdl/catalogsync/ops/web.py musicdl/catalogsync/ops/repository.py
|
||||
git commit -m "feat: split download stage into resolver and transfer pools"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Self-Review
|
||||
|
||||
### Spec Coverage
|
||||
|
||||
- dual-pool architecture: covered by Task 3
|
||||
- downloader API split: covered by Task 1
|
||||
- executor support for resolved tasks: covered by Task 2
|
||||
- worker-family visibility: covered by Task 4
|
||||
- NAS verification of real concurrency: covered by Task 5
|
||||
|
||||
No spec gaps remain for this iteration.
|
||||
|
||||
### Placeholder Scan
|
||||
|
||||
- no `TODO`, `TBD`, or “implement later” placeholders remain
|
||||
- each code-changing task includes concrete method names, commands, and code shapes
|
||||
|
||||
### Type Consistency
|
||||
|
||||
- `ResolvedDownloadPayload` is used consistently between Task 1 and Task 2
|
||||
- `ResolvedStageDownloadTask` is used consistently between Task 2 and Task 3
|
||||
- runner dual-pool entry point is consistently named `_run_download_stage_pipeline(...)`
|
||||
|
||||
@@ -0,0 +1,631 @@
|
||||
# Resolver Source Ranking Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add a persistent resolver side database that learns fallback success rates by original source and reorders fallback sources after warmup without touching the main catalog business tables.
|
||||
|
||||
**Architecture:** Create a dedicated `resolver_stats.db` side database and repository, then wire `MultiSourceSongResolver` to ask that repository for ranked fallback order. Keep preferred-source resolution first, record only fallback attempts and successes, and continue trying later sources if the learned top two fail.
|
||||
|
||||
**Tech Stack:** Python, sqlite3, unittest, Click CLI, FastAPI ops web
|
||||
|
||||
---
|
||||
|
||||
## File Map
|
||||
|
||||
- Create: `musicdl/catalogsync/resolver_stats.py`
|
||||
Dedicated resolver side-database bootstrap, default path helper, and ranking repository.
|
||||
- Modify: `musicdl/catalogsync/resolver.py`
|
||||
Preferred-source-first resolver flow plus ranked fallback traversal and resilient stats recording.
|
||||
- Modify: `musicdl/catalogsync/downloader.py`
|
||||
Construct `MultiSourceSongResolver` with a `ResolverStatsRepository` derived from the main database path.
|
||||
- Modify: `musicdl/catalogsync/cli.py`
|
||||
Initialize the resolver side database during CLI app startup and `init-db`.
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
Initialize the resolver side database during web app startup.
|
||||
- Create: `tests/catalogsync/test_resolver_stats.py`
|
||||
Unit tests for side-database schema, warmup, ranking, and grouping.
|
||||
- Modify: `tests/catalogsync/test_resolver.py`
|
||||
Integration-style resolver tests for warmup behavior, ranked top-two traversal, continuation, and graceful fallback.
|
||||
- Modify: `tests/catalogsync/test_cli.py`
|
||||
CLI startup tests for side-database creation.
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
Web startup test for side-database creation.
|
||||
|
||||
### Task 1: Add The Resolver Stats Side Database
|
||||
|
||||
**Files:**
|
||||
- Create: `musicdl/catalogsync/resolver_stats.py`
|
||||
- Create: `tests/catalogsync/test_resolver_stats.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing side-database tests**
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class ResolverStatsRepositoryTests(unittest.TestCase):
|
||||
def test_initialize_resolver_stats_database_creates_stats_table(self):
|
||||
from musicdl.catalogsync.resolver_stats import initialize_resolver_stats_database
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "resolver_stats.db"
|
||||
conn = initialize_resolver_stats_database(db_path)
|
||||
try:
|
||||
table_names = {
|
||||
row["name"]
|
||||
for row in conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type = 'table'"
|
||||
).fetchall()
|
||||
}
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
self.assertIn("resolver_source_stats", table_names)
|
||||
|
||||
def test_rank_fallback_sources_keeps_config_order_before_warmup(self):
|
||||
from musicdl.catalogsync.resolver_stats import ResolverStatsRepository
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
repo = ResolverStatsRepository(Path(tmpdir) / "resolver_stats.db")
|
||||
repo.record_fallback_result("qq", "kuwo", succeeded=True)
|
||||
ranked = repo.rank_fallback_sources(
|
||||
"qq",
|
||||
["kuwo", "migu", "qianqian"],
|
||||
warmup_attempts=1000,
|
||||
)
|
||||
|
||||
self.assertEqual(["kuwo", "migu", "qianqian"], ranked)
|
||||
|
||||
def test_rank_fallback_sources_reorders_after_warmup_per_origin_source(self):
|
||||
from musicdl.catalogsync.resolver_stats import ResolverStatsRepository
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
repo = ResolverStatsRepository(Path(tmpdir) / "resolver_stats.db")
|
||||
for _ in range(800):
|
||||
repo.record_fallback_result("qq", "migu", succeeded=True)
|
||||
for _ in range(200):
|
||||
repo.record_fallback_result("qq", "kuwo", succeeded=False)
|
||||
ranked = repo.rank_fallback_sources(
|
||||
"qq",
|
||||
["kuwo", "migu", "qianqian"],
|
||||
warmup_attempts=1000,
|
||||
)
|
||||
|
||||
self.assertEqual(["migu", "kuwo", "qianqian"], ranked)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused side-database tests and verify they fail**
|
||||
|
||||
Run: `python -m pytest tests/catalogsync/test_resolver_stats.py -q`
|
||||
|
||||
Expected: `ModuleNotFoundError` or missing symbol failures for `musicdl.catalogsync.resolver_stats`.
|
||||
|
||||
- [ ] **Step 3: Write the minimal side-database implementation**
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
import sqlite3
|
||||
from contextlib import suppress
|
||||
from pathlib import Path
|
||||
|
||||
SQLITE_BUSY_TIMEOUT_MS = 30000
|
||||
RESOLVER_FALLBACK_WARMUP_ATTEMPTS = 1000
|
||||
|
||||
SCHEMA_STATEMENTS = [
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS resolver_source_stats (
|
||||
origin_source TEXT NOT NULL,
|
||||
candidate_source TEXT NOT NULL,
|
||||
attempt_count INTEGER NOT NULL DEFAULT 0,
|
||||
resolve_success_count INTEGER NOT NULL DEFAULT 0,
|
||||
last_attempt_at TEXT,
|
||||
last_success_at TEXT,
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
||||
PRIMARY KEY(origin_source, candidate_source)
|
||||
)
|
||||
""",
|
||||
"""
|
||||
CREATE INDEX IF NOT EXISTS idx_resolver_source_stats_origin
|
||||
ON resolver_source_stats (origin_source)
|
||||
""",
|
||||
]
|
||||
|
||||
|
||||
def default_resolver_stats_db_path(db_path: str | Path) -> Path:
|
||||
return Path(db_path).resolve().with_name("resolver_stats.db")
|
||||
|
||||
|
||||
def connect_resolver_stats_database(db_path: str | Path) -> sqlite3.Connection:
|
||||
path = Path(db_path)
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(path, timeout=SQLITE_BUSY_TIMEOUT_MS / 1000)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute(f"PRAGMA busy_timeout = {SQLITE_BUSY_TIMEOUT_MS}")
|
||||
with suppress(sqlite3.OperationalError):
|
||||
conn.execute("PRAGMA journal_mode = WAL")
|
||||
with suppress(sqlite3.OperationalError):
|
||||
conn.execute("PRAGMA synchronous = NORMAL")
|
||||
return conn
|
||||
|
||||
|
||||
def initialize_resolver_stats_database(db_path: str | Path) -> sqlite3.Connection:
|
||||
conn = connect_resolver_stats_database(db_path)
|
||||
for statement in SCHEMA_STATEMENTS:
|
||||
conn.execute(statement)
|
||||
conn.commit()
|
||||
return conn
|
||||
|
||||
|
||||
class ResolverStatsRepository:
|
||||
def __init__(self, db_path: str | Path):
|
||||
self.db_path = Path(db_path)
|
||||
initialize_resolver_stats_database(self.db_path).close()
|
||||
|
||||
def record_fallback_result(self, origin_source: str, candidate_source: str, *, succeeded: bool) -> None:
|
||||
with connect_resolver_stats_database(self.db_path) as conn:
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO resolver_source_stats (
|
||||
origin_source, candidate_source, attempt_count, resolve_success_count,
|
||||
last_attempt_at, last_success_at
|
||||
)
|
||||
VALUES (?, ?, 1, ?, CURRENT_TIMESTAMP, CASE WHEN ? THEN CURRENT_TIMESTAMP ELSE NULL END)
|
||||
ON CONFLICT(origin_source, candidate_source) DO UPDATE SET
|
||||
attempt_count = attempt_count + 1,
|
||||
resolve_success_count = resolve_success_count + excluded.resolve_success_count,
|
||||
last_attempt_at = CURRENT_TIMESTAMP,
|
||||
last_success_at = CASE
|
||||
WHEN excluded.resolve_success_count > 0 THEN CURRENT_TIMESTAMP
|
||||
ELSE resolver_source_stats.last_success_at
|
||||
END,
|
||||
updated_at = CURRENT_TIMESTAMP
|
||||
""",
|
||||
(origin_source, candidate_source, 1 if succeeded else 0, 1 if succeeded else 0),
|
||||
)
|
||||
|
||||
def rank_fallback_sources(
|
||||
self,
|
||||
origin_source: str,
|
||||
fallback_sources: list[str],
|
||||
*,
|
||||
warmup_attempts: int = RESOLVER_FALLBACK_WARMUP_ATTEMPTS,
|
||||
) -> list[str]:
|
||||
ordered = list(fallback_sources)
|
||||
if not ordered:
|
||||
return []
|
||||
with connect_resolver_stats_database(self.db_path) as conn:
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT candidate_source, attempt_count, resolve_success_count
|
||||
FROM resolver_source_stats
|
||||
WHERE origin_source = ?
|
||||
""",
|
||||
(origin_source,),
|
||||
).fetchall()
|
||||
total_attempts = sum(int(row["attempt_count"]) for row in rows)
|
||||
if total_attempts < warmup_attempts:
|
||||
return ordered
|
||||
stats = {
|
||||
str(row["candidate_source"]): (
|
||||
(int(row["resolve_success_count"]) + 1) / (int(row["attempt_count"]) + 2)
|
||||
)
|
||||
for row in rows
|
||||
}
|
||||
return sorted(ordered, key=lambda source: (-stats.get(source, 0.5), ordered.index(source)))
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the side-database tests and verify they pass**
|
||||
|
||||
Run: `python -m pytest tests/catalogsync/test_resolver_stats.py -q`
|
||||
|
||||
Expected: `3 passed`
|
||||
|
||||
- [ ] **Step 5: Commit the side-database foundation**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/resolver_stats.py tests/catalogsync/test_resolver_stats.py
|
||||
git commit -m "feat: add resolver stats side database"
|
||||
```
|
||||
|
||||
### Task 2: Teach The Resolver To Use Ranked Fallback Sources
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/resolver.py`
|
||||
- Modify: `musicdl/catalogsync/downloader.py`
|
||||
- Modify: `tests/catalogsync/test_resolver.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing resolver behavior tests**
|
||||
|
||||
```python
|
||||
def test_resolver_uses_ranked_top_two_fallback_sources_after_warmup(self):
|
||||
from musicdl.catalogsync.resolver import MultiSourceSongResolver
|
||||
from musicdl.modules.utils.data import SongInfo
|
||||
|
||||
class FakeStatsRepo:
|
||||
def rank_fallback_sources(self, origin_source, fallback_sources, warmup_attempts=1000):
|
||||
self.rank_call = (origin_source, list(fallback_sources), warmup_attempts)
|
||||
return ["migu", "kuwo", "qianqian"]
|
||||
|
||||
def record_fallback_result(self, origin_source, candidate_source, *, succeeded):
|
||||
self.records.append((origin_source, candidate_source, succeeded))
|
||||
|
||||
def __init__(self):
|
||||
self.records = []
|
||||
|
||||
class FakeClient:
|
||||
def __init__(self, source, result=None, calls=None):
|
||||
self.source = source
|
||||
self.result = list(result or [])
|
||||
self.calls = calls
|
||||
|
||||
def search(self, keyword, num_threadings=1, request_overrides=None, rule=None, main_process_context=None):
|
||||
self.calls.append(self.source)
|
||||
return list(self.result)
|
||||
|
||||
snapshot = SongInfo(
|
||||
source="QQMusicClient",
|
||||
identifier="song-1",
|
||||
song_name="Song 1",
|
||||
singers="Singer 1",
|
||||
raw_data={"search": {"id": "song-1"}},
|
||||
download_url=None,
|
||||
download_url_status={},
|
||||
)
|
||||
migu_hit = SongInfo(
|
||||
source="MiguMusicClient",
|
||||
identifier="migu-song-1",
|
||||
song_name="Song 1",
|
||||
singers="Singer 1",
|
||||
ext="mp3",
|
||||
download_url="https://example.com/song-1.mp3",
|
||||
download_url_status={"ok": True},
|
||||
)
|
||||
search_calls = []
|
||||
stats_repo = FakeStatsRepo()
|
||||
resolver = MultiSourceSongResolver(
|
||||
client_factory=lambda platform: {
|
||||
"qq": FakeClient("qq", [], search_calls),
|
||||
"kuwo": FakeClient("kuwo", [], search_calls),
|
||||
"migu": FakeClient("migu", [migu_hit], search_calls),
|
||||
"qianqian": FakeClient("qianqian", [], search_calls),
|
||||
}[platform],
|
||||
resolver_stats_repo=stats_repo,
|
||||
)
|
||||
|
||||
resolved = resolver.resolve_song_info(
|
||||
row={"platform": "qq", "name": "Song 1", "singers": "Singer 1", "remote_song_id": "song-1"},
|
||||
snapshot_song_info=snapshot,
|
||||
download_sources=["qq", "kuwo", "migu", "qianqian"],
|
||||
)
|
||||
|
||||
self.assertEqual(["qq", "migu"], search_calls)
|
||||
self.assertEqual(
|
||||
[("qq", "migu", True)],
|
||||
stats_repo.records,
|
||||
)
|
||||
self.assertEqual("MiguMusicClient", resolved.source)
|
||||
|
||||
def test_resolver_continues_after_ranked_top_two_fail(self):
|
||||
from musicdl.catalogsync.resolver import MultiSourceSongResolver
|
||||
from musicdl.modules.utils.data import SongInfo
|
||||
|
||||
class FakeStatsRepo:
|
||||
def rank_fallback_sources(self, origin_source, fallback_sources, warmup_attempts=1000):
|
||||
return ["migu", "kuwo", "qianqian"]
|
||||
|
||||
def record_fallback_result(self, origin_source, candidate_source, *, succeeded):
|
||||
self.records.append((candidate_source, succeeded))
|
||||
|
||||
def __init__(self):
|
||||
self.records = []
|
||||
|
||||
class FakeClient:
|
||||
def __init__(self, source, result, calls):
|
||||
self.source = source
|
||||
self.result = list(result)
|
||||
self.calls = calls
|
||||
|
||||
def search(self, keyword, num_threadings=1, request_overrides=None, rule=None, main_process_context=None):
|
||||
self.calls.append(self.source)
|
||||
return list(self.result)
|
||||
|
||||
snapshot = SongInfo(
|
||||
source="QQMusicClient",
|
||||
identifier="song-2",
|
||||
song_name="Song 2",
|
||||
singers="Singer 2",
|
||||
raw_data={"search": {"id": "song-2"}},
|
||||
download_url=None,
|
||||
download_url_status={},
|
||||
)
|
||||
qianqian_hit = SongInfo(
|
||||
source="QianqianMusicClient",
|
||||
identifier="qianqian-song-2",
|
||||
song_name="Song 2",
|
||||
singers="Singer 2",
|
||||
ext="mp3",
|
||||
download_url="https://example.com/song-2.mp3",
|
||||
download_url_status={"ok": True},
|
||||
)
|
||||
calls = []
|
||||
stats_repo = FakeStatsRepo()
|
||||
resolver = MultiSourceSongResolver(
|
||||
client_factory=lambda platform: {
|
||||
"qq": FakeClient("qq", [], calls),
|
||||
"migu": FakeClient("migu", [], calls),
|
||||
"kuwo": FakeClient("kuwo", [], calls),
|
||||
"qianqian": FakeClient("qianqian", [qianqian_hit], calls),
|
||||
}[platform],
|
||||
resolver_stats_repo=stats_repo,
|
||||
)
|
||||
|
||||
resolved = resolver.resolve_song_info(
|
||||
row={"platform": "qq", "name": "Song 2", "singers": "Singer 2", "remote_song_id": "song-2"},
|
||||
snapshot_song_info=snapshot,
|
||||
download_sources=["qq", "kuwo", "migu", "qianqian"],
|
||||
)
|
||||
|
||||
self.assertEqual(["qq", "migu", "kuwo", "qianqian"], calls)
|
||||
self.assertEqual("QianqianMusicClient", resolved.source)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the focused resolver tests and verify they fail**
|
||||
|
||||
Run: `python -m pytest tests/catalogsync/test_resolver.py -q`
|
||||
|
||||
Expected: failures for unexpected source order and missing `resolver_stats_repo` support.
|
||||
|
||||
- [ ] **Step 3: Write the minimal resolver and downloader integration**
|
||||
|
||||
```python
|
||||
class MultiSourceSongResolver:
|
||||
def __init__(
|
||||
self,
|
||||
client_factory,
|
||||
request_overrides_factory=None,
|
||||
resolver_stats_repo=None,
|
||||
warmup_attempts: int = RESOLVER_FALLBACK_WARMUP_ATTEMPTS,
|
||||
):
|
||||
self.client_factory = client_factory
|
||||
self.request_overrides_factory = request_overrides_factory or (lambda timeout: {"timeout": timeout})
|
||||
self.resolver_stats_repo = resolver_stats_repo
|
||||
self.warmup_attempts = int(warmup_attempts)
|
||||
|
||||
def _rank_fallback_sources(self, origin_source: str, fallback_sources: list[str]) -> list[str]:
|
||||
if self.resolver_stats_repo is None:
|
||||
return list(fallback_sources)
|
||||
try:
|
||||
return self.resolver_stats_repo.rank_fallback_sources(
|
||||
origin_source,
|
||||
list(fallback_sources),
|
||||
warmup_attempts=self.warmup_attempts,
|
||||
)
|
||||
except Exception:
|
||||
return list(fallback_sources)
|
||||
|
||||
def _record_fallback_result(self, origin_source: str, candidate_source: str, *, succeeded: bool) -> None:
|
||||
if self.resolver_stats_repo is None:
|
||||
return
|
||||
try:
|
||||
self.resolver_stats_repo.record_fallback_result(
|
||||
origin_source,
|
||||
candidate_source,
|
||||
succeeded=succeeded,
|
||||
)
|
||||
except Exception:
|
||||
return
|
||||
|
||||
def resolve_song_info(self, row, snapshot_song_info, download_sources=None, progress_callback=None):
|
||||
target_song_info = self._build_target_song_info(row=row, snapshot_song_info=snapshot_song_info)
|
||||
preferred_source = normalize_source_name(getattr(target_song_info, "source", None) or row.get("platform"))
|
||||
ordered_sources = dedupe_preserve_order(list(download_sources or DEFAULT_DOWNLOAD_SOURCES))
|
||||
fallback_sources = [source for source in ordered_sources if source != preferred_source]
|
||||
ranked_fallback_sources = self._rank_fallback_sources(preferred_source, fallback_sources)
|
||||
|
||||
candidate_rows = []
|
||||
if preferred_source in ordered_sources:
|
||||
self._emit_progress(progress_callback, f"resolving source {preferred_source} (1/{len(ordered_sources)})")
|
||||
client = self.client_factory(preferred_source)
|
||||
refreshed_song = self._refresh_song_info(client, target_song_info)
|
||||
if self._has_valid_download_url(refreshed_song):
|
||||
merged_refreshed = merge_resolved_song_info(target_song_info, refreshed_song)
|
||||
refreshed_match_priority = song_info_match_priority(merged_refreshed, target_song_info)
|
||||
candidate_rows.append((merged_refreshed, refreshed_match_priority, 0))
|
||||
if is_high_confidence_match(refreshed_match_priority):
|
||||
return merged_refreshed
|
||||
search_candidates = self._search_source_candidates(preferred_source, build_resolve_keyword(target_song_info, row))
|
||||
best_candidate = self._pick_best_candidate(search_candidates, target_song_info, 0)
|
||||
if best_candidate is not None:
|
||||
merged_candidate = merge_resolved_song_info(target_song_info, best_candidate)
|
||||
match_priority = song_info_match_priority(merged_candidate, target_song_info)
|
||||
candidate_rows.append((merged_candidate, match_priority, 0))
|
||||
if is_high_confidence_match(match_priority):
|
||||
return merged_candidate
|
||||
|
||||
for offset, source in enumerate(ranked_fallback_sources, start=2):
|
||||
self._emit_progress(progress_callback, f"resolving source {source} ({offset}/{len(ordered_sources)})")
|
||||
search_candidates = self._search_source_candidates(source, build_resolve_keyword(target_song_info, row))
|
||||
best_candidate = self._pick_best_candidate(search_candidates, target_song_info, offset)
|
||||
succeeded = best_candidate is not None
|
||||
self._record_fallback_result(preferred_source, source, succeeded=succeeded)
|
||||
if not succeeded:
|
||||
continue
|
||||
return merge_resolved_song_info(target_song_info, best_candidate)
|
||||
|
||||
if candidate_rows:
|
||||
candidate_rows.sort(key=lambda item: (match_priority_group(item[1]), search_result_quality_group(item[0]), -candidate_file_size_bytes(item[0]), item[2], item[1]))
|
||||
return candidate_rows[0][0]
|
||||
return target_song_info
|
||||
|
||||
|
||||
class CatalogDownloader:
|
||||
def __init__(self, repository, work_dir="musicdl_outputs/catalogsync", worker_count=DEFAULT_DOWNLOAD_WORKERS):
|
||||
self.repository = repository
|
||||
resolver_stats_repo = ResolverStatsRepository(default_resolver_stats_db_path(self.repository.db_path))
|
||||
self._resolver = MultiSourceSongResolver(
|
||||
client_factory=lambda platform: self.get_client(platform),
|
||||
request_overrides_factory=lambda timeout: self._request_overrides(timeout),
|
||||
resolver_stats_repo=resolver_stats_repo,
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the resolver-focused tests and verify they pass**
|
||||
|
||||
Run: `python -m pytest tests/catalogsync/test_resolver.py tests/catalogsync/test_resolver_stats.py -q`
|
||||
|
||||
Expected: all resolver and resolver-stats tests pass.
|
||||
|
||||
- [ ] **Step 5: Commit the ranked resolver behavior**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/resolver.py musicdl/catalogsync/downloader.py tests/catalogsync/test_resolver.py tests/catalogsync/test_resolver_stats.py
|
||||
git commit -m "feat: rank resolver fallback sources by origin"
|
||||
```
|
||||
|
||||
### Task 3: Initialize The Side Database At Startup Boundaries
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/cli.py`
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
- Modify: `tests/catalogsync/test_cli.py`
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing startup initialization tests**
|
||||
|
||||
```python
|
||||
def test_init_db_command_creates_resolver_stats_side_db(self):
|
||||
from musicdl.catalogsync.cli import cli
|
||||
|
||||
runner = CliRunner()
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
db_path = Path(tmpdir) / "catalogsync.db"
|
||||
side_db_path = Path(tmpdir) / "resolver_stats.db"
|
||||
library_root = Path(tmpdir) / "library"
|
||||
|
||||
result = runner.invoke(
|
||||
cli,
|
||||
["init-db", "--db", str(db_path), "--library-root", str(library_root)],
|
||||
)
|
||||
|
||||
self.assertEqual(0, result.exit_code, msg=result.output)
|
||||
self.assertTrue(side_db_path.exists())
|
||||
|
||||
def test_create_app_initializes_resolver_stats_side_db(self):
|
||||
from musicdl.catalogsync.db import initialize_database
|
||||
from musicdl.catalogsync.ops.web import create_app
|
||||
|
||||
with tempfile.TemporaryDirectory(ignore_cleanup_errors=True) as tmpdir:
|
||||
root = Path(tmpdir)
|
||||
db_path = root / "catalogsync.db"
|
||||
env_path = root / "catalogsync.env"
|
||||
side_db_path = root / "resolver_stats.db"
|
||||
env_path.write_text("ROOT_DIR=/music\nDOWNLOAD_SOURCES=qq\n", encoding="utf-8")
|
||||
initialize_database(db_path).close()
|
||||
|
||||
app = create_app(db_path=db_path, env_path=env_path)
|
||||
|
||||
self.assertIsNotNone(app)
|
||||
self.assertTrue(side_db_path.exists())
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the startup tests and verify they fail**
|
||||
|
||||
Run: `python -m pytest tests/catalogsync/test_cli.py -k "resolver_stats_side_db" tests/catalogsync/test_ops_api.py -k "resolver_stats_side_db" -q`
|
||||
|
||||
Expected: assertions fail because `resolver_stats.db` is not created yet.
|
||||
|
||||
- [ ] **Step 3: Wire startup initialization to the side database**
|
||||
|
||||
```python
|
||||
from .resolver_stats import default_resolver_stats_db_path, initialize_resolver_stats_database
|
||||
|
||||
|
||||
class CatalogSyncApplication:
|
||||
def __init__(self, db_path: str, library_root: str | None = None):
|
||||
self.db_path = db_path
|
||||
self.library_root = library_root
|
||||
initialize_database(db_path, default_library_root=library_root).close()
|
||||
initialize_resolver_stats_database(default_resolver_stats_db_path(db_path)).close()
|
||||
self.repository = CatalogRepository(db_path)
|
||||
self.service = CatalogSyncService(self.repository)
|
||||
self.downloader = CatalogDownloader(self.repository)
|
||||
|
||||
def init_db(self):
|
||||
initialize_database(self.db_path, default_library_root=self.library_root).close()
|
||||
initialize_resolver_stats_database(default_resolver_stats_db_path(self.db_path)).close()
|
||||
|
||||
|
||||
def create_app(db_path: str | Path, env_path: str | Path, *, start_runner: bool = False, runner_sleep_seconds: float = 1.0) -> FastAPI:
|
||||
db_file = Path(db_path)
|
||||
initialize_database(db_file).close()
|
||||
initialize_resolver_stats_database(default_resolver_stats_db_path(db_file)).close()
|
||||
...
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the startup tests and verify they pass**
|
||||
|
||||
Run: `python -m pytest tests/catalogsync/test_cli.py -k "resolver_stats_side_db" tests/catalogsync/test_ops_api.py -k "resolver_stats_side_db" -q`
|
||||
|
||||
Expected: `2 passed`
|
||||
|
||||
- [ ] **Step 5: Commit the startup wiring**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/cli.py musicdl/catalogsync/ops/web.py tests/catalogsync/test_cli.py tests/catalogsync/test_ops_api.py
|
||||
git commit -m "feat: initialize resolver stats database on startup"
|
||||
```
|
||||
|
||||
### Task 4: Run Full Verification And NAS Validation
|
||||
|
||||
**Files:**
|
||||
- Modify: `musicdl/catalogsync/resolver.py`
|
||||
- Modify: `musicdl/catalogsync/downloader.py`
|
||||
- Modify: `musicdl/catalogsync/cli.py`
|
||||
- Modify: `musicdl/catalogsync/ops/web.py`
|
||||
- Modify: `musicdl/catalogsync/resolver_stats.py`
|
||||
- Modify: `tests/catalogsync/test_resolver.py`
|
||||
- Modify: `tests/catalogsync/test_resolver_stats.py`
|
||||
- Modify: `tests/catalogsync/test_cli.py`
|
||||
- Modify: `tests/catalogsync/test_ops_api.py`
|
||||
|
||||
- [ ] **Step 1: Run the full local regression slice**
|
||||
|
||||
Run: `python -m pytest tests/catalogsync/test_resolver_stats.py tests/catalogsync/test_resolver.py tests/catalogsync/test_cli.py tests/catalogsync/test_services.py tests/catalogsync/test_ops_executors.py tests/catalogsync/test_ops_runner.py tests/catalogsync/test_ops_api.py tests/catalogsync/test_runtime.py -q`
|
||||
|
||||
Expected: all tests pass, with only the existing known warning if it still appears.
|
||||
|
||||
- [ ] **Step 2: Deploy to NAS**
|
||||
|
||||
Run: `powershell -ExecutionPolicy Bypass -File .\deploy-catalogsync.ps1`
|
||||
|
||||
Expected: deploy completes successfully, health check passes, and single-instance check passes.
|
||||
|
||||
- [ ] **Step 3: Sample NAS dashboard and confirm dual-download bursts still appear**
|
||||
|
||||
Run:
|
||||
|
||||
```powershell
|
||||
powershell -ExecutionPolicy Bypass -File 'C:\Users\Administrator\.codex\skills\nas-ssh-192168543\scripts\run.ps1' "python3 - <<'PY'
|
||||
import json, urllib.request
|
||||
with urllib.request.urlopen('http://127.0.0.1:18080/api/dashboard?include_task_rows=false', timeout=10) as resp:
|
||||
data = json.load(resp)
|
||||
print(json.dumps({
|
||||
'downloaded_songs': data['download_stats']['downloaded_songs'],
|
||||
'speed_bps': data['transfer_stats']['download_speed_bytes_per_sec'],
|
||||
'workers': [w['worker_name'] for w in data['workers'] if w.get('status') == 'running'],
|
||||
}, ensure_ascii=False))
|
||||
PY"
|
||||
```
|
||||
|
||||
Expected: running workers still include `download-1` and `download-2` during active bursts, and `downloaded_songs` continues increasing.
|
||||
|
||||
- [ ] **Step 4: Commit the verified end-to-end implementation**
|
||||
|
||||
```bash
|
||||
git add musicdl/catalogsync/resolver_stats.py musicdl/catalogsync/resolver.py musicdl/catalogsync/downloader.py musicdl/catalogsync/cli.py musicdl/catalogsync/ops/web.py tests/catalogsync/test_resolver_stats.py tests/catalogsync/test_resolver.py tests/catalogsync/test_cli.py tests/catalogsync/test_ops_api.py
|
||||
git commit -m "feat: persist resolver fallback source rankings"
|
||||
```
|
||||
@@ -0,0 +1,120 @@
|
||||
# Catalog Sync Design
|
||||
|
||||
## Goal
|
||||
|
||||
Build an independent catalog sync and download workflow that:
|
||||
|
||||
- extracts playlist-square and toplist sources from NetEase, QQ Music, and Kuwo
|
||||
- stores `playlist pool -> playlist -> song` and derived `artist pool -> artist -> song`
|
||||
- skips duplicate downloads by `(platform, remote_song_id)`
|
||||
- prefers highest available quality and falls back when needed
|
||||
- supports pausing on low disk space and continuing in a new local directory
|
||||
- keeps storage metadata compatible with local paths, cloud-drive paths, and bucket/key style object storage
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- Independent Python CLI entrypoint
|
||||
- SQLite schema for catalog, file, and task state
|
||||
- Source collectors for:
|
||||
- NetEase playlist square + toplists
|
||||
- QQ playlist square + toplists
|
||||
- Kuwo playlist square + toplists
|
||||
- Reuse existing platform `parseplaylist()` and download logic where practical
|
||||
- Derived artist pool updates during playlist sync
|
||||
- Lazy artist enrichment metadata and hooks
|
||||
- Local download dedupe and disk-space prompts
|
||||
- Storage schema compatible with future uploads
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- Full cross-platform song canonicalization
|
||||
- GUI integration
|
||||
- Production-ready 123 cloud upload implementation
|
||||
- Streaming upload while downloading
|
||||
|
||||
## Constraints
|
||||
|
||||
- Prefer reuse of existing source clients under `musicdl.modules.sources`
|
||||
- Avoid new mandatory dependencies where stdlib is sufficient
|
||||
- Keep first version recoverable and inspectable from local files and SQLite
|
||||
- Preserve compatibility with the existing `musicdl` package and console script
|
||||
|
||||
## Architecture
|
||||
|
||||
The new workflow lives in a dedicated package under `musicdl.catalogsync`. Collectors fetch playlist candidates per source and pool kind, then a sync layer normalizes and persists them. Playlist parsing reuses the existing per-platform clients to resolve tracks into `SongInfo` objects, which are then stored into catalog tables and used to derive artist pool membership. A download planner reads undispatched songs from the database, skips anything already represented by an active local file asset, and otherwise delegates the actual media fetch to existing source download logic.
|
||||
|
||||
Storage metadata is modeled with a logical file layer plus a location layer. `file_assets` describes the downloaded media version for a song, while `file_locations` records where that file lives. The first implementation only writes local locations, but the schema supports cloud-drive or bucket/key locations later without changing the song-level model.
|
||||
|
||||
## Data Model
|
||||
|
||||
### Catalog
|
||||
|
||||
- `playlist_pools`
|
||||
- `playlists`
|
||||
- `pool_playlists`
|
||||
- `artist_pools`
|
||||
- `artists`
|
||||
- `pool_artists`
|
||||
- `songs`
|
||||
- `playlist_songs`
|
||||
- `artist_songs`
|
||||
|
||||
### File and Storage
|
||||
|
||||
- `storage_backends`
|
||||
- `file_assets`
|
||||
- `file_locations`
|
||||
- `download_tasks`
|
||||
|
||||
## Key Behaviors
|
||||
|
||||
### Playlist Sync
|
||||
|
||||
1. Fetch playlist-square and toplist candidates for selected sources.
|
||||
2. Upsert pool rows and playlist rows.
|
||||
3. Link pools to playlists.
|
||||
4. For selected playlists, call platform `parseplaylist()` to resolve songs.
|
||||
5. Upsert song rows and `playlist_songs`.
|
||||
6. Extract artists from raw platform metadata when possible, otherwise from normalized singer strings.
|
||||
7. Upsert artists and attach them to derived artist pools and `artist_songs`.
|
||||
|
||||
### Download Dedupe
|
||||
|
||||
- A song is considered already owned when it has an active local `file_location`.
|
||||
- Dedupe key at song level is `(platform, remote_song_id)`.
|
||||
- The first implementation keeps one preferred file asset per song. Future uploads add locations, not duplicate song rows.
|
||||
|
||||
### Quality Selection
|
||||
|
||||
- Existing platform clients already attempt higher qualities first.
|
||||
- The workflow treats the returned file as the chosen asset and persists:
|
||||
- quality label
|
||||
- extension
|
||||
- file size
|
||||
- hash when available or computable
|
||||
|
||||
### Low Disk Space
|
||||
|
||||
- Before each download, check free space for the active local backend.
|
||||
- If insufficient, pause and prompt for a new local directory.
|
||||
- Upsert a new local backend row and continue subsequent downloads there.
|
||||
- Already downloaded files remain linked to their original backend.
|
||||
|
||||
### Future Upload Compatibility
|
||||
|
||||
- `storage_backends` represents local FS, cloud-drive roots, or object-storage containers.
|
||||
- `file_locations.container_name + locator` can represent:
|
||||
- local root + relative path
|
||||
- cloud root + remote path
|
||||
- bucket + key
|
||||
- Future upload jobs can attach new non-local locations to an existing `file_asset`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- Selected source collectors can persist playlist-square and toplist rows into SQLite.
|
||||
- Playlist sync can populate songs and derived artists from at least the supported source set.
|
||||
- Download command skips songs already backed by active local file locations.
|
||||
- Low-space prompt can switch to a new local directory and continue.
|
||||
- Tests cover schema creation, normalization, derived artist sync, dedupe checks, and collector parsing helpers.
|
||||
@@ -0,0 +1,289 @@
|
||||
# Download Layout And NAS Deployment Design
|
||||
|
||||
## Goal
|
||||
|
||||
Refine the current `musicdl.catalogsync` download flow so it can be deployed cleanly onto a NAS or any other Linux machine with:
|
||||
|
||||
- a portable script layout
|
||||
- a machine-local `.env` configuration file
|
||||
- a dedicated music library root separate from scripts and runtime state
|
||||
- a download directory structure of `platform/first_artist/filename`
|
||||
- path semantics that can be reused later by the upload workflow
|
||||
|
||||
This design intentionally focuses on download and deployment only. Upload automation is deferred to the next sub-project.
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- Introduce a portable deployment layout for NAS and other Linux targets
|
||||
- Separate application/runtime files from downloaded music files
|
||||
- Standardize local download paths as:
|
||||
- `<LIBRARY_DIR>/<platform>/<first_artist>/<filename>`
|
||||
- Preserve relative path semantics in `file_locations.locator`
|
||||
- Add machine-local configuration through `config/catalogsync.env`
|
||||
- Add bootstrap and runtime script conventions suitable for copying to other machines
|
||||
- Keep database and runtime files under the application home instead of the music library root
|
||||
- Ensure required directories are auto-created when bootstrapping or running
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- 123 cloud upload implementation
|
||||
- Object storage upload implementation
|
||||
- Concurrent download
|
||||
- Concurrent upload
|
||||
- Cross-platform song canonicalization
|
||||
- GUI integration
|
||||
- Deletion or migration of existing remote file locations
|
||||
|
||||
## Constraints
|
||||
|
||||
- Reuse the existing `musicdl.catalogsync` package and CLI as much as possible
|
||||
- Keep the deployment scripts portable so they can be copied to another Linux machine
|
||||
- Do not hardcode NAS-only paths inside the application logic
|
||||
- Store machine-specific paths in configuration, not in source code
|
||||
- Keep `file_locations.locator` stable so the future upload phase can reuse the same relative paths
|
||||
|
||||
## Deployment Model
|
||||
|
||||
### Local Repo Versus Target Machine
|
||||
|
||||
There are two kinds of scripts:
|
||||
|
||||
1. Bootstrap/deployment scripts that live in the repository and are run from the operator machine
|
||||
2. Runtime scripts that are copied onto the target machine and used there repeatedly
|
||||
|
||||
This avoids the circular problem of requiring a target-side script before the target-side directories exist.
|
||||
|
||||
### Target Directory Layout
|
||||
|
||||
Recommended target layout:
|
||||
|
||||
```text
|
||||
/volume4/Music_Cloud/
|
||||
├─ library/
|
||||
└─ catalogsync/
|
||||
├─ app/
|
||||
├─ bin/
|
||||
├─ config/
|
||||
├─ data/
|
||||
├─ inputs/
|
||||
└─ logs/
|
||||
```
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- `library`
|
||||
- downloaded music files only
|
||||
- `catalogsync/app`
|
||||
- synced code, virtual environment, and application files
|
||||
- `catalogsync/bin`
|
||||
- target-side runtime scripts
|
||||
- `catalogsync/config`
|
||||
- machine-local configuration such as `catalogsync.env`
|
||||
- `catalogsync/data`
|
||||
- SQLite database
|
||||
- `catalogsync/inputs`
|
||||
- playlist files and other operator-provided inputs
|
||||
- `catalogsync/logs`
|
||||
- runtime logs
|
||||
|
||||
## Configuration Model
|
||||
|
||||
### Machine-Local Environment File
|
||||
|
||||
Each deployed machine should use a local config file:
|
||||
|
||||
```bash
|
||||
ROOT_DIR=/volume4/Music_Cloud
|
||||
APP_HOME=/volume4/Music_Cloud/catalogsync
|
||||
LIBRARY_DIR=/volume4/Music_Cloud/library
|
||||
|
||||
DB_PATH=/volume4/Music_Cloud/catalogsync/data/catalogsync.db
|
||||
INPUT_DIR=/volume4/Music_Cloud/catalogsync/inputs
|
||||
LOG_DIR=/volume4/Music_Cloud/catalogsync/logs
|
||||
|
||||
PYTHON_BIN=python3
|
||||
VENV_DIR=/volume4/Music_Cloud/catalogsync/app/.venv
|
||||
|
||||
DOWNLOAD_LAYOUT=platform_first_artist
|
||||
```
|
||||
|
||||
### Configuration Rules
|
||||
|
||||
- `ROOT_DIR`
|
||||
- optional convenience root for deployment layout
|
||||
- `APP_HOME`
|
||||
- runtime home for scripts, DB, logs, and inputs
|
||||
- `LIBRARY_DIR`
|
||||
- physical location of downloaded music files
|
||||
- may be different from `ROOT_DIR`
|
||||
- `DB_PATH`
|
||||
- defaults to `<APP_HOME>/data/catalogsync.db`
|
||||
- `INPUT_DIR`
|
||||
- defaults to `<APP_HOME>/inputs`
|
||||
- `LOG_DIR`
|
||||
- defaults to `<APP_HOME>/logs`
|
||||
- `PYTHON_BIN`
|
||||
- interpreter used by runtime scripts
|
||||
- `VENV_DIR`
|
||||
- target-side virtualenv path
|
||||
- `DOWNLOAD_LAYOUT`
|
||||
- first supported value: `platform_first_artist`
|
||||
|
||||
This keeps deployment portable:
|
||||
|
||||
- copying to a new machine mainly requires updating `catalogsync.env`
|
||||
- moving the music library only requires updating `LIBRARY_DIR`
|
||||
|
||||
## Download Path Design
|
||||
|
||||
### Layout Rule
|
||||
|
||||
The first supported layout is:
|
||||
|
||||
```text
|
||||
<LIBRARY_DIR>/<platform>/<first_artist>/<filename>
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
||||
```text
|
||||
/volume4/Music_Cloud/library/netease/周杰伦/七里香.flac
|
||||
/volume4/Music_Cloud/library/qq/林俊杰/江南.mp3
|
||||
```
|
||||
|
||||
### Artist Directory Rule
|
||||
|
||||
- Use the first artist only
|
||||
- Do not create multi-artist directory names in the first version
|
||||
- If no artist is available, use a stable fallback such as `Unknown Artist`
|
||||
|
||||
This keeps paths shorter, more stable, and easier to reuse for upload.
|
||||
|
||||
### Locator Rule
|
||||
|
||||
`file_locations.locator` must store a path relative to `LIBRARY_DIR`.
|
||||
|
||||
Examples:
|
||||
|
||||
```text
|
||||
netease/周杰伦/七里香.flac
|
||||
qq/林俊杰/江南.mp3
|
||||
```
|
||||
|
||||
This is important because the future upload phase will reuse the same relative path for:
|
||||
|
||||
- cloud-drive locators
|
||||
- object-storage keys beneath a backend root prefix
|
||||
|
||||
## Directory Creation Behavior
|
||||
|
||||
When bootstrapping or first running on a machine, the system should auto-create any missing directories with `mkdir -p` semantics.
|
||||
|
||||
Required directories:
|
||||
|
||||
- `<ROOT_DIR>`
|
||||
- `<LIBRARY_DIR>`
|
||||
- `<APP_HOME>`
|
||||
- `<APP_HOME>/app`
|
||||
- `<APP_HOME>/bin`
|
||||
- `<APP_HOME>/config`
|
||||
- `<APP_HOME>/data`
|
||||
- `<APP_HOME>/inputs`
|
||||
- `<APP_HOME>/logs`
|
||||
|
||||
Rules:
|
||||
|
||||
- existing directories are reused without error
|
||||
- missing directories are created automatically
|
||||
- permission failures should produce a clear fatal error
|
||||
|
||||
## Script Model
|
||||
|
||||
### Repository-Side Bootstrap Scripts
|
||||
|
||||
The repository should contain deployment/bootstrap scripts that:
|
||||
|
||||
- connect to the target machine
|
||||
- create the target directory layout
|
||||
- copy application files
|
||||
- create or refresh the runtime scripts
|
||||
- create a config template if missing
|
||||
|
||||
These scripts must not hardcode a single target path internally beyond defaults that can be overridden.
|
||||
|
||||
### Target-Side Runtime Scripts
|
||||
|
||||
After bootstrap, the target machine should contain reusable runtime scripts under:
|
||||
|
||||
```text
|
||||
<APP_HOME>/bin
|
||||
```
|
||||
|
||||
Initial examples:
|
||||
|
||||
- `download_all.sh`
|
||||
- `download_from_file.sh`
|
||||
|
||||
Each runtime script should:
|
||||
|
||||
- load `config/catalogsync.env`
|
||||
- ensure the required directories exist
|
||||
- use `DB_PATH`, `INPUT_DIR`, `LOG_DIR`, and `LIBRARY_DIR`
|
||||
- write logs to the configured log directory
|
||||
|
||||
## CLI And Application Semantics
|
||||
|
||||
The current code uses `--library-root` as the download root. This design prefers moving toward a configuration-driven deployment model where:
|
||||
|
||||
- runtime scripts supply the configured paths
|
||||
- the application writes downloads into `LIBRARY_DIR`
|
||||
- the DB lives under `APP_HOME/data`
|
||||
|
||||
The implementation may either:
|
||||
|
||||
- keep `--library-root` internally for compatibility while runtime scripts pass `LIBRARY_DIR`
|
||||
- or introduce a cleaner root/app configuration layer as long as behavior stays aligned with this design
|
||||
|
||||
The important requirement is behavioral, not the exact CLI spelling:
|
||||
|
||||
- scripts and runtime state must stay separated from music files
|
||||
- downloaded file locations must follow `platform/first_artist/filename`
|
||||
|
||||
## Error Handling
|
||||
|
||||
- Missing config file:
|
||||
- fail fast with a clear message pointing to `catalogsync.env`
|
||||
- Missing required env values:
|
||||
- fail fast with a clear message naming the missing variable
|
||||
- Missing artist data:
|
||||
- use fallback artist directory and continue
|
||||
- Invalid filename/path characters:
|
||||
- sanitize to a filesystem-safe name
|
||||
- Existing file in the destination path:
|
||||
- preserve current dedupe behavior through DB state and active local file records
|
||||
- Directory creation failure:
|
||||
- fail fast with an actionable error
|
||||
|
||||
## Testing
|
||||
|
||||
Add or update coverage for:
|
||||
|
||||
- path-building helper for `platform/first_artist/filename`
|
||||
- first-artist extraction behavior
|
||||
- artist fallback behavior
|
||||
- locator values remaining relative to `LIBRARY_DIR`
|
||||
- directory auto-creation for deployment/runtime helpers
|
||||
- runtime config loading from `catalogsync.env`
|
||||
- download flow recording the new relative locator format in `file_locations`
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- Downloads are stored under `<LIBRARY_DIR>/<platform>/<first_artist>/<filename>`
|
||||
- `file_locations.locator` stores the path relative to `LIBRARY_DIR`
|
||||
- Application/runtime files are separate from music files
|
||||
- A deployment can be copied to another Linux machine by adjusting `catalogsync.env`
|
||||
- Bootstrap/runtime behavior auto-creates the expected directory structure
|
||||
- Existing download logic still records local files into the catalog database
|
||||
- The resulting local relative paths are suitable for reuse by the later upload implementation
|
||||
@@ -0,0 +1,168 @@
|
||||
# Playlist File Run Design
|
||||
|
||||
## Goal
|
||||
|
||||
Add a file-driven playlist execution path to `musicdl.catalogsync` so a user can provide a text file of playlist URLs and run the existing catalog sync and download pipeline against only those playlists.
|
||||
|
||||
The default behavior must remain unchanged when the new option is not used.
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- Add `--playlist-file` to the existing `run` command
|
||||
- Support two input line formats:
|
||||
- raw playlist URL
|
||||
- `platform,playlist_url`
|
||||
- Ignore blank lines and comment lines beginning with `#`
|
||||
- Auto-detect `netease`, `qq`, or `kuwo` from URL when platform is omitted
|
||||
- Deduplicate repeated playlist URLs within the same input file
|
||||
- Import file playlists into the existing catalog tables
|
||||
- Run sync and download only for playlists referenced by the file
|
||||
- Keep song and file dedupe behavior exactly as it works today
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- Incremental skip mode
|
||||
- New collect-mode behavior
|
||||
- New database tables for file imports
|
||||
- GUI integration
|
||||
- Upload automation
|
||||
|
||||
## Constraints
|
||||
|
||||
- Reuse the existing `playlists`, `playlist_pools`, and `pool_playlists` tables
|
||||
- Preserve current `run` behavior when `--playlist-file` is absent
|
||||
- Do not create duplicate playlist rows for the same `(platform, remote_playlist_id)`
|
||||
- Do not widen download scope to the full database when a playlist file is used
|
||||
- Keep implementation small and aligned with the current `catalogsync` package layout
|
||||
|
||||
## User-Facing Behavior
|
||||
|
||||
### Default Run Path
|
||||
|
||||
When `--playlist-file` is not provided:
|
||||
|
||||
1. `run` collects playlist pools from configured sources
|
||||
2. `run` syncs playlists from the database
|
||||
3. `run` downloads pending songs from the database
|
||||
|
||||
This matches the current behavior exactly.
|
||||
|
||||
### File-Driven Run Path
|
||||
|
||||
When `--playlist-file <path>` is provided:
|
||||
|
||||
1. Skip `collect`
|
||||
2. Read and parse the file
|
||||
3. Normalize and deduplicate the playlist entries from the file
|
||||
4. Upsert those playlists into the existing catalog database
|
||||
5. Attach them to a dedicated pool row representing the source file import
|
||||
6. Sync only those playlist IDs
|
||||
7. Download only songs belonging to those playlist IDs
|
||||
|
||||
## Input File Rules
|
||||
|
||||
Each non-empty, non-comment line must be one of:
|
||||
|
||||
```text
|
||||
https://music.163.com/#/playlist?id=17745989905
|
||||
qq,https://y.qq.com/n/ryqq/playlist/7707261125
|
||||
```
|
||||
|
||||
Parsing rules:
|
||||
|
||||
- Leading and trailing whitespace is trimmed
|
||||
- Blank lines are ignored
|
||||
- `# ...` lines are ignored
|
||||
- If a comma is present, split once into `platform` and `url`
|
||||
- If no platform is provided, infer it from the URL
|
||||
- Unsupported or unrecognized lines are reported and skipped
|
||||
- Repeated URLs in the same file are processed only once
|
||||
|
||||
## Architecture
|
||||
|
||||
The feature should be implemented as a narrow branch off the existing `run` workflow.
|
||||
|
||||
Recommended units:
|
||||
|
||||
- A file parser helper that converts input lines into normalized playlist import entries
|
||||
- A service method that imports manual playlists into the existing playlist catalog
|
||||
- A service method that syncs only a provided list of playlist IDs
|
||||
- A downloader method that queues only songs reachable from a provided list of playlist IDs
|
||||
|
||||
This keeps the current full-database path intact while adding a targeted path for file-based execution.
|
||||
|
||||
## Data Model
|
||||
|
||||
No new tables are required.
|
||||
|
||||
The imported playlists should reuse:
|
||||
|
||||
- `playlists`
|
||||
- `playlist_pools`
|
||||
- `pool_playlists`
|
||||
|
||||
Recommended pool representation:
|
||||
|
||||
- `pool_kind = manual_file`
|
||||
- `external_id = manual_file:<resolved file path>`
|
||||
- `name = Manual File Import: <filename>`
|
||||
|
||||
This preserves provenance without changing the main playlist model.
|
||||
|
||||
## Dedupe Behavior
|
||||
|
||||
### Playlist Rows
|
||||
|
||||
Duplicate playlist rows must not be created because `playlists` is already unique on `(platform, remote_playlist_id)`.
|
||||
|
||||
### Songs
|
||||
|
||||
Repeated sync of the same playlist may re-run parsing, but songs must continue to upsert by `(platform, remote_song_id)` and playlist-song links must remain unique by `(playlist_id, song_id)`.
|
||||
|
||||
### Files
|
||||
|
||||
Downloads must continue to rely on the existing `file_locations` and local-file checks so already downloaded songs are not fetched again.
|
||||
|
||||
## Error Handling
|
||||
|
||||
- Missing playlist file path: fail fast with a clear CLI error
|
||||
- File exists but contains no valid playlist lines: fail fast with a clear CLI error
|
||||
- Invalid individual lines: warn and skip, continue processing the rest
|
||||
- Playlist parse failure for one playlist: log the failure, continue with the remaining playlists
|
||||
- Download failure for one song: preserve the existing downloader behavior
|
||||
|
||||
## Output
|
||||
|
||||
The file-driven run path should report a compact summary including:
|
||||
|
||||
- total lines read
|
||||
- valid playlist entries
|
||||
- skipped invalid lines
|
||||
- deduplicated playlist count
|
||||
- synchronized song count
|
||||
- downloaded song count
|
||||
|
||||
## Tests
|
||||
|
||||
Add coverage for:
|
||||
|
||||
- file parsing of URL-only and `platform,url` lines
|
||||
- blank lines and comment handling
|
||||
- same-file URL dedupe
|
||||
- unsupported line handling
|
||||
- `run --playlist-file ...` taking the file-driven branch instead of `collect`
|
||||
- manual playlist import into a `manual_file` pool
|
||||
- sync limited to provided playlist IDs
|
||||
- download limited to songs linked to provided playlist IDs
|
||||
- repeated execution not creating duplicate playlist rows or duplicate local file downloads
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- `run --playlist-file <path>` processes only playlists from the file
|
||||
- omitting `--playlist-file` preserves current behavior
|
||||
- duplicate URLs inside one file are processed once
|
||||
- repeated runs do not create duplicate playlist rows
|
||||
- repeated runs do not redownload already owned local files
|
||||
- tests cover the file-driven branch and targeted sync/download behavior
|
||||
@@ -0,0 +1,724 @@
|
||||
# Catalogsync Operations Console Design
|
||||
|
||||
## Goal
|
||||
|
||||
Extend `musicdl.catalogsync` with a NAS-local web operations console that can:
|
||||
|
||||
- manage queue-based pipeline jobs for `collect`, `sync`, `download`, and `upload`
|
||||
- show playlist pool and playlist execution status as `未完成 / 进行中 / 已完成 / 异常`
|
||||
- show worker-level live processing state, especially which song each worker is handling
|
||||
- support global soft pause and resume across all active workers
|
||||
- survive process crashes or NAS restarts without restarting the whole catalog from scratch
|
||||
- allow retrying a single failed or interrupted song/item instead of rerunning the whole database
|
||||
- manage `catalogsync.env` as the primary operator configuration source
|
||||
|
||||
This design targets an internal NAS console, not a public-facing multi-user product.
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- Add a NAS-local web console for `catalogsync`
|
||||
- Add a database-backed job queue with exactly one active job at a time
|
||||
- Support these job templates:
|
||||
- `全链路`
|
||||
- `仅采集`
|
||||
- `仅同步`
|
||||
- `同步+下载`
|
||||
- `仅下载`
|
||||
- `仅上传`
|
||||
- `下载+上传`
|
||||
- Track job, stage, item, and worker state in SQLite
|
||||
- Show dashboard, queue, playlist pool, worker, log, and config views
|
||||
- Implement soft pause and resume
|
||||
- Implement crash-safe recovery at job-item granularity
|
||||
- Implement single-item retry and force-retry
|
||||
- Version and edit `catalogsync.env` from the web console
|
||||
- Reuse existing `musicdl.catalogsync` collectors, services, downloader, uploader, and storage model as much as possible
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- Multi-user login or permissions
|
||||
- Public internet exposure or hardened auth
|
||||
- Multiple active jobs running at the same time
|
||||
- Cross-machine worker distribution
|
||||
- Arbitrary user-defined stage graphs
|
||||
- Provider-specific cloud drive management beyond current object storage support
|
||||
- Automatic deletion of local or remote files
|
||||
- Editing business data such as songs or playlists directly from the UI
|
||||
|
||||
## Constraints
|
||||
|
||||
- The console runs on the NAS itself
|
||||
- `catalogsync.env` remains the configuration source of truth
|
||||
- A queued job must freeze the required runtime settings into a config snapshot so later env edits do not mutate in-flight work
|
||||
- Recovery must resume from unfinished work items instead of rerunning all songs or all playlists
|
||||
- Existing `musicdl.catalogsync` CLI and scripts must remain usable
|
||||
- The first version should optimize for operational stability, inspectability, and recoverability over architecture purity
|
||||
|
||||
## Operator Model
|
||||
|
||||
### Deployment Model
|
||||
|
||||
The web console runs on the same NAS host that already owns:
|
||||
|
||||
- the SQLite database
|
||||
- the local music library
|
||||
- the logs directory
|
||||
- the runtime scripts
|
||||
- the object storage configuration
|
||||
|
||||
This avoids a remote-control architecture for v1 and keeps job control, log access, file state, and recovery local.
|
||||
|
||||
### Configuration Model
|
||||
|
||||
`catalogsync.env` remains the operator-managed source of truth.
|
||||
|
||||
The console may:
|
||||
|
||||
- display current env values
|
||||
- validate and save new env revisions
|
||||
- apply a previous env revision as the current file
|
||||
|
||||
Queued jobs must store a `config_snapshot_json` copy of the relevant settings so:
|
||||
|
||||
- existing queued or running jobs stay deterministic
|
||||
- later env edits only affect newly created jobs
|
||||
|
||||
## Recommended Architecture
|
||||
|
||||
Use four layers:
|
||||
|
||||
1. `Web Console`
|
||||
- browser UI for dashboards, queue control, logs, and config management
|
||||
2. `Management API`
|
||||
- serves data and accepts job or config commands
|
||||
3. `Job Orchestrator / Runner`
|
||||
- single-process scheduler that owns queue progression, pause, resume, and recovery
|
||||
4. `Existing Catalogsync Executors`
|
||||
- reuse `collect`, `sync`, `download`, and `upload` behavior from current package modules
|
||||
|
||||
### Why Not A Thin Shell Wrapper
|
||||
|
||||
Wrapping only `download_all.sh` and `upload_all.sh` would not reliably provide:
|
||||
|
||||
- worker-level current song visibility
|
||||
- item-level retry
|
||||
- fine-grained recovery after process crashes
|
||||
- stable soft pause and resume
|
||||
|
||||
The console therefore needs first-class job and work-item tables instead of depending only on raw shell output.
|
||||
|
||||
## Job Model
|
||||
|
||||
### Active Job Policy
|
||||
|
||||
- only one job may be `running` at a time
|
||||
- additional jobs stay `queued`
|
||||
- a paused job may later resume and reclaim the active slot
|
||||
|
||||
This keeps:
|
||||
|
||||
- pause and resume semantics simple
|
||||
- resource ownership clear
|
||||
- crash recovery easier to reason about
|
||||
|
||||
### Job Templates
|
||||
|
||||
Supported templates and stage chains:
|
||||
|
||||
- `全链路`
|
||||
- `collect -> sync -> download -> upload`
|
||||
- `仅采集`
|
||||
- `collect`
|
||||
- `仅同步`
|
||||
- `sync`
|
||||
- `同步+下载`
|
||||
- `sync -> download`
|
||||
- `仅下载`
|
||||
- `download`
|
||||
- `仅上传`
|
||||
- `upload`
|
||||
- `下载+上传`
|
||||
- `download -> upload`
|
||||
|
||||
### Job Status
|
||||
|
||||
Recommended job statuses:
|
||||
|
||||
- `queued`
|
||||
- `running`
|
||||
- `pause_requested`
|
||||
- `paused`
|
||||
- `completed`
|
||||
- `completed_with_errors`
|
||||
- `failed`
|
||||
- `canceled`
|
||||
|
||||
### Stage Status
|
||||
|
||||
Recommended stage statuses:
|
||||
|
||||
- `pending`
|
||||
- `running`
|
||||
- `pause_requested`
|
||||
- `paused`
|
||||
- `completed`
|
||||
- `failed`
|
||||
- `skipped`
|
||||
|
||||
### Work Item Status
|
||||
|
||||
Recommended item statuses:
|
||||
|
||||
- `pending`
|
||||
- `running`
|
||||
- `succeeded`
|
||||
- `failed`
|
||||
- `interrupted`
|
||||
- `skipped`
|
||||
- `canceled`
|
||||
|
||||
The work item is the recovery and retry granularity. This is what prevents a single failure from forcing a whole-catalog restart.
|
||||
|
||||
## Data Model
|
||||
|
||||
### Existing Table Reuse
|
||||
|
||||
Keep current business tables as the catalog truth:
|
||||
|
||||
- `playlist_pools`
|
||||
- `playlists`
|
||||
- `pool_playlists`
|
||||
- `songs`
|
||||
- `playlist_songs`
|
||||
- `artists`
|
||||
- `song_artists`
|
||||
- `file_locations`
|
||||
- `object_storage_backends`
|
||||
|
||||
These continue to answer:
|
||||
|
||||
- what playlists exist
|
||||
- what songs belong to each playlist
|
||||
- which files exist locally or remotely
|
||||
|
||||
The new console layer adds execution truth around them.
|
||||
|
||||
### New Table: `job_runs`
|
||||
|
||||
Purpose:
|
||||
|
||||
- represent one queued or active operator job
|
||||
|
||||
Recommended fields:
|
||||
|
||||
```text
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||
job_type TEXT NOT NULL
|
||||
status TEXT NOT NULL
|
||||
priority INTEGER NOT NULL DEFAULT 100
|
||||
requested_by TEXT
|
||||
config_snapshot_json TEXT NOT NULL
|
||||
sources TEXT
|
||||
download_sources TEXT
|
||||
playlist_scope_json TEXT
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
started_at TEXT
|
||||
ended_at TEXT
|
||||
last_error TEXT
|
||||
resume_token TEXT
|
||||
```
|
||||
|
||||
### New Table: `job_stages`
|
||||
|
||||
Purpose:
|
||||
|
||||
- track the stage-level execution status inside one job
|
||||
|
||||
Recommended fields:
|
||||
|
||||
```text
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||
job_run_id INTEGER NOT NULL
|
||||
stage_type TEXT NOT NULL
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
seq_no INTEGER NOT NULL
|
||||
total_items INTEGER NOT NULL DEFAULT 0
|
||||
pending_items INTEGER NOT NULL DEFAULT 0
|
||||
running_items INTEGER NOT NULL DEFAULT 0
|
||||
success_items INTEGER NOT NULL DEFAULT 0
|
||||
failed_items INTEGER NOT NULL DEFAULT 0
|
||||
skipped_items INTEGER NOT NULL DEFAULT 0
|
||||
started_at TEXT
|
||||
ended_at TEXT
|
||||
last_error TEXT
|
||||
```
|
||||
|
||||
### New Table: `job_items`
|
||||
|
||||
Purpose:
|
||||
|
||||
- track the real execution unit for recovery and retry
|
||||
|
||||
Granularity by stage:
|
||||
|
||||
- `collect`
|
||||
- one pool/source fetch unit
|
||||
- `sync`
|
||||
- one playlist expansion unit
|
||||
- `download`
|
||||
- one song download unit
|
||||
- `upload`
|
||||
- one file upload unit
|
||||
|
||||
Recommended fields:
|
||||
|
||||
```text
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||
job_stage_id INTEGER NOT NULL
|
||||
item_type TEXT NOT NULL
|
||||
item_key TEXT NOT NULL
|
||||
playlist_pool_id INTEGER
|
||||
playlist_id INTEGER
|
||||
song_id INTEGER
|
||||
file_location_id INTEGER
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
attempt_count INTEGER NOT NULL DEFAULT 0
|
||||
max_attempts INTEGER NOT NULL DEFAULT 3
|
||||
worker_id INTEGER
|
||||
started_at TEXT
|
||||
ended_at TEXT
|
||||
last_error TEXT
|
||||
last_error_code TEXT
|
||||
payload_json TEXT
|
||||
UNIQUE(job_stage_id, item_key)
|
||||
```
|
||||
|
||||
### New Table: `job_workers`
|
||||
|
||||
Purpose:
|
||||
|
||||
- surface live worker state to the UI
|
||||
- show which song each worker is processing
|
||||
|
||||
Recommended fields:
|
||||
|
||||
```text
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||
job_run_id INTEGER NOT NULL
|
||||
job_stage_id INTEGER
|
||||
worker_name TEXT NOT NULL
|
||||
status TEXT NOT NULL DEFAULT 'idle'
|
||||
current_job_item_id INTEGER
|
||||
current_song_id INTEGER
|
||||
current_playlist_id INTEGER
|
||||
current_display_text TEXT
|
||||
heartbeat_at TEXT
|
||||
last_progress_text TEXT
|
||||
processed_count INTEGER NOT NULL DEFAULT 0
|
||||
error_count INTEGER NOT NULL DEFAULT 0
|
||||
```
|
||||
|
||||
### New Table: `job_commands`
|
||||
|
||||
Purpose:
|
||||
|
||||
- safely bridge UI actions and runner behavior
|
||||
|
||||
Recommended command types:
|
||||
|
||||
- `pause`
|
||||
- `resume`
|
||||
- `cancel`
|
||||
- `retry_item`
|
||||
- `force_retry_item`
|
||||
|
||||
Recommended fields:
|
||||
|
||||
```text
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||
job_run_id INTEGER NOT NULL
|
||||
command_type TEXT NOT NULL
|
||||
target_item_id INTEGER
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
applied_at TEXT
|
||||
payload_json TEXT
|
||||
```
|
||||
|
||||
### New Table: `job_events`
|
||||
|
||||
Purpose:
|
||||
|
||||
- structured audit trail for major runner events
|
||||
|
||||
Recommended event types include:
|
||||
|
||||
- `job_started`
|
||||
- `stage_started`
|
||||
- `item_started`
|
||||
- `item_failed`
|
||||
- `pause_requested`
|
||||
- `resumed`
|
||||
- `worker_heartbeat`
|
||||
- `recovery_requeued`
|
||||
|
||||
### New Table: `job_logs`
|
||||
|
||||
Purpose:
|
||||
|
||||
- queryable log lines for the UI
|
||||
|
||||
Recommended fields:
|
||||
|
||||
```text
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||
job_run_id INTEGER NOT NULL
|
||||
job_stage_id INTEGER
|
||||
worker_id INTEGER
|
||||
level TEXT NOT NULL
|
||||
message TEXT NOT NULL
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
```
|
||||
|
||||
### New Table: `config_revisions`
|
||||
|
||||
Purpose:
|
||||
|
||||
- keep revision history of `catalogsync.env`
|
||||
|
||||
Recommended fields:
|
||||
|
||||
```text
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||
source_type TEXT NOT NULL DEFAULT 'env_file'
|
||||
file_path TEXT NOT NULL
|
||||
content_text TEXT NOT NULL
|
||||
content_hash TEXT NOT NULL
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
applied_at TEXT
|
||||
note TEXT
|
||||
```
|
||||
|
||||
## UI Design
|
||||
|
||||
### Page 1: Dashboard
|
||||
|
||||
Show:
|
||||
|
||||
- current active job
|
||||
- queue length
|
||||
- downloaded song count
|
||||
- uploaded file count
|
||||
- failed item count
|
||||
- per-stage summaries
|
||||
- recent exceptions
|
||||
- worker heartbeat overview
|
||||
|
||||
### Page 2: Job Center
|
||||
|
||||
Show:
|
||||
|
||||
- queued jobs
|
||||
- running or paused job
|
||||
- job template
|
||||
- scope
|
||||
- stage progression
|
||||
- pause, resume, cancel controls
|
||||
|
||||
Allow:
|
||||
|
||||
- creating a new job from the supported templates
|
||||
- changing priority of queued jobs if desired
|
||||
|
||||
### Page 3: Playlist Pools
|
||||
|
||||
Show:
|
||||
|
||||
- all playlist pools and playlists
|
||||
- source platform
|
||||
- pool kind
|
||||
- song count
|
||||
- downloaded count
|
||||
- uploaded count
|
||||
- main status
|
||||
- current stage
|
||||
- last processed time
|
||||
- latest error summary
|
||||
|
||||
#### Derived Playlist Status Rules
|
||||
|
||||
Recommend deriving the main status as:
|
||||
|
||||
- `异常`
|
||||
- any recent failed item exists for the playlist
|
||||
- `进行中`
|
||||
- any running or pause-requested item exists
|
||||
- `未完成`
|
||||
- unfinished items remain but the playlist is not actively processing
|
||||
- `已完成`
|
||||
- no unfinished item remains in the relevant pipeline scope
|
||||
|
||||
### Page 4: Song Processing
|
||||
|
||||
Show:
|
||||
|
||||
- each worker and its current song
|
||||
- failed songs
|
||||
- interrupted songs
|
||||
- retryable items
|
||||
|
||||
Allow:
|
||||
|
||||
- retry single item
|
||||
- force-retry single item
|
||||
- filter by stage, platform, playlist, or error state
|
||||
|
||||
### Page 5: Logs And Exceptions
|
||||
|
||||
Show:
|
||||
|
||||
- structured events
|
||||
- text logs
|
||||
- job-level and item-level errors
|
||||
- stack traces or HTTP error summaries where available
|
||||
|
||||
### Page 6: Config Management
|
||||
|
||||
Show:
|
||||
|
||||
- current `catalogsync.env`
|
||||
- parsed effective values
|
||||
- validation errors
|
||||
- revision history
|
||||
|
||||
Allow:
|
||||
|
||||
- save a new env revision
|
||||
- re-apply a previous revision
|
||||
|
||||
Rule:
|
||||
|
||||
- config edits affect only future jobs unless an explicit resume override is supplied
|
||||
|
||||
## API Surface
|
||||
|
||||
Recommended management endpoints:
|
||||
|
||||
- `GET /api/dashboard`
|
||||
- `GET /api/jobs`
|
||||
- `POST /api/jobs`
|
||||
- `GET /api/jobs/{id}`
|
||||
- `POST /api/jobs/{id}/pause`
|
||||
- `POST /api/jobs/{id}/resume`
|
||||
- `POST /api/jobs/{id}/cancel`
|
||||
- `GET /api/jobs/{id}/items`
|
||||
- `POST /api/job-items/{id}/retry`
|
||||
- `POST /api/job-items/{id}/force-retry`
|
||||
- `GET /api/workers`
|
||||
- `GET /api/playlists`
|
||||
- `GET /api/playlists/{id}`
|
||||
- `GET /api/logs`
|
||||
- `GET /api/config/env`
|
||||
- `PUT /api/config/env`
|
||||
- `GET /api/config/revisions`
|
||||
- `POST /api/config/revisions/{id}/apply`
|
||||
- `GET /api/events/stream`
|
||||
|
||||
`/api/events/stream` should use server-sent events so the dashboard and worker pages can refresh without polling every table separately.
|
||||
|
||||
## Pause, Resume, And Recovery Rules
|
||||
|
||||
### Soft Pause
|
||||
|
||||
The only supported pause mode in v1 is soft pause.
|
||||
|
||||
Behavior:
|
||||
|
||||
- UI inserts a `pause` command
|
||||
- the runner marks the job and current stage as `pause_requested`
|
||||
- workers stop claiming new items
|
||||
- any in-progress item is allowed to finish naturally
|
||||
- once all workers are idle, the stage becomes `paused` and then the job becomes `paused`
|
||||
|
||||
This avoids half-written file state and keeps item completion boundaries clean.
|
||||
|
||||
### Resume
|
||||
|
||||
Resume behavior:
|
||||
|
||||
- UI inserts a `resume` command
|
||||
- the runner validates the job can continue
|
||||
- the runner resets paused stage and job state back to `running`
|
||||
- unstarted items stay `pending`
|
||||
- succeeded items remain untouched
|
||||
|
||||
The resume action may optionally carry a limited override payload, such as a new library root after disk exhaustion.
|
||||
|
||||
### Crash Recovery
|
||||
|
||||
On runner startup:
|
||||
|
||||
1. find all jobs with status `running` or `pause_requested`
|
||||
2. mark those jobs `paused`
|
||||
3. find all `job_items` left in `running`
|
||||
4. convert those items to `interrupted`
|
||||
5. record a recovery event
|
||||
|
||||
After that:
|
||||
|
||||
- `succeeded` items remain done
|
||||
- `pending` items remain pending
|
||||
- `interrupted` items become eligible for retry or auto-requeue depending on stage policy
|
||||
- `failed` items remain failed until explicit retry
|
||||
|
||||
This preserves progress without restarting the whole job or whole database.
|
||||
|
||||
## Retry Rules
|
||||
|
||||
### Single Item Retry
|
||||
|
||||
When the operator clicks retry for a failed or interrupted item:
|
||||
|
||||
- insert `job_commands.retry_item`
|
||||
- clear execution fields on the target item
|
||||
- set status back to `pending`
|
||||
- increment `attempt_count` on the next worker claim
|
||||
|
||||
### Force Retry
|
||||
|
||||
Force retry is more aggressive:
|
||||
|
||||
- download stage may ignore an existing local mapping if the operator requests a fresh re-download
|
||||
- upload stage may ignore an existing active remote mapping if the operator explicitly wants a re-upload
|
||||
|
||||
Force retry must stay item-scoped, never job-scoped.
|
||||
|
||||
## Disk Exhaustion Handling
|
||||
|
||||
If the downloader detects insufficient space:
|
||||
|
||||
- fail or interrupt the current download item
|
||||
- pause the active job with a machine-readable reason such as `disk_full`
|
||||
- surface a UI banner asking for a new library root override
|
||||
|
||||
After the operator supplies a new directory and clicks resume:
|
||||
|
||||
- the job continues only for unfinished items
|
||||
- completed downloads are not restarted
|
||||
- the currently failed song can be retried from scratch
|
||||
|
||||
This matches the requirement that one song may restart while the whole database must not restart.
|
||||
|
||||
## Execution Strategy
|
||||
|
||||
### Stage Executors
|
||||
|
||||
Implement separate executor paths for:
|
||||
|
||||
- `collect`
|
||||
- `sync`
|
||||
- `download`
|
||||
- `upload`
|
||||
|
||||
Recommended concurrency:
|
||||
|
||||
- `collect`
|
||||
- low concurrency, v1 may stay serial
|
||||
- `sync`
|
||||
- low concurrency, v1 may stay serial
|
||||
- `download`
|
||||
- configurable worker pool
|
||||
- `upload`
|
||||
- configurable worker pool
|
||||
|
||||
### Reuse Strategy
|
||||
|
||||
Prefer reusing current catalogsync modules:
|
||||
|
||||
- `musicdl.catalogsync.services`
|
||||
- `musicdl.catalogsync.downloader`
|
||||
- `musicdl.catalogsync.uploader`
|
||||
- `musicdl.catalogsync.repository`
|
||||
|
||||
The runner should orchestrate these modules rather than rewriting the domain logic from scratch.
|
||||
|
||||
## Technology Choice
|
||||
|
||||
### Backend
|
||||
|
||||
Recommended stack:
|
||||
|
||||
- `FastAPI`
|
||||
- `Jinja2`
|
||||
- `SQLite`
|
||||
- `SSE` for live updates
|
||||
|
||||
### Frontend
|
||||
|
||||
Recommended rendering model:
|
||||
|
||||
- server-rendered pages with `Jinja2`
|
||||
- `HTMX` for partial updates and action forms
|
||||
- a small amount of vanilla JavaScript for log streaming and live worker refresh
|
||||
|
||||
Why this fits:
|
||||
|
||||
- NAS-local internal tool
|
||||
- mainly operational tables and actions
|
||||
- lower dependency and deployment complexity than a separate SPA
|
||||
- easier to keep aligned with the existing Python-only project
|
||||
|
||||
## Verification Plan
|
||||
|
||||
The implementation should be verified at four levels:
|
||||
|
||||
1. unit tests
|
||||
- state transitions
|
||||
- retry rules
|
||||
- recovery transforms
|
||||
2. API integration tests
|
||||
- job creation
|
||||
- pause and resume
|
||||
- item retry
|
||||
- config revision flow
|
||||
3. fault injection tests
|
||||
- kill the runner mid-download and confirm item-level recovery
|
||||
4. NAS smoke tests
|
||||
- create jobs
|
||||
- pause and resume
|
||||
- crash and restart
|
||||
- retry a single failed song
|
||||
- change library directory after disk-full pause
|
||||
|
||||
## V1 Delivery Boundary
|
||||
|
||||
### Must Ship In V1
|
||||
|
||||
- queue-based single-active-job runner
|
||||
- supported job templates
|
||||
- dashboard, job center, playlist pools, song processing, logs, and config pages
|
||||
- soft pause and resume
|
||||
- crash-safe item-level recovery
|
||||
- single-item retry and force-retry
|
||||
- env revision history and apply flow
|
||||
|
||||
### Explicitly Deferred
|
||||
|
||||
- authentication
|
||||
- multi-user permissions
|
||||
- multiple active jobs
|
||||
- distributed workers
|
||||
- arbitrary stage composition
|
||||
- automatic endless retries
|
||||
- destructive file cleanup actions
|
||||
|
||||
## Open Follow-Up Items
|
||||
|
||||
Two source-coverage follow-ups remain outside this console design and should stay tracked separately:
|
||||
|
||||
- redeploy the local Kuwo toplist fallback fix to the NAS and backfill the missing collection or sync results
|
||||
- repair QQ playlist square collection after the old endpoint started returning `parameter failed`
|
||||
|
||||
These belong to operational backlog work, not to the web console architecture itself.
|
||||
@@ -0,0 +1,567 @@
|
||||
# Object Storage Upload Automation Design
|
||||
|
||||
## Goal
|
||||
|
||||
Extend `musicdl.catalogsync` with a first-class object storage upload workflow that:
|
||||
|
||||
- uploads downloaded local files to an S3-compatible object storage backend
|
||||
- preserves local files after upload
|
||||
- mirrors the local relative path into the remote object key
|
||||
- records remote locations in the catalog database
|
||||
- tracks backend presence per song for fast lookup
|
||||
- supports queue-based upload execution and limited concurrency
|
||||
- updates `docs/catalogsync.md` alongside the implementation so operator docs stay current
|
||||
|
||||
This sub-project also introduces limited concurrent download so very large catalogs do not have to run fully serially.
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- Add a queue-based upload workflow for object storage backends
|
||||
- Reuse `storage_backends`, `file_assets`, and `file_locations` as the primary storage model
|
||||
- Add a song/backend presence summary table
|
||||
- Add an upload task queue table
|
||||
- Add CLI commands to register an object storage backend and upload files to it
|
||||
- Support S3-compatible object storage as the first upload backend type
|
||||
- Store non-secret backend configuration in the database
|
||||
- Read secrets from environment variables at runtime
|
||||
- Mirror local relative paths into remote object keys
|
||||
- Keep local files after successful upload
|
||||
- Mark remote object locations as non-primary while local files remain primary
|
||||
- Support queue-based concurrent upload workers
|
||||
- Add limited concurrent download workers
|
||||
- When download space is exhausted, pause the whole download flow once, prompt for a new directory once, then continue later tasks under the new root
|
||||
- Update `docs/catalogsync.md` to document the upload workflow, object storage backend configuration, and the new commands
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- 123 cloud implementation
|
||||
- Baidu Netdisk implementation
|
||||
- Remote `HEAD` verification before every upload
|
||||
- Automatic deletion of local files after upload
|
||||
- Multi-backend upload in a single command
|
||||
- GUI integration
|
||||
- CDN upload orchestration beyond deriving an optional public URL
|
||||
- Background daemon / scheduler service
|
||||
|
||||
## Constraints
|
||||
|
||||
- Keep the current `musicdl.catalogsync` data model as the source of truth
|
||||
- Do not duplicate file location truth into `songs`
|
||||
- Do not store secret access credentials in SQLite
|
||||
- First upload backend must be generic S3-compatible object storage
|
||||
- Default behavior must trust database state rather than querying remote object existence every time
|
||||
- Upload behavior must preserve existing local download behavior
|
||||
- Download and upload concurrency must remain limited and operator-controllable
|
||||
|
||||
## Recommended Architecture
|
||||
|
||||
Use the existing storage model as the base:
|
||||
|
||||
- `storage_backends`
|
||||
- backend definition
|
||||
- `file_assets`
|
||||
- file-version identity
|
||||
- `file_locations`
|
||||
- concrete physical or remote locations
|
||||
|
||||
Add two new layers:
|
||||
|
||||
- `song_backend_presence`
|
||||
- fast summary of whether a song has active files on a given backend
|
||||
- `upload_tasks`
|
||||
- queue of upload work items per file asset and target backend/key
|
||||
|
||||
Implement one new uploader component:
|
||||
|
||||
- `S3CompatibleUploader`
|
||||
- resolves credentials from environment
|
||||
- uploads a local file to a configured backend
|
||||
- writes the resulting remote file location
|
||||
- refreshes backend presence
|
||||
|
||||
Keep the user-facing CLI small:
|
||||
|
||||
- `register-object-backend`
|
||||
- `upload`
|
||||
|
||||
Internally, `upload` should still be queue-driven:
|
||||
|
||||
1. enumerate missing remote uploads
|
||||
2. enqueue deduplicated tasks
|
||||
3. consume tasks with limited workers
|
||||
|
||||
## Data Model
|
||||
|
||||
### Existing Table Reuse
|
||||
|
||||
#### `storage_backends`
|
||||
|
||||
Object storage backends should reuse the current table with the following conventions:
|
||||
|
||||
- `backend_type = 'object_storage'`
|
||||
- `name`
|
||||
- stable operator-facing backend name, for example `main-s3`
|
||||
- `container_name`
|
||||
- object storage bucket name
|
||||
- `base_path`
|
||||
- unused for object storage, may remain `NULL`
|
||||
- `config_json`
|
||||
- non-secret configuration only
|
||||
|
||||
Recommended `config_json` keys:
|
||||
|
||||
- `endpoint`
|
||||
- `region`
|
||||
- `base_prefix`
|
||||
- `addressing_style`
|
||||
- `public_base_url`
|
||||
- `credential_env_prefix`
|
||||
|
||||
Secrets must not be stored here.
|
||||
|
||||
#### `file_assets`
|
||||
|
||||
No semantic changes are required.
|
||||
|
||||
The upload unit stays aligned with the current model:
|
||||
|
||||
- one `file_asset` represents one concrete file version for a song
|
||||
- if a song has multiple active local file versions, all of them are eligible for upload
|
||||
|
||||
#### `file_locations`
|
||||
|
||||
No structural redesign is required.
|
||||
|
||||
For object storage locations:
|
||||
|
||||
- `backend_id`
|
||||
- target object storage backend
|
||||
- `container_name`
|
||||
- bucket
|
||||
- `locator`
|
||||
- object key
|
||||
- `absolute_path`
|
||||
- `NULL`
|
||||
- `remote_file_id`
|
||||
- optional, reserved for future provider-specific remote IDs
|
||||
- `public_url`
|
||||
- derived if backend config provides `public_base_url`
|
||||
- `download_url`
|
||||
- optional, first version may keep this `NULL`
|
||||
- `status`
|
||||
- `active`, `deleted`, or `failed`
|
||||
- `is_primary`
|
||||
- `0` for remote object storage in the first version
|
||||
|
||||
The local location remains:
|
||||
|
||||
- `status = 'active'`
|
||||
- `is_primary = 1`
|
||||
|
||||
### New Table: `song_backend_presence`
|
||||
|
||||
Purpose:
|
||||
|
||||
- answer “does this song have active files on backend X?” quickly
|
||||
- avoid pushing hard-coded backend presence fields into `songs`
|
||||
|
||||
Recommended schema:
|
||||
|
||||
```text
|
||||
song_id INTEGER NOT NULL
|
||||
backend_id INTEGER NOT NULL
|
||||
has_active_file INTEGER NOT NULL DEFAULT 0
|
||||
active_file_count INTEGER NOT NULL DEFAULT 0
|
||||
primary_file_location_id INTEGER
|
||||
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
PRIMARY KEY(song_id, backend_id)
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- this is a derived summary table, not the source of truth
|
||||
- truth still comes from `file_locations`
|
||||
- refresh this row whenever a location on that song/backend becomes active or inactive
|
||||
|
||||
### New Table: `upload_tasks`
|
||||
|
||||
Purpose:
|
||||
|
||||
- queue upload work
|
||||
- support retries, concurrency, and resumable batch execution
|
||||
|
||||
Recommended schema:
|
||||
|
||||
```text
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT
|
||||
file_asset_id INTEGER NOT NULL
|
||||
source_location_id INTEGER NOT NULL
|
||||
target_backend_id INTEGER NOT NULL
|
||||
target_container_name TEXT
|
||||
target_locator TEXT NOT NULL
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
attempts INTEGER NOT NULL DEFAULT 0
|
||||
last_error TEXT
|
||||
queued_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
started_at TEXT
|
||||
finished_at TEXT
|
||||
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
UNIQUE(file_asset_id, target_backend_id, target_locator)
|
||||
```
|
||||
|
||||
Task granularity:
|
||||
|
||||
- one task = one local file asset version uploaded to one target backend/key
|
||||
|
||||
This keeps the queue aligned with your “upload all active file versions” requirement.
|
||||
|
||||
## Object Storage Key Rules
|
||||
|
||||
### Key Shape
|
||||
|
||||
The object key should mirror the local relative path beneath the configured backend prefix.
|
||||
|
||||
If:
|
||||
|
||||
- local relative path is `qq/Singer A/song-c.mp3`
|
||||
- backend `base_prefix` is `music`
|
||||
|
||||
Then:
|
||||
|
||||
- remote key becomes `music/qq/Singer A/song-c.mp3`
|
||||
|
||||
### Why Mirror The Relative Path
|
||||
|
||||
- easiest to reconnect local and remote locations
|
||||
- preserves the existing local organization
|
||||
- keeps future CDN and migration mapping simple
|
||||
- reuses the semantics already established in `file_locations.locator`
|
||||
|
||||
## Credential Model
|
||||
|
||||
### Database Versus Secrets
|
||||
|
||||
Store only non-secret backend config in SQLite.
|
||||
|
||||
Resolve secrets from environment variables using the backend’s configured prefix.
|
||||
|
||||
Example:
|
||||
|
||||
- backend name: `main-s3`
|
||||
- `credential_env_prefix = CATALOGSYNC_MAIN_S3`
|
||||
|
||||
Runtime lookup:
|
||||
|
||||
- `CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID`
|
||||
- `CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY`
|
||||
- `CATALOGSYNC_MAIN_S3_SESSION_TOKEN` optional
|
||||
|
||||
### Why This Model
|
||||
|
||||
- portable for long-running batch jobs
|
||||
- safer than storing keys in SQLite
|
||||
- works well across multiple machines and deployment targets
|
||||
|
||||
## CLI Design
|
||||
|
||||
### `register-object-backend`
|
||||
|
||||
Purpose:
|
||||
|
||||
- create or update one object storage backend definition
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
musicdl-catalogsync register-object-backend \
|
||||
--db D:\catalogsync\catalogsync.db \
|
||||
--backend main-s3 \
|
||||
--endpoint https://s3.example.com \
|
||||
--bucket music \
|
||||
--base-prefix music \
|
||||
--region auto \
|
||||
--addressing-style auto \
|
||||
--public-base-url https://cdn.example.com/music \
|
||||
--credential-env-prefix CATALOGSYNC_MAIN_S3
|
||||
```
|
||||
|
||||
Behavior:
|
||||
|
||||
- upsert backend by `name`
|
||||
- set `backend_type='object_storage'`
|
||||
- validate required non-secret config before writing
|
||||
|
||||
### `upload`
|
||||
|
||||
Purpose:
|
||||
|
||||
- default: upload all local active file versions that are missing on the target backend
|
||||
- optionally filter by source platform, playlist range, and count
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3
|
||||
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --sources netease,qq --limit 200
|
||||
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --playlist-ids 12,15 --workers 4
|
||||
```
|
||||
|
||||
Default semantics:
|
||||
|
||||
- trust database state
|
||||
- do not do remote `HEAD` by default
|
||||
- enqueue missing uploads
|
||||
- consume queue with limited workers
|
||||
|
||||
### Download CLI Extension
|
||||
|
||||
Extend the existing `download` and `run` workflows with:
|
||||
|
||||
- `--workers`
|
||||
|
||||
First-version default:
|
||||
|
||||
- `download --workers 3`
|
||||
- `upload --workers 4`
|
||||
|
||||
These defaults should remain conservative and configurable.
|
||||
|
||||
## Upload Execution Flow
|
||||
|
||||
### Phase 1: Candidate Selection
|
||||
|
||||
For the target backend:
|
||||
|
||||
- find all active local `file_locations`
|
||||
- resolve their `file_asset`
|
||||
- derive target object key from:
|
||||
- backend `base_prefix`
|
||||
- local relative path
|
||||
- skip assets that already have an active remote location on the same backend/key
|
||||
|
||||
Selection must support:
|
||||
|
||||
- all local songs
|
||||
- `--sources`
|
||||
- `--playlist-ids`
|
||||
- `--limit`
|
||||
|
||||
### Phase 2: Task Enqueue
|
||||
|
||||
For each missing remote file:
|
||||
|
||||
- insert or reuse a unique `upload_tasks` row
|
||||
- set status to `pending` unless it is already `uploading` or `succeeded`
|
||||
|
||||
### Phase 3: Worker Claim
|
||||
|
||||
Each worker should:
|
||||
|
||||
- claim one `pending` task in a transaction
|
||||
- move it to `uploading`
|
||||
- set `started_at`
|
||||
|
||||
This must prevent duplicate worker claims.
|
||||
|
||||
### Phase 4: Upload
|
||||
|
||||
For each claimed task:
|
||||
|
||||
- resolve source local file from `source_location_id`
|
||||
- validate that the file still exists
|
||||
- resolve backend config
|
||||
- resolve credentials from environment
|
||||
- upload to S3-compatible storage
|
||||
|
||||
### Phase 5: Writeback
|
||||
|
||||
On success:
|
||||
|
||||
- write or upsert the remote `file_location`
|
||||
- set remote `status='active'`
|
||||
- keep remote `is_primary=0`
|
||||
- refresh `song_backend_presence`
|
||||
- mark task `succeeded`
|
||||
- set `finished_at`
|
||||
|
||||
## Upload Task State Machine
|
||||
|
||||
Use these first-version task states:
|
||||
|
||||
- `pending`
|
||||
- `uploading`
|
||||
- `succeeded`
|
||||
- `failed`
|
||||
- `skipped`
|
||||
|
||||
State transitions:
|
||||
|
||||
- enqueue → `pending`
|
||||
- worker claim → `uploading`
|
||||
- success with DB writeback → `succeeded`
|
||||
- upload error or writeback error → `failed`
|
||||
- no-op due to already-active remote location → `skipped`
|
||||
|
||||
Retry model:
|
||||
|
||||
- store `attempts`
|
||||
- store `last_error`
|
||||
- later `upload` runs may requeue or retry `failed` tasks under a bounded retry rule
|
||||
|
||||
## Backend Presence Refresh Rules
|
||||
|
||||
Whenever a remote location changes on `(song_id, backend_id)`:
|
||||
|
||||
- count active locations for that song/backend
|
||||
- update `has_active_file`
|
||||
- update `active_file_count`
|
||||
- set `primary_file_location_id` to a preferred active location on that backend
|
||||
|
||||
First version preference rule:
|
||||
|
||||
- if any active location exists on that backend, pick one deterministic row, for example the smallest active `file_locations.id`
|
||||
|
||||
This table exists for fast lookup and operator queries, not for deciding the actual upload truth.
|
||||
|
||||
## Limited Concurrency Design
|
||||
|
||||
### Download Concurrency
|
||||
|
||||
Add limited worker-based download concurrency.
|
||||
|
||||
Key rule:
|
||||
|
||||
- disk-space exhaustion must trigger one global pause, not one prompt per worker
|
||||
|
||||
Behavior:
|
||||
|
||||
1. workers process queued download items
|
||||
2. if a worker detects insufficient space under the current active root:
|
||||
- raise a shared pause request
|
||||
- stop dispatching new tasks
|
||||
3. prompt the operator once for a new download directory
|
||||
4. switch the shared active root
|
||||
5. resume remaining not-yet-started tasks under the new root
|
||||
|
||||
Non-goals:
|
||||
|
||||
- per-worker independent root switching
|
||||
- automatic multi-root balancing in the first version
|
||||
|
||||
### Upload Concurrency
|
||||
|
||||
Upload workers should process queue rows concurrently but conservatively.
|
||||
|
||||
Requirements:
|
||||
|
||||
- claim tasks transactionally
|
||||
- prevent duplicate uploads of the same `(file_asset_id, backend_id, locator)`
|
||||
- keep worker count operator-controlled
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Upload Errors
|
||||
|
||||
- missing source file
|
||||
- mark task `failed`
|
||||
- set descriptive `last_error`
|
||||
- missing backend config
|
||||
- fail fast before batch execution
|
||||
- missing environment credentials
|
||||
- fail fast before batch execution
|
||||
- upload transport error
|
||||
- mark task `failed`
|
||||
- upload succeeded but DB writeback failed
|
||||
- mark task `failed`
|
||||
- store explicit `last_error` explaining that remote upload may already exist
|
||||
|
||||
### Download Errors
|
||||
|
||||
- worker download failure
|
||||
- record failure for that item and continue with other tasks
|
||||
- insufficient disk space
|
||||
- trigger one global directory-switch prompt
|
||||
- no replacement directory supplied
|
||||
- fail the remaining batch clearly
|
||||
|
||||
## Testing
|
||||
|
||||
Add or update coverage for the following areas.
|
||||
|
||||
### Schema Tests
|
||||
|
||||
- `song_backend_presence` exists
|
||||
- `upload_tasks` exists
|
||||
- unique constraint on upload task dedupe works
|
||||
|
||||
### Repository Tests
|
||||
|
||||
- register or upsert object storage backends
|
||||
- write remote `file_locations`
|
||||
- refresh `song_backend_presence`
|
||||
- enqueue deduplicated upload tasks
|
||||
- select pending upload candidates by backend, source, playlist, and limit
|
||||
|
||||
### Uploader / Service Tests
|
||||
|
||||
Using a fake or stub S3-compatible client:
|
||||
|
||||
- successful upload creates active remote location
|
||||
- public URL derivation when configured
|
||||
- missing source file becomes `failed`
|
||||
- missing credentials fail fast
|
||||
- multiple local file versions for one song are all enqueued
|
||||
|
||||
### CLI Tests
|
||||
|
||||
- `register-object-backend`
|
||||
- `upload --backend ...`
|
||||
- `upload --sources ...`
|
||||
- `upload --playlist-ids ...`
|
||||
- `upload --limit ...`
|
||||
- `upload --workers ...`
|
||||
- `download --workers ...`
|
||||
|
||||
### Concurrency Tests
|
||||
|
||||
- concurrent upload workers do not claim the same task twice
|
||||
- concurrent download workers trigger only one directory switch prompt
|
||||
- after directory switch, later downloads use the new root
|
||||
|
||||
### Documentation Tests
|
||||
|
||||
- `docs/catalogsync.md` is updated to describe:
|
||||
- object storage backend registration
|
||||
- upload command usage
|
||||
- queue semantics
|
||||
- environment variable credential model
|
||||
- download/upload worker options
|
||||
|
||||
## Documentation Requirements
|
||||
|
||||
Implementation must update `docs/catalogsync.md` to include:
|
||||
|
||||
- why object storage uses backend config plus env-based secrets
|
||||
- how to register an object storage backend
|
||||
- how remote keys mirror local relative paths
|
||||
- how `upload` works by default
|
||||
- what `song_backend_presence` and `upload_tasks` are for
|
||||
- how `--workers` affects download and upload
|
||||
- how the global download directory switch behaves under low disk space
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- An operator can register an S3-compatible object storage backend without storing secrets in SQLite
|
||||
- `upload` can enqueue and execute uploads for missing remote files on that backend
|
||||
- Remote object keys mirror local relative paths beneath the configured backend prefix
|
||||
- Successful uploads create active remote `file_locations`
|
||||
- Local files remain active and primary after upload
|
||||
- `song_backend_presence` shows whether a song has active files on a given backend
|
||||
- `upload_tasks` supports resumable queue execution with bounded retries
|
||||
- The first version uploads all active local file versions for a song, not just one version
|
||||
- `upload` supports both full backend fill-in mode and filtered mode
|
||||
- Download and upload both support limited operator-configurable concurrency
|
||||
- Low disk space during download triggers one global prompt and one shared root switch for later tasks
|
||||
- `docs/catalogsync.md` is updated together with the implementation
|
||||
@@ -0,0 +1,571 @@
|
||||
# Playlist Selective Download Design
|
||||
|
||||
## Goal
|
||||
|
||||
Extend the `catalogsync` operations console so operators can download songs by selected playlists instead of relying on uncontrolled full-library download runs.
|
||||
|
||||
The new flow must allow the operator to:
|
||||
|
||||
- browse playlists through a paginated playlist-pool page
|
||||
- filter playlists by download state
|
||||
- select playlists on the current page
|
||||
- run either `download already-synced songs` or `sync then download` for the selected playlists
|
||||
- persist a separate `wanted for download` marker for playlists that should remain in a long-term queue
|
||||
|
||||
This design keeps the existing job system and downloader intact wherever possible.
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- Upgrade `/playlists` from a read-only list into a playlist-pool management page
|
||||
- Add server-side pagination to the playlist page
|
||||
- Add playlist filtering by:
|
||||
- platform
|
||||
- pool kind
|
||||
- keyword
|
||||
- download state
|
||||
- wanted marker
|
||||
- Add current-page checkbox selection and current-page select-all
|
||||
- Add bulk actions:
|
||||
- `下载已同步所选歌单`
|
||||
- `同步后下载所选歌单`
|
||||
- `加入待下载清单`
|
||||
- `移出待下载清单`
|
||||
- Add a persistent playlist-level preference table for the wanted marker
|
||||
- Reuse existing `download_only` and `sync_download` jobs by passing `playlist_scope.playlist_ids`
|
||||
- Compute playlist download state from live catalog and runner data
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- Cross-page remembered temporary selection
|
||||
- Saved named selection sets
|
||||
- Manual editing of computed playlist download state
|
||||
- New downloader semantics outside the existing job system
|
||||
- Per-playlist download history pages
|
||||
- Automatic cancellation or reprioritization of in-flight jobs in this design
|
||||
|
||||
## User Decisions Captured
|
||||
|
||||
This design encodes the following confirmed product decisions:
|
||||
|
||||
- Playlist download state uses multiple states instead of a simple downloaded flag
|
||||
- The operator needs both:
|
||||
- temporary current-page selection for immediate actions
|
||||
- persistent playlist-level wanted markers
|
||||
- If a song appears in multiple playlists, it counts as downloaded for all of them once the same `song_id` has an active local file
|
||||
- Playlists with no synced `playlist_songs` must show a dedicated `未同步` state
|
||||
- The playlist page must support pagination, page-level select-all, and download-state filtering
|
||||
- `下载中` is shown only when there is active running download work for songs belonging to that playlist
|
||||
- The state filter set is:
|
||||
- `全部`
|
||||
- `未同步`
|
||||
- `未下载`
|
||||
- `下载中`
|
||||
- `部分已下载`
|
||||
- `已下载`
|
||||
|
||||
## Constraints
|
||||
|
||||
- Existing `catalogsync` job queue remains the only execution path
|
||||
- Only one active job still runs at a time
|
||||
- Existing `download_only` and `sync_download` job types should remain valid and reusable
|
||||
- SQLite remains the backing store
|
||||
- The first version should optimize for operational clarity and low migration risk over advanced UX
|
||||
- The implementation should avoid full-library recomputation for every playlist page load because the NAS dataset is already large
|
||||
|
||||
## Existing System Reuse
|
||||
|
||||
The current codebase already provides two critical capabilities that should be reused instead of reinvented:
|
||||
|
||||
1. `playlist_scope.playlist_ids` already exists on jobs
|
||||
2. download planning already supports filtering by `playlist_ids`
|
||||
|
||||
Relevant current behavior:
|
||||
|
||||
- `download_only` is already a first-class job type
|
||||
- `sync_download` is already a first-class job type
|
||||
- `OpsRunner` already resolves `playlist_scope.playlist_ids`
|
||||
- `DownloadPlanner.build_download_queue()` already accepts `playlist_ids`
|
||||
- `CatalogRepository.list_pending_download_songs()` already supports `playlist_ids`
|
||||
|
||||
Because of this, the main work is playlist management UI, playlist-state aggregation, and lightweight playlist preference persistence.
|
||||
|
||||
## Recommended Approach
|
||||
|
||||
Use a mixed model:
|
||||
|
||||
- compute playlist state live for the current result page
|
||||
- persist only playlist-level operator intent (`wanted for download`)
|
||||
- use current bulk selections only as transient request payload
|
||||
|
||||
### Why This Approach
|
||||
|
||||
This approach avoids two bad extremes:
|
||||
|
||||
- **Pure runtime-only UI** would lose long-term operator intent such as a curated wanted list
|
||||
- **Full cached playlist-state tables** would add a large invalidation burden after every sync, download, retry, or file-state change
|
||||
|
||||
The mixed approach gives:
|
||||
|
||||
- correct and current state for the visible page
|
||||
- minimal schema change
|
||||
- low-risk reuse of the current pipeline
|
||||
|
||||
## Data Model
|
||||
|
||||
### New Table: `playlist_download_preferences`
|
||||
|
||||
Purpose:
|
||||
|
||||
- persist operator intent for playlists that should stay in a long-term wanted queue
|
||||
|
||||
Recommended fields:
|
||||
|
||||
```text
|
||||
playlist_id INTEGER PRIMARY KEY
|
||||
is_wanted INTEGER NOT NULL DEFAULT 1
|
||||
marked_by TEXT
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- use one row per playlist, not an event log
|
||||
- `playlist_id` should be unique and serve as the primary key
|
||||
- deleting or setting `is_wanted = 0` are both acceptable implementation choices; prefer explicit row persistence only if it simplifies auditing
|
||||
|
||||
### No Cached Playlist State Table
|
||||
|
||||
Do not add a second table that stores computed playlist state such as `已下载 / 未下载 / 部分已下载`.
|
||||
|
||||
Reason:
|
||||
|
||||
- those values depend on current `playlist_songs`
|
||||
- current local file availability can change
|
||||
- current running download items can change
|
||||
|
||||
The state should therefore be computed from live data for the current page.
|
||||
|
||||
## Playlist State Model
|
||||
|
||||
For each playlist row, calculate:
|
||||
|
||||
- `song_count`
|
||||
- `downloaded_song_count`
|
||||
- `running_download_song_count`
|
||||
- `is_wanted`
|
||||
|
||||
### State Rules
|
||||
|
||||
- `未同步`
|
||||
- `song_count = 0`
|
||||
- `下载中`
|
||||
- `song_count > 0`
|
||||
- `running_download_song_count > 0`
|
||||
- `未下载`
|
||||
- `song_count > 0`
|
||||
- `downloaded_song_count = 0`
|
||||
- `running_download_song_count = 0`
|
||||
- `部分已下载`
|
||||
- `song_count > 0`
|
||||
- `0 < downloaded_song_count < song_count`
|
||||
- `running_download_song_count = 0`
|
||||
- `已下载`
|
||||
- `song_count > 0`
|
||||
- `downloaded_song_count = song_count`
|
||||
- `running_download_song_count = 0`
|
||||
|
||||
### Downloaded Song Counting Rule
|
||||
|
||||
For one playlist song entry, the song is treated as downloaded if:
|
||||
|
||||
- the same `song_id` has an active local file location
|
||||
|
||||
It does not matter which playlist originally triggered that file download.
|
||||
|
||||
### Running Download Counting Rule
|
||||
|
||||
For one playlist song entry, the song is treated as currently downloading if:
|
||||
|
||||
- there is a `running` job item in stage `download`
|
||||
- that item points to the same `song_id`
|
||||
|
||||
Queued-but-not-running work does not count as `下载中`.
|
||||
|
||||
## Playlist Page Design
|
||||
|
||||
### URL
|
||||
|
||||
Keep the main page at:
|
||||
|
||||
```text
|
||||
/playlists
|
||||
```
|
||||
|
||||
### Query Parameters
|
||||
|
||||
Support these server-side filters:
|
||||
|
||||
- `page`
|
||||
- `page_size`
|
||||
- `platform`
|
||||
- `pool_kind`
|
||||
- `status`
|
||||
- `keyword`
|
||||
- `wanted_only`
|
||||
|
||||
### Default Pagination
|
||||
|
||||
Recommended defaults:
|
||||
|
||||
- default `page_size = 50`
|
||||
- allow `20 / 50 / 100`
|
||||
|
||||
Reason:
|
||||
|
||||
- current NAS data already contains more than ten thousand playlists
|
||||
- a fixed `LIMIT 500` list will become increasingly unusable
|
||||
|
||||
### Table Columns
|
||||
|
||||
Recommended visible columns:
|
||||
|
||||
- checkbox
|
||||
- playlist id
|
||||
- platform
|
||||
- remote playlist id
|
||||
- playlist name
|
||||
- pool names
|
||||
- song count
|
||||
- downloaded song count
|
||||
- computed state
|
||||
- wanted marker
|
||||
- updated at
|
||||
|
||||
### Toolbar Actions
|
||||
|
||||
Recommended top-toolbar controls:
|
||||
|
||||
- platform filter
|
||||
- pool-kind filter
|
||||
- state filter
|
||||
- keyword search
|
||||
- wanted-only filter
|
||||
- page-size selector
|
||||
|
||||
### Bulk Action Groups
|
||||
|
||||
#### Temporary Selection Actions
|
||||
|
||||
Apply only to the currently selected playlist ids:
|
||||
|
||||
- `下载已同步所选歌单`
|
||||
- `同步后下载所选歌单`
|
||||
|
||||
#### Persistent Marker Actions
|
||||
|
||||
Apply only to the currently selected playlist ids:
|
||||
|
||||
- `加入待下载清单`
|
||||
- `移出待下载清单`
|
||||
|
||||
### Selection Behavior
|
||||
|
||||
The first version should support only current-page temporary selection.
|
||||
|
||||
Rules:
|
||||
|
||||
- `全选本页` selects all rows visible on the current page
|
||||
- changing page clears temporary selection
|
||||
- filters changing the result page clear temporary selection
|
||||
- persistent wanted markers remain stored independently of temporary selection
|
||||
|
||||
This keeps implementation simple and predictable.
|
||||
|
||||
## API Design
|
||||
|
||||
### `GET /api/playlists`
|
||||
|
||||
Purpose:
|
||||
|
||||
- return one filtered page of playlist rows with computed state
|
||||
|
||||
Request parameters:
|
||||
|
||||
- `page`
|
||||
- `page_size`
|
||||
- `platform`
|
||||
- `pool_kind`
|
||||
- `status`
|
||||
- `keyword`
|
||||
- `wanted_only`
|
||||
|
||||
Response shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [
|
||||
{
|
||||
"id": 123,
|
||||
"platform": "qq",
|
||||
"remote_playlist_id": "456",
|
||||
"name": "Example Playlist",
|
||||
"pool_names": "QQ 音乐歌单广场",
|
||||
"song_count": 120,
|
||||
"downloaded_song_count": 80,
|
||||
"state": "部分已下载",
|
||||
"is_wanted": true,
|
||||
"updated_at": "2026-04-17 00:00:00"
|
||||
}
|
||||
],
|
||||
"page": 1,
|
||||
"page_size": 50,
|
||||
"total": 12345
|
||||
}
|
||||
```
|
||||
|
||||
### `POST /api/playlists/mark-wanted`
|
||||
|
||||
Purpose:
|
||||
|
||||
- persist wanted markers for the specified playlists
|
||||
|
||||
Request body:
|
||||
|
||||
```json
|
||||
{
|
||||
"playlist_ids": [1, 2, 3],
|
||||
"marked_by": "ops-console"
|
||||
}
|
||||
```
|
||||
|
||||
### `POST /api/playlists/unmark-wanted`
|
||||
|
||||
Purpose:
|
||||
|
||||
- remove or disable wanted markers for the specified playlists
|
||||
|
||||
Request body:
|
||||
|
||||
```json
|
||||
{
|
||||
"playlist_ids": [1, 2, 3]
|
||||
}
|
||||
```
|
||||
|
||||
### `POST /api/playlists/download`
|
||||
|
||||
Purpose:
|
||||
|
||||
- create a `download_only` job scoped to selected playlist ids
|
||||
|
||||
Request body:
|
||||
|
||||
```json
|
||||
{
|
||||
"playlist_ids": [1, 2, 3],
|
||||
"requested_by": "ops-console"
|
||||
}
|
||||
```
|
||||
|
||||
Behavior:
|
||||
|
||||
- create one `download_only` job
|
||||
- store `playlist_scope.playlist_ids = [...]`
|
||||
- do not include playlists that are not selected
|
||||
|
||||
### `POST /api/playlists/sync-download`
|
||||
|
||||
Purpose:
|
||||
|
||||
- create a `sync_download` job scoped to selected playlist ids
|
||||
|
||||
Request body:
|
||||
|
||||
```json
|
||||
{
|
||||
"playlist_ids": [1, 2, 3],
|
||||
"requested_by": "ops-console"
|
||||
}
|
||||
```
|
||||
|
||||
Behavior:
|
||||
|
||||
- create one `sync_download` job
|
||||
- store `playlist_scope.playlist_ids = [...]`
|
||||
|
||||
## Interaction Rules
|
||||
|
||||
### `下载已同步所选歌单`
|
||||
|
||||
This action should:
|
||||
|
||||
- create a `download_only` job for the selected `playlist_ids`
|
||||
- operate only on songs already present in `playlist_songs`
|
||||
|
||||
Playlists in state `未同步` contribute no songs and therefore effectively produce no download work.
|
||||
|
||||
The UI should make this explicit instead of pretending those playlists are downloading.
|
||||
|
||||
### `同步后下载所选歌单`
|
||||
|
||||
This action should:
|
||||
|
||||
- create a `sync_download` job for the selected `playlist_ids`
|
||||
- sync playlist songs first
|
||||
- then download only missing songs from those playlists
|
||||
|
||||
### Wanted Marker UX
|
||||
|
||||
The wanted marker is not itself a download state.
|
||||
|
||||
It is a separate operator-intent flag.
|
||||
|
||||
A playlist may therefore be:
|
||||
|
||||
- `已下载` and still marked wanted
|
||||
- `未同步` and marked wanted
|
||||
- `部分已下载` and not marked wanted
|
||||
|
||||
This separation avoids overloading one column with two different meanings.
|
||||
|
||||
## Query and Aggregation Strategy
|
||||
|
||||
### Page-First Aggregation
|
||||
|
||||
Do not compute states for the whole library on each request.
|
||||
|
||||
Instead:
|
||||
|
||||
1. query only the playlist ids for the requested page
|
||||
2. run aggregation queries only for those playlist ids
|
||||
3. merge the counts into the returned rows
|
||||
|
||||
This keeps response cost proportional to current page size instead of full library size.
|
||||
|
||||
### Aggregations Needed Per Page
|
||||
|
||||
For the current page playlist ids:
|
||||
|
||||
- playlist song totals from `playlist_songs`
|
||||
- downloaded song totals from:
|
||||
- `playlist_songs`
|
||||
- `file_assets`
|
||||
- `file_locations`
|
||||
- `storage_backends`
|
||||
- running download song totals from:
|
||||
- `playlist_songs`
|
||||
- `job_items`
|
||||
- `job_stages`
|
||||
- wanted markers from `playlist_download_preferences`
|
||||
|
||||
### Index Expectations
|
||||
|
||||
Add or verify indexes for:
|
||||
|
||||
- `pool_playlists(playlist_id)`
|
||||
- `pool_playlists(pool_id)`
|
||||
- `playlist_songs(playlist_id)`
|
||||
- `playlist_songs(song_id)`
|
||||
- `file_assets(song_id)`
|
||||
- `file_locations(file_asset_id, status)`
|
||||
- `job_items(song_id, status)`
|
||||
- `job_stages(id, stage_type)`
|
||||
- `playlist_download_preferences(playlist_id)`
|
||||
- `playlist_download_preferences(is_wanted)`
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Empty Selection
|
||||
|
||||
Bulk actions should reject empty `playlist_ids` with a validation error.
|
||||
|
||||
### Unknown Playlist IDs
|
||||
|
||||
If unknown ids are passed:
|
||||
|
||||
- ignore ids that do not exist
|
||||
- fail only if the final valid set is empty
|
||||
|
||||
### Duplicate Playlist IDs
|
||||
|
||||
Normalize to unique ids before processing.
|
||||
|
||||
### Large Selection on One Page
|
||||
|
||||
The selected ids are page-scoped and therefore bounded by `page_size`.
|
||||
|
||||
This makes bulk requests predictable and low risk.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Repository and Query Tests
|
||||
|
||||
Add tests for:
|
||||
|
||||
- listing one playlist page with filters
|
||||
- correct `total` count under filtering
|
||||
- wanted marker persistence
|
||||
- state aggregation across:
|
||||
- `未同步`
|
||||
- `未下载`
|
||||
- `下载中`
|
||||
- `部分已下载`
|
||||
- `已下载`
|
||||
|
||||
### API Tests
|
||||
|
||||
Add tests for:
|
||||
|
||||
- `GET /api/playlists`
|
||||
- `POST /api/playlists/mark-wanted`
|
||||
- `POST /api/playlists/unmark-wanted`
|
||||
- `POST /api/playlists/download`
|
||||
- `POST /api/playlists/sync-download`
|
||||
|
||||
Verify that:
|
||||
|
||||
- created jobs use the expected job type
|
||||
- `playlist_scope.playlist_ids` is stored correctly
|
||||
- invalid or empty selection is rejected
|
||||
|
||||
### UI Tests
|
||||
|
||||
At minimum, validate rendered page content and form wiring for:
|
||||
|
||||
- pagination controls
|
||||
- state filter controls
|
||||
- current-page select-all
|
||||
- bulk action forms
|
||||
- wanted-only filter
|
||||
|
||||
### Regression Coverage
|
||||
|
||||
Keep existing `download_only` and `sync_download` behavior valid for callers outside the playlist page.
|
||||
|
||||
## Rollout Notes
|
||||
|
||||
The recommended rollout order is:
|
||||
|
||||
1. add the playlist preference table and repository helpers
|
||||
2. add page-level playlist listing API with computed state
|
||||
3. upgrade `/playlists` UI to pagination, filters, selection, and actions
|
||||
4. add bulk job-creation endpoints for selected playlists
|
||||
5. verify on NAS with a controlled subset of playlists before using it for wide library download
|
||||
|
||||
## Result
|
||||
|
||||
After this design lands, the operator workflow becomes:
|
||||
|
||||
1. open `/playlists`
|
||||
2. filter to `未同步`, `未下载`, or `部分已下载`
|
||||
3. select playlists on the current page
|
||||
4. choose either:
|
||||
- `下载已同步所选歌单`
|
||||
- `同步后下载所选歌单`
|
||||
- `加入待下载清单`
|
||||
5. observe progress through the existing jobs and worker views
|
||||
|
||||
This changes playlist download from an uncontrolled whole-library operation into a scoped, inspectable, operator-driven workflow.
|
||||
@@ -0,0 +1,529 @@
|
||||
# Catalogsync Task Center And Download Lanes Design
|
||||
|
||||
## Goal
|
||||
|
||||
Rework the current operations console so the NAS web UI behaves like a real task center:
|
||||
|
||||
- `Dashboard` becomes the primary task control page
|
||||
- task list and task detail are merged into one page with row expansion instead of forced page jumps
|
||||
- all jobs that contain a `download` stage are serialized into one download lane
|
||||
- collect and sync jobs can still run without being blocked by the download lane
|
||||
- operators can run `sync_only` against selected playlists to fix `song_count = 0` or incomplete playlists
|
||||
- download jobs surface real-time throughput, including per-task aggregate speed and per-worker song speed
|
||||
|
||||
This design extends the existing operations-console design rather than replacing the underlying SQLite-backed execution model.
|
||||
|
||||
## Confirmed Decisions
|
||||
|
||||
The following points were confirmed during design review:
|
||||
|
||||
- `Dashboard` should become the main task center instead of relying on `/jobs/{id}` as the normal interaction path
|
||||
- the preferred layout is a single task table with inline expansion for details
|
||||
- all job types that include a `download` stage are treated as download-class jobs:
|
||||
- `catalog_sync`
|
||||
- `sync_download`
|
||||
- `download_only`
|
||||
- `download_upload`
|
||||
- download-class jobs may be created freely, but only one may run at a time
|
||||
- non-download jobs such as `collect_only` and `sync_only` are not restricted by the single-download-job rule
|
||||
- playlists with `song_count = 0` do not get a new dedicated status
|
||||
- the playlists page must add `sync selected playlists`, implemented as `sync_only + playlist_scope`
|
||||
- the task center should use compact icon-style controls:
|
||||
- one toggle control for pause and resume
|
||||
- one `X`-style control for cancel
|
||||
- the task center should show download speed if a real value can be captured from the download pipeline
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- redesign the `Dashboard` page into a task center
|
||||
- reduce the importance of `/jobs` and `/jobs/{id}` while keeping them available as fallback routes
|
||||
- add lane-aware scheduling rules so only one download-class job runs at a time
|
||||
- keep collect and sync jobs runnable without the global single-job bottleneck
|
||||
- add playlist-bulk sync from the playlists page
|
||||
- add structured task summary data for dashboard rows
|
||||
- add real-time download throughput display for download workers and download tasks
|
||||
- preserve the existing pause, resume, cancel, retry, and recovery model where possible
|
||||
|
||||
### Out Of Scope
|
||||
|
||||
- changing the core collect, sync, download, and upload business logic for provider behavior
|
||||
- redefining playlist status taxonomy beyond the current states
|
||||
- removing `/jobs/{id}` completely
|
||||
- redesigning the UI as a separate SPA
|
||||
- cross-machine or distributed workers
|
||||
- changing upload scheduling policy in this iteration beyond fitting into the task-center UI
|
||||
|
||||
## User Experience Design
|
||||
|
||||
## Dashboard As Task Center
|
||||
|
||||
`/dashboard` becomes the single primary operator page.
|
||||
|
||||
The page should be reorganized into three layers:
|
||||
|
||||
1. Summary cards
|
||||
2. Quick actions
|
||||
3. Main task table
|
||||
|
||||
### Summary Cards
|
||||
|
||||
The top row should remain compact and operator-focused:
|
||||
|
||||
- total jobs
|
||||
- running jobs
|
||||
- queued download jobs
|
||||
- paused jobs
|
||||
- failed or completed-with-errors jobs
|
||||
- running download songs
|
||||
|
||||
These cards are status indicators, not the main interaction surface.
|
||||
|
||||
### Quick Actions
|
||||
|
||||
Keep quick-launch actions, but make them secondary to the task table:
|
||||
|
||||
- full pipeline
|
||||
- collect only
|
||||
- sync only
|
||||
- download only
|
||||
- upload only
|
||||
|
||||
The existing manual job-creation form may remain, but it should be visually reduced so the page reads as a task center first.
|
||||
|
||||
### Main Task Table
|
||||
|
||||
Replace the current split between `Active Job`, `Recent Jobs`, and the hard jump into `/jobs/{id}` with one central task table.
|
||||
|
||||
Recommended columns:
|
||||
|
||||
- `ID`
|
||||
- `Task`
|
||||
- `Status`
|
||||
- `Scope`
|
||||
- `Primary Progress`
|
||||
- `Active Workers`
|
||||
- `Lane`
|
||||
- `Actions`
|
||||
|
||||
Each row should support inline expansion.
|
||||
|
||||
### Inline Expanded Task Details
|
||||
|
||||
Expanding a row reveals the most useful parts of the current task detail page:
|
||||
|
||||
- stage summary
|
||||
- playlist progress
|
||||
- running items
|
||||
- recent commands
|
||||
- recent errors
|
||||
|
||||
The goal is that normal pause, resume, cancel, and progress inspection no longer require navigation away from `Dashboard`.
|
||||
|
||||
## Route Roles
|
||||
|
||||
### `/dashboard`
|
||||
|
||||
Primary operator view and daily control surface.
|
||||
|
||||
### `/jobs`
|
||||
|
||||
Fallback archive-like list for all jobs. It can be simplified because the main operator path is now `Dashboard`.
|
||||
|
||||
### `/jobs/{id}`
|
||||
|
||||
Keep as a deep-link route for troubleshooting, bookmarks, and future direct links from logs or notifications. It should no longer be the primary interaction path.
|
||||
|
||||
## Playlist Page Behavior
|
||||
|
||||
The playlists page should keep its current filtering and progress features and add one new bulk action:
|
||||
|
||||
- `sync selected playlists`
|
||||
|
||||
This action creates a `sync_only` job with `playlist_scope.playlist_ids`.
|
||||
|
||||
It must:
|
||||
|
||||
- sync the selected playlists
|
||||
- update playlist-song links
|
||||
- update `song_count`
|
||||
|
||||
It must not:
|
||||
|
||||
- download song files
|
||||
- implicitly turn into `sync_download`
|
||||
|
||||
### `song_count = 0` Handling
|
||||
|
||||
Playlists with zero songs remain in the existing status model.
|
||||
|
||||
No extra dedicated state is introduced for this iteration.
|
||||
|
||||
To help operators understand what to do, the row may show a light hint such as:
|
||||
|
||||
- `0 songs, sync recommended`
|
||||
|
||||
But the filter model stays unchanged.
|
||||
|
||||
## Task Summary Model
|
||||
|
||||
The dashboard task table needs a job-summary projection that is richer than the current recent-jobs payload.
|
||||
|
||||
Each task row should expose:
|
||||
|
||||
- `id`
|
||||
- `job_type`
|
||||
- `display_name`
|
||||
- `status`
|
||||
- `scope_summary`
|
||||
- `lane_type`
|
||||
- `queue_position`
|
||||
- `primary_progress_text`
|
||||
- `primary_progress_percent`
|
||||
- `active_worker_count`
|
||||
- `can_pause`
|
||||
- `can_resume`
|
||||
- `can_cancel`
|
||||
- `expanded_detail_payload`
|
||||
|
||||
### Display Name
|
||||
|
||||
Map internal job types to friendlier operator labels. Examples:
|
||||
|
||||
- `catalog_sync` -> `Full Pipeline`
|
||||
- `collect_only` -> `Collect`
|
||||
- `sync_only` with playlist scope -> `Sync Selected Playlists`
|
||||
- `download_only` with playlist scope -> `Download Selected Playlists`
|
||||
- `sync_download` with playlist scope -> `Sync Then Download`
|
||||
|
||||
### Scope Summary
|
||||
|
||||
Examples:
|
||||
|
||||
- `All sources`
|
||||
- `12 playlists`
|
||||
- `3 sources`
|
||||
|
||||
### Primary Progress
|
||||
|
||||
The progress shown in the main task row depends on task type:
|
||||
|
||||
- collect jobs:
|
||||
- collected sources or collected pools summary
|
||||
- sync jobs:
|
||||
- synced playlists / target playlists
|
||||
- download jobs:
|
||||
- downloaded songs / target download songs
|
||||
- upload jobs:
|
||||
- uploaded files / target uploads
|
||||
|
||||
## Scheduling Design
|
||||
|
||||
## Current Limitation
|
||||
|
||||
The current runner effectively behaves like a global single-active-job scheduler.
|
||||
|
||||
That is insufficient for the new requirement because it would still block pure collect or sync jobs behind a long-running download-class job.
|
||||
|
||||
## Lane Model
|
||||
|
||||
Introduce two scheduler lanes:
|
||||
|
||||
- `download`
|
||||
- `general`
|
||||
|
||||
### Download Lane
|
||||
|
||||
Contains any job whose stage sequence includes `download`.
|
||||
|
||||
This includes:
|
||||
|
||||
- `catalog_sync`
|
||||
- `sync_download`
|
||||
- `download_only`
|
||||
- `download_upload`
|
||||
|
||||
Policy:
|
||||
|
||||
- only one download-lane job may be running at a time
|
||||
- additional download-lane jobs remain queued in lane order
|
||||
|
||||
### General Lane
|
||||
|
||||
Contains jobs without a `download` stage.
|
||||
|
||||
This includes:
|
||||
|
||||
- `collect_only`
|
||||
- `sync_only`
|
||||
|
||||
Policy:
|
||||
|
||||
- these jobs are not blocked by the single-download-job rule
|
||||
- multiple general-lane jobs may run concurrently
|
||||
|
||||
### Recommended Default Concurrency
|
||||
|
||||
Assume:
|
||||
|
||||
- `DOWNLOAD_LANE_CONCURRENCY = 1`
|
||||
- `GENERAL_LANE_CONCURRENCY = 3`
|
||||
|
||||
`GENERAL_LANE_CONCURRENCY` should be configurable later through env or runner settings, but default `3` is acceptable for the first implementation.
|
||||
|
||||
## Lane Assignment Rule
|
||||
|
||||
Lane assignment should be derived from the job stage sequence, not from separate operator flags.
|
||||
|
||||
This avoids drift between UI intent and scheduler behavior.
|
||||
|
||||
If a job contains `download` in `JOB_STAGE_SEQUENCES`, it belongs to the `download` lane.
|
||||
|
||||
## Queue Position
|
||||
|
||||
For download-lane jobs, the dashboard should expose:
|
||||
|
||||
- `running`
|
||||
- `queued #1`
|
||||
- `queued #2`
|
||||
|
||||
For general-lane jobs, queue display can be simpler:
|
||||
|
||||
- `general`
|
||||
- `running`
|
||||
- `queued`
|
||||
|
||||
The exact wording can stay simple as long as the operator can tell which jobs are blocked by the single-download rule.
|
||||
|
||||
## Runner Refactor Strategy
|
||||
|
||||
Do not rewrite stage executors.
|
||||
|
||||
Instead, refactor the scheduler layer so:
|
||||
|
||||
- lane eligibility is computed when choosing runnable jobs
|
||||
- the runner can hold one active download-lane job
|
||||
- the runner can hold multiple active general-lane jobs
|
||||
- each running job continues to use the existing stage and worker machinery
|
||||
|
||||
This keeps risk concentrated in the orchestration layer rather than in provider-specific logic.
|
||||
|
||||
## Download Speed Design
|
||||
|
||||
## Requirement
|
||||
|
||||
The task center should display real download throughput rather than parsing console text heuristically.
|
||||
|
||||
### Why Text Parsing Is Rejected
|
||||
|
||||
Many providers already emit `MB/s` in rich terminal progress, but that output is not a stable API:
|
||||
|
||||
- formats differ by provider
|
||||
- text may change without notice
|
||||
- not all clients expose identical progress lines
|
||||
|
||||
Therefore the design must use structured progress reporting inside the download pipeline.
|
||||
|
||||
## Structured Throughput Model
|
||||
|
||||
During download-stage execution, each download worker should publish structured progress fields such as:
|
||||
|
||||
- `downloaded_bytes`
|
||||
- `total_bytes`
|
||||
- `speed_bytes_per_sec`
|
||||
- `progress_percent`
|
||||
|
||||
These values should update the worker state and be aggregatable per task.
|
||||
|
||||
### Task-Level Speed
|
||||
|
||||
The dashboard row for a download-class task should show total live throughput across active download workers.
|
||||
|
||||
Example:
|
||||
|
||||
- `62 / 300 songs | 18.4 MB/s`
|
||||
|
||||
### Worker-Level Speed
|
||||
|
||||
The expanded worker section for a running download task should show, per worker:
|
||||
|
||||
- current song
|
||||
- current speed
|
||||
- downloaded bytes / total bytes when known
|
||||
|
||||
Example:
|
||||
|
||||
- `download-2 | Moonlight | 6.2 MB/s | 21.4 / 41.0 MB`
|
||||
|
||||
### Fallback Behavior
|
||||
|
||||
If structured speed is not available for a particular worker or provider:
|
||||
|
||||
- show `-`
|
||||
- do not synthesize or guess a value from text logs
|
||||
|
||||
## API Changes
|
||||
|
||||
## Dashboard Payload
|
||||
|
||||
`GET /api/dashboard` must evolve from a light summary into a task-center payload.
|
||||
|
||||
It should return:
|
||||
|
||||
- summary cards
|
||||
- quick-launch defaults
|
||||
- task-center rows
|
||||
- row detail summaries for tasks expanded by the UI
|
||||
|
||||
Each task row should include the task summary fields described above.
|
||||
|
||||
## New Playlist Sync Endpoint
|
||||
|
||||
Add:
|
||||
|
||||
- `POST /api/playlists/sync`
|
||||
|
||||
Behavior:
|
||||
|
||||
- validate `playlist_ids`
|
||||
- create `sync_only`
|
||||
- write `playlist_scope.playlist_ids`
|
||||
- return created job summary
|
||||
|
||||
Existing playlist bulk endpoints remain:
|
||||
|
||||
- `mark-wanted`
|
||||
- `unmark-wanted`
|
||||
- `download`
|
||||
- `sync-download`
|
||||
|
||||
## Job Detail Endpoint
|
||||
|
||||
`GET /api/jobs/{id}` remains the source of full detail.
|
||||
|
||||
The dashboard inline expansion may either:
|
||||
|
||||
- reuse the existing detail payload directly
|
||||
- or consume a trimmed detail projection
|
||||
|
||||
The implementation may start by reusing the current payload for safety.
|
||||
|
||||
## Data Model Extensions
|
||||
|
||||
Prefer extending existing operations tables rather than introducing a second job schema.
|
||||
|
||||
### Job Summary Computation
|
||||
|
||||
The repository layer should compute dashboard-friendly projections rather than forcing templates to derive them ad hoc.
|
||||
|
||||
### Worker Progress Extension
|
||||
|
||||
`job_workers` state must be able to carry structured download progress.
|
||||
|
||||
This may be done by:
|
||||
|
||||
- adding typed columns
|
||||
- or adding a compact JSON progress payload
|
||||
|
||||
Recommended preference:
|
||||
|
||||
- keep existing visible scalar columns
|
||||
- add a small JSON payload if multiple dynamic throughput fields are needed
|
||||
|
||||
The implementation plan can choose the exact storage form.
|
||||
|
||||
## UI Controls
|
||||
|
||||
The dashboard task table should use compact icon-first controls:
|
||||
|
||||
- pause icon when the job is pausable
|
||||
- resume icon when the job is resumable
|
||||
- cancel `X` icon when the job is cancelable
|
||||
- expand toggle for inline details
|
||||
|
||||
Low-frequency actions such as `retry item` remain inside expanded detail sections or the fallback detail page.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
## Scheduler Tests
|
||||
|
||||
Add tests that prove:
|
||||
|
||||
- only one download-lane job runs at a time
|
||||
- a second download-lane job remains queued
|
||||
- a general-lane `sync_only` job can run while a download-lane job is active
|
||||
- `catalog_sync` is correctly classified into the download lane
|
||||
|
||||
## API Tests
|
||||
|
||||
Add tests for:
|
||||
|
||||
- `POST /api/playlists/sync`
|
||||
- dashboard payload includes lane and task-summary fields
|
||||
- dashboard renders compact action controls
|
||||
- inline task detail data is available
|
||||
|
||||
## UI Rendering Tests
|
||||
|
||||
At minimum verify:
|
||||
|
||||
- dashboard contains the main task table
|
||||
- dashboard no longer depends on a jump to detail as the primary control path
|
||||
- playlists page contains `sync selected playlists`
|
||||
- download tasks render speed fields and real values when available
|
||||
|
||||
## Regression Tests
|
||||
|
||||
Protect:
|
||||
|
||||
- existing pause, resume, cancel command flow
|
||||
- `wanted_only=` empty-query compatibility
|
||||
- playlist progress rendering
|
||||
- task playlist progress rendering
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
Recommended rollout order:
|
||||
|
||||
1. add dashboard-oriented task summary repository helpers
|
||||
2. add lane-aware scheduling rules
|
||||
3. add playlist bulk sync endpoint and button
|
||||
4. redesign dashboard into the primary task center
|
||||
5. add structured download throughput reporting
|
||||
6. redeploy to NAS and verify live behavior
|
||||
|
||||
This order keeps correctness and scheduling changes ahead of cosmetic UI work.
|
||||
|
||||
## Risks
|
||||
|
||||
Primary technical risk:
|
||||
|
||||
- refactoring the runner from a single-global-job loop into a lane-aware multi-job scheduler
|
||||
|
||||
Risk reduction:
|
||||
|
||||
- keep stage executors intact
|
||||
- concentrate changes in job selection and orchestration
|
||||
- verify lane rules with focused tests before refining UI
|
||||
|
||||
Secondary risk:
|
||||
|
||||
- structured speed reporting may require touching downloader integration points across multiple providers
|
||||
|
||||
Risk reduction:
|
||||
|
||||
- start with download-stage worker instrumentation in `catalogsync`
|
||||
- expose speed only when a real structured value is available
|
||||
- degrade gracefully to `-` rather than inventing numbers
|
||||
|
||||
## Success Criteria
|
||||
|
||||
The design is successful when:
|
||||
|
||||
- operators can manage tasks primarily from `Dashboard`
|
||||
- normal control flow does not require bouncing into `/jobs/{id}`
|
||||
- multiple download-class jobs can be queued while only one runs
|
||||
- collect and sync jobs are no longer unnecessarily blocked behind downloads
|
||||
- selected playlists can be synced directly from the playlists page
|
||||
- running download tasks show meaningful live throughput in the task center
|
||||
@@ -0,0 +1,219 @@
|
||||
# Catalogsync Task Tree Dashboard Design
|
||||
|
||||
## Goal
|
||||
|
||||
Replace the current Task Center detail tables with a stable tree view:
|
||||
|
||||
- task
|
||||
- playlist
|
||||
- song
|
||||
|
||||
The new Task Center must keep task nodes visible across status transitions such as `running -> paused -> completed`, and live refresh must update existing nodes instead of rebuilding the entire task table.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The current dashboard still feels unstable for two reasons:
|
||||
|
||||
1. The expanded task detail is rendered as a large HTML block containing `Summary`, `Stages`, `Workers`, `Running Items`, and `Playlist Progress`.
|
||||
2. The frontend refresh path still calls `setTaskRows(...)`, which rebuilds the whole Task Center body and then rebinds all event handlers.
|
||||
|
||||
Even after paused tasks were kept in the query, this full redraw still recreates DOM nodes and causes visible flicker. It also makes the UI feel like a live report view instead of an operator task tree.
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- Rebuild only the `Task Center` section of `/dashboard`
|
||||
- Keep the dashboard top cards for now:
|
||||
- live snapshot
|
||||
- summary
|
||||
- quick actions
|
||||
- create job
|
||||
- playlist coverage
|
||||
- Replace the task detail block with a three-level tree:
|
||||
- task node
|
||||
- playlist child node
|
||||
- song child node
|
||||
- Keep pause/resume/cancel actions on the task node
|
||||
- Preserve expanded task and playlist state across refreshes
|
||||
- Update task status, progress, and counts in place without full Task Center redraw
|
||||
- Keep non-music resource labeling in song rows
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- Redesigning the top dashboard cards
|
||||
- Removing `/jobs`, `/songs`, or other pages
|
||||
- Changing job execution semantics
|
||||
- Changing download logic or retry semantics
|
||||
|
||||
## Chosen UI Structure
|
||||
|
||||
### Task Center Layout
|
||||
|
||||
`Task Center` becomes a single tree container instead of a table plus nested detail tables.
|
||||
|
||||
Each task node shows:
|
||||
|
||||
- expand/collapse toggle
|
||||
- task display name
|
||||
- task type
|
||||
- task status
|
||||
- scope summary
|
||||
- primary progress text and progress bar
|
||||
- active worker count
|
||||
- lane label
|
||||
- pause/resume button
|
||||
- cancel button
|
||||
|
||||
When expanded, the task shows only its playlist children.
|
||||
|
||||
Each playlist node shows:
|
||||
|
||||
- expand/collapse toggle
|
||||
- playlist name
|
||||
- source label such as `qq #64`
|
||||
- downloaded song count / total song count
|
||||
- progress bar
|
||||
- compact state summary:
|
||||
- running
|
||||
- pending
|
||||
- failed
|
||||
- skipped
|
||||
|
||||
When expanded, the playlist shows only its song children.
|
||||
|
||||
Each song node shows:
|
||||
|
||||
- sequence number
|
||||
- song name
|
||||
- singer summary
|
||||
- platform/source id summary
|
||||
- song status tag
|
||||
- optional `非音乐资源` tag
|
||||
- status note
|
||||
|
||||
This makes the page behave more like a file explorer tree and removes the distracting intermediate tables.
|
||||
|
||||
## Data Model and API Expectations
|
||||
|
||||
### Task List
|
||||
|
||||
`list_task_center_rows()` must return recent tasks for all operator-visible lifecycle states, not just active ones.
|
||||
|
||||
The intended visible states are:
|
||||
|
||||
- `queued`
|
||||
- `running`
|
||||
- `pause_requested`
|
||||
- `paused`
|
||||
- `completed`
|
||||
- `completed_with_errors`
|
||||
- `failed`
|
||||
- `canceled`
|
||||
|
||||
This ensures a task stays in the tree when its state changes; only its displayed status changes.
|
||||
|
||||
### Task Detail
|
||||
|
||||
`/api/jobs/{job_id}` may continue returning its current payload shape, but the dashboard will only consume:
|
||||
|
||||
- `job`
|
||||
- `playlist_progress`
|
||||
|
||||
The frontend will ignore `summary`, `stages`, `workers`, and `running_items` for Task Center rendering.
|
||||
|
||||
### Playlist Songs
|
||||
|
||||
`/api/jobs/{job_id}/playlists/{playlist_id}/songs` remains the lazy-loaded source for song child nodes.
|
||||
|
||||
The existing fields are sufficient:
|
||||
|
||||
- `position`
|
||||
- `song_name`
|
||||
- `singers`
|
||||
- `platform`
|
||||
- `remote_song_id`
|
||||
- `status`
|
||||
- `status_note`
|
||||
- `is_non_music_resource`
|
||||
|
||||
## Rendering Strategy
|
||||
|
||||
### Initial Render
|
||||
|
||||
The server-rendered HTML for `/dashboard` should render a task tree shell directly, not a table with hidden detail rows.
|
||||
|
||||
### Live Updates
|
||||
|
||||
The Task Center refresh path must switch from full `innerHTML` replacement to keyed DOM patching.
|
||||
|
||||
Rules:
|
||||
|
||||
1. Task nodes are keyed by `job_id`
|
||||
2. Playlist nodes are keyed by `job_id + playlist_id`
|
||||
3. Song nodes are keyed by `job_id + playlist_id + song_id/position`
|
||||
4. Existing nodes are updated in place
|
||||
5. Expanded/collapsed state is preserved in `dashboardState`
|
||||
6. Status changes never collapse or remove a visible node by themselves
|
||||
|
||||
### Removal Policy
|
||||
|
||||
Nodes may be removed only when they are absent from the latest server payload because they have truly fallen out of the visible result window, not because they changed from active to paused/completed.
|
||||
|
||||
## Refresh Model
|
||||
|
||||
### Dashboard Summary Refresh
|
||||
|
||||
The lightweight snapshot refresh may continue updating:
|
||||
|
||||
- live snapshot text
|
||||
- summary numbers
|
||||
- download stats
|
||||
- playlist coverage
|
||||
|
||||
These sections can still use simple row replacement because they are small and not interactive.
|
||||
|
||||
### Task Tree Refresh
|
||||
|
||||
Task rows must be refreshed through a dedicated keyed patch function:
|
||||
|
||||
- update existing task header fields
|
||||
- insert new task nodes
|
||||
- remove missing task nodes only when no longer returned
|
||||
- if a task is expanded, refresh its playlist subtree in place
|
||||
- if a playlist is expanded, refresh its song subtree in place when fresh data arrives
|
||||
|
||||
The Task Center must no longer call a function that rebuilds the whole container on every poll.
|
||||
|
||||
## Error Handling
|
||||
|
||||
- Failed songs remain visible in the playlist subtree with their note
|
||||
- Non-music resources remain visible and are labeled `非音乐资源`
|
||||
- If playlist song loading fails, show the error message only inside that playlist node
|
||||
- If task detail loading fails, show the error only inside that task node
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Automated
|
||||
|
||||
- repository test: task list includes completed jobs
|
||||
- API test: dashboard HTML renders the tree shell instead of detail tables
|
||||
- API test: dashboard data continues to expose task rows for live refresh
|
||||
|
||||
### Manual
|
||||
|
||||
- open `/dashboard`
|
||||
- expand one paused task
|
||||
- expand one playlist
|
||||
- wait through multiple refresh cycles
|
||||
- verify:
|
||||
- expanded task stays expanded
|
||||
- expanded playlist stays expanded
|
||||
- no `Summary / Stages / Workers / Running Items` blocks appear under tasks
|
||||
- task status changes update text only, without the whole Task Center flashing
|
||||
|
||||
## Assumptions
|
||||
|
||||
- The operator still wants the top dashboard cards retained for now
|
||||
- Recent finished tasks should remain visible in the Task Center instead of disappearing immediately
|
||||
- The current lazy-load model for song lists is acceptable as long as the node itself stays stable
|
||||
@@ -0,0 +1,268 @@
|
||||
# Playlist Export To Local ZIP Design
|
||||
|
||||
**Date:** 2026-04-18
|
||||
|
||||
## Goal
|
||||
|
||||
统一 `Export` 的用户语义:
|
||||
|
||||
- 用户看到的 `Export` / `Export Selected` 都表示“导出到前端本地电脑”
|
||||
- NAS 上的 `playlists/` 目录继续保留,但只作为服务端缓存与打包来源
|
||||
- 导出流程变成:
|
||||
- 先确保 NAS 上已有歌单目录
|
||||
- 再由服务端打包 ZIP
|
||||
- 最后由浏览器下载到用户本地
|
||||
|
||||
## Current Problem
|
||||
|
||||
当前系统存在语义混淆:
|
||||
|
||||
- 页面里的 `Export Folder` / `Export Selected Playlists` 实际上是在 NAS 上生成或刷新 `playlists/<歌单目录>/`
|
||||
- 用户会自然把“导出”理解成“下载到当前浏览器所在电脑”
|
||||
- 结果是:
|
||||
- NAS 目录导出和前端本地导出没有区分
|
||||
- 用户看到按钮名时会误解
|
||||
- 批量导出到本地还没有真正落地
|
||||
|
||||
## Target UX
|
||||
|
||||
### 1. Export means browser download
|
||||
|
||||
- 歌单详情弹窗按钮文案:
|
||||
- `Export Folder` -> `Export`
|
||||
- 歌单列表批量按钮文案:
|
||||
- `Export Selected Playlists` -> `Export Selected`
|
||||
|
||||
这两个按钮对用户都统一表示:
|
||||
|
||||
- 导出到当前前端本地
|
||||
- 浏览器触发文件下载
|
||||
|
||||
### 2. NAS playlists directory becomes internal cache
|
||||
|
||||
- NAS 上继续维护:
|
||||
- `playlists/<歌单名_歌单ID>/playlist.yaml`
|
||||
- `.playlist_meta.json`
|
||||
- `covers/...`
|
||||
- 但这不再是主要用户可见动作
|
||||
- 用户真正点击 `Export` 时:
|
||||
- 若 NAS 目录已存在且可用,则直接复用
|
||||
- 若不存在,则先生成
|
||||
- 然后打包为 ZIP 返回给浏览器
|
||||
|
||||
### 3. Single playlist export
|
||||
|
||||
- 单歌单点击 `Export`
|
||||
- 返回一个 ZIP
|
||||
- ZIP 内包含该歌单目录完整内容
|
||||
|
||||
建议 ZIP 文件名:
|
||||
|
||||
```text
|
||||
playlist-<platform>-<playlist_id>-<sanitized_name>.zip
|
||||
```
|
||||
|
||||
ZIP 结构:
|
||||
|
||||
```text
|
||||
歌单名_歌单ID/
|
||||
playlist.yaml
|
||||
.playlist_meta.json
|
||||
covers/
|
||||
playlist-cover.jpg
|
||||
song-1-xxxx.jpg
|
||||
```
|
||||
|
||||
### 4. Multi-playlist export
|
||||
|
||||
- 批量选择多个歌单后点击 `Export Selected`
|
||||
- 返回一个 ZIP
|
||||
- ZIP 内包含多个歌单目录
|
||||
|
||||
建议 ZIP 文件名:
|
||||
|
||||
```text
|
||||
playlists-export-YYYYMMDD-HHMMSS.zip
|
||||
```
|
||||
|
||||
ZIP 结构:
|
||||
|
||||
```text
|
||||
playlists/
|
||||
歌单A_123/
|
||||
playlist.yaml
|
||||
.playlist_meta.json
|
||||
covers/
|
||||
...
|
||||
歌单B_456/
|
||||
playlist.yaml
|
||||
.playlist_meta.json
|
||||
covers/
|
||||
...
|
||||
```
|
||||
|
||||
## Export Readiness Rules
|
||||
|
||||
导出前按歌单状态分流:
|
||||
|
||||
- `downloaded`
|
||||
- 直接确保 NAS 目录存在
|
||||
- 然后进入打包
|
||||
- `unsynced`
|
||||
- 不能在同一个 HTTP 请求里边下载边等待
|
||||
- 创建 `sync_download` 后台任务
|
||||
- 当前请求返回“已入队,暂时不能立即导出”
|
||||
- `not_downloaded`
|
||||
- 创建 `download_only` 后台任务
|
||||
- 当前请求返回“已入队,暂时不能立即导出”
|
||||
- `partial`
|
||||
- 创建 `download_only` 后台任务
|
||||
- 当前请求返回“已入队,暂时不能立即导出”
|
||||
- `downloading`
|
||||
- 不重复创建任务
|
||||
- 返回“正在处理中,稍后再导出”
|
||||
|
||||
批量导出规则再额外收紧一条:
|
||||
|
||||
- 只有“所选全部歌单都已可导出”时,才返回一个最终 ZIP 下载
|
||||
- 只要所选集合里有任意歌单需要先同步/下载,就本次不返回部分 ZIP
|
||||
- 这样可以保证 `Export Selected` 的结果始终对应“这次选中的完整集合”
|
||||
|
||||
## API Design
|
||||
|
||||
### Keep
|
||||
|
||||
- `GET /api/playlists/{playlist_id}/export-folder`
|
||||
- 保留
|
||||
- 作为服务端目录刷新与定位能力
|
||||
- 不作为最终用户主导出接口
|
||||
|
||||
### Add
|
||||
|
||||
#### `GET /api/playlists/{playlist_id}/export.zip`
|
||||
|
||||
行为:
|
||||
|
||||
- 读取歌单状态
|
||||
- 若已可导出:
|
||||
- 确保 NAS 目录存在
|
||||
- 临时打包为 ZIP
|
||||
- 以二进制下载响应返回
|
||||
- 若尚不可导出:
|
||||
- 返回 `409`
|
||||
- 响应体说明当前状态与下一步动作建议
|
||||
|
||||
#### `POST /api/playlists/export-zip`
|
||||
|
||||
请求体:
|
||||
|
||||
```json
|
||||
{
|
||||
"playlist_ids": [1, 2, 3],
|
||||
"requested_by": "ops-console"
|
||||
}
|
||||
```
|
||||
|
||||
返回分两类:
|
||||
|
||||
1. 所选全部歌单都可立即导出
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ready",
|
||||
"download_url": "/api/exports/bundles/<token>.zip",
|
||||
"playlist_ids": [1, 2, 3]
|
||||
}
|
||||
```
|
||||
|
||||
2. 所选集合里有歌单尚不可立即导出
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "queued",
|
||||
"message": "2 playlists queued for sync/download before export.",
|
||||
"download_job": {...},
|
||||
"sync_download_job": {...},
|
||||
"blocked_playlist_ids": [2, 3],
|
||||
"ready_playlist_ids": [1]
|
||||
}
|
||||
```
|
||||
|
||||
注意:
|
||||
|
||||
- `ready_playlist_ids` 只是状态说明,不表示本次会先下载一个“部分 ZIP”
|
||||
- 当前批次只要不是 `status=ready`,前端就不触发本地下载
|
||||
- 用户应等待后台任务完成后再次点击 `Export`
|
||||
|
||||
### Optional helper endpoint
|
||||
|
||||
#### `GET /api/exports/bundles/{token}.zip`
|
||||
|
||||
- 下载已准备好的临时 ZIP
|
||||
- token 指向临时打包结果
|
||||
- 便于前端先请求准备,再触发浏览器下载
|
||||
|
||||
## Backend Packaging Strategy
|
||||
|
||||
### Why not build ZIP directly from DB payload
|
||||
|
||||
不建议直接从数据库临时拼:
|
||||
|
||||
- YAML、封面、目录结构已经在 `playlists/` 目录中固化
|
||||
- 当前系统已经有一套歌单目录生成链路
|
||||
- 直接复用目录再打包,行为更稳定,也更容易和 NAS 目录保持一致
|
||||
|
||||
### Packaging flow
|
||||
|
||||
1. 收到 export 请求
|
||||
2. 判断歌单是否立即可导出
|
||||
3. 对可导出的歌单调用 `ensure_playlist_artifacts_for_playlist(...)`
|
||||
4. 收集歌单目录路径
|
||||
5. 在临时目录生成 ZIP
|
||||
6. 通过下载响应返回给前端
|
||||
7. 请求结束后删除临时 ZIP,或短期缓存后清理
|
||||
|
||||
## Frontend Behavior
|
||||
|
||||
### Single playlist modal
|
||||
|
||||
- 按钮文案改成 `Export`
|
||||
- 点击后:
|
||||
- 调用单歌单 ZIP 导出接口
|
||||
- 若可立即导出,浏览器直接下载
|
||||
- 若不可立即导出,弹出状态提示
|
||||
|
||||
### Playlist list bulk export
|
||||
|
||||
- 按钮文案改成 `Export Selected`
|
||||
- 点击后:
|
||||
- 调用批量 ZIP 导出接口
|
||||
- 若可立即导出,则自动开始下载一个 ZIP
|
||||
- 若需先同步/下载,则提示已创建后台任务
|
||||
|
||||
## Error Handling
|
||||
|
||||
- `404`
|
||||
- 歌单不存在
|
||||
- `409`
|
||||
- 歌单尚未同步或下载,不可立即导出
|
||||
- `500`
|
||||
- 打包失败
|
||||
- 封面刷新失败不应导致整个导出失败;只要目录可生成,就继续打包
|
||||
|
||||
## Naming Rules
|
||||
|
||||
- 用户按钮文案只用 `Export` / `Export Selected`
|
||||
- 若页面仍需保留查看 NAS 路径的能力,应另命名为:
|
||||
- `Show NAS Folder`
|
||||
- 或 `Show Server Folder`
|
||||
|
||||
不能再把“导出到本地”和“在 NAS 生成目录”共用一个 `Export` 名字。
|
||||
|
||||
## Testing
|
||||
|
||||
- 单歌单 `Export` 可返回 ZIP 下载
|
||||
- 批量 `Export Selected` 可返回包含多个歌单目录的 ZIP
|
||||
- 已有 NAS 目录时不重复生成
|
||||
- 未同步/未下载时返回 queued/blocked,而不是长时间卡住 HTTP 请求
|
||||
- 前端按钮文案与实际行为一致
|
||||
@@ -0,0 +1,93 @@
|
||||
# Playlist Export On Download Design
|
||||
|
||||
**Date:** 2026-04-18
|
||||
|
||||
## Goal
|
||||
|
||||
把 `playlists/` 目录产出从 `sync` 链路移到“所选歌单下载链路”,并新增“输出所选歌单”能力,让已下载歌单可单独补输出,未同步/未下载歌单则自动走 `sync + download` 后输出。
|
||||
|
||||
## Current Problem
|
||||
|
||||
- `CatalogSyncService.sync_playlist_row()` 目前会直接写 `playlists/<歌单名_id>/`。
|
||||
- 这会导致“只是同步歌单”也生成导出目录,和用户期望不一致。
|
||||
- 已下载歌单想补生成 `playlist.yaml` / 封面时,只能重新走旧链路,容易让人误解成要重下歌曲。
|
||||
|
||||
## Target Behavior
|
||||
|
||||
### 1. Sync no longer writes playlist artifacts
|
||||
|
||||
- `sync` 只负责:
|
||||
- 拉歌单歌曲
|
||||
- 回填歌单热度
|
||||
- 更新歌手池/歌曲关联
|
||||
- `sync` 完成后不再自动写 `playlists/`。
|
||||
|
||||
### 2. Scoped download writes playlist artifacts
|
||||
|
||||
- 当任务是“所选歌单”的 `download_only` 或 `sync_download`,并且下载阶段结束后:
|
||||
- 为该任务作用域内的歌单刷新 `playlists/<歌单名_id>/`
|
||||
- 写入最新 `playlist.yaml`
|
||||
- 拉取歌单封面和歌曲封面
|
||||
- 这样已下载歌曲会把 `local_file_path` 带进导出目录,但不会要求整库重下。
|
||||
|
||||
### 3. Add Export Selected Playlists action
|
||||
|
||||
- 歌单页增加 `Export Selected Playlists` 按钮。
|
||||
- 后端按歌单状态分流:
|
||||
- `unsynced` -> 放入 `sync_download`
|
||||
- `not_downloaded` / `partial` -> 放入 `download_only`
|
||||
- `downloaded` -> 直接按当前数据库输出到 `playlists/`
|
||||
- 一个请求允许同时返回:
|
||||
- 直接导出的歌单
|
||||
- 新建的 `download_only` 任务
|
||||
- 新建的 `sync_download` 任务
|
||||
|
||||
### 4. Single-playlist export remains available
|
||||
|
||||
- `GET /api/playlists/{id}/export-folder` 保留。
|
||||
- 它只按当前数据库状态刷新/返回歌单目录。
|
||||
- 不负责自动触发下载。
|
||||
|
||||
## Data Rules
|
||||
|
||||
- 是否“已下载”继续以数据库中的本地 `file_locations` 为准。
|
||||
- `playlist.yaml` 中的 `local_file_path` 只来源于数据库已有的本地活跃位置。
|
||||
- 导出目录刷新不写入新的歌曲下载记录,只是把已有数据库状态重新投影到文件系统。
|
||||
|
||||
## API Changes
|
||||
|
||||
- 保留:`GET /api/playlists/{playlist_id}/export-folder`
|
||||
- 新增:`POST /api/playlists/export`
|
||||
|
||||
请求体:
|
||||
|
||||
```json
|
||||
{
|
||||
"playlist_ids": [1, 2, 3],
|
||||
"requested_by": "ops-console"
|
||||
}
|
||||
```
|
||||
|
||||
响应体:
|
||||
|
||||
```json
|
||||
{
|
||||
"exported_playlist_ids": [1],
|
||||
"exported_count": 1,
|
||||
"download_job": {"id": 11},
|
||||
"sync_download_job": {"id": 12}
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- 复用现有歌单状态判定逻辑,避免前后端自己猜状态。
|
||||
- 下载后写导出目录的触发点放在 `OpsRunner` 的下载阶段完成后,而不是单首歌曲完成后。
|
||||
- 仅对带 `playlist_scope.playlist_ids` 的下载任务执行自动导出,避免“全量下载”顺手刷全库导出目录。
|
||||
|
||||
## Testing
|
||||
|
||||
- `sync_playlist_row()` 不再生成 `playlists/`
|
||||
- 作用域下载任务完成后会调用歌单导出
|
||||
- `POST /api/playlists/export` 对不同状态的歌单正确分流
|
||||
- 已下载歌单直接导出,不创建多余下载任务
|
||||
@@ -0,0 +1,463 @@
|
||||
# Catalogsync Download Dual-Pool Pipeline Design
|
||||
|
||||
## Goal
|
||||
|
||||
Improve real download concurrency without changing the sync stage or introducing a sync-time download URL cache.
|
||||
|
||||
The current bottleneck is not the byte-transfer implementation itself. The real bottleneck is that each download worker performs two very different jobs in sequence:
|
||||
|
||||
1. resolve a usable download source and URL
|
||||
2. transfer audio bytes and record the finished file
|
||||
|
||||
In production, source resolution often takes tens of seconds while the final audio transfer may take around one second. As a result, `DOWNLOAD_WORKERS=10` behaves like ten mixed workers waiting on resolve work instead of ten true download workers.
|
||||
|
||||
This design splits the download stage into a two-pool in-memory pipeline:
|
||||
|
||||
- `resolver pool`
|
||||
- `download pool`
|
||||
|
||||
The sync stage remains unchanged. Songs are still stored as deferred snapshots and download URLs are still resolved at download time.
|
||||
|
||||
## Confirmed Decisions
|
||||
|
||||
The following points were confirmed during design discussion:
|
||||
|
||||
- do not change the sync stage
|
||||
- do not introduce a sync-time download URL cache as part of this iteration
|
||||
- focus only on download-stage behavior
|
||||
- the target outcome is that download workers spend their time on actual downloads instead of long source-resolution work
|
||||
- UI clarity matters:
|
||||
- operators should be able to tell which workers are resolving and which workers are downloading
|
||||
- existing database schema should be preserved if possible
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- split the download stage into resolver workers and downloader workers
|
||||
- keep the existing job, stage, and item lifecycle model
|
||||
- preserve existing deferred snapshot storage
|
||||
- preserve current local file recording and quality detection behavior
|
||||
- surface resolver activity and download activity clearly in worker state
|
||||
- keep pause, cancel, and recovery semantics compatible with the current runner
|
||||
|
||||
### Out Of Scope
|
||||
|
||||
- changing playlist sync behavior
|
||||
- persisting resolved download URLs across runs
|
||||
- redesigning source ranking logic
|
||||
- changing upload behavior
|
||||
- changing the meaning of song uniqueness in the database
|
||||
- introducing distributed workers or external queues
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Current download-stage flow:
|
||||
|
||||
1. a runner worker claims a download item
|
||||
2. the worker calls `CatalogDownloader.download_song_row(...)`
|
||||
3. inside that flow, the same worker:
|
||||
- deserializes the deferred snapshot
|
||||
- resolves a usable source across multiple providers
|
||||
- downloads the final audio file
|
||||
- records the local file
|
||||
|
||||
This model creates two user-visible problems:
|
||||
|
||||
- most workers appear idle from a transfer perspective because they are blocked in source resolution
|
||||
- byte-transfer concurrency is much lower than the configured worker count
|
||||
|
||||
Recent production measurements showed the pattern clearly:
|
||||
|
||||
- source resolution commonly takes about `77-83s`
|
||||
- actual file download commonly takes about `1s`
|
||||
|
||||
So the current worker pool is structurally spending most of its time in the wrong phase.
|
||||
|
||||
## Approaches Considered
|
||||
|
||||
### Approach A: Keep Single Pool And Only Improve UI
|
||||
|
||||
Show resolver activity more clearly, but keep each worker as `resolve + download`.
|
||||
|
||||
Pros:
|
||||
|
||||
- smallest code change
|
||||
- no pipeline coordination logic
|
||||
|
||||
Cons:
|
||||
|
||||
- does not materially improve true download concurrency
|
||||
- preserves the main performance bottleneck
|
||||
|
||||
Decision:
|
||||
|
||||
- rejected for this task because it improves observability but not throughput
|
||||
|
||||
### Approach B: Implement Dual Pools Inside `CatalogDownloader`
|
||||
|
||||
Move queueing and split-pool logic into `CatalogDownloader`.
|
||||
|
||||
Pros:
|
||||
|
||||
- conceptually local to download code
|
||||
- useful for non-ops batch paths
|
||||
|
||||
Cons:
|
||||
|
||||
- mismatches the current ops runner lifecycle
|
||||
- complicates job item ownership, pause, cancel, and worker naming
|
||||
- less natural for the NAS task center, which already manages workers at runner level
|
||||
|
||||
Decision:
|
||||
|
||||
- not preferred for this iteration
|
||||
|
||||
### Approach C: Implement Dual Pools At Download Stage Runner Level
|
||||
|
||||
Create a download-stage pipeline in the ops runner:
|
||||
|
||||
- resolver workers claim items and produce ready-to-download tasks
|
||||
- downloader workers consume ready tasks and perform final transfer
|
||||
|
||||
Pros:
|
||||
|
||||
- fits current job/stage/item orchestration naturally
|
||||
- keeps worker ownership explicit
|
||||
- lets dashboard show separate resolver and downloader workers
|
||||
- delivers real transfer concurrency gains without changing sync behavior
|
||||
|
||||
Cons:
|
||||
|
||||
- more control-flow complexity in the runner
|
||||
- requires careful queue shutdown and pause/cancel handling
|
||||
|
||||
Decision:
|
||||
|
||||
- recommended
|
||||
|
||||
## Recommended Design
|
||||
|
||||
## High-Level Architecture
|
||||
|
||||
During a `download` stage, the runner will create a bounded in-memory queue:
|
||||
|
||||
- `ready_queue`
|
||||
|
||||
The stage will use two thread pools:
|
||||
|
||||
- `resolver pool`
|
||||
- `download pool`
|
||||
|
||||
### Resolver Pool Responsibilities
|
||||
|
||||
- claim pending download items
|
||||
- check whether the song is already downloaded
|
||||
- build the download row
|
||||
- resolve a usable `SongInfo` with a valid download URL
|
||||
- publish a `ResolvedDownloadTask` into `ready_queue`
|
||||
- mark the item failed immediately if resolution cannot produce a usable download target
|
||||
|
||||
### Download Pool Responsibilities
|
||||
|
||||
- consume `ResolvedDownloadTask` instances from `ready_queue`
|
||||
- execute actual file download only
|
||||
- emit transfer progress
|
||||
- record local file metadata
|
||||
- mark the item succeeded or failed
|
||||
|
||||
This separates long-latency provider resolution from short, bandwidth-heavy transfer work.
|
||||
|
||||
## New Internal Data Model
|
||||
|
||||
Introduce an internal in-memory task object for the stage, for example:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ResolvedDownloadTask:
|
||||
item_id: int
|
||||
row: dict[str, Any]
|
||||
resolved_song_info: Any
|
||||
display_text: str
|
||||
target_library_root: Path
|
||||
```
|
||||
|
||||
This object is not persisted to the database in this iteration.
|
||||
|
||||
## Worker Model
|
||||
|
||||
The dashboard should show two worker families for a running download stage:
|
||||
|
||||
- `resolve-1`, `resolve-2`, ...
|
||||
- `download-1`, `download-2`, ...
|
||||
|
||||
This is intentional. The operator should be able to distinguish:
|
||||
|
||||
- workers currently finding a usable source
|
||||
- workers currently transferring bytes
|
||||
|
||||
`transfer_stats` should continue to count only workers with real transfer speed values.
|
||||
|
||||
## Download Stage Flow
|
||||
|
||||
### Step 1: Stage Startup
|
||||
|
||||
When the runner enters a `download` stage:
|
||||
|
||||
1. compute total worker budget from existing configuration
|
||||
2. split it into resolver and downloader counts
|
||||
3. create a bounded `ready_queue`
|
||||
4. start resolver pool and downloader pool
|
||||
|
||||
### Step 2: Item Resolution
|
||||
|
||||
Each resolver worker loops until:
|
||||
|
||||
- no more claimable items remain
|
||||
- pause or cancel is requested
|
||||
- pipeline shutdown is triggered
|
||||
|
||||
For each claimed item:
|
||||
|
||||
1. load row data
|
||||
2. skip immediately if already downloaded
|
||||
3. emit resolver progress such as `resolving source qq (1/6)`
|
||||
4. call a new downloader API that resolves but does not download
|
||||
5. enqueue a `ResolvedDownloadTask` on success
|
||||
6. mark failed on resolution failure
|
||||
|
||||
### Step 3: Pure Download Execution
|
||||
|
||||
Each downloader worker loops until:
|
||||
|
||||
- a shutdown sentinel is received
|
||||
- pause or cancel is requested and the queue has drained according to the chosen shutdown policy
|
||||
|
||||
For each resolved task:
|
||||
|
||||
1. emit `starting download via <platform>`
|
||||
2. monitor file growth and emit transfer stats
|
||||
3. record the local file on success
|
||||
4. mark the item succeeded or failed
|
||||
|
||||
## CatalogDownloader API Refactor
|
||||
|
||||
Keep the current public behavior but split the implementation into two explicit phases.
|
||||
|
||||
### New Methods
|
||||
|
||||
- `resolve_song_row(...) -> ResolvedDownloadPayload | None`
|
||||
- `download_resolved_song(...) -> bool`
|
||||
|
||||
Where:
|
||||
|
||||
- `resolve_song_row(...)` handles snapshot deserialization, source resolution, target directory selection, and worker text for the resolver phase
|
||||
- `download_resolved_song(...)` performs only final download, monitor setup, file recording, and quality detection
|
||||
|
||||
### Compatibility Method
|
||||
|
||||
Keep:
|
||||
|
||||
- `download_song_row(...)`
|
||||
|
||||
But turn it into a compatibility wrapper:
|
||||
|
||||
1. resolve
|
||||
2. download
|
||||
|
||||
This preserves existing unit-test entry points and any non-ops call sites.
|
||||
|
||||
## Worker State Design
|
||||
|
||||
Resolver workers should update:
|
||||
|
||||
- `current_song_id`
|
||||
- `current_display_text`
|
||||
- `last_progress_text`
|
||||
|
||||
Example messages:
|
||||
|
||||
- `resolving source qq (1/6)`
|
||||
- `resolving source kuwo (2/6)`
|
||||
- `resolved via qq`
|
||||
|
||||
Downloader workers should update:
|
||||
|
||||
- `current_song_id`
|
||||
- `current_display_text`
|
||||
- `last_progress_text`
|
||||
- `downloaded_bytes`
|
||||
- `total_bytes`
|
||||
- `speed_bytes_per_sec`
|
||||
- `progress_percent`
|
||||
|
||||
Example messages:
|
||||
|
||||
- `starting download via qq`
|
||||
- `12.00MB/48.00MB`
|
||||
|
||||
## Concurrency Split
|
||||
|
||||
Do not require a schema change or mandatory new env vars for the first version.
|
||||
|
||||
Recommended default behavior:
|
||||
|
||||
- if total download worker budget is `1`, use `1 resolver, 0 downloader` is invalid, so coerce to single-thread compatibility path
|
||||
- if total is `2`, use `1 resolver + 1 downloader`
|
||||
- if total is `>= 3`, use approximately `30% resolver` and `70% downloader`
|
||||
|
||||
Initial recommended rule:
|
||||
|
||||
```text
|
||||
resolver_workers = max(1, min(3, total_workers // 3))
|
||||
download_workers = max(1, total_workers - resolver_workers)
|
||||
```
|
||||
|
||||
For `DOWNLOAD_WORKERS=10`, this gives:
|
||||
|
||||
- `3 resolver`
|
||||
- `7 downloader`
|
||||
|
||||
This is a reasonable first cut and avoids over-investing worker budget in resolution.
|
||||
|
||||
## Queue Design
|
||||
|
||||
Use a bounded in-memory queue to avoid resolver workers running too far ahead.
|
||||
|
||||
Recommended initial capacity:
|
||||
|
||||
- `download_workers * 2`
|
||||
|
||||
Why bounded:
|
||||
|
||||
- prevents unbounded memory growth
|
||||
- keeps resolution work closer to actual download demand
|
||||
- simplifies pause and cancel behavior
|
||||
|
||||
## Pause, Cancel, And Shutdown Behavior
|
||||
|
||||
### Pause
|
||||
|
||||
When pause is requested:
|
||||
|
||||
- resolver workers stop claiming new items
|
||||
- downloader workers may finish in-flight downloads
|
||||
- stage reconciliation remains based on existing item states
|
||||
|
||||
This matches current expectations better than attempting hard interruption of active downloads.
|
||||
|
||||
### Cancel
|
||||
|
||||
When cancel is requested:
|
||||
|
||||
- resolver workers stop claiming new items immediately
|
||||
- downloader workers stop after their current task boundary where possible
|
||||
- no new resolved tasks should be enqueued after cancellation is observed
|
||||
|
||||
### Queue Shutdown
|
||||
|
||||
After resolver workers finish, the runner should send explicit queue sentinels so downloader workers can exit cleanly once the queue drains.
|
||||
|
||||
## Failure Handling
|
||||
|
||||
### Resolution Failure
|
||||
|
||||
If resolution cannot produce a valid downloadable `SongInfo`:
|
||||
|
||||
- mark the item failed immediately
|
||||
- do not enqueue it for download
|
||||
|
||||
### Download Failure
|
||||
|
||||
If pure download fails after resolution:
|
||||
|
||||
- mark the item failed
|
||||
- preserve the existing error formatting model
|
||||
|
||||
### Resolver Success But Queue/Shutdown Race
|
||||
|
||||
If the pipeline is shutting down and a resolver has a resolved task ready:
|
||||
|
||||
- prefer not enqueuing new work after pause/cancel has been observed
|
||||
- let the item remain in a recoverable state according to current reconciliation rules
|
||||
|
||||
The first implementation should prefer correctness over aggressive continuation.
|
||||
|
||||
## Why This Improves Throughput
|
||||
|
||||
Under the current model, ten workers spend most of their time waiting on provider resolution.
|
||||
|
||||
Under the dual-pool model:
|
||||
|
||||
- a small resolver pool continues finding usable sources
|
||||
- a larger downloader pool stays focused on byte transfer
|
||||
|
||||
This does not make provider resolution free, but it stops long resolution latency from occupying the same worker budget needed for real downloads.
|
||||
|
||||
The expected operator-visible result is:
|
||||
|
||||
- multiple downloader workers can show real transfer progress concurrently
|
||||
- resolver workers remain visible as separate activity instead of appearing as fake download workers
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
## Unit Tests
|
||||
|
||||
Extend downloader tests to cover:
|
||||
|
||||
- `resolve_song_row(...)` returning a resolved payload without downloading
|
||||
- `download_resolved_song(...)` preserving existing progress and file-recording behavior
|
||||
- compatibility wrapper `download_song_row(...)` still working
|
||||
|
||||
## Runner Tests
|
||||
|
||||
Add runner tests for:
|
||||
|
||||
- worker split calculation
|
||||
- resolver workers feeding downloader workers through a queue
|
||||
- successful completion of mixed resolved tasks
|
||||
- pause and cancel behavior while the queue is non-empty
|
||||
- clean worker shutdown after resolver completion
|
||||
|
||||
## Dashboard-Oriented Tests
|
||||
|
||||
Add ops tests to verify:
|
||||
|
||||
- resolver workers appear with resolver progress text
|
||||
- downloader workers expose transfer metrics
|
||||
- aggregate transfer stats ignore resolver-only workers
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
1. refactor `CatalogDownloader` into resolve-only and download-only phases
|
||||
2. add dual-pool execution path for the download stage in the runner
|
||||
3. keep the old single-call wrapper for compatibility
|
||||
4. update worker naming and dashboard expectations
|
||||
5. run targeted NAS verification:
|
||||
- confirm simultaneous non-zero transfer speed on more than one downloader worker
|
||||
- confirm resolver workers remain visible separately
|
||||
|
||||
## Open Questions Resolved In This Design
|
||||
|
||||
- Should sync be changed to resolve URLs early?
|
||||
- No.
|
||||
|
||||
- Should this iteration add persistent URL caching?
|
||||
- No.
|
||||
|
||||
- Should resolver and downloader state share the same worker names?
|
||||
- No. Separate names are clearer and better match reality.
|
||||
|
||||
- Should the first version require schema changes?
|
||||
- No.
|
||||
|
||||
## Summary
|
||||
|
||||
The recommended change is to keep deferred snapshots exactly as they are and redesign only the download-stage execution model.
|
||||
|
||||
Instead of ten mixed workers doing `resolve + download`, the system should run a two-pool pipeline:
|
||||
|
||||
- a small resolver pool that turns deferred snapshots into ready download tasks
|
||||
- a larger downloader pool that performs real file transfer
|
||||
|
||||
This is the smallest architecture change that directly targets the current bottleneck while preserving the existing sync model and database schema.
|
||||
@@ -0,0 +1,324 @@
|
||||
# Catalogsync Resolver Source Ranking Design
|
||||
|
||||
## Goal
|
||||
|
||||
Improve resolver throughput without sacrificing cross-run learning by introducing a persistent, isolated source-ranking store.
|
||||
|
||||
The resolver should keep treating the song's original platform as the preferred source, but fallback order should become adaptive instead of fixed. The adaptive order must:
|
||||
|
||||
- learn over time across jobs and restarts
|
||||
- be grouped by original source instead of using one global ranking
|
||||
- stay isolated from the main catalog business tables
|
||||
- preserve the current "keep trying later sources if earlier ones fail" behavior
|
||||
|
||||
## Confirmed Decisions
|
||||
|
||||
The following points were confirmed during design discussion:
|
||||
|
||||
- the ranking model is grouped by original source
|
||||
- statistics must persist across tasks and service restarts
|
||||
- the statistics store must be isolated from the main business schema
|
||||
- the original source is still tried first
|
||||
- after the warmup threshold is reached, fallback should try the top two ranked sources first
|
||||
- if the top two fallback sources fail, resolver must continue trying the remaining sources
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- add a dedicated resolver statistics SQLite side database
|
||||
- record persistent fallback attempt and success statistics by `(origin_source, candidate_source)`
|
||||
- use fallback statistics to reorder sources after a warmup threshold
|
||||
- keep the original source as the first attempt
|
||||
- preserve existing resolver matching and candidate selection logic within a single source
|
||||
- cover side-database initialization, repository methods, ranking logic, and resolver behavior with tests
|
||||
|
||||
### Out Of Scope
|
||||
|
||||
- changing sync-stage behavior
|
||||
- caching download URLs across runs
|
||||
- changing song uniqueness rules
|
||||
- replacing the current matching heuristics inside a source
|
||||
- adding a UI for resolver statistics in this iteration
|
||||
- distributed or external metrics storage
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The current resolver still spends too much time in fallback traversal.
|
||||
|
||||
Today, resolver behavior is:
|
||||
|
||||
1. derive the original platform as `preferred_source`
|
||||
2. try the preferred source first
|
||||
3. if preferred-source fast return does not happen, continue through the configured fallback list
|
||||
4. within fallback traversal, source order is static and does not learn from production outcomes
|
||||
|
||||
This causes two operational problems:
|
||||
|
||||
- fallback time is longer than necessary because low-yield sources keep being retried early
|
||||
- the system does not accumulate knowledge from prior jobs, so every restart returns to the same static ordering
|
||||
|
||||
The result is that resolver throughput remains bursty even after the dual-pool pipeline work, because the ready queue is still fed by a fallback strategy that does not adapt.
|
||||
|
||||
## Approaches Considered
|
||||
|
||||
### Approach A: Global Ranking For All Sources
|
||||
|
||||
Keep one success-rate table for all candidate sources regardless of original platform.
|
||||
|
||||
Pros:
|
||||
|
||||
- simplest data model
|
||||
- easiest ranking query
|
||||
|
||||
Cons:
|
||||
|
||||
- mixes very different source relationships
|
||||
- large-volume platforms can dominate the ranking
|
||||
- does not reflect that `qq -> kuwo` and `netease -> kuwo` may behave differently
|
||||
|
||||
Decision:
|
||||
|
||||
- rejected because the learning model should follow the original source
|
||||
|
||||
### Approach B: In-Memory Per-Run Learning Only
|
||||
|
||||
Track statistics only for the current job and discard them at task end.
|
||||
|
||||
Pros:
|
||||
|
||||
- no schema work
|
||||
- easy to experiment with
|
||||
|
||||
Cons:
|
||||
|
||||
- restarts lose all learning
|
||||
- long warmup every time
|
||||
- directly conflicts with the requirement for cross-run reuse
|
||||
|
||||
Decision:
|
||||
|
||||
- rejected
|
||||
|
||||
### Approach C: Persistent Side Database Grouped By Original Source
|
||||
|
||||
Store statistics in a dedicated SQLite side database keyed by original source and fallback source.
|
||||
|
||||
Pros:
|
||||
|
||||
- matches the confirmed grouping model
|
||||
- survives restarts and future jobs
|
||||
- keeps analytics-style tables isolated from the main business schema
|
||||
- easy to evolve independently from catalog tables
|
||||
|
||||
Cons:
|
||||
|
||||
- requires one more database file and repository
|
||||
- adds coordination between resolver and statistics store
|
||||
|
||||
Decision:
|
||||
|
||||
- recommended
|
||||
|
||||
## Recommended Design
|
||||
|
||||
## High-Level Architecture
|
||||
|
||||
Add a dedicated resolver statistics store, for example:
|
||||
|
||||
- `resolver_stats.db`
|
||||
|
||||
This database is initialized separately from `catalogsync.db` and contains only resolver-learning tables.
|
||||
|
||||
The main download flow remains:
|
||||
|
||||
1. build `target_song_info`
|
||||
2. determine `preferred_source`
|
||||
3. try preferred source first
|
||||
4. reorder fallback sources using persistent statistics when warmup criteria are met
|
||||
5. try ranked top two fallback sources first
|
||||
6. if still unresolved, continue the remaining fallback sources in ranked order
|
||||
|
||||
The existing resolver still owns matching, candidate picking, and final candidate selection inside each source.
|
||||
|
||||
## Statistics Model
|
||||
|
||||
The learning key is:
|
||||
|
||||
- `origin_source`
|
||||
- `candidate_source`
|
||||
|
||||
Where:
|
||||
|
||||
- `origin_source` is the normalized original platform for the song being resolved
|
||||
- `candidate_source` is a fallback source actually attempted after the original source path failed
|
||||
|
||||
Statistics are recorded only for fallback attempts. Preferred-source attempts are not stored in this side database for the first iteration because the ranking problem is specifically about fallback order.
|
||||
|
||||
### Stored Counters
|
||||
|
||||
Each row should persist:
|
||||
|
||||
- `origin_source`
|
||||
- `candidate_source`
|
||||
- `attempt_count`
|
||||
- `resolve_success_count`
|
||||
- `last_attempt_at`
|
||||
- `last_success_at`
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
|
||||
## Warmup and Ranking Rules
|
||||
|
||||
### Warmup Threshold
|
||||
|
||||
The warmup threshold is not global song count. It is the total fallback sample count for a specific `origin_source`.
|
||||
|
||||
Example:
|
||||
|
||||
- `qq` fallback learning activates only after the sum of all `qq -> *` fallback attempts reaches `1000`
|
||||
- `netease` fallback learning activates independently after the sum of all `netease -> *` fallback attempts reaches `1000`
|
||||
|
||||
### Ranking Formula
|
||||
|
||||
Use a smoothed success rate:
|
||||
|
||||
`(resolve_success_count + 1) / (attempt_count + 2)`
|
||||
|
||||
This avoids unstable rankings when sample counts are still low.
|
||||
|
||||
### Ranked Traversal
|
||||
|
||||
For a song with original source `origin_source`:
|
||||
|
||||
1. always try `preferred_source` first
|
||||
2. if preferred source does not resolve a high-confidence downloadable result, enter fallback
|
||||
3. if `origin_source` warmup threshold is not met:
|
||||
- keep the configured fallback order
|
||||
4. if `origin_source` warmup threshold is met:
|
||||
- sort fallback candidates by smoothed success rate, highest first
|
||||
- preserve configured order as the tie-breaker
|
||||
- try the top two ranked fallback sources first
|
||||
- if both fail, continue with the remaining ranked fallback sources
|
||||
|
||||
This preserves completeness while improving average-case resolution speed.
|
||||
|
||||
## Resolver Flow Changes
|
||||
|
||||
Resolver source ordering should become a two-phase plan:
|
||||
|
||||
### Phase 1: Preferred Source
|
||||
|
||||
- derive `preferred_source` from the snapshot or row platform
|
||||
- try preferred-source refresh
|
||||
- try preferred-source search
|
||||
- if a preferred-source high-confidence result is found, return immediately
|
||||
|
||||
### Phase 2: Ranked Fallback
|
||||
|
||||
- build fallback candidates from configured `download_sources` excluding `preferred_source`
|
||||
- ask the resolver stats repository for the ranked order for this `origin_source`
|
||||
- attempt fallback sources in that order
|
||||
- after each fallback attempt:
|
||||
- record one attempt
|
||||
- if that source resolves a usable candidate, record one success and stop
|
||||
- if a fallback source fails to produce a usable candidate, continue to the next source
|
||||
|
||||
The resolver should still stop at the first acceptable fallback success in this iteration rather than exhaustively scanning later sources for a possibly better file.
|
||||
|
||||
## Side Database Schema
|
||||
|
||||
The side database should stay minimal for the first version.
|
||||
|
||||
### Table: `resolver_source_stats`
|
||||
|
||||
- `origin_source TEXT NOT NULL`
|
||||
- `candidate_source TEXT NOT NULL`
|
||||
- `attempt_count INTEGER NOT NULL DEFAULT 0`
|
||||
- `resolve_success_count INTEGER NOT NULL DEFAULT 0`
|
||||
- `last_attempt_at TEXT`
|
||||
- `last_success_at TEXT`
|
||||
- `created_at TEXT DEFAULT CURRENT_TIMESTAMP`
|
||||
- `updated_at TEXT DEFAULT CURRENT_TIMESTAMP`
|
||||
- primary key: `(origin_source, candidate_source)`
|
||||
|
||||
Recommended indexes:
|
||||
|
||||
- primary key already covers lookup by `(origin_source, candidate_source)`
|
||||
- index on `(origin_source)` for ranking queries
|
||||
|
||||
## Repository Boundary
|
||||
|
||||
Introduce a dedicated repository for the side database, for example:
|
||||
|
||||
- `ResolverStatsRepository`
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- initialize side-database schema
|
||||
- upsert attempt and success counters
|
||||
- report total fallback samples for an origin source
|
||||
- return ranked fallback candidates for an origin source given the configured fallback list
|
||||
|
||||
The main `CatalogRepository` should not absorb this responsibility. Keeping the side database behind a dedicated repository keeps the separation explicit and prevents statistics logic from leaking into core business persistence.
|
||||
|
||||
## Configuration and File Layout
|
||||
|
||||
Add a dedicated resolver statistics database path derived from the application root, for example:
|
||||
|
||||
- `<APP_HOME>/data/resolver_stats.db`
|
||||
|
||||
This path should be configurable but should default automatically so current operators do not need new setup work.
|
||||
|
||||
The service should initialize both:
|
||||
|
||||
- main catalog database
|
||||
- resolver statistics side database
|
||||
|
||||
Service startup should not fail if the side database is empty; it should be created on demand.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Resolver statistics must not become a single point of failure.
|
||||
|
||||
If the side database update fails:
|
||||
|
||||
- do not fail the actual download item
|
||||
- log the statistics error
|
||||
- continue resolver fallback using the best available in-memory ordering for that invocation
|
||||
|
||||
If the side database ranking query fails:
|
||||
|
||||
- fall back to configured source order
|
||||
|
||||
This keeps the ranking system opportunistic rather than mission-critical.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
Tests should cover:
|
||||
|
||||
- side-database schema creation
|
||||
- isolated side-database repository queries and updates
|
||||
- warmup not reached:
|
||||
- configured fallback order is preserved
|
||||
- warmup reached:
|
||||
- fallback order is re-ranked by per-origin-source statistics
|
||||
- top-two-first behavior:
|
||||
- top two ranked fallback sources are attempted before the rest
|
||||
- continuation behavior:
|
||||
- if top two fail, later sources are still attempted
|
||||
- grouping behavior:
|
||||
- `qq` ranking does not affect `netease` ranking
|
||||
- graceful degradation:
|
||||
- side-database failure falls back to configured order instead of failing the item
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- resolver statistics are stored in a dedicated SQLite side database rather than the main business database
|
||||
- fallback statistics persist across jobs and service restarts
|
||||
- ranking is grouped by original source
|
||||
- before the warmup threshold, fallback order matches configured source order
|
||||
- after the warmup threshold, top two fallback candidates for an origin source are tried first according to smoothed success rate
|
||||
- if the top two fallback candidates fail, resolver still attempts the remaining fallback sources
|
||||
- statistics-store failures do not fail the download item outright
|
||||
- automated tests cover ranking, grouping, warmup, and fallback-to-configured-order behavior
|
||||