214 lines
6.8 KiB
Markdown
214 lines
6.8 KiB
Markdown
# catalog-sync
|
|
|
|
`catalog-sync` is a standalone catalog collection and download pipeline built in this repository.
|
|
It focuses on "playlist discovery -> sync to database -> multi-source download -> optional object storage upload",
|
|
with both CLI mode and Ops Web Console mode.
|
|
|
|
This README is intentionally rewritten for `catalog-sync` only, instead of the generic `musicdl` project usage.
|
|
|
|
## What It Does
|
|
|
|
- Collect playlist pools from `netease`, `qq`, `kuwo` (playlist square + toplist).
|
|
- Sync playlists/songs/artists into SQLite (deduplicated by platform + remote id).
|
|
- Download songs with resolver fallback across multiple download sources.
|
|
- Save sibling `.lrc` lyric files after successful downloads by default, with optional overwrite and bulk backfill support.
|
|
- Track local/remote file locations and upload missing files to object storage.
|
|
- Run full job orchestration with an Ops console (FastAPI + background runner).
|
|
|
|
## Supported Sources
|
|
|
|
- Playlist collection sources:
|
|
- `netease`
|
|
- `qq`
|
|
- `kuwo`
|
|
- Download resolver sources:
|
|
- `qq`
|
|
- `kuwo`
|
|
- `migu`
|
|
- `qianqian`
|
|
- `kugou`
|
|
- `netease`
|
|
|
|
## Repository Layout
|
|
|
|
```text
|
|
musicdl/catalogsync/
|
|
cli.py # musicdl-catalogsync command entry
|
|
db.py # SQLite schema + migration
|
|
repository.py # catalog data access
|
|
services.py # collect/sync orchestration
|
|
downloader.py # resolver + download + dedupe registration
|
|
uploader.py # object storage upload pipeline
|
|
manual_playlists.py # playlist file parsing
|
|
runtime.py # runtime path/layout helpers
|
|
ops/
|
|
web.py # FastAPI pages + APIs
|
|
runner.py # background job runner
|
|
executors.py # stage executors
|
|
repository.py # ops job persistence
|
|
|
|
scripts/catalogsync/
|
|
bootstrap_to_linux.ps1
|
|
deploy_to_nas.ps1
|
|
deploy_to_nas.py
|
|
templates/
|
|
```
|
|
|
|
## Install
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
pip install -e .
|
|
```
|
|
|
|
CLI entry points:
|
|
|
|
- `musicdl-catalogsync`
|
|
- `python -m musicdl.catalogsync.cli`
|
|
|
|
## Quick Start (Local)
|
|
|
|
1) Initialize database:
|
|
|
|
```bash
|
|
musicdl-catalogsync init-db --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary
|
|
```
|
|
|
|
2) Run full pipeline (collect -> sync -> download):
|
|
|
|
```bash
|
|
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --sources netease,qq,kuwo --download-sources qq,kuwo,migu,qianqian,kugou,netease --limit 20 --workers 10
|
|
```
|
|
|
|
3) Run by playlist file (skip collect):
|
|
|
|
```bash
|
|
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --playlist-file D:\catalogsync\playlists.txt --download-sources qq,kuwo,migu,qianqian,kugou,netease --workers 10
|
|
```
|
|
|
|
4) Backfill or refresh lyrics for songs already downloaded locally:
|
|
|
|
```bash
|
|
musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --sources netease,qq --limit 200
|
|
musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --playlist-ids 12,15 --overwrite-lyrics
|
|
```
|
|
|
|
## Command Reference
|
|
|
|
- `init-db`: initialize schema and runtime defaults.
|
|
- `collect`: collect playlist pools from configured platforms.
|
|
- `sync`: sync songs/artists for selected playlists.
|
|
- `download`: download pending songs from selected sources, with lyrics enabled by default.
|
|
- `run`: one-command pipeline (or playlist-file pipeline), also saving lyrics by default.
|
|
- `lyrics`: backfill or refresh `.lrc` files for songs that already have local audio files.
|
|
- `register-object-backend`: register S3-compatible backend.
|
|
- `upload`: enqueue + process missing object uploads.
|
|
- `serve`: start Ops Web Console.
|
|
|
|
## Lyrics Behavior
|
|
|
|
- `download` and `run` save sibling `.lrc` files by default after audio download succeeds.
|
|
- Use `--no-lyrics` to skip lyric fetching for a run.
|
|
- Use `--overwrite-lyrics` to replace existing `.lrc` files instead of keeping them untouched.
|
|
- `lyrics` only processes songs that already have an active local file recorded in the catalog database.
|
|
- Lyric lookup failures do not fail the audio download; they are treated as best-effort.
|
|
|
|
## Playlist File Format
|
|
|
|
Example `playlists.txt`:
|
|
|
|
```text
|
|
# comment lines start with #
|
|
https://music.163.com/#/playlist?id=17745989905
|
|
qq,https://y.qq.com/n/ryqq/playlist/7707261125
|
|
https://y.qq.com/n/ryqq/toplist/26
|
|
https://www.kuwo.cn/rankList?bangId=16
|
|
```
|
|
|
|
Rules:
|
|
|
|
- Empty lines and comments are ignored.
|
|
- `platform,url` and `url-only` are both supported.
|
|
- `url-only` lines infer platform automatically (`netease`, `qq`, `kuwo`).
|
|
- Duplicate playlist keys are deduplicated before import.
|
|
|
|
## Object Storage Upload
|
|
|
|
Register backend:
|
|
|
|
```bash
|
|
musicdl-catalogsync register-object-backend --db D:\catalogsync\catalogsync.db --backend main-s3 --bucket music-bucket --endpoint https://s3.example.com --region auto --base-prefix music --credential-env-prefix CATALOGSYNC_MAIN_S3
|
|
```
|
|
|
|
Upload:
|
|
|
|
```bash
|
|
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --workers 4
|
|
```
|
|
|
|
Credentials are read from environment variables (not stored in SQLite):
|
|
|
|
```dotenv
|
|
CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID=your-access-key
|
|
CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY=your-secret-key
|
|
CATALOGSYNC_MAIN_S3_SESSION_TOKEN=
|
|
```
|
|
|
|
## Ops Web Console
|
|
|
|
Start console:
|
|
|
|
```bash
|
|
musicdl-catalogsync serve --db D:\catalogsync\catalogsync.db --env-file D:\catalogsync\catalogsync.env --host 127.0.0.1 --port 18080
|
|
```
|
|
|
|
The console manages job types like:
|
|
|
|
- `catalog_sync`
|
|
- `collect_only`
|
|
- `sync_only`
|
|
- `sync_download`
|
|
- `download_only`
|
|
- `upload_only`
|
|
- `download_upload`
|
|
|
|
## NAS/Linux Deployment Scripts
|
|
|
|
Under `scripts/catalogsync/templates/`:
|
|
|
|
- `install_runtime.sh`: create venv + install runtime dependencies + editable install.
|
|
- `download_all.sh`: run pipeline with env-configured defaults.
|
|
- `download_from_file.sh`: run playlist-file mode.
|
|
- `upload_all.sh`: upload pending assets to object storage.
|
|
- `serve_console.sh`: start Ops console with lock/pid handling.
|
|
- `deploy_and_restart.sh`: deployment helper.
|
|
|
|
Example target host sequence:
|
|
|
|
```bash
|
|
bash /volume4/Music_Cloud/catalogsync/bin/install_runtime.sh
|
|
bash /volume4/Music_Cloud/catalogsync/bin/download_all.sh --sources netease,qq,kuwo --limit 20
|
|
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh
|
|
bash /volume4/Music_Cloud/catalogsync/bin/serve_console.sh
|
|
```
|
|
|
|
Default environment template:
|
|
|
|
- `scripts/catalogsync/templates/catalogsync.env.example`
|
|
|
|
## Data Notes
|
|
|
|
- Default downloader temporary workdir is `musicdl_outputs/catalogsync`.
|
|
- This directory contains runtime artifacts and should not be committed.
|
|
- Library output paths are driven by `--library-root` (CLI) or `LIBRARY_DIR` (env).
|
|
|
|
## More Details
|
|
|
|
- Main design/ops notes: `docs/catalogsync.md`
|
|
- Catalog-sync tests: `tests/catalogsync/`
|
|
|
|
## Compliance Note
|
|
|
|
Use this project only in environments and jurisdictions where your data source access and download behavior are authorized.
|
|
You are responsible for complying with applicable terms, licenses, and copyright requirements.
|