# catalog-sync `catalog-sync` is a standalone catalog collection and download pipeline built in this repository. It focuses on "playlist discovery -> sync to database -> multi-source download -> optional object storage upload", with both CLI mode and Ops Web Console mode. This README is intentionally rewritten for `catalog-sync` only, instead of the generic `musicdl` project usage. ## What It Does - Collect playlist pools from `netease`, `qq`, `kuwo` (playlist square + toplist). - Sync playlists/songs/artists into SQLite (deduplicated by platform + remote id). - Download songs with resolver fallback across multiple download sources. - Save sibling `.lrc` lyric files after successful downloads by default, with optional overwrite and bulk backfill support. - Track local/remote file locations and upload missing files to object storage. - Run full job orchestration with an Ops console (FastAPI + background runner). ## Supported Sources - Playlist collection sources: - `netease` - `qq` - `kuwo` - Download resolver sources: - `qq` - `kuwo` - `migu` - `qianqian` - `kugou` - `netease` ## Repository Layout ```text musicdl/catalogsync/ cli.py # musicdl-catalogsync command entry db.py # SQLite schema + migration repository.py # catalog data access services.py # collect/sync orchestration downloader.py # resolver + download + dedupe registration uploader.py # object storage upload pipeline manual_playlists.py # playlist file parsing runtime.py # runtime path/layout helpers ops/ web.py # FastAPI pages + APIs runner.py # background job runner executors.py # stage executors repository.py # ops job persistence scripts/catalogsync/ bootstrap_to_linux.ps1 deploy_to_nas.ps1 deploy_to_nas.py templates/ ``` ## Install ```bash pip install -r requirements.txt pip install -e . ``` CLI entry points: - `musicdl-catalogsync` - `python -m musicdl.catalogsync.cli` ## Quick Start (Local) 1) Initialize database: ```bash musicdl-catalogsync init-db --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary ``` 2) Run full pipeline (collect -> sync -> download): ```bash musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --sources netease,qq,kuwo --download-sources qq,kuwo,migu,qianqian,kugou,netease --limit 20 --workers 10 ``` 3) Run by playlist file (skip collect): ```bash musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --playlist-file D:\catalogsync\playlists.txt --download-sources qq,kuwo,migu,qianqian,kugou,netease --workers 10 ``` 4) Backfill or refresh lyrics for songs already downloaded locally: ```bash musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --sources netease,qq --limit 200 musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --playlist-ids 12,15 --overwrite-lyrics ``` ## Command Reference - `init-db`: initialize schema and runtime defaults. - `collect`: collect playlist pools from configured platforms. - `sync`: sync songs/artists for selected playlists. - `download`: download pending songs from selected sources, with lyrics enabled by default. - `run`: one-command pipeline (or playlist-file pipeline), also saving lyrics by default. - `lyrics`: backfill or refresh `.lrc` files for songs that already have local audio files. - `register-object-backend`: register S3-compatible backend. - `upload`: enqueue + process missing object uploads. - `serve`: start Ops Web Console. ## Lyrics Behavior - `download` and `run` save sibling `.lrc` files by default after audio download succeeds. - Use `--no-lyrics` to skip lyric fetching for a run. - Use `--overwrite-lyrics` to replace existing `.lrc` files instead of keeping them untouched. - `lyrics` only processes songs that already have an active local file recorded in the catalog database. - Lyric lookup failures do not fail the audio download; they are treated as best-effort. ## Playlist File Format Example `playlists.txt`: ```text # comment lines start with # https://music.163.com/#/playlist?id=17745989905 qq,https://y.qq.com/n/ryqq/playlist/7707261125 https://y.qq.com/n/ryqq/toplist/26 https://www.kuwo.cn/rankList?bangId=16 ``` Rules: - Empty lines and comments are ignored. - `platform,url` and `url-only` are both supported. - `url-only` lines infer platform automatically (`netease`, `qq`, `kuwo`). - Duplicate playlist keys are deduplicated before import. ## Object Storage Upload Register backend: ```bash musicdl-catalogsync register-object-backend --db D:\catalogsync\catalogsync.db --backend main-s3 --bucket music-bucket --endpoint https://s3.example.com --region auto --base-prefix music --credential-env-prefix CATALOGSYNC_MAIN_S3 ``` Upload: ```bash musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --workers 4 ``` Credentials are read from environment variables (not stored in SQLite): ```dotenv CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID=your-access-key CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY=your-secret-key CATALOGSYNC_MAIN_S3_SESSION_TOKEN= ``` ## Ops Web Console Start console: ```bash musicdl-catalogsync serve --db D:\catalogsync\catalogsync.db --env-file D:\catalogsync\catalogsync.env --host 127.0.0.1 --port 18080 ``` The console manages job types like: - `catalog_sync` - `collect_only` - `sync_only` - `sync_download` - `download_only` - `upload_only` - `download_upload` ## NAS/Linux Deployment Scripts Under `scripts/catalogsync/templates/`: - `install_runtime.sh`: create venv + install runtime dependencies + editable install. - `download_all.sh`: run pipeline with env-configured defaults. - `download_from_file.sh`: run playlist-file mode. - `upload_all.sh`: upload pending assets to object storage. - `serve_console.sh`: start Ops console with lock/pid handling. - `deploy_and_restart.sh`: deployment helper. Example target host sequence: ```bash bash /volume4/Music_Cloud/catalogsync/bin/install_runtime.sh bash /volume4/Music_Cloud/catalogsync/bin/download_all.sh --sources netease,qq,kuwo --limit 20 bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh bash /volume4/Music_Cloud/catalogsync/bin/serve_console.sh ``` Default environment template: - `scripts/catalogsync/templates/catalogsync.env.example` ## Data Notes - Default downloader temporary workdir is `musicdl_outputs/catalogsync`. - This directory contains runtime artifacts and should not be committed. - Library output paths are driven by `--library-root` (CLI) or `LIBRARY_DIR` (env). ## More Details - Main design/ops notes: `docs/catalogsync.md` - Catalog-sync tests: `tests/catalogsync/` ## Compliance Note Use this project only in environments and jurisdictions where your data source access and download behavior are authorized. You are responsible for complying with applicable terms, licenses, and copyright requirements.