catalog-sync
catalog-sync is a standalone catalog collection and download pipeline built in this repository.
It focuses on "playlist discovery -> sync to database -> multi-source download -> optional object storage upload",
with both CLI mode and Ops Web Console mode.
This README is intentionally rewritten for catalog-sync only, instead of the generic musicdl project usage.
What It Does
- Collect playlist pools from
netease,qq,kuwo(playlist square + toplist). - Sync playlists/songs/artists into SQLite (deduplicated by platform + remote id).
- Download songs with resolver fallback across multiple download sources.
- Save sibling
.lrclyric files after successful downloads by default, with optional overwrite and bulk backfill support. - Track local/remote file locations and upload missing files to object storage.
- Run full job orchestration with an Ops console (FastAPI + background runner).
Supported Sources
- Playlist collection sources:
neteaseqqkuwo
- Download resolver sources:
qqkuwomiguqianqiankugounetease
Repository Layout
musicdl/catalogsync/
cli.py # musicdl-catalogsync command entry
db.py # SQLite schema + migration
repository.py # catalog data access
services.py # collect/sync orchestration
downloader.py # resolver + download + dedupe registration
uploader.py # object storage upload pipeline
manual_playlists.py # playlist file parsing
runtime.py # runtime path/layout helpers
ops/
web.py # FastAPI pages + APIs
runner.py # background job runner
executors.py # stage executors
repository.py # ops job persistence
scripts/catalogsync/
bootstrap_to_linux.ps1
deploy_to_nas.ps1
deploy_to_nas.py
templates/
Install
pip install -r requirements.txt
pip install -e .
CLI entry points:
musicdl-catalogsyncpython -m musicdl.catalogsync.cli
Quick Start (Local)
- Initialize database:
musicdl-catalogsync init-db --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary
- Run full pipeline (collect -> sync -> download):
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --sources netease,qq,kuwo --download-sources qq,kuwo,migu,qianqian,kugou,netease --limit 20 --workers 10
- Run by playlist file (skip collect):
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --playlist-file D:\catalogsync\playlists.txt --download-sources qq,kuwo,migu,qianqian,kugou,netease --workers 10
- Backfill or refresh lyrics for songs already downloaded locally:
musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --sources netease,qq --limit 200
musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --playlist-ids 12,15 --overwrite-lyrics
Command Reference
init-db: initialize schema and runtime defaults.collect: collect playlist pools from configured platforms.sync: sync songs/artists for selected playlists.download: download pending songs from selected sources, with lyrics enabled by default.run: one-command pipeline (or playlist-file pipeline), also saving lyrics by default.lyrics: backfill or refresh.lrcfiles for songs that already have local audio files.register-object-backend: register S3-compatible backend.upload: enqueue + process missing object uploads.serve: start Ops Web Console.
Lyrics Behavior
downloadandrunsave sibling.lrcfiles by default after audio download succeeds.- Use
--no-lyricsto skip lyric fetching for a run. - Use
--overwrite-lyricsto replace existing.lrcfiles instead of keeping them untouched. lyricsonly processes songs that already have an active local file recorded in the catalog database.- Lyric lookup failures do not fail the audio download; they are treated as best-effort.
Playlist File Format
Example playlists.txt:
# comment lines start with #
https://music.163.com/#/playlist?id=17745989905
qq,https://y.qq.com/n/ryqq/playlist/7707261125
https://y.qq.com/n/ryqq/toplist/26
https://www.kuwo.cn/rankList?bangId=16
Rules:
- Empty lines and comments are ignored.
platform,urlandurl-onlyare both supported.url-onlylines infer platform automatically (netease,qq,kuwo).- Duplicate playlist keys are deduplicated before import.
Object Storage Upload
Register backend:
musicdl-catalogsync register-object-backend --db D:\catalogsync\catalogsync.db --backend main-s3 --bucket music-bucket --endpoint https://s3.example.com --region auto --base-prefix music --credential-env-prefix CATALOGSYNC_MAIN_S3
Upload:
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --workers 4
Credentials are read from environment variables (not stored in SQLite):
CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID=your-access-key
CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY=your-secret-key
CATALOGSYNC_MAIN_S3_SESSION_TOKEN=
Ops Web Console
Start console:
musicdl-catalogsync serve --db D:\catalogsync\catalogsync.db --env-file D:\catalogsync\catalogsync.env --host 127.0.0.1 --port 18080
The console manages job types like:
catalog_synccollect_onlysync_onlysync_downloaddownload_onlyupload_onlydownload_upload
NAS/Linux Deployment Scripts
Under scripts/catalogsync/templates/:
install_runtime.sh: create venv + install runtime dependencies + editable install.download_all.sh: run pipeline with env-configured defaults.download_from_file.sh: run playlist-file mode.upload_all.sh: upload pending assets to object storage.serve_console.sh: start Ops console with lock/pid handling.deploy_and_restart.sh: deployment helper.
Example target host sequence:
bash /volume4/Music_Cloud/catalogsync/bin/install_runtime.sh
bash /volume4/Music_Cloud/catalogsync/bin/download_all.sh --sources netease,qq,kuwo --limit 20
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh
bash /volume4/Music_Cloud/catalogsync/bin/serve_console.sh
Default environment template:
scripts/catalogsync/templates/catalogsync.env.example
Data Notes
- Default downloader temporary workdir is
musicdl_outputs/catalogsync. - This directory contains runtime artifacts and should not be committed.
- Library output paths are driven by
--library-root(CLI) orLIBRARY_DIR(env).
More Details
- Main design/ops notes:
docs/catalogsync.md - Catalog-sync tests:
tests/catalogsync/
Compliance Note
Use this project only in environments and jurisdictions where your data source access and download behavior are authorized. You are responsible for complying with applicable terms, licenses, and copyright requirements.