Files
musicdl-catalog-sync-suite/catalog-sync
..

catalog-sync

catalog-sync is a standalone catalog collection and download pipeline built in this repository. It focuses on "playlist discovery -> sync to database -> multi-source download -> optional object storage upload", with both CLI mode and Ops Web Console mode.

This README is intentionally rewritten for catalog-sync only, instead of the generic musicdl project usage.

What It Does

  • Collect playlist pools from netease, qq, kuwo (playlist square + toplist).
  • Sync playlists/songs/artists into SQLite (deduplicated by platform + remote id).
  • Download songs with resolver fallback across multiple download sources.
  • Save sibling .lrc lyric files after successful downloads by default, with optional overwrite and bulk backfill support.
  • Track local/remote file locations and upload missing files to object storage.
  • Run full job orchestration with an Ops console (FastAPI + background runner).

Supported Sources

  • Playlist collection sources:
    • netease
    • qq
    • kuwo
  • Download resolver sources:
    • qq
    • kuwo
    • migu
    • qianqian
    • kugou
    • netease

Repository Layout

musicdl/catalogsync/
  cli.py                 # musicdl-catalogsync command entry
  db.py                  # SQLite schema + migration
  repository.py          # catalog data access
  services.py            # collect/sync orchestration
  downloader.py          # resolver + download + dedupe registration
  uploader.py            # object storage upload pipeline
  manual_playlists.py    # playlist file parsing
  runtime.py             # runtime path/layout helpers
  ops/
    web.py               # FastAPI pages + APIs
    runner.py            # background job runner
    executors.py         # stage executors
    repository.py        # ops job persistence

scripts/catalogsync/
  bootstrap_to_linux.ps1
  deploy_to_nas.ps1
  deploy_to_nas.py
  templates/

Install

pip install -r requirements.txt
pip install -e .

CLI entry points:

  • musicdl-catalogsync
  • python -m musicdl.catalogsync.cli

Quick Start (Local)

  1. Initialize database:
musicdl-catalogsync init-db --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary
  1. Run full pipeline (collect -> sync -> download):
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --sources netease,qq,kuwo --download-sources qq,kuwo,migu,qianqian,kugou,netease --limit 20 --workers 10
  1. Run by playlist file (skip collect):
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --playlist-file D:\catalogsync\playlists.txt --download-sources qq,kuwo,migu,qianqian,kugou,netease --workers 10
  1. Backfill or refresh lyrics for songs already downloaded locally:
musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --sources netease,qq --limit 200
musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --playlist-ids 12,15 --overwrite-lyrics

Command Reference

  • init-db: initialize schema and runtime defaults.
  • collect: collect playlist pools from configured platforms.
  • sync: sync songs/artists for selected playlists.
  • download: download pending songs from selected sources, with lyrics enabled by default.
  • run: one-command pipeline (or playlist-file pipeline), also saving lyrics by default.
  • lyrics: backfill or refresh .lrc files for songs that already have local audio files.
  • register-object-backend: register S3-compatible backend.
  • upload: enqueue + process missing object uploads.
  • serve: start Ops Web Console.

Lyrics Behavior

  • download and run save sibling .lrc files by default after audio download succeeds.
  • Use --no-lyrics to skip lyric fetching for a run.
  • Use --overwrite-lyrics to replace existing .lrc files instead of keeping them untouched.
  • lyrics only processes songs that already have an active local file recorded in the catalog database.
  • Lyric lookup failures do not fail the audio download; they are treated as best-effort.

Playlist File Format

Example playlists.txt:

# comment lines start with #
https://music.163.com/#/playlist?id=17745989905
qq,https://y.qq.com/n/ryqq/playlist/7707261125
https://y.qq.com/n/ryqq/toplist/26
https://www.kuwo.cn/rankList?bangId=16

Rules:

  • Empty lines and comments are ignored.
  • platform,url and url-only are both supported.
  • url-only lines infer platform automatically (netease, qq, kuwo).
  • Duplicate playlist keys are deduplicated before import.

Object Storage Upload

Register backend:

musicdl-catalogsync register-object-backend --db D:\catalogsync\catalogsync.db --backend main-s3 --bucket music-bucket --endpoint https://s3.example.com --region auto --base-prefix music --credential-env-prefix CATALOGSYNC_MAIN_S3

Upload:

musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --workers 4

Credentials are read from environment variables (not stored in SQLite):

CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID=your-access-key
CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY=your-secret-key
CATALOGSYNC_MAIN_S3_SESSION_TOKEN=

Ops Web Console

Start console:

musicdl-catalogsync serve --db D:\catalogsync\catalogsync.db --env-file D:\catalogsync\catalogsync.env --host 127.0.0.1 --port 18080

The console manages job types like:

  • catalog_sync
  • collect_only
  • sync_only
  • sync_download
  • download_only
  • upload_only
  • download_upload

NAS/Linux Deployment Scripts

Under scripts/catalogsync/templates/:

  • install_runtime.sh: create venv + install runtime dependencies + editable install.
  • download_all.sh: run pipeline with env-configured defaults.
  • download_from_file.sh: run playlist-file mode.
  • upload_all.sh: upload pending assets to object storage.
  • serve_console.sh: start Ops console with lock/pid handling.
  • deploy_and_restart.sh: deployment helper.

Example target host sequence:

bash /volume4/Music_Cloud/catalogsync/bin/install_runtime.sh
bash /volume4/Music_Cloud/catalogsync/bin/download_all.sh --sources netease,qq,kuwo --limit 20
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh
bash /volume4/Music_Cloud/catalogsync/bin/serve_console.sh

Default environment template:

  • scripts/catalogsync/templates/catalogsync.env.example

Data Notes

  • Default downloader temporary workdir is musicdl_outputs/catalogsync.
  • This directory contains runtime artifacts and should not be committed.
  • Library output paths are driven by --library-root (CLI) or LIBRARY_DIR (env).

More Details

  • Main design/ops notes: docs/catalogsync.md
  • Catalog-sync tests: tests/catalogsync/

Compliance Note

Use this project only in environments and jurisdictions where your data source access and download behavior are authorized. You are responsible for complying with applicable terms, licenses, and copyright requirements.