Initial import: Music_Server, MusicFree, catalog-sync

This commit is contained in:
2026-05-23 16:51:14 +08:00
commit 069af30dba
847 changed files with 179878 additions and 0 deletions
+213
View File
@@ -0,0 +1,213 @@
# catalog-sync
`catalog-sync` is a standalone catalog collection and download pipeline built in this repository.
It focuses on "playlist discovery -> sync to database -> multi-source download -> optional object storage upload",
with both CLI mode and Ops Web Console mode.
This README is intentionally rewritten for `catalog-sync` only, instead of the generic `musicdl` project usage.
## What It Does
- Collect playlist pools from `netease`, `qq`, `kuwo` (playlist square + toplist).
- Sync playlists/songs/artists into SQLite (deduplicated by platform + remote id).
- Download songs with resolver fallback across multiple download sources.
- Save sibling `.lrc` lyric files after successful downloads by default, with optional overwrite and bulk backfill support.
- Track local/remote file locations and upload missing files to object storage.
- Run full job orchestration with an Ops console (FastAPI + background runner).
## Supported Sources
- Playlist collection sources:
- `netease`
- `qq`
- `kuwo`
- Download resolver sources:
- `qq`
- `kuwo`
- `migu`
- `qianqian`
- `kugou`
- `netease`
## Repository Layout
```text
musicdl/catalogsync/
cli.py # musicdl-catalogsync command entry
db.py # SQLite schema + migration
repository.py # catalog data access
services.py # collect/sync orchestration
downloader.py # resolver + download + dedupe registration
uploader.py # object storage upload pipeline
manual_playlists.py # playlist file parsing
runtime.py # runtime path/layout helpers
ops/
web.py # FastAPI pages + APIs
runner.py # background job runner
executors.py # stage executors
repository.py # ops job persistence
scripts/catalogsync/
bootstrap_to_linux.ps1
deploy_to_nas.ps1
deploy_to_nas.py
templates/
```
## Install
```bash
pip install -r requirements.txt
pip install -e .
```
CLI entry points:
- `musicdl-catalogsync`
- `python -m musicdl.catalogsync.cli`
## Quick Start (Local)
1) Initialize database:
```bash
musicdl-catalogsync init-db --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary
```
2) Run full pipeline (collect -> sync -> download):
```bash
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --sources netease,qq,kuwo --download-sources qq,kuwo,migu,qianqian,kugou,netease --limit 20 --workers 10
```
3) Run by playlist file (skip collect):
```bash
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --playlist-file D:\catalogsync\playlists.txt --download-sources qq,kuwo,migu,qianqian,kugou,netease --workers 10
```
4) Backfill or refresh lyrics for songs already downloaded locally:
```bash
musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --sources netease,qq --limit 200
musicdl-catalogsync lyrics --db D:\catalogsync\catalogsync.db --playlist-ids 12,15 --overwrite-lyrics
```
## Command Reference
- `init-db`: initialize schema and runtime defaults.
- `collect`: collect playlist pools from configured platforms.
- `sync`: sync songs/artists for selected playlists.
- `download`: download pending songs from selected sources, with lyrics enabled by default.
- `run`: one-command pipeline (or playlist-file pipeline), also saving lyrics by default.
- `lyrics`: backfill or refresh `.lrc` files for songs that already have local audio files.
- `register-object-backend`: register S3-compatible backend.
- `upload`: enqueue + process missing object uploads.
- `serve`: start Ops Web Console.
## Lyrics Behavior
- `download` and `run` save sibling `.lrc` files by default after audio download succeeds.
- Use `--no-lyrics` to skip lyric fetching for a run.
- Use `--overwrite-lyrics` to replace existing `.lrc` files instead of keeping them untouched.
- `lyrics` only processes songs that already have an active local file recorded in the catalog database.
- Lyric lookup failures do not fail the audio download; they are treated as best-effort.
## Playlist File Format
Example `playlists.txt`:
```text
# comment lines start with #
https://music.163.com/#/playlist?id=17745989905
qq,https://y.qq.com/n/ryqq/playlist/7707261125
https://y.qq.com/n/ryqq/toplist/26
https://www.kuwo.cn/rankList?bangId=16
```
Rules:
- Empty lines and comments are ignored.
- `platform,url` and `url-only` are both supported.
- `url-only` lines infer platform automatically (`netease`, `qq`, `kuwo`).
- Duplicate playlist keys are deduplicated before import.
## Object Storage Upload
Register backend:
```bash
musicdl-catalogsync register-object-backend --db D:\catalogsync\catalogsync.db --backend main-s3 --bucket music-bucket --endpoint https://s3.example.com --region auto --base-prefix music --credential-env-prefix CATALOGSYNC_MAIN_S3
```
Upload:
```bash
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --workers 4
```
Credentials are read from environment variables (not stored in SQLite):
```dotenv
CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID=your-access-key
CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY=your-secret-key
CATALOGSYNC_MAIN_S3_SESSION_TOKEN=
```
## Ops Web Console
Start console:
```bash
musicdl-catalogsync serve --db D:\catalogsync\catalogsync.db --env-file D:\catalogsync\catalogsync.env --host 127.0.0.1 --port 18080
```
The console manages job types like:
- `catalog_sync`
- `collect_only`
- `sync_only`
- `sync_download`
- `download_only`
- `upload_only`
- `download_upload`
## NAS/Linux Deployment Scripts
Under `scripts/catalogsync/templates/`:
- `install_runtime.sh`: create venv + install runtime dependencies + editable install.
- `download_all.sh`: run pipeline with env-configured defaults.
- `download_from_file.sh`: run playlist-file mode.
- `upload_all.sh`: upload pending assets to object storage.
- `serve_console.sh`: start Ops console with lock/pid handling.
- `deploy_and_restart.sh`: deployment helper.
Example target host sequence:
```bash
bash /volume4/Music_Cloud/catalogsync/bin/install_runtime.sh
bash /volume4/Music_Cloud/catalogsync/bin/download_all.sh --sources netease,qq,kuwo --limit 20
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh
bash /volume4/Music_Cloud/catalogsync/bin/serve_console.sh
```
Default environment template:
- `scripts/catalogsync/templates/catalogsync.env.example`
## Data Notes
- Default downloader temporary workdir is `musicdl_outputs/catalogsync`.
- This directory contains runtime artifacts and should not be committed.
- Library output paths are driven by `--library-root` (CLI) or `LIBRARY_DIR` (env).
## More Details
- Main design/ops notes: `docs/catalogsync.md`
- Catalog-sync tests: `tests/catalogsync/`
## Compliance Note
Use this project only in environments and jurisdictions where your data source access and download behavior are authorized.
You are responsible for complying with applicable terms, licenses, and copyright requirements.