42 KiB
Catalog Sync CLI
catalogsync 是一套独立于 GUI 的采集、同步、下载链路,目标是把“发现”页里的“歌单广场”和“排行榜”来源抽出来,变成可以自动跑批的命令行工具。
当前支持的平台分两层:
- 歌单采集源:
neteaseqqkuwo
- 下载解析源:
qqkuwomiguqianqiankugounetease
设计重点:
- 将“歌单池 -> 歌单 -> 歌曲”持久化到 SQLite
- 同步歌单歌曲时,派生更新“歌手池 -> 歌手 -> 歌曲”
- 下载时按歌曲主键和有效文件位置去重
- 为本地磁盘、云盘、对象存储保留统一的文件位置抽象
文档导览
本文件同时覆盖四类信息:
- 项目用途与运行链路(
collect -> sync -> download -> upload) - 代码架构(CLI、采集同步、下载上传、Ops Console)
- 数据库设计(业务实体、文件映射、任务编排)
- 服务器部署与运维(NAS/Linux 目录规范、脚本、日志、重启)
如果你是首次接手项目,建议按这个顺序阅读:
- 先看“代码架构”和“数据库设计总览”
- 再看“命令”和“NAS / Linux 落地约定”
- 最后看文末的 Ops Console 更新说明
代码架构
这套系统是“命令入口 + 领域服务 + 仓储层 + 后台任务控制台”四层结构,核心目标是把“采集/同步/下载/上传”拆成可组合、可恢复、可观察的流水线。
目录与职责边界
musicdl/catalogsync/
cli.py # 命令入口与参数解析;组装 Application
runtime.py # 运行时路径/端口/目录规范(env -> config)
db.py # SQLite schema、索引、补列迁移、连接参数
models.py # 领域模型与元信息提取
repository.py # catalog 侧数据读写(歌单/歌曲/文件/统计)
services.py # 采集 + 同步编排(playlist -> songs -> artists)
downloader.py # 下载规划 + 多源候选优选 + 落盘 + 去重入库
resolver.py # 跨平台候选搜歌、评分、降级策略
uploader.py # 对象存储补传、上传队列消费、presence 刷新
collectors/ # 歌单源采集器(网易/QQ/酷我)
ops/
web.py # FastAPI 页面与 API(dashboard/playlists/jobs)
repository.py # ops 侧任务仓储(job/stage/item/worker)
runner.py # 后台调度器(lane、抢占、恢复、收敛)
executors.py # stage 执行器(collect/sync/download/upload)
maintenance.py # 本地重复文件巡检与去重
config.py # 环境配置读取/写回/版本快照
models.py # Job/Stage/Item 状态枚举与数据结构
边界约束:
services.py只负责“业务编排”,不直接做 UI/任务调度repository.py负责 SQL 读写,不关心下载/上传策略ops/runner.py负责“如何跑任务”,不直接定义采集/下载规则ops/executors.py负责“一个 item 怎么执行”,并通过 CAS 更新状态
两条主链路
- CLI 直跑链路(离线批处理)
cli.py->CatalogSyncApplicationcollect/sync/download/run/upload直接调用services/downloader/uploader- 适合脚本化批量任务或单次命令执行
- Ops 任务链路(可视化 + 可暂停恢复)
ops/web.py受理任务创建(/api/jobs、/api/playlists/*)ops/runner.py按job_type拆 stage,轮询调度ops/executors.py逐 item 执行并回写job_*表- 前端通过 dashboard API + SSE 读取实时状态
关键调用序列(以“同步后下载”任务为例)
- Web 端创建
sync_download任务,写入job_runs - runner 建立
job_stages:sync -> download - sync stage 为每个歌单生成
job_items,执行services.sync_playlist_row - download stage 为歌曲生成
job_items,执行downloader.download_song_row - 下载命中后写入
file_assets+file_locations,并刷新歌单状态聚合 - runner 汇总 stage/item 计数,更新
job_runs到completed/completed_with_errors
任务并发与恢复模型
- 双 lane 调度:
downloadlane:独占型,限制并发,避免磁盘与网络争用generallane:用于 collect/sync/upload,支持更高并发
- stage 内并发:
- 由 worker 数控制(下载默认 10,可配置)
- worker 心跳/速度/当前项写入
job_workers
- 断点恢复:
- runner 启动时扫描 recoverable job
- 运行中 item 置为
interrupted - 可恢复 item 重新入队,任务状态转
paused或继续running
- 命令控制:
- pause/resume/cancel/retry 写入
job_commands - runner 统一消费命令,避免并发写冲突
- pause/resume/cancel/retry 写入
可扩展点(后续加平台/加存储时看这里)
- 新歌单源:实现
collectors/*+ 在services.py注册 - 新下载源:扩展
resolver.py候选检索与评分策略 - 新存储后端:扩展
uploader.py的 backend 适配与 locator 语义 - 新任务类型:在
ops/jobdefs.py增加 stage 序列与显示名称 - 新运维能力:在
ops/web.py加 API,在ops/repository.py落状态模型
任务状态流转图(JobStatus)
下面图示对应 ops/models.py 中的 JobStatus:
stateDiagram-v2
[*] --> queued
queued --> running: runner claim
queued --> canceled: cancel
running --> pause_requested: pause command
pause_requested --> paused: all running items drained
paused --> running: resume command
running --> completed: all items success/skipped
running --> completed_with_errors: some items failed
running --> failed: unrecoverable error
running --> canceled: cancel
pause_requested --> canceled: cancel
completed --> [*]
completed_with_errors --> [*]
failed --> [*]
canceled --> [*]
命令
初始化数据库:
musicdl-catalogsync init-db --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary
采集“歌单广场”和“排行榜”来源:
musicdl-catalogsync collect --db D:\catalogsync\catalogsync.db --sources netease,qq,kuwo
同步数据库里已有歌单:
musicdl-catalogsync sync --db D:\catalogsync\catalogsync.db --sources netease,qq,kuwo --limit 20
下载待下载歌曲:
musicdl-catalogsync download --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --sources netease,qq,kuwo --download-sources qq,kuwo,migu,qianqian,kugou,netease --limit 20 --workers 10
按默认链路一把跑完:
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --sources netease,qq,kuwo --download-sources qq,kuwo,migu,qianqian,kugou,netease --limit 20 --workers 10
按歌单文件直接跑:
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --playlist-file D:\catalogsync\playlists.txt --download-sources qq,kuwo,migu,qianqian,kugou,netease --workers 10
注册一个对象存储后端:
musicdl-catalogsync register-object-backend ^
--db D:\catalogsync\catalogsync.db ^
--backend main-s3 ^
--bucket music-bucket ^
--endpoint https://s3.example.com ^
--region auto ^
--base-prefix music ^
--credential-env-prefix CATALOGSYNC_MAIN_S3
把本地已下载文件补传到对象存储:
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --workers 4
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --sources netease,qq --limit 200
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --playlist-ids 12,15 --workers 4
启动 ops web console(FastAPI + uvicorn):
musicdl-catalogsync serve --db D:\catalogsync\catalogsync.db --env-file D:\catalogsync\catalogsync.env --host 127.0.0.1 --port 18080
也可以直接用模块方式启动:
python -m musicdl.catalogsync.cli --help
--playlist-file 行为
传入 --playlist-file 时,run 会走一条窄分支:
- 跳过
collect - 读取文件中的歌单 URL
- 解析并去重
- 以
manual_file池的形式写入数据库 - 只同步这些歌单
- 只下载这些歌单关联到的歌曲
不传 --playlist-file 时,仍然保持原来的 collect -> sync -> download 默认行为。
--sources 与 --download-sources
--sources- 控制要采集 / 同步 / 过滤哪些 canonical 平台歌曲
- 当前主要用于
netease、qq、kuwo这三类歌单来源
--download-sources- 控制下载前要去哪些平台重新搜歌、解析直链
- 默认值是 GUI 同款六平台:
qq,kuwo,migu,qianqian,kugou,netease
下载阶段的实际行为是:
- 先从数据库中的 canonical song 取歌名、歌手、原始快照
- 在
--download-sources白名单里重新找可下载候选 - 对候选按“匹配度 -> 音质 / 文件大小 -> 你配置的源顺序”排序
- 选出最佳候选后再真正下载
这意味着:
- 网易云歌单里的歌,不一定由网易云下载
- 原平台官方直链过期或不可用时,会自动去其它下载源找同名同歌手候选
- 只要匹配可信,优先选择质量更高的候选
sync 阶段从这一版开始也不再要求“原平台当场给出可下载直链”:
- 只要歌单接口还能返回歌曲元信息,
sync就会把歌曲快照完整写入数据库 - 这些歌曲会以“延迟解析”快照入库,真正下载时再按
--download-sources去补可用直链 - 这样可以避免网易云 / QQ / 酷我因为版权或临时直链失效,导致歌曲在入库阶段被提前丢掉
文件格式
每行一种,支持以下三类:
# 注释行
https://music.163.com/#/playlist?id=17745989905
qq,https://y.qq.com/n/ryqq/playlist/7707261125
https://y.qq.com/n/ryqq/toplist/26
https://www.kuwo.cn/rankList?bangId=16
规则:
- 空行忽略
#开头的行忽略- 支持
平台,URL - 也支持只写 URL,此时会自动识别平台
- 同一文件里的重复歌单会自动去重
- 当前支持自动识别的 URL 平台为
netease、qq、kuwo
支持的 URL 类型
- 网易云普通歌单:
https://music.163.com/#/playlist?id=... - QQ 普通歌单:
https://y.qq.com/n/ryqq/playlist/... - QQ 排行榜:
https://y.qq.com/n/ryqq/toplist/... - 酷我普通歌单:
https://www.kuwo.cn/playlist_detail/... - 酷我排行榜:
https://www.kuwo.cn/rankList?bangId=...
数据库设计总览
数据库使用 SQLite,连接策略为:
PRAGMA journal_mode=WALPRAGMA busy_timeout=30000PRAGMA synchronous=NORMAL- 所有表在
db.py中集中定义,并在初始化时执行补列迁移
设计目标:
- 强去重:同一平台同一远端 ID 只保留一条实体
- 弱耦合:歌曲逻辑资产与物理存储位置分离
- 可恢复:任务状态机可持久化并支持重启续跑
- 可观测:任务、worker、日志、事件都有落表
表域拆分(四大域)
- 目录实体域(Catalog Core)
playlist_pools: 歌单来源池(广场/榜单/manual_file)playlists: 歌单主体(平台、远端 ID、策略、播放量)songs: 歌曲主体(平台、远端 ID、名称、歌手、格式、快照)artists: 歌手主体(归一化名称 + 平台维度)
- 关系映射域(Association)
pool_playlists: 池与歌单多对多playlist_songs: 歌单与歌曲多对多(含 position)pool_artists: 池与歌手多对多artist_songs: 歌手与歌曲多对多
- 文件资产域(Storage)
storage_backends: 存储后端定义(local_fs/object_storage/cloud_drive)file_assets: 歌曲文件逻辑版本(质量/格式/大小/checksum)file_locations: 物理位置(backend + locator + 状态 + 主副本)song_backend_presence: 歌曲在后端的聚合存在性(加速查询)download_tasks/upload_tasks: 下载上传队列
- 任务编排域(Ops)
job_runs: 任务总览(类型、状态、范围、配置快照)job_stages: 阶段(collect/sync/download/upload)计数器job_items: 最小执行单元(歌单项/歌曲项/文件项)job_workers: worker 实时状态、吞吐、速度job_commands: pause/resume/cancel/retry 命令队列job_events/job_logs: 审计事件与执行日志config_revisions: 环境配置版本快照与回滚记录
去重与一致性约束(核心)
唯一键(强约束):
playlists(platform, remote_playlist_id)songs(platform, remote_song_id)file_locations(file_asset_id, backend_id, locator)upload_tasks(file_asset_id, target_backend_id, target_locator)job_items(job_stage_id, item_key)
一致性规则(业务层):
- 同一
song_id可对应多个file_asset(不同质量/格式) - 同一
file_asset可有多个file_location(本地 + 云端) song_backend_presence由file_locations派生,不作为事实源- 歌单“已下载/未下载/部分”状态由
playlist_songs + active local file_locations聚合计算
高频读写路径(排障重点)
- 采集阶段
- 写:
playlist_pools,playlists,pool_playlists - 典型问题:池里有歌单但
playlists.collected_song_count未回填
- 写:
- 同步阶段
- 写:
songs,playlist_songs,artists,pool_artists,artist_songs - 典型问题:歌单已同步但歌曲数为 0(需区分“源返回空”与“解析失败”)
- 写:
- 下载阶段
- 写:
file_assets,file_locations,download_tasks - 读:
songs快照 + 下载源候选 - 典型问题:文件重复落盘、
(1)/(2)命名膨胀
- 写:
- 上传阶段
- 写:
upload_tasks,file_locations,song_backend_presence - 典型问题:上传成功但 presence 未刷新导致界面仍显示未上传
- 写:
- 任务中心
- 写:
job_runs/stages/items/workers/commands/events/logs - 读:dashboard 汇总、doing/done 树、worker 速度
- 写:
迁移与向后兼容
initialize_database()每次启动都会:- 执行
CREATE TABLE IF NOT EXISTS - 执行必要
ALTER TABLE ADD COLUMN(如play_count、worker 吞吐字段)
- 执行
- 这保证了旧库可直接升级,不需要手工跑 SQL migration 脚本
- 升级前建议备份
catalogsync.db,尤其在调整去重策略与批量维护前
核心 ER 简图
erDiagram
PLAYLIST_POOLS ||--o{ POOL_PLAYLISTS : links
PLAYLISTS ||--o{ POOL_PLAYLISTS : belongs_to
PLAYLISTS ||--o{ PLAYLIST_SONGS : contains
SONGS ||--o{ PLAYLIST_SONGS : appears_in
ARTIST_POOLS ||--o{ POOL_ARTISTS : links
ARTISTS ||--o{ POOL_ARTISTS : belongs_to
ARTISTS ||--o{ ARTIST_SONGS : sings
SONGS ||--o{ ARTIST_SONGS : performed_by
SONGS ||--o{ FILE_ASSETS : has_versions
FILE_ASSETS ||--o{ FILE_LOCATIONS : stored_at
STORAGE_BACKENDS ||--o{ FILE_LOCATIONS : hosts
SONGS ||--o{ SONG_BACKEND_PRESENCE : has_presence
STORAGE_BACKENDS ||--o{ SONG_BACKEND_PRESENCE : summarized_on
JOB_RUNS ||--o{ JOB_STAGES : has
JOB_STAGES ||--o{ JOB_ITEMS : has
JOB_RUNS ||--o{ JOB_WORKERS : owns
JOB_RUNS ||--o{ JOB_COMMANDS : receives
JOB_RUNS ||--o{ JOB_EVENTS : emits
JOB_RUNS ||--o{ JOB_LOGS : writes
数据表
歌单池 -> 歌单 -> 歌曲
playlist_pools- 平台来源池,比如
playlist_square、toplist、manual_file
- 平台来源池,比如
playlists- 具体歌单或榜单
pool_playlists- 歌单池和歌单的映射
songs- 歌曲主表,唯一键为
(platform, remote_song_id)
- 歌曲主表,唯一键为
playlist_songs- 歌单和歌曲的映射
歌曲主表会保存这些核心信息:
remote_song_idnamesingersextfile_size_bytesquality_labelmetadata_json- 包含
SongInfo快照,后续可直接恢复给原下载器继续下载
- 包含
派生歌手池 + 懒加载补全
artist_pools- 由歌单池派生出的歌手池
artists- 歌手主表
pool_artists- 歌手池和歌手的映射
artist_songs- 歌手和歌曲的映射
同步歌单歌曲时,会一起更新歌手池,满足“歌单池更新时,同时更新歌手池”的要求。
下载去重与文件映射
逻辑资产层
file_assets- 表示“某首歌的某一种文件版本”
- 常见维度是
song_id + quality_label + ext + file_size_bytes ext / quality_label / file_size_bytes以实际下载命中的音源文件为准,不强绑 canonical 平台
物理位置层
storage_backends- 描述存储后端
- 当前已实现
local_fs - 后续可扩展到云盘和对象存储
file_locations- 记录某个文件资产当前实际存在哪
可以这样理解:
file_assets回答“这是什么文件”file_locations回答“这个文件现在放在哪”
如果一首歌先下载到本地,后面再上传到云盘或对象存储,可以继续复用同一个 file_asset,只需追加或更新对应的 file_location。
上传队列与后端可达性
song_backend_presence- 派生汇总表,表示某首歌在某个 backend 上是否已有 active 文件
- 常用于快速判断“这首歌是否已经补传到 main-s3”
upload_tasks- 上传任务队列表
- 一条任务 = 一个本地
file_asset上传到一个目标 backend/key - 状态包括
pending、uploading、succeeded、failed、skipped
这里要特别区分:
file_locations仍然是事实来源song_backend_presence只是为了快速查询,不替代file_locations
磁盘不足时的行为
下载器会优先检查目标目录剩余空间。
如果空间不足,会提示输入新的下载目录:
磁盘空间不足,请输入新的下载目录继续:
新目录可以位于另一个盘符。程序会:
- 把歌曲下载到新目录
- 为新目录自动创建或复用一个
storage_backend - 把新的文件位置写回
file_locations
在 --workers > 1 时,仍然只会出现一次全局提示。切换成功后,后续尚未开始的下载任务会统一改用新目录继续。
对象存储上传
当前已经实现第一版对象存储上传,后端语义按 S3-compatible 处理。
关键约定
- 本地下载完成后,会先写入一条本地
file_location - 上传成功后,会为同一个
file_asset新增一条远端file_location - 本地文件仍保留,且本地
file_location.is_primary = 1 - 远端对象存储记录为
is_primary = 0 - 默认信数据库状态,不对远端对象额外做
HEAD校验 - 同一首歌如果本地有多个 active 文件版本,会全部入队上传
key / locator 规则
对象存储 key 会镜像本地相对路径。
例如:
- 本地 locator:
qq/Singer A/song-a.flac - backend
base_prefix:music - 远端 locator:
music/qq/Singer A/song-a.flac
这样做的好处是:
- 目录结构和本地一致
- 后续迁移或重新建立映射更简单
- 上传到 CDN / 云盘时也更容易复用相同 locator 语义
backend 配置与密钥模型
非敏感配置写在 storage_backends.config_json 中,例如:
endpointregionbase_prefixaddressing_stylepublic_base_urlcredential_env_prefix
敏感密钥不落库,只走环境变量。
例如 credential_env_prefix = CATALOGSYNC_MAIN_S3 时:
CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID=your-access-key
CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY=your-secret-key
CATALOGSYNC_MAIN_S3_SESSION_TOKEN=optional-session-token
如果配置了 public_base_url,上传成功后会顺手把可推导出来的 public_url 写回远端 file_location。
upload 命令默认行为
upload 默认会做三件事:
- 找出目标 backend 上仍缺失的本地 active 文件
- 去重后写入或复用
upload_tasks - 用有限并发 worker 执行上传并回写数据库
支持按以下维度缩小范围:
--sources--playlist-ids--limit--workers
默认建议:
- 下载:
--workers 10 - 上传:
--workers 4
上传后数据库会更新什么
file_locations- 新增或更新远端对象位置
song_backend_presence- 刷新该歌曲在目标 backend 上的 active 汇总
upload_tasks- 记录本次任务的排队、执行、成功或失败状态
云盘兼容预留
推荐约定:
- 本地文件:
backend_type=local_fslocator保存相对路径
- 对象存储:
backend_type=object_storagecontainer_name保存 bucketlocator保存 key
- 云盘类后端:
backend_type=cloud_driveremote_file_id保存平台文件 IDlocator保存远端目录路径
当前实现说明
- 采集层已经覆盖 GUI “发现”页中的“歌单广场”和“排行榜”来源
- 榜单特殊解析已支持:
netease_toplistqq_toplistkuwo_toplist
- 下载链路已解耦“歌单来源”和“下载来源”
- 下载时会在
--download-sources指定的平台里重新搜歌 - 候选优选策略为:
- 高可信匹配优先
- 在高可信候选里优先更高音质 / 更大文件
- 音质相近时按
--download-sources的顺序决定优先级
- 默认下载源为 GUI 同款六平台:
qq,kuwo,migu,qianqian,kugou,netease - 对象存储上传当前已实现
register-object-backend+upload两条命令链路
运行建议
- 首次跑批建议先从单一平台开始,例如
--sources netease sync和download建议先带--limit做冒烟验证- 如果只想跑少量指定歌单,优先使用
run --playlist-file
NAS / Linux 落地约定
目录职责拆分
/volume4/Music_Cloud/library- 只存放最终音乐文件(下载产物)
/volume4/Music_Cloud/catalogsync- 只存放 catalogsync 应用与运行数据(代码、副本脚本、配置、数据库、输入、日志)
建议固定结构:
/volume4/Music_Cloud/
library/
catalogsync/
app/
bin/
config/
data/
inputs/
logs/
下载布局
默认下载布局为:
<LIBRARY_DIR>/<platform>/<first_artist>/<filename>
其中 DOWNLOAD_LAYOUT=platform_first_artist 对应上述目录结构。
这里的 <platform> 指的是“实际命中的下载源平台”,不是歌单来源平台。
catalogsync.env 关键项示例
ROOT_DIR=/volume4/Music_Cloud
APP_HOME=/volume4/Music_Cloud/catalogsync
LIBRARY_DIR=/volume4/Music_Cloud/library
DB_PATH=/volume4/Music_Cloud/catalogsync/data/catalogsync.db
INPUT_DIR=/volume4/Music_Cloud/catalogsync/inputs
LOG_DIR=/volume4/Music_Cloud/catalogsync/logs
ENV_FILE=/volume4/Music_Cloud/catalogsync/config/catalogsync.env
WEB_HOST=127.0.0.1
WEB_PORT=18080
PYTHON_BIN=python3
VENV_DIR=/volume4/Music_Cloud/catalogsync/app/.venv
DOWNLOAD_LAYOUT=platform_first_artist
DOWNLOAD_SOURCES=qq,kuwo,migu,qianqian,kugou,netease
CATALOG_EXPORT_COMMAND=bash /volume4/Music_Cloud/Music_Server/scripts/catalog-export.sh
CATALOG_EXPORT_WORKDIR=/volume4/Music_Cloud/Music_Server
OBJECT_BACKEND_NAME=main-s3
OBJECT_BUCKET=music-bucket
OBJECT_ENDPOINT=https://s3.example.com
OBJECT_REGION=auto
OBJECT_BASE_PREFIX=music
OBJECT_ADDRESSING_STYLE=
OBJECT_PUBLIC_BASE_URL=
OBJECT_CREDENTIAL_ENV_PREFIX=CATALOGSYNC_MAIN_S3
UPLOAD_WORKERS=4
UPLOAD_SOURCES=
UPLOAD_PLAYLIST_IDS=
UPLOAD_LIMIT=
CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID=
CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY=
CATALOGSYNC_MAIN_S3_SESSION_TOKEN=
Windows 一键部署到 NAS(推荐)
如果你在 Windows 本地开发并部署到固定 NAS,推荐使用一条命令:
.\deploy-catalogsync.ps1
该命令会串联:
- 本地上传
musicdl/catalogsync到 NAS staging 目录 - 覆盖 NAS 上最新
serve_console.sh与deploy_and_restart.sh - 在 NAS 端执行原子部署脚本(备份 -> 同步 -> 停旧 -> 起新 -> 探活)
- 若探活或单实例校验失败,自动回滚到上一个版本并返回非 0
可选参数:
.\deploy-catalogsync.ps1 -SkipHealthCheck
脚本位置:
- 仓库快捷入口:
deploy-catalogsync.ps1 - NAS 部署触发:
scripts/catalogsync/deploy_to_nas.ps1 - NAS 部署执行:
scripts/catalogsync/templates/deploy_and_restart.sh
NAS 端部署脚本行为(deploy_and_restart.sh)
脚本默认目标路径:
- 代码目标:
/volume4/Music_Cloud/catalogsync/app/musicdl/catalogsync - staging:
/volume4/Music_Cloud/catalogsync/deploy/staging/catalogsync - 备份:
/volume4/Music_Cloud/catalogsync/deploy/backups/catalogsync_YYYYMMDD_HHMMSS
稳定性机制:
- 部署锁:
/volume4/Music_Cloud/catalogsync/run/deploy.lock - 服务 PID:
/volume4/Music_Cloud/catalogsync/run/serve.pid - 健康检查:默认
http://127.0.0.1:${WEB_PORT}/dashboard - 失败回滚:自动恢复最近备份并重启验证
- 备份保留:默认保留最近 5 个版本(可用
--keep-backups调整)
scripts/catalogsync/bootstrap_to_linux.ps1 用法
在 Windows 侧执行(会通过 ssh/scp 初始化目标机目录并分发代码与脚本模板):
powershell -ExecutionPolicy Bypass -File .\scripts\catalogsync\bootstrap_to_linux.ps1 `
-RemoteHost 192.168.1.10 `
-Port 22 `
-User xiaoming `
-RootDir /volume4/Music_Cloud
执行后请在目标机把 catalogsync.env.example 复制为 catalogsync.env 并按机器实际路径调整。
目标机先执行 install_runtime.sh
目标机第一次部署完成后,建议先跑一次:
bash /volume4/Music_Cloud/catalogsync/bin/install_runtime.sh
这条脚本会自动完成几件事:
- 使用
PYTHON_BIN创建VENV_DIR - 升级
pip/setuptools/wheel - 从
/volume4/Music_Cloud/catalogsync/app/requirements.txt生成/volume4/Music_Cloud/catalogsync/app/requirements.nas.txt - 自动过滤
nodejs-wheel - 安装
catalogsync当前下载/上传链路所需依赖 - 对
/volume4/Music_Cloud/catalogsync/app执行一次 editable install,使python -m musicdl.catalogsync.cli ...可直接运行
日志会写到:
/volume4/Music_Cloud/catalogsync/logs/install_runtime_YYYYMMDD_HHMMSS.log
目标机 download_all.sh / download_from_file.sh 用法
在目标机执行前先准备:
cp /volume4/Music_Cloud/catalogsync/config/catalogsync.env.example \
/volume4/Music_Cloud/catalogsync/config/catalogsync.env
全量流程(等价于 musicdl.catalogsync.cli run):
bash /volume4/Music_Cloud/catalogsync/bin/download_all.sh --sources netease,qq,kuwo --limit 20
按歌单文件跑(跳过 collect):
bash /volume4/Music_Cloud/catalogsync/bin/download_from_file.sh \
/volume4/Music_Cloud/catalogsync/inputs/playlists.txt
该脚本对应 run --playlist-file 分支(跳过 collect),因此示例中不再携带 --sources。
这两个下载脚本都会自动读取 catalogsync.env 里的 DOWNLOAD_SOURCES,并转成 --download-sources ... 传给 CLI。
这两个下载脚本会优先使用 VENV_DIR/bin/python;如果虚拟环境还没准备好,才回退到 PYTHON_BIN。
下载后 catalog 导出(NAS 联动建议开启)
为让 Music_Server 的只读库 catalog_read.db 在下载后自动刷新,建议在 catalogsync.env 配置:
CATALOG_EXPORT_COMMAND=bash /volume4/Music_Cloud/Music_Server/scripts/catalog-export.shCATALOG_EXPORT_WORKDIR=/volume4/Music_Cloud/Music_Server
行为说明:
- 每次
downloadstage 进入终态后触发一次(同一 stage 仅触发一次) - 未配置
CATALOG_EXPORT_COMMAND时,本次导出标记为skipped job_events会记录以下事件:catalog_export_startedcatalog_export_skippedcatalog_export_succeededcatalog_export_failed
目标机 upload_all.sh 用法
对象存储上传脚本位于:
/volume4/Music_Cloud/catalogsync/bin/upload_all.sh
它会先按 catalogsync.env 中的配置自动执行一次 register-object-backend,再执行 upload,因此改了 bucket、endpoint、CDN 基地址后,不需要单独再手工注册一次。
最简单的跑法:
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh
如果只想补传指定来源或指定歌单,也可以在脚本后面直接追加 CLI 参数:
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh --sources netease,qq --limit 200
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh --playlist-ids 12,15 --workers 6
这条脚本同样会优先使用 VENV_DIR/bin/python;如果虚拟环境不存在,才回退到 PYTHON_BIN。
这条脚本依赖以下 env:
OBJECT_BACKEND_NAMEOBJECT_BUCKETOBJECT_ENDPOINTOBJECT_REGIONOBJECT_BASE_PREFIXOBJECT_ADDRESSING_STYLEOBJECT_PUBLIC_BASE_URLOBJECT_CREDENTIAL_ENV_PREFIX${OBJECT_CREDENTIAL_ENV_PREFIX}_ACCESS_KEY_ID${OBJECT_CREDENTIAL_ENV_PREFIX}_SECRET_ACCESS_KEY${OBJECT_CREDENTIAL_ENV_PREFIX}_SESSION_TOKENUPLOAD_WORKERSUPLOAD_SOURCESUPLOAD_PLAYLIST_IDSUPLOAD_LIMIT
日志会写到:
/volume4/Music_Cloud/catalogsync/logs/upload_all_YYYYMMDD_HHMMSS.log
目标机 serve_console.sh 用法
ops 控制台脚本位于:
/volume4/Music_Cloud/catalogsync/bin/serve_console.sh
运行示例:
bash /volume4/Music_Cloud/catalogsync/bin/serve_console.sh
脚本会自动读取 catalogsync.env 中的 DB_PATH、ENV_FILE、WEB_HOST、WEB_PORT 并透传给 musicdl.catalogsync.cli serve。
单实例保护机制:
- 锁目录:
/volume4/Music_Cloud/catalogsync/run/serve.lock - PID 文件:
/volume4/Music_Cloud/catalogsync/run/serve.pid - 如果已存在活跃实例,脚本会直接失败退出,避免重复启动
日志会写到:
/volume4/Music_Cloud/catalogsync/logs/serve_console_YYYYMMDD_HHMMSS.log
NAS 依赖安装备注
这台 NAS 的系统 Python 是 Python 3.8,并且缺少 nodejs-wheel-binaries 需要的本地编译工具链。
当前 catalogsync 的下载、对象存储上传、netease/qq/kuwo 这条链路不依赖 nodejs-wheel,因此建议直接使用上面的 install_runtime.sh。它会自动生成并安装过滤后的 requirements.nas.txt,不需要再手工执行 grep。
/playlists 歌单池管理页(选择性下载)
/playlists 现已作为歌单池管理页使用,面向“筛选歌单 -> 选择目标 -> 执行批量动作”的运维流程。
支持筛选参数:
platformpool_kindstatuskeywordwanted_onlypage_size
列表支持当前页勾选,并提供整页全选/清空。
当前支持四个批量动作:
- 下载已同步所选歌单
- 同步后下载所选歌单
- 加入待下载清单
- 移出待下载清单
歌单状态语义:
- 未同步:该歌单尚未完成同步
- 未下载:已同步但仍有待下载歌曲
- 下载中:存在进行中的下载任务
- 部分已下载:部分歌曲已落盘,仍有剩余未完成
- 已下载:歌单内歌曲均满足“已下载”口径
“已下载”口径:对同一 song_id,只要本地存在 active 的 local_fs 文件,即判定该歌曲下载完成。
页面动作最终仍复用现有 job 系统:
- 下载已同步所选歌单 ->
download_only - 同步后下载所选歌单 ->
sync_download - 上述两类任务的区别在
playlist_scope.playlist_ids
Operations Console Update
As of 2026-04-16, the operations console behavior has changed in three important ways:
musicdl-catalogsync servenow starts the web console together with an embedded ops runner./dashboardnow exposes a create-job form plus live job/download summary, active workers, and running items./jobs/{id}now exposes a command form forpause,resume,cancel,retry_item, andforce_retry_item, together with worker and running-item detail.
Current job type to stage mapping:
catalog_sync:collect -> sync -> downloadcollect_only:collectsync_only:syncsync_download:sync -> downloaddownload_only:downloadupload_only:uploaddownload_upload:download -> upload
Collector behavior update:
- playlist square collection now paginates for
neteaseandkuwo qqplaylist-square failures are isolated so other sources continue
This means the console is no longer read-only: creating a job from the dashboard should enqueue work that the embedded runner can execute without starting a second process.
As of 2026-04-17, the deployed NAS console was verified again and the following operational fixes are also part of the live behavior:
/dashboardnow exposesQuick Launch,Active Job,Running Songs, andPlaylist Coverage, and theActive Job/Recent Jobsblocks now provide directpause/resume/cancelbuttons, so the operator can both observe progress and control the current queue from one page./jobs/{id}now exposes direct action buttons forpause,resume,cancel,retry_item, andforce_retry_iteminstead of only relying on a generic command dropdown.- Collect-stage workers now emit page-level progress text such as
page N: +X, total Y, which makes it clear whether collection is advancing or stuck.
Collector and runtime hardening in this round:
QQCollectorplaylist-square requests now send the requiredRefererandOriginheaders, which restored non-zero QQ playlist-square collection on NAS.neteaseandkuwoplaylist-square pagination now stops when the upstream explicitly reportshas_more = falseor when a page is entirely duplicate playlists, preventing long-running repeated-page loops.- NAS runtime compatibility was extended for Python
3.8by removing runtime-evaluated built-in generic aliases from the serve import path. - SQLite connections now enable
busy_timeoutandjournal_mode=WAL, which prevents the operations console from intermittently failing withdatabase is lockedwhile the embedded runner is writing progress.
Observed NAS verification snapshot after redeploying these fixes:
GET http://192.168.5.43:18080/dashboardreturned200 OKwith the new controls visible.- Ten consecutive requests to
/api/dashboardreturned200 OKwhilecollect_onlyjob3was running. - Total playlists on NAS grew from the earlier
811baseline to1441during live verification. - QQ playlists on NAS grew from
25to629+during the same verification window, confirming that QQ playlist-square collection was no longer stuck at zero.
2026-04-17 NAS Restart Note
During the 2026-04-17 restart verification on NAS, the web console and the embedded runner did not recover equally:
- the web process restarted and continued serving
/dashboard,/jobs/{id}, and/api/dashboard - a stale duplicate
serveprocess had to be removed manually before the NAS converged back to a single web instance - after duplicate cleanup, the embedded runner still failed to advance queued work even though manual
OpsRepository/OpsRunnerrecovery calls succeeded against the same database
Operational workaround used on NAS:
- web console kept running as
/volume4/Music_Cloud/catalogsync/app/.venv/bin/python -m musicdl.catalogsync.cli serve ... - a separate emergency runner process was started to execute
OpsRunner.run_forever()against the same SQLite database - verification after the workaround showed
job 5resume correctly anddownloaded_songsincrease from82to85
Temporary NAS-only emergency runner details:
- PID:
17516 - log:
/volume4/Music_Cloud/catalogsync/logs/ops_runner_20260417_101958.log
Resolution on 2026-04-17 10:29:
musicdl/catalogsync/ops/web.pynow supervises the embedded runner thread and automatically restarts it after transient exceptions instead of letting the web process continue without background execution- local regression coverage now includes an embedded-runner recovery test that forces one loop failure and verifies that queued work is still completed after automatic restart
- NAS was redeployed with this fix and the temporary emergency runner was removed
- after restart, NAS converged back to a single live
serveprocess on port18080 - the restarted web process recovered the interrupted download job back to
paused, accepted aresumecommand, and then continued downloading without any standalone runner - live verification on NAS showed
downloaded_songsincrease from100to102under the single embedded-runner setup
2026-04-17 Progress Visibility Update
- the playlists page now renders a
Progresscolumn withdownloaded / total, a percentage bar, and the current running-song count - the job detail page now renders a
Playlist Progresstable for playlist-scoped jobs - job playlist progress is derived from playlist-song links, active local files, and download-stage job items of the current job
- songs that were already present locally before the job started still count as completed progress for that playlist
- empty boolean-like filters such as
/playlists?wanted_only=and/api/playlists?wanted_only=are accepted and treated asfalse
2026-04-17 Non-Music Skip + Task Center Tree
- download stage now classifies QQ toplist fallback entries (
remote_song_idstarts withqqtop_or metadata marksqq_toplist_fallback) asskippedinstead offailed - skipped toplist entries are annotated with
非音乐资源(有声榜条目) - new API:
GET /api/jobs/{job_id}/playlists/{playlist_id}/songsreturns per-song progress rows for one playlist inside one job - dashboard Task Center removed the old
Openjump link and keeps operations inline - task detail now supports hierarchical expansion:
- task -> playlist progress rows
- playlist row -> lazy-loaded song progress rows
- song rows explicitly show
非音乐资源tag when matched
2026-04-17 Stable Task Tree Refresh
- dashboard
Task Centerno longer renders the embeddedSummary / Stages / Workers / Running Itemsdetail tables - the dashboard now presents one stable tree:
- task
- playlist
- song
- task lifecycle transitions such as
paused,completed,completed_with_errors, andcanceledkeep the same task node visible in Task Center instead of making the row disappear immediately - live refresh updates task nodes in place so expanded tasks and expanded playlists can remain open across refresh cycles
2026-04-18 Dashboard Maintenance: Local Duplicate Scan / Dedupe
Dashboardnow includes aMaintenancecard for local duplicate inspection.Scan Duplicate Local CopiescallsGET /api/maintenance/local-duplicates.Run Local DedupecallsPOST /api/maintenance/local-duplicates/dedupe.- The scan groups active local duplicate rows by
(file_asset_id, backend_id). - Keep rule priority:
- existing file wins
- non-
(1)/ non-(2)canonical locator wins - shorter locator wins
- smaller
file_locations.idwins
- Dedupe execution updates references before inactivation:
- repoint
upload_tasks.source_location_id - repoint
job_items.file_location_id - mark duplicate
file_locations.status = 'inactive' - delete duplicate local files when they still exist on disk
- refresh
song_backend_presence
- repoint
- Safety guard:
- dedupe is rejected with
409while anyjob_runs.status = 'running'orjob_items.status = 'running' - this avoids colliding with active download / upload execution
- dedupe is rejected with
- The dashboard renders results inline and does not jump away from the page.
2026-04-18 Playlist Export Pipeline Update
playlists/directory generation is no longer triggered bysync.CatalogSyncService.sync_playlist_row()now only handles playlist-song linking and play-count backfill.- Playlist export artifacts are refreshed from the download side for scoped playlist jobs:
download_onlysync_download
- The runner refreshes export folders when an individual scoped playlist finishes downloading, instead of waiting for the whole download job to finish.
- On runner restart / recovery, scoped download stages also backfill export folders for playlists whose items were already completed before the restart.
- Stage-final export refresh is still kept as the last safety net, including the
0-pending-items case where all files already existed locally. - Existing single-playlist export remains available:
GET /api/playlists/{playlist_id}/export-folder- it refreshes the folder from current database state only
- it does not auto-download missing songs
- New bulk export API:
POST /api/playlists/export- routes selected playlists by current state
downloaded-> export immediatelyunsynced-> createsync_downloadjobnot_downloaded/partial/downloading-> createdownload_onlyjob
- Playlists page adds
Export Selected Playlists:- already-downloaded playlists can be exported without re-downloading songs
- not-yet-synced or not-yet-downloaded playlists are queued into the appropriate job automatically
2026-04-19 Local ZIP Export + Adaptive Download
- Playlists page no longer shows a standalone
Sync Then Downloadbutton. Download Selected Playlistsis now adaptive:unsyncedplaylists are routed tosync_download- already-synced but incomplete playlists are routed to
download_only - mixed selections may create both a
download_joband async_download_job - already-downloaded playlists can be skipped without forcing a re-download
- Export semantics now mean browser download to the operator's local machine:
- modal
ExportdownloadsGET /api/playlists/{playlist_id}/export.zip - list
Export SelectedcallsPOST /api/playlists/export-zip - when every selected playlist is ready, the API returns
status=readyplusdownload_url - when any selected playlist is not ready, the API returns
status=queuedplus job details instead of a partial ZIP
- modal
- Prepared bundle downloads are served by:
GET /api/exports/bundles/{bundle_name}.zip
GET /api/playlists/{playlist_id}/export-folderremains available as an internal server-side folder refresh / inspection endpoint, but it is no longer the user-facing export action.