1095 lines
42 KiB
Markdown
1095 lines
42 KiB
Markdown
# Catalog Sync CLI
|
||
|
||
`catalogsync` 是一套独立于 GUI 的采集、同步、下载链路,目标是把“发现”页里的“歌单广场”和“排行榜”来源抽出来,变成可以自动跑批的命令行工具。
|
||
|
||
当前支持的平台分两层:
|
||
|
||
- 歌单采集源:
|
||
- `netease`
|
||
- `qq`
|
||
- `kuwo`
|
||
- 下载解析源:
|
||
- `qq`
|
||
- `kuwo`
|
||
- `migu`
|
||
- `qianqian`
|
||
- `kugou`
|
||
- `netease`
|
||
|
||
设计重点:
|
||
|
||
- 将“歌单池 -> 歌单 -> 歌曲”持久化到 SQLite
|
||
- 同步歌单歌曲时,派生更新“歌手池 -> 歌手 -> 歌曲”
|
||
- 下载时按歌曲主键和有效文件位置去重
|
||
- 为本地磁盘、云盘、对象存储保留统一的文件位置抽象
|
||
|
||
## 文档导览
|
||
|
||
本文件同时覆盖四类信息:
|
||
|
||
- 项目用途与运行链路(`collect -> sync -> download -> upload`)
|
||
- 代码架构(CLI、采集同步、下载上传、Ops Console)
|
||
- 数据库设计(业务实体、文件映射、任务编排)
|
||
- 服务器部署与运维(NAS/Linux 目录规范、脚本、日志、重启)
|
||
|
||
如果你是首次接手项目,建议按这个顺序阅读:
|
||
|
||
1. 先看“代码架构”和“数据库设计总览”
|
||
2. 再看“命令”和“NAS / Linux 落地约定”
|
||
3. 最后看文末的 Ops Console 更新说明
|
||
|
||
## 代码架构
|
||
|
||
这套系统是“命令入口 + 领域服务 + 仓储层 + 后台任务控制台”四层结构,核心目标是把“采集/同步/下载/上传”拆成可组合、可恢复、可观察的流水线。
|
||
|
||
### 目录与职责边界
|
||
|
||
```text
|
||
musicdl/catalogsync/
|
||
cli.py # 命令入口与参数解析;组装 Application
|
||
runtime.py # 运行时路径/端口/目录规范(env -> config)
|
||
db.py # SQLite schema、索引、补列迁移、连接参数
|
||
models.py # 领域模型与元信息提取
|
||
repository.py # catalog 侧数据读写(歌单/歌曲/文件/统计)
|
||
services.py # 采集 + 同步编排(playlist -> songs -> artists)
|
||
downloader.py # 下载规划 + 多源候选优选 + 落盘 + 去重入库
|
||
resolver.py # 跨平台候选搜歌、评分、降级策略
|
||
uploader.py # 对象存储补传、上传队列消费、presence 刷新
|
||
collectors/ # 歌单源采集器(网易/QQ/酷我)
|
||
ops/
|
||
web.py # FastAPI 页面与 API(dashboard/playlists/jobs)
|
||
repository.py # ops 侧任务仓储(job/stage/item/worker)
|
||
runner.py # 后台调度器(lane、抢占、恢复、收敛)
|
||
executors.py # stage 执行器(collect/sync/download/upload)
|
||
maintenance.py # 本地重复文件巡检与去重
|
||
config.py # 环境配置读取/写回/版本快照
|
||
models.py # Job/Stage/Item 状态枚举与数据结构
|
||
```
|
||
|
||
边界约束:
|
||
|
||
- `services.py` 只负责“业务编排”,不直接做 UI/任务调度
|
||
- `repository.py` 负责 SQL 读写,不关心下载/上传策略
|
||
- `ops/runner.py` 负责“如何跑任务”,不直接定义采集/下载规则
|
||
- `ops/executors.py` 负责“一个 item 怎么执行”,并通过 CAS 更新状态
|
||
|
||
### 两条主链路
|
||
|
||
1. CLI 直跑链路(离线批处理)
|
||
- `cli.py` -> `CatalogSyncApplication`
|
||
- `collect/sync/download/run/upload` 直接调用 `services/downloader/uploader`
|
||
- 适合脚本化批量任务或单次命令执行
|
||
2. Ops 任务链路(可视化 + 可暂停恢复)
|
||
- `ops/web.py` 受理任务创建(`/api/jobs`、`/api/playlists/*`)
|
||
- `ops/runner.py` 按 `job_type` 拆 stage,轮询调度
|
||
- `ops/executors.py` 逐 item 执行并回写 `job_*` 表
|
||
- 前端通过 dashboard API + SSE 读取实时状态
|
||
|
||
### 关键调用序列(以“同步后下载”任务为例)
|
||
|
||
1. Web 端创建 `sync_download` 任务,写入 `job_runs`
|
||
2. runner 建立 `job_stages`:`sync -> download`
|
||
3. sync stage 为每个歌单生成 `job_items`,执行 `services.sync_playlist_row`
|
||
4. download stage 为歌曲生成 `job_items`,执行 `downloader.download_song_row`
|
||
5. 下载命中后写入 `file_assets` + `file_locations`,并刷新歌单状态聚合
|
||
6. runner 汇总 stage/item 计数,更新 `job_runs` 到 `completed/completed_with_errors`
|
||
|
||
### 任务并发与恢复模型
|
||
|
||
- 双 lane 调度:
|
||
- `download` lane:独占型,限制并发,避免磁盘与网络争用
|
||
- `general` lane:用于 collect/sync/upload,支持更高并发
|
||
- stage 内并发:
|
||
- 由 worker 数控制(下载默认 10,可配置)
|
||
- worker 心跳/速度/当前项写入 `job_workers`
|
||
- 断点恢复:
|
||
- runner 启动时扫描 recoverable job
|
||
- 运行中 item 置为 `interrupted`
|
||
- 可恢复 item 重新入队,任务状态转 `paused` 或继续 `running`
|
||
- 命令控制:
|
||
- pause/resume/cancel/retry 写入 `job_commands`
|
||
- runner 统一消费命令,避免并发写冲突
|
||
|
||
### 可扩展点(后续加平台/加存储时看这里)
|
||
|
||
- 新歌单源:实现 `collectors/*` + 在 `services.py` 注册
|
||
- 新下载源:扩展 `resolver.py` 候选检索与评分策略
|
||
- 新存储后端:扩展 `uploader.py` 的 backend 适配与 locator 语义
|
||
- 新任务类型:在 `ops/jobdefs.py` 增加 stage 序列与显示名称
|
||
- 新运维能力:在 `ops/web.py` 加 API,在 `ops/repository.py` 落状态模型
|
||
|
||
### 任务状态流转图(JobStatus)
|
||
|
||
下面图示对应 `ops/models.py` 中的 `JobStatus`:
|
||
|
||
```mermaid
|
||
stateDiagram-v2
|
||
[*] --> queued
|
||
queued --> running: runner claim
|
||
queued --> canceled: cancel
|
||
running --> pause_requested: pause command
|
||
pause_requested --> paused: all running items drained
|
||
paused --> running: resume command
|
||
running --> completed: all items success/skipped
|
||
running --> completed_with_errors: some items failed
|
||
running --> failed: unrecoverable error
|
||
running --> canceled: cancel
|
||
pause_requested --> canceled: cancel
|
||
completed --> [*]
|
||
completed_with_errors --> [*]
|
||
failed --> [*]
|
||
canceled --> [*]
|
||
```
|
||
|
||
## 命令
|
||
|
||
初始化数据库:
|
||
|
||
```bash
|
||
musicdl-catalogsync init-db --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary
|
||
```
|
||
|
||
采集“歌单广场”和“排行榜”来源:
|
||
|
||
```bash
|
||
musicdl-catalogsync collect --db D:\catalogsync\catalogsync.db --sources netease,qq,kuwo
|
||
```
|
||
|
||
同步数据库里已有歌单:
|
||
|
||
```bash
|
||
musicdl-catalogsync sync --db D:\catalogsync\catalogsync.db --sources netease,qq,kuwo --limit 20
|
||
```
|
||
|
||
下载待下载歌曲:
|
||
|
||
```bash
|
||
musicdl-catalogsync download --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --sources netease,qq,kuwo --download-sources qq,kuwo,migu,qianqian,kugou,netease --limit 20 --workers 10
|
||
```
|
||
|
||
按默认链路一把跑完:
|
||
|
||
```bash
|
||
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --sources netease,qq,kuwo --download-sources qq,kuwo,migu,qianqian,kugou,netease --limit 20 --workers 10
|
||
```
|
||
|
||
按歌单文件直接跑:
|
||
|
||
```bash
|
||
musicdl-catalogsync run --db D:\catalogsync\catalogsync.db --library-root E:\MusicLibrary --playlist-file D:\catalogsync\playlists.txt --download-sources qq,kuwo,migu,qianqian,kugou,netease --workers 10
|
||
```
|
||
|
||
注册一个对象存储后端:
|
||
|
||
```bash
|
||
musicdl-catalogsync register-object-backend ^
|
||
--db D:\catalogsync\catalogsync.db ^
|
||
--backend main-s3 ^
|
||
--bucket music-bucket ^
|
||
--endpoint https://s3.example.com ^
|
||
--region auto ^
|
||
--base-prefix music ^
|
||
--credential-env-prefix CATALOGSYNC_MAIN_S3
|
||
```
|
||
|
||
把本地已下载文件补传到对象存储:
|
||
|
||
```bash
|
||
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --workers 4
|
||
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --sources netease,qq --limit 200
|
||
musicdl-catalogsync upload --db D:\catalogsync\catalogsync.db --backend main-s3 --playlist-ids 12,15 --workers 4
|
||
```
|
||
|
||
启动 ops web console(FastAPI + uvicorn):
|
||
|
||
```bash
|
||
musicdl-catalogsync serve --db D:\catalogsync\catalogsync.db --env-file D:\catalogsync\catalogsync.env --host 127.0.0.1 --port 18080
|
||
```
|
||
|
||
也可以直接用模块方式启动:
|
||
|
||
```bash
|
||
python -m musicdl.catalogsync.cli --help
|
||
```
|
||
|
||
## `--playlist-file` 行为
|
||
|
||
传入 `--playlist-file` 时,`run` 会走一条窄分支:
|
||
|
||
1. 跳过 `collect`
|
||
2. 读取文件中的歌单 URL
|
||
3. 解析并去重
|
||
4. 以 `manual_file` 池的形式写入数据库
|
||
5. 只同步这些歌单
|
||
6. 只下载这些歌单关联到的歌曲
|
||
|
||
不传 `--playlist-file` 时,仍然保持原来的 `collect -> sync -> download` 默认行为。
|
||
|
||
## `--sources` 与 `--download-sources`
|
||
|
||
- `--sources`
|
||
- 控制要采集 / 同步 / 过滤哪些 canonical 平台歌曲
|
||
- 当前主要用于 `netease`、`qq`、`kuwo` 这三类歌单来源
|
||
- `--download-sources`
|
||
- 控制下载前要去哪些平台重新搜歌、解析直链
|
||
- 默认值是 GUI 同款六平台:`qq,kuwo,migu,qianqian,kugou,netease`
|
||
|
||
下载阶段的实际行为是:
|
||
|
||
1. 先从数据库中的 canonical song 取歌名、歌手、原始快照
|
||
2. 在 `--download-sources` 白名单里重新找可下载候选
|
||
3. 对候选按“匹配度 -> 音质 / 文件大小 -> 你配置的源顺序”排序
|
||
4. 选出最佳候选后再真正下载
|
||
|
||
这意味着:
|
||
|
||
- 网易云歌单里的歌,不一定由网易云下载
|
||
- 原平台官方直链过期或不可用时,会自动去其它下载源找同名同歌手候选
|
||
- 只要匹配可信,优先选择质量更高的候选
|
||
|
||
`sync` 阶段从这一版开始也不再要求“原平台当场给出可下载直链”:
|
||
|
||
- 只要歌单接口还能返回歌曲元信息,`sync` 就会把歌曲快照完整写入数据库
|
||
- 这些歌曲会以“延迟解析”快照入库,真正下载时再按 `--download-sources` 去补可用直链
|
||
- 这样可以避免网易云 / QQ / 酷我因为版权或临时直链失效,导致歌曲在入库阶段被提前丢掉
|
||
|
||
### 文件格式
|
||
|
||
每行一种,支持以下三类:
|
||
|
||
```text
|
||
# 注释行
|
||
https://music.163.com/#/playlist?id=17745989905
|
||
qq,https://y.qq.com/n/ryqq/playlist/7707261125
|
||
https://y.qq.com/n/ryqq/toplist/26
|
||
https://www.kuwo.cn/rankList?bangId=16
|
||
```
|
||
|
||
规则:
|
||
|
||
- 空行忽略
|
||
- `#` 开头的行忽略
|
||
- 支持 `平台,URL`
|
||
- 也支持只写 URL,此时会自动识别平台
|
||
- 同一文件里的重复歌单会自动去重
|
||
- 当前支持自动识别的 URL 平台为 `netease`、`qq`、`kuwo`
|
||
|
||
### 支持的 URL 类型
|
||
|
||
- 网易云普通歌单:`https://music.163.com/#/playlist?id=...`
|
||
- QQ 普通歌单:`https://y.qq.com/n/ryqq/playlist/...`
|
||
- QQ 排行榜:`https://y.qq.com/n/ryqq/toplist/...`
|
||
- 酷我普通歌单:`https://www.kuwo.cn/playlist_detail/...`
|
||
- 酷我排行榜:`https://www.kuwo.cn/rankList?bangId=...`
|
||
|
||
## 数据库设计总览
|
||
|
||
数据库使用 SQLite,连接策略为:
|
||
|
||
- `PRAGMA journal_mode=WAL`
|
||
- `PRAGMA busy_timeout=30000`
|
||
- `PRAGMA synchronous=NORMAL`
|
||
- 所有表在 `db.py` 中集中定义,并在初始化时执行补列迁移
|
||
|
||
设计目标:
|
||
|
||
1. 强去重:同一平台同一远端 ID 只保留一条实体
|
||
2. 弱耦合:歌曲逻辑资产与物理存储位置分离
|
||
3. 可恢复:任务状态机可持久化并支持重启续跑
|
||
4. 可观测:任务、worker、日志、事件都有落表
|
||
|
||
### 表域拆分(四大域)
|
||
|
||
1. 目录实体域(Catalog Core)
|
||
- `playlist_pools`: 歌单来源池(广场/榜单/manual_file)
|
||
- `playlists`: 歌单主体(平台、远端 ID、策略、播放量)
|
||
- `songs`: 歌曲主体(平台、远端 ID、名称、歌手、格式、快照)
|
||
- `artists`: 歌手主体(归一化名称 + 平台维度)
|
||
2. 关系映射域(Association)
|
||
- `pool_playlists`: 池与歌单多对多
|
||
- `playlist_songs`: 歌单与歌曲多对多(含 position)
|
||
- `pool_artists`: 池与歌手多对多
|
||
- `artist_songs`: 歌手与歌曲多对多
|
||
3. 文件资产域(Storage)
|
||
- `storage_backends`: 存储后端定义(local_fs/object_storage/cloud_drive)
|
||
- `file_assets`: 歌曲文件逻辑版本(质量/格式/大小/checksum)
|
||
- `file_locations`: 物理位置(backend + locator + 状态 + 主副本)
|
||
- `song_backend_presence`: 歌曲在后端的聚合存在性(加速查询)
|
||
- `download_tasks` / `upload_tasks`: 下载上传队列
|
||
4. 任务编排域(Ops)
|
||
- `job_runs`: 任务总览(类型、状态、范围、配置快照)
|
||
- `job_stages`: 阶段(collect/sync/download/upload)计数器
|
||
- `job_items`: 最小执行单元(歌单项/歌曲项/文件项)
|
||
- `job_workers`: worker 实时状态、吞吐、速度
|
||
- `job_commands`: pause/resume/cancel/retry 命令队列
|
||
- `job_events` / `job_logs`: 审计事件与执行日志
|
||
- `config_revisions`: 环境配置版本快照与回滚记录
|
||
|
||
### 去重与一致性约束(核心)
|
||
|
||
唯一键(强约束):
|
||
|
||
- `playlists(platform, remote_playlist_id)`
|
||
- `songs(platform, remote_song_id)`
|
||
- `file_locations(file_asset_id, backend_id, locator)`
|
||
- `upload_tasks(file_asset_id, target_backend_id, target_locator)`
|
||
- `job_items(job_stage_id, item_key)`
|
||
|
||
一致性规则(业务层):
|
||
|
||
- 同一 `song_id` 可对应多个 `file_asset`(不同质量/格式)
|
||
- 同一 `file_asset` 可有多个 `file_location`(本地 + 云端)
|
||
- `song_backend_presence` 由 `file_locations` 派生,不作为事实源
|
||
- 歌单“已下载/未下载/部分”状态由 `playlist_songs + active local file_locations` 聚合计算
|
||
|
||
### 高频读写路径(排障重点)
|
||
|
||
1. 采集阶段
|
||
- 写:`playlist_pools`, `playlists`, `pool_playlists`
|
||
- 典型问题:池里有歌单但 `playlists.collected_song_count` 未回填
|
||
2. 同步阶段
|
||
- 写:`songs`, `playlist_songs`, `artists`, `pool_artists`, `artist_songs`
|
||
- 典型问题:歌单已同步但歌曲数为 0(需区分“源返回空”与“解析失败”)
|
||
3. 下载阶段
|
||
- 写:`file_assets`, `file_locations`, `download_tasks`
|
||
- 读:`songs` 快照 + 下载源候选
|
||
- 典型问题:文件重复落盘、`(1)/(2)` 命名膨胀
|
||
4. 上传阶段
|
||
- 写:`upload_tasks`, `file_locations`, `song_backend_presence`
|
||
- 典型问题:上传成功但 presence 未刷新导致界面仍显示未上传
|
||
5. 任务中心
|
||
- 写:`job_runs/stages/items/workers/commands/events/logs`
|
||
- 读:dashboard 汇总、doing/done 树、worker 速度
|
||
|
||
### 迁移与向后兼容
|
||
|
||
- `initialize_database()` 每次启动都会:
|
||
- 执行 `CREATE TABLE IF NOT EXISTS`
|
||
- 执行必要 `ALTER TABLE ADD COLUMN`(如 `play_count`、worker 吞吐字段)
|
||
- 这保证了旧库可直接升级,不需要手工跑 SQL migration 脚本
|
||
- 升级前建议备份 `catalogsync.db`,尤其在调整去重策略与批量维护前
|
||
|
||
### 核心 ER 简图
|
||
|
||
```mermaid
|
||
erDiagram
|
||
PLAYLIST_POOLS ||--o{ POOL_PLAYLISTS : links
|
||
PLAYLISTS ||--o{ POOL_PLAYLISTS : belongs_to
|
||
PLAYLISTS ||--o{ PLAYLIST_SONGS : contains
|
||
SONGS ||--o{ PLAYLIST_SONGS : appears_in
|
||
|
||
ARTIST_POOLS ||--o{ POOL_ARTISTS : links
|
||
ARTISTS ||--o{ POOL_ARTISTS : belongs_to
|
||
ARTISTS ||--o{ ARTIST_SONGS : sings
|
||
SONGS ||--o{ ARTIST_SONGS : performed_by
|
||
|
||
SONGS ||--o{ FILE_ASSETS : has_versions
|
||
FILE_ASSETS ||--o{ FILE_LOCATIONS : stored_at
|
||
STORAGE_BACKENDS ||--o{ FILE_LOCATIONS : hosts
|
||
SONGS ||--o{ SONG_BACKEND_PRESENCE : has_presence
|
||
STORAGE_BACKENDS ||--o{ SONG_BACKEND_PRESENCE : summarized_on
|
||
|
||
JOB_RUNS ||--o{ JOB_STAGES : has
|
||
JOB_STAGES ||--o{ JOB_ITEMS : has
|
||
JOB_RUNS ||--o{ JOB_WORKERS : owns
|
||
JOB_RUNS ||--o{ JOB_COMMANDS : receives
|
||
JOB_RUNS ||--o{ JOB_EVENTS : emits
|
||
JOB_RUNS ||--o{ JOB_LOGS : writes
|
||
```
|
||
|
||
## 数据表
|
||
|
||
### 歌单池 -> 歌单 -> 歌曲
|
||
|
||
- `playlist_pools`
|
||
- 平台来源池,比如 `playlist_square`、`toplist`、`manual_file`
|
||
- `playlists`
|
||
- 具体歌单或榜单
|
||
- `pool_playlists`
|
||
- 歌单池和歌单的映射
|
||
- `songs`
|
||
- 歌曲主表,唯一键为 `(platform, remote_song_id)`
|
||
- `playlist_songs`
|
||
- 歌单和歌曲的映射
|
||
|
||
歌曲主表会保存这些核心信息:
|
||
|
||
- `remote_song_id`
|
||
- `name`
|
||
- `singers`
|
||
- `ext`
|
||
- `file_size_bytes`
|
||
- `quality_label`
|
||
- `metadata_json`
|
||
- 包含 `SongInfo` 快照,后续可直接恢复给原下载器继续下载
|
||
|
||
### 派生歌手池 + 懒加载补全
|
||
|
||
- `artist_pools`
|
||
- 由歌单池派生出的歌手池
|
||
- `artists`
|
||
- 歌手主表
|
||
- `pool_artists`
|
||
- 歌手池和歌手的映射
|
||
- `artist_songs`
|
||
- 歌手和歌曲的映射
|
||
|
||
同步歌单歌曲时,会一起更新歌手池,满足“歌单池更新时,同时更新歌手池”的要求。
|
||
|
||
## 下载去重与文件映射
|
||
|
||
### 逻辑资产层
|
||
|
||
- `file_assets`
|
||
- 表示“某首歌的某一种文件版本”
|
||
- 常见维度是 `song_id + quality_label + ext + file_size_bytes`
|
||
- `ext / quality_label / file_size_bytes` 以实际下载命中的音源文件为准,不强绑 canonical 平台
|
||
|
||
### 物理位置层
|
||
|
||
- `storage_backends`
|
||
- 描述存储后端
|
||
- 当前已实现 `local_fs`
|
||
- 后续可扩展到云盘和对象存储
|
||
- `file_locations`
|
||
- 记录某个文件资产当前实际存在哪
|
||
|
||
可以这样理解:
|
||
|
||
- `file_assets` 回答“这是什么文件”
|
||
- `file_locations` 回答“这个文件现在放在哪”
|
||
|
||
如果一首歌先下载到本地,后面再上传到云盘或对象存储,可以继续复用同一个 `file_asset`,只需追加或更新对应的 `file_location`。
|
||
|
||
### 上传队列与后端可达性
|
||
|
||
- `song_backend_presence`
|
||
- 派生汇总表,表示某首歌在某个 backend 上是否已有 active 文件
|
||
- 常用于快速判断“这首歌是否已经补传到 main-s3”
|
||
- `upload_tasks`
|
||
- 上传任务队列表
|
||
- 一条任务 = 一个本地 `file_asset` 上传到一个目标 backend/key
|
||
- 状态包括 `pending`、`uploading`、`succeeded`、`failed`、`skipped`
|
||
|
||
这里要特别区分:
|
||
|
||
- `file_locations` 仍然是事实来源
|
||
- `song_backend_presence` 只是为了快速查询,不替代 `file_locations`
|
||
|
||
## 磁盘不足时的行为
|
||
|
||
下载器会优先检查目标目录剩余空间。
|
||
|
||
如果空间不足,会提示输入新的下载目录:
|
||
|
||
```text
|
||
磁盘空间不足,请输入新的下载目录继续:
|
||
```
|
||
|
||
新目录可以位于另一个盘符。程序会:
|
||
|
||
- 把歌曲下载到新目录
|
||
- 为新目录自动创建或复用一个 `storage_backend`
|
||
- 把新的文件位置写回 `file_locations`
|
||
|
||
在 `--workers > 1` 时,仍然只会出现一次全局提示。切换成功后,后续尚未开始的下载任务会统一改用新目录继续。
|
||
|
||
## 对象存储上传
|
||
|
||
当前已经实现第一版对象存储上传,后端语义按 S3-compatible 处理。
|
||
|
||
### 关键约定
|
||
|
||
1. 本地下载完成后,会先写入一条本地 `file_location`
|
||
2. 上传成功后,会为同一个 `file_asset` 新增一条远端 `file_location`
|
||
3. 本地文件仍保留,且本地 `file_location.is_primary = 1`
|
||
4. 远端对象存储记录为 `is_primary = 0`
|
||
5. 默认信数据库状态,不对远端对象额外做 `HEAD` 校验
|
||
6. 同一首歌如果本地有多个 active 文件版本,会全部入队上传
|
||
|
||
### key / locator 规则
|
||
|
||
对象存储 key 会镜像本地相对路径。
|
||
|
||
例如:
|
||
|
||
- 本地 locator:`qq/Singer A/song-a.flac`
|
||
- backend `base_prefix`:`music`
|
||
- 远端 locator:`music/qq/Singer A/song-a.flac`
|
||
|
||
这样做的好处是:
|
||
|
||
- 目录结构和本地一致
|
||
- 后续迁移或重新建立映射更简单
|
||
- 上传到 CDN / 云盘时也更容易复用相同 locator 语义
|
||
|
||
### backend 配置与密钥模型
|
||
|
||
非敏感配置写在 `storage_backends.config_json` 中,例如:
|
||
|
||
- `endpoint`
|
||
- `region`
|
||
- `base_prefix`
|
||
- `addressing_style`
|
||
- `public_base_url`
|
||
- `credential_env_prefix`
|
||
|
||
敏感密钥不落库,只走环境变量。
|
||
|
||
例如 `credential_env_prefix = CATALOGSYNC_MAIN_S3` 时:
|
||
|
||
```dotenv
|
||
CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID=your-access-key
|
||
CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY=your-secret-key
|
||
CATALOGSYNC_MAIN_S3_SESSION_TOKEN=optional-session-token
|
||
```
|
||
|
||
如果配置了 `public_base_url`,上传成功后会顺手把可推导出来的 `public_url` 写回远端 `file_location`。
|
||
|
||
### upload 命令默认行为
|
||
|
||
`upload` 默认会做三件事:
|
||
|
||
1. 找出目标 backend 上仍缺失的本地 active 文件
|
||
2. 去重后写入或复用 `upload_tasks`
|
||
3. 用有限并发 worker 执行上传并回写数据库
|
||
|
||
支持按以下维度缩小范围:
|
||
|
||
- `--sources`
|
||
- `--playlist-ids`
|
||
- `--limit`
|
||
- `--workers`
|
||
|
||
默认建议:
|
||
|
||
- 下载:`--workers 10`
|
||
- 上传:`--workers 4`
|
||
|
||
### 上传后数据库会更新什么
|
||
|
||
- `file_locations`
|
||
- 新增或更新远端对象位置
|
||
- `song_backend_presence`
|
||
- 刷新该歌曲在目标 backend 上的 active 汇总
|
||
- `upload_tasks`
|
||
- 记录本次任务的排队、执行、成功或失败状态
|
||
|
||
## 云盘兼容预留
|
||
|
||
推荐约定:
|
||
|
||
- 本地文件:
|
||
- `backend_type=local_fs`
|
||
- `locator` 保存相对路径
|
||
- 对象存储:
|
||
- `backend_type=object_storage`
|
||
- `container_name` 保存 bucket
|
||
- `locator` 保存 key
|
||
- 云盘类后端:
|
||
- `backend_type=cloud_drive`
|
||
- `remote_file_id` 保存平台文件 ID
|
||
- `locator` 保存远端目录路径
|
||
|
||
## 当前实现说明
|
||
|
||
- 采集层已经覆盖 GUI “发现”页中的“歌单广场”和“排行榜”来源
|
||
- 榜单特殊解析已支持:
|
||
- `netease_toplist`
|
||
- `qq_toplist`
|
||
- `kuwo_toplist`
|
||
- 下载链路已解耦“歌单来源”和“下载来源”
|
||
- 下载时会在 `--download-sources` 指定的平台里重新搜歌
|
||
- 候选优选策略为:
|
||
- 高可信匹配优先
|
||
- 在高可信候选里优先更高音质 / 更大文件
|
||
- 音质相近时按 `--download-sources` 的顺序决定优先级
|
||
- 默认下载源为 GUI 同款六平台:`qq,kuwo,migu,qianqian,kugou,netease`
|
||
- 对象存储上传当前已实现 `register-object-backend` + `upload` 两条命令链路
|
||
|
||
## 运行建议
|
||
|
||
- 首次跑批建议先从单一平台开始,例如 `--sources netease`
|
||
- `sync` 和 `download` 建议先带 `--limit` 做冒烟验证
|
||
- 如果只想跑少量指定歌单,优先使用 `run --playlist-file`
|
||
|
||
## NAS / Linux 落地约定
|
||
|
||
### 目录职责拆分
|
||
|
||
- `/volume4/Music_Cloud/library`
|
||
- 只存放最终音乐文件(下载产物)
|
||
- `/volume4/Music_Cloud/catalogsync`
|
||
- 只存放 catalogsync 应用与运行数据(代码、副本脚本、配置、数据库、输入、日志)
|
||
|
||
建议固定结构:
|
||
|
||
```text
|
||
/volume4/Music_Cloud/
|
||
library/
|
||
catalogsync/
|
||
app/
|
||
bin/
|
||
config/
|
||
data/
|
||
inputs/
|
||
logs/
|
||
```
|
||
|
||
### 下载布局
|
||
|
||
默认下载布局为:
|
||
|
||
```text
|
||
<LIBRARY_DIR>/<platform>/<first_artist>/<filename>
|
||
```
|
||
|
||
其中 `DOWNLOAD_LAYOUT=platform_first_artist` 对应上述目录结构。
|
||
|
||
这里的 `<platform>` 指的是“实际命中的下载源平台”,不是歌单来源平台。
|
||
|
||
### `catalogsync.env` 关键项示例
|
||
|
||
```dotenv
|
||
ROOT_DIR=/volume4/Music_Cloud
|
||
APP_HOME=/volume4/Music_Cloud/catalogsync
|
||
LIBRARY_DIR=/volume4/Music_Cloud/library
|
||
DB_PATH=/volume4/Music_Cloud/catalogsync/data/catalogsync.db
|
||
INPUT_DIR=/volume4/Music_Cloud/catalogsync/inputs
|
||
LOG_DIR=/volume4/Music_Cloud/catalogsync/logs
|
||
ENV_FILE=/volume4/Music_Cloud/catalogsync/config/catalogsync.env
|
||
WEB_HOST=127.0.0.1
|
||
WEB_PORT=18080
|
||
PYTHON_BIN=python3
|
||
VENV_DIR=/volume4/Music_Cloud/catalogsync/app/.venv
|
||
DOWNLOAD_LAYOUT=platform_first_artist
|
||
DOWNLOAD_SOURCES=qq,kuwo,migu,qianqian,kugou,netease
|
||
CATALOG_EXPORT_COMMAND=bash /volume4/Music_Cloud/Music_Server/scripts/catalog-export.sh
|
||
CATALOG_EXPORT_WORKDIR=/volume4/Music_Cloud/Music_Server
|
||
OBJECT_BACKEND_NAME=main-s3
|
||
OBJECT_BUCKET=music-bucket
|
||
OBJECT_ENDPOINT=https://s3.example.com
|
||
OBJECT_REGION=auto
|
||
OBJECT_BASE_PREFIX=music
|
||
OBJECT_ADDRESSING_STYLE=
|
||
OBJECT_PUBLIC_BASE_URL=
|
||
OBJECT_CREDENTIAL_ENV_PREFIX=CATALOGSYNC_MAIN_S3
|
||
UPLOAD_WORKERS=4
|
||
UPLOAD_SOURCES=
|
||
UPLOAD_PLAYLIST_IDS=
|
||
UPLOAD_LIMIT=
|
||
CATALOGSYNC_MAIN_S3_ACCESS_KEY_ID=
|
||
CATALOGSYNC_MAIN_S3_SECRET_ACCESS_KEY=
|
||
CATALOGSYNC_MAIN_S3_SESSION_TOKEN=
|
||
```
|
||
|
||
### Windows 一键部署到 NAS(推荐)
|
||
|
||
如果你在 Windows 本地开发并部署到固定 NAS,推荐使用一条命令:
|
||
|
||
```powershell
|
||
.\deploy-catalogsync.ps1
|
||
```
|
||
|
||
该命令会串联:
|
||
|
||
1. 本地上传 `musicdl/catalogsync` 到 NAS staging 目录
|
||
2. 覆盖 NAS 上最新 `serve_console.sh` 与 `deploy_and_restart.sh`
|
||
3. 在 NAS 端执行原子部署脚本(备份 -> 同步 -> 停旧 -> 起新 -> 探活)
|
||
4. 若探活或单实例校验失败,自动回滚到上一个版本并返回非 0
|
||
|
||
可选参数:
|
||
|
||
```powershell
|
||
.\deploy-catalogsync.ps1 -SkipHealthCheck
|
||
```
|
||
|
||
脚本位置:
|
||
|
||
- 仓库快捷入口:`deploy-catalogsync.ps1`
|
||
- NAS 部署触发:`scripts/catalogsync/deploy_to_nas.ps1`
|
||
- NAS 部署执行:`scripts/catalogsync/templates/deploy_and_restart.sh`
|
||
|
||
### NAS 端部署脚本行为(`deploy_and_restart.sh`)
|
||
|
||
脚本默认目标路径:
|
||
|
||
- 代码目标:`/volume4/Music_Cloud/catalogsync/app/musicdl/catalogsync`
|
||
- staging:`/volume4/Music_Cloud/catalogsync/deploy/staging/catalogsync`
|
||
- 备份:`/volume4/Music_Cloud/catalogsync/deploy/backups/catalogsync_YYYYMMDD_HHMMSS`
|
||
|
||
稳定性机制:
|
||
|
||
- 部署锁:`/volume4/Music_Cloud/catalogsync/run/deploy.lock`
|
||
- 服务 PID:`/volume4/Music_Cloud/catalogsync/run/serve.pid`
|
||
- 健康检查:默认 `http://127.0.0.1:${WEB_PORT}/dashboard`
|
||
- 失败回滚:自动恢复最近备份并重启验证
|
||
- 备份保留:默认保留最近 5 个版本(可用 `--keep-backups` 调整)
|
||
|
||
### `scripts/catalogsync/bootstrap_to_linux.ps1` 用法
|
||
|
||
在 Windows 侧执行(会通过 `ssh/scp` 初始化目标机目录并分发代码与脚本模板):
|
||
|
||
```powershell
|
||
powershell -ExecutionPolicy Bypass -File .\scripts\catalogsync\bootstrap_to_linux.ps1 `
|
||
-RemoteHost 192.168.1.10 `
|
||
-Port 22 `
|
||
-User xiaoming `
|
||
-RootDir /volume4/Music_Cloud
|
||
```
|
||
|
||
执行后请在目标机把 `catalogsync.env.example` 复制为 `catalogsync.env` 并按机器实际路径调整。
|
||
|
||
### 目标机先执行 `install_runtime.sh`
|
||
|
||
目标机第一次部署完成后,建议先跑一次:
|
||
|
||
```bash
|
||
bash /volume4/Music_Cloud/catalogsync/bin/install_runtime.sh
|
||
```
|
||
|
||
这条脚本会自动完成几件事:
|
||
|
||
- 使用 `PYTHON_BIN` 创建 `VENV_DIR`
|
||
- 升级 `pip/setuptools/wheel`
|
||
- 从 `/volume4/Music_Cloud/catalogsync/app/requirements.txt` 生成 `/volume4/Music_Cloud/catalogsync/app/requirements.nas.txt`
|
||
- 自动过滤 `nodejs-wheel`
|
||
- 安装 `catalogsync` 当前下载/上传链路所需依赖
|
||
- 对 `/volume4/Music_Cloud/catalogsync/app` 执行一次 editable install,使 `python -m musicdl.catalogsync.cli ...` 可直接运行
|
||
|
||
日志会写到:
|
||
|
||
```text
|
||
/volume4/Music_Cloud/catalogsync/logs/install_runtime_YYYYMMDD_HHMMSS.log
|
||
```
|
||
|
||
### 目标机 `download_all.sh` / `download_from_file.sh` 用法
|
||
|
||
在目标机执行前先准备:
|
||
|
||
```bash
|
||
cp /volume4/Music_Cloud/catalogsync/config/catalogsync.env.example \
|
||
/volume4/Music_Cloud/catalogsync/config/catalogsync.env
|
||
```
|
||
|
||
全量流程(等价于 `musicdl.catalogsync.cli run`):
|
||
|
||
```bash
|
||
bash /volume4/Music_Cloud/catalogsync/bin/download_all.sh --sources netease,qq,kuwo --limit 20
|
||
```
|
||
|
||
按歌单文件跑(跳过 collect):
|
||
|
||
```bash
|
||
bash /volume4/Music_Cloud/catalogsync/bin/download_from_file.sh \
|
||
/volume4/Music_Cloud/catalogsync/inputs/playlists.txt
|
||
```
|
||
|
||
该脚本对应 `run --playlist-file` 分支(跳过 `collect`),因此示例中不再携带 `--sources`。
|
||
|
||
这两个下载脚本都会自动读取 `catalogsync.env` 里的 `DOWNLOAD_SOURCES`,并转成 `--download-sources ...` 传给 CLI。
|
||
|
||
这两个下载脚本会优先使用 `VENV_DIR/bin/python`;如果虚拟环境还没准备好,才回退到 `PYTHON_BIN`。
|
||
|
||
### 下载后 catalog 导出(NAS 联动建议开启)
|
||
|
||
为让 `Music_Server` 的只读库 `catalog_read.db` 在下载后自动刷新,建议在 `catalogsync.env` 配置:
|
||
|
||
- `CATALOG_EXPORT_COMMAND=bash /volume4/Music_Cloud/Music_Server/scripts/catalog-export.sh`
|
||
- `CATALOG_EXPORT_WORKDIR=/volume4/Music_Cloud/Music_Server`
|
||
|
||
行为说明:
|
||
|
||
- 每次 `download` stage 进入终态后触发一次(同一 stage 仅触发一次)
|
||
- 未配置 `CATALOG_EXPORT_COMMAND` 时,本次导出标记为 `skipped`
|
||
- `job_events` 会记录以下事件:
|
||
- `catalog_export_started`
|
||
- `catalog_export_skipped`
|
||
- `catalog_export_succeeded`
|
||
- `catalog_export_failed`
|
||
|
||
### 目标机 `upload_all.sh` 用法
|
||
|
||
对象存储上传脚本位于:
|
||
|
||
```text
|
||
/volume4/Music_Cloud/catalogsync/bin/upload_all.sh
|
||
```
|
||
|
||
它会先按 `catalogsync.env` 中的配置自动执行一次 `register-object-backend`,再执行 `upload`,因此改了 bucket、endpoint、CDN 基地址后,不需要单独再手工注册一次。
|
||
|
||
最简单的跑法:
|
||
|
||
```bash
|
||
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh
|
||
```
|
||
|
||
如果只想补传指定来源或指定歌单,也可以在脚本后面直接追加 CLI 参数:
|
||
|
||
```bash
|
||
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh --sources netease,qq --limit 200
|
||
bash /volume4/Music_Cloud/catalogsync/bin/upload_all.sh --playlist-ids 12,15 --workers 6
|
||
```
|
||
|
||
这条脚本同样会优先使用 `VENV_DIR/bin/python`;如果虚拟环境不存在,才回退到 `PYTHON_BIN`。
|
||
|
||
这条脚本依赖以下 env:
|
||
|
||
- `OBJECT_BACKEND_NAME`
|
||
- `OBJECT_BUCKET`
|
||
- `OBJECT_ENDPOINT`
|
||
- `OBJECT_REGION`
|
||
- `OBJECT_BASE_PREFIX`
|
||
- `OBJECT_ADDRESSING_STYLE`
|
||
- `OBJECT_PUBLIC_BASE_URL`
|
||
- `OBJECT_CREDENTIAL_ENV_PREFIX`
|
||
- `${OBJECT_CREDENTIAL_ENV_PREFIX}_ACCESS_KEY_ID`
|
||
- `${OBJECT_CREDENTIAL_ENV_PREFIX}_SECRET_ACCESS_KEY`
|
||
- `${OBJECT_CREDENTIAL_ENV_PREFIX}_SESSION_TOKEN`
|
||
- `UPLOAD_WORKERS`
|
||
- `UPLOAD_SOURCES`
|
||
- `UPLOAD_PLAYLIST_IDS`
|
||
- `UPLOAD_LIMIT`
|
||
|
||
日志会写到:
|
||
|
||
```text
|
||
/volume4/Music_Cloud/catalogsync/logs/upload_all_YYYYMMDD_HHMMSS.log
|
||
```
|
||
|
||
### 目标机 `serve_console.sh` 用法
|
||
|
||
ops 控制台脚本位于:
|
||
|
||
```text
|
||
/volume4/Music_Cloud/catalogsync/bin/serve_console.sh
|
||
```
|
||
|
||
运行示例:
|
||
|
||
```bash
|
||
bash /volume4/Music_Cloud/catalogsync/bin/serve_console.sh
|
||
```
|
||
|
||
脚本会自动读取 `catalogsync.env` 中的 `DB_PATH`、`ENV_FILE`、`WEB_HOST`、`WEB_PORT` 并透传给 `musicdl.catalogsync.cli serve`。
|
||
|
||
单实例保护机制:
|
||
|
||
- 锁目录:`/volume4/Music_Cloud/catalogsync/run/serve.lock`
|
||
- PID 文件:`/volume4/Music_Cloud/catalogsync/run/serve.pid`
|
||
- 如果已存在活跃实例,脚本会直接失败退出,避免重复启动
|
||
|
||
日志会写到:
|
||
|
||
```text
|
||
/volume4/Music_Cloud/catalogsync/logs/serve_console_YYYYMMDD_HHMMSS.log
|
||
```
|
||
|
||
### NAS 依赖安装备注
|
||
|
||
这台 NAS 的系统 Python 是 `Python 3.8`,并且缺少 `nodejs-wheel-binaries` 需要的本地编译工具链。
|
||
|
||
当前 `catalogsync` 的下载、对象存储上传、`netease/qq/kuwo` 这条链路不依赖 `nodejs-wheel`,因此建议直接使用上面的 `install_runtime.sh`。它会自动生成并安装过滤后的 `requirements.nas.txt`,不需要再手工执行 `grep`。
|
||
|
||
## `/playlists` 歌单池管理页(选择性下载)
|
||
|
||
`/playlists` 现已作为歌单池管理页使用,面向“筛选歌单 -> 选择目标 -> 执行批量动作”的运维流程。
|
||
|
||
支持筛选参数:
|
||
|
||
- `platform`
|
||
- `pool_kind`
|
||
- `status`
|
||
- `keyword`
|
||
- `wanted_only`
|
||
- `page_size`
|
||
|
||
列表支持当前页勾选,并提供整页全选/清空。
|
||
|
||
当前支持四个批量动作:
|
||
|
||
- 下载已同步所选歌单
|
||
- 同步后下载所选歌单
|
||
- 加入待下载清单
|
||
- 移出待下载清单
|
||
|
||
歌单状态语义:
|
||
|
||
- 未同步:该歌单尚未完成同步
|
||
- 未下载:已同步但仍有待下载歌曲
|
||
- 下载中:存在进行中的下载任务
|
||
- 部分已下载:部分歌曲已落盘,仍有剩余未完成
|
||
- 已下载:歌单内歌曲均满足“已下载”口径
|
||
|
||
“已下载”口径:对同一 `song_id`,只要本地存在 `active` 的 `local_fs` 文件,即判定该歌曲下载完成。
|
||
|
||
页面动作最终仍复用现有 job 系统:
|
||
|
||
- 下载已同步所选歌单 -> `download_only`
|
||
- 同步后下载所选歌单 -> `sync_download`
|
||
- 上述两类任务的区别在 `playlist_scope.playlist_ids`
|
||
|
||
## Operations Console Update
|
||
|
||
As of `2026-04-16`, the operations console behavior has changed in three important ways:
|
||
|
||
1. `musicdl-catalogsync serve` now starts the web console together with an embedded ops runner.
|
||
2. `/dashboard` now exposes a create-job form plus live job/download summary, active workers, and running items.
|
||
3. `/jobs/{id}` now exposes a command form for `pause`, `resume`, `cancel`, `retry_item`, and `force_retry_item`, together with worker and running-item detail.
|
||
|
||
Current job type to stage mapping:
|
||
|
||
- `catalog_sync`: `collect -> sync -> download`
|
||
- `collect_only`: `collect`
|
||
- `sync_only`: `sync`
|
||
- `sync_download`: `sync -> download`
|
||
- `download_only`: `download`
|
||
- `upload_only`: `upload`
|
||
- `download_upload`: `download -> upload`
|
||
|
||
Collector behavior update:
|
||
|
||
- playlist square collection now paginates for `netease` and `kuwo`
|
||
- `qq` playlist-square failures are isolated so other sources continue
|
||
|
||
This means the console is no longer read-only: creating a job from the dashboard should enqueue work that the embedded runner can execute without starting a second process.
|
||
|
||
As of `2026-04-17`, the deployed NAS console was verified again and the following operational fixes are also part of the live behavior:
|
||
|
||
1. `/dashboard` now exposes `Quick Launch`, `Active Job`, `Running Songs`, and `Playlist Coverage`, and the `Active Job` / `Recent Jobs` blocks now provide direct `pause` / `resume` / `cancel` buttons, so the operator can both observe progress and control the current queue from one page.
|
||
2. `/jobs/{id}` now exposes direct action buttons for `pause`, `resume`, `cancel`, `retry_item`, and `force_retry_item` instead of only relying on a generic command dropdown.
|
||
3. Collect-stage workers now emit page-level progress text such as `page N: +X, total Y`, which makes it clear whether collection is advancing or stuck.
|
||
|
||
Collector and runtime hardening in this round:
|
||
|
||
- `QQCollector` playlist-square requests now send the required `Referer` and `Origin` headers, which restored non-zero QQ playlist-square collection on NAS.
|
||
- `netease` and `kuwo` playlist-square pagination now stops when the upstream explicitly reports `has_more = false` or when a page is entirely duplicate playlists, preventing long-running repeated-page loops.
|
||
- NAS runtime compatibility was extended for Python `3.8` by removing runtime-evaluated built-in generic aliases from the serve import path.
|
||
- SQLite connections now enable `busy_timeout` and `journal_mode=WAL`, which prevents the operations console from intermittently failing with `database is locked` while the embedded runner is writing progress.
|
||
|
||
Observed NAS verification snapshot after redeploying these fixes:
|
||
|
||
- `GET http://192.168.5.43:18080/dashboard` returned `200 OK` with the new controls visible.
|
||
- Ten consecutive requests to `/api/dashboard` returned `200 OK` while `collect_only` job `3` was running.
|
||
- Total playlists on NAS grew from the earlier `811` baseline to `1441` during live verification.
|
||
- QQ playlists on NAS grew from `25` to `629+` during the same verification window, confirming that QQ playlist-square collection was no longer stuck at zero.
|
||
|
||
## 2026-04-17 NAS Restart Note
|
||
|
||
During the `2026-04-17` restart verification on NAS, the web console and the embedded runner did not recover equally:
|
||
|
||
- the web process restarted and continued serving `/dashboard`, `/jobs/{id}`, and `/api/dashboard`
|
||
- a stale duplicate `serve` process had to be removed manually before the NAS converged back to a single web instance
|
||
- after duplicate cleanup, the embedded runner still failed to advance queued work even though manual `OpsRepository` / `OpsRunner` recovery calls succeeded against the same database
|
||
|
||
Operational workaround used on NAS:
|
||
|
||
- web console kept running as `/volume4/Music_Cloud/catalogsync/app/.venv/bin/python -m musicdl.catalogsync.cli serve ...`
|
||
- a separate emergency runner process was started to execute `OpsRunner.run_forever()` against the same SQLite database
|
||
- verification after the workaround showed `job 5` resume correctly and `downloaded_songs` increase from `82` to `85`
|
||
|
||
Temporary NAS-only emergency runner details:
|
||
|
||
- PID: `17516`
|
||
- log: `/volume4/Music_Cloud/catalogsync/logs/ops_runner_20260417_101958.log`
|
||
|
||
Resolution on `2026-04-17 10:29`:
|
||
|
||
- `musicdl/catalogsync/ops/web.py` now supervises the embedded runner thread and automatically restarts it after transient exceptions instead of letting the web process continue without background execution
|
||
- local regression coverage now includes an embedded-runner recovery test that forces one loop failure and verifies that queued work is still completed after automatic restart
|
||
- NAS was redeployed with this fix and the temporary emergency runner was removed
|
||
- after restart, NAS converged back to a single live `serve` process on port `18080`
|
||
- the restarted web process recovered the interrupted download job back to `paused`, accepted a `resume` command, and then continued downloading without any standalone runner
|
||
- live verification on NAS showed `downloaded_songs` increase from `100` to `102` under the single embedded-runner setup
|
||
|
||
## 2026-04-17 Progress Visibility Update
|
||
|
||
- the playlists page now renders a `Progress` column with `downloaded / total`, a percentage bar, and the current running-song count
|
||
- the job detail page now renders a `Playlist Progress` table for playlist-scoped jobs
|
||
- job playlist progress is derived from playlist-song links, active local files, and download-stage job items of the current job
|
||
- songs that were already present locally before the job started still count as completed progress for that playlist
|
||
- empty boolean-like filters such as `/playlists?wanted_only=` and `/api/playlists?wanted_only=` are accepted and treated as `false`
|
||
|
||
## 2026-04-17 Non-Music Skip + Task Center Tree
|
||
|
||
- download stage now classifies QQ toplist fallback entries (`remote_song_id` starts with `qqtop_` or metadata marks `qq_toplist_fallback`) as `skipped` instead of `failed`
|
||
- skipped toplist entries are annotated with `非音乐资源(有声榜条目)`
|
||
- new API: `GET /api/jobs/{job_id}/playlists/{playlist_id}/songs` returns per-song progress rows for one playlist inside one job
|
||
- dashboard Task Center removed the old `Open` jump link and keeps operations inline
|
||
- task detail now supports hierarchical expansion:
|
||
- task -> playlist progress rows
|
||
- playlist row -> lazy-loaded song progress rows
|
||
- song rows explicitly show `非音乐资源` tag when matched
|
||
## 2026-04-17 Stable Task Tree Refresh
|
||
|
||
- dashboard `Task Center` no longer renders the embedded `Summary / Stages / Workers / Running Items` detail tables
|
||
- the dashboard now presents one stable tree:
|
||
- task
|
||
- playlist
|
||
- song
|
||
- task lifecycle transitions such as `paused`, `completed`, `completed_with_errors`, and `canceled` keep the same task node visible in Task Center instead of making the row disappear immediately
|
||
- live refresh updates task nodes in place so expanded tasks and expanded playlists can remain open across refresh cycles
|
||
|
||
## 2026-04-18 Dashboard Maintenance: Local Duplicate Scan / Dedupe
|
||
|
||
- `Dashboard` now includes a `Maintenance` card for local duplicate inspection.
|
||
- `Scan Duplicate Local Copies` calls `GET /api/maintenance/local-duplicates`.
|
||
- `Run Local Dedupe` calls `POST /api/maintenance/local-duplicates/dedupe`.
|
||
- The scan groups active local duplicate rows by `(file_asset_id, backend_id)`.
|
||
- Keep rule priority:
|
||
1. existing file wins
|
||
2. non-`(1)` / non-`(2)` canonical locator wins
|
||
3. shorter locator wins
|
||
4. smaller `file_locations.id` wins
|
||
- Dedupe execution updates references before inactivation:
|
||
- repoint `upload_tasks.source_location_id`
|
||
- repoint `job_items.file_location_id`
|
||
- mark duplicate `file_locations.status = 'inactive'`
|
||
- delete duplicate local files when they still exist on disk
|
||
- refresh `song_backend_presence`
|
||
- Safety guard:
|
||
- dedupe is rejected with `409` while any `job_runs.status = 'running'` or `job_items.status = 'running'`
|
||
- this avoids colliding with active download / upload execution
|
||
- The dashboard renders results inline and does not jump away from the page.
|
||
|
||
## 2026-04-18 Playlist Export Pipeline Update
|
||
|
||
- `playlists/` directory generation is no longer triggered by `sync`.
|
||
- `CatalogSyncService.sync_playlist_row()` now only handles playlist-song linking and play-count backfill.
|
||
- Playlist export artifacts are refreshed from the download side for scoped playlist jobs:
|
||
- `download_only`
|
||
- `sync_download`
|
||
- The runner refreshes export folders when an individual scoped playlist finishes downloading, instead of waiting for the whole download job to finish.
|
||
- On runner restart / recovery, scoped download stages also backfill export folders for playlists whose items were already completed before the restart.
|
||
- Stage-final export refresh is still kept as the last safety net, including the `0`-pending-items case where all files already existed locally.
|
||
- Existing single-playlist export remains available:
|
||
- `GET /api/playlists/{playlist_id}/export-folder`
|
||
- it refreshes the folder from current database state only
|
||
- it does not auto-download missing songs
|
||
- New bulk export API:
|
||
- `POST /api/playlists/export`
|
||
- routes selected playlists by current state
|
||
- `downloaded` -> export immediately
|
||
- `unsynced` -> create `sync_download` job
|
||
- `not_downloaded` / `partial` / `downloading` -> create `download_only` job
|
||
- Playlists page adds `Export Selected Playlists`:
|
||
- already-downloaded playlists can be exported without re-downloading songs
|
||
- not-yet-synced or not-yet-downloaded playlists are queued into the appropriate job automatically
|
||
|
||
## 2026-04-19 Local ZIP Export + Adaptive Download
|
||
|
||
- Playlists page no longer shows a standalone `Sync Then Download` button.
|
||
- `Download Selected Playlists` is now adaptive:
|
||
- `unsynced` playlists are routed to `sync_download`
|
||
- already-synced but incomplete playlists are routed to `download_only`
|
||
- mixed selections may create both a `download_job` and a `sync_download_job`
|
||
- already-downloaded playlists can be skipped without forcing a re-download
|
||
- Export semantics now mean browser download to the operator's local machine:
|
||
- modal `Export` downloads `GET /api/playlists/{playlist_id}/export.zip`
|
||
- list `Export Selected` calls `POST /api/playlists/export-zip`
|
||
- when every selected playlist is ready, the API returns `status=ready` plus `download_url`
|
||
- when any selected playlist is not ready, the API returns `status=queued` plus job details instead of a partial ZIP
|
||
- Prepared bundle downloads are served by:
|
||
- `GET /api/exports/bundles/{bundle_name}.zip`
|
||
- `GET /api/playlists/{playlist_id}/export-folder` remains available as an internal server-side folder refresh / inspection endpoint, but it is no longer the user-facing export action.
|