xiaoming/musicdl-catalog-sync-suite

Fork 0

Files

T

xiaoming 069af30dba Initial import: Music_Server, MusicFree, catalog-sync

2026-05-23 16:51:14 +08:00

17 KiB

Raw Permalink Blame History

Catalogsync Operations Console Design

Goal

Extend musicdl.catalogsync with a NAS-local web operations console that can:

manage queue-based pipeline jobs for collect, sync, download, and upload
show playlist pool and playlist execution status as 未完成 / 进行中 / 已完成 / 异常
show worker-level live processing state, especially which song each worker is handling
support global soft pause and resume across all active workers
survive process crashes or NAS restarts without restarting the whole catalog from scratch
allow retrying a single failed or interrupted song/item instead of rerunning the whole database
manage catalogsync.env as the primary operator configuration source

This design targets an internal NAS console, not a public-facing multi-user product.

Scope

In Scope

Add a NAS-local web console for catalogsync
Add a database-backed job queue with exactly one active job at a time
Support these job templates:
- 全链路
- 仅采集
- 仅同步
- 同步+下载
- 仅下载
- 仅上传
- 下载+上传
Track job, stage, item, and worker state in SQLite
Show dashboard, queue, playlist pool, worker, log, and config views
Implement soft pause and resume
Implement crash-safe recovery at job-item granularity
Implement single-item retry and force-retry
Version and edit catalogsync.env from the web console
Reuse existing musicdl.catalogsync collectors, services, downloader, uploader, and storage model as much as possible

Out of Scope

Multi-user login or permissions
Public internet exposure or hardened auth
Multiple active jobs running at the same time
Cross-machine worker distribution
Arbitrary user-defined stage graphs
Provider-specific cloud drive management beyond current object storage support
Automatic deletion of local or remote files
Editing business data such as songs or playlists directly from the UI

Constraints

The console runs on the NAS itself
catalogsync.env remains the configuration source of truth
A queued job must freeze the required runtime settings into a config snapshot so later env edits do not mutate in-flight work
Recovery must resume from unfinished work items instead of rerunning all songs or all playlists
Existing musicdl.catalogsync CLI and scripts must remain usable
The first version should optimize for operational stability, inspectability, and recoverability over architecture purity

Operator Model

Deployment Model

The web console runs on the same NAS host that already owns:

the SQLite database
the local music library
the logs directory
the runtime scripts
the object storage configuration

This avoids a remote-control architecture for v1 and keeps job control, log access, file state, and recovery local.

Configuration Model

catalogsync.env remains the operator-managed source of truth.

The console may:

display current env values
validate and save new env revisions
apply a previous env revision as the current file

Queued jobs must store a config_snapshot_json copy of the relevant settings so:

existing queued or running jobs stay deterministic
later env edits only affect newly created jobs

Recommended Architecture

Use four layers:

Web Console
- browser UI for dashboards, queue control, logs, and config management
Management API
- serves data and accepts job or config commands
Job Orchestrator / Runner
- single-process scheduler that owns queue progression, pause, resume, and recovery
Existing Catalogsync Executors
- reuse collect, sync, download, and upload behavior from current package modules

Why Not A Thin Shell Wrapper

Wrapping only download_all.sh and upload_all.sh would not reliably provide:

worker-level current song visibility
item-level retry
fine-grained recovery after process crashes
stable soft pause and resume

The console therefore needs first-class job and work-item tables instead of depending only on raw shell output.

Job Model

Active Job Policy

only one job may be running at a time
additional jobs stay queued
a paused job may later resume and reclaim the active slot

This keeps:

pause and resume semantics simple
resource ownership clear
crash recovery easier to reason about

Job Templates

Supported templates and stage chains:

全链路
- collect -> sync -> download -> upload
仅采集
- collect
仅同步
- sync
同步+下载
- sync -> download
仅下载
- download
仅上传
- upload
下载+上传
- download -> upload

Job Status

Recommended job statuses:

queued
running
pause_requested
paused
completed
completed_with_errors
failed
canceled

Stage Status

Recommended stage statuses:

pending
running
pause_requested
paused
completed
failed
skipped

Work Item Status

Data Model

Existing Table Reuse

Keep current business tables as the catalog truth:

playlist_pools
playlists
pool_playlists
songs
playlist_songs
artists
song_artists
file_locations
object_storage_backends

These continue to answer:

what playlists exist
what songs belong to each playlist
which files exist locally or remotely

The new console layer adds execution truth around them.

New Table: `job_runs`

Purpose:

represent one queued or active operator job

Recommended fields:

id INTEGER PRIMARY KEY AUTOINCREMENT
job_type TEXT NOT NULL
status TEXT NOT NULL
priority INTEGER NOT NULL DEFAULT 100
requested_by TEXT
config_snapshot_json TEXT NOT NULL
sources TEXT
download_sources TEXT
playlist_scope_json TEXT
created_at TEXT DEFAULT CURRENT_TIMESTAMP
started_at TEXT
ended_at TEXT
last_error TEXT
resume_token TEXT

New Table: `job_stages`

Purpose:

track the stage-level execution status inside one job

Recommended fields:

id INTEGER PRIMARY KEY AUTOINCREMENT
job_run_id INTEGER NOT NULL
stage_type TEXT NOT NULL
status TEXT NOT NULL DEFAULT 'pending'
seq_no INTEGER NOT NULL
total_items INTEGER NOT NULL DEFAULT 0
pending_items INTEGER NOT NULL DEFAULT 0
running_items INTEGER NOT NULL DEFAULT 0
success_items INTEGER NOT NULL DEFAULT 0
failed_items INTEGER NOT NULL DEFAULT 0
skipped_items INTEGER NOT NULL DEFAULT 0
started_at TEXT
ended_at TEXT
last_error TEXT

New Table: `job_items`

Purpose:

track the real execution unit for recovery and retry

Granularity by stage:

collect
- one pool/source fetch unit
sync
- one playlist expansion unit
download
- one song download unit
upload
- one file upload unit

Recommended fields:

id INTEGER PRIMARY KEY AUTOINCREMENT
job_stage_id INTEGER NOT NULL
item_type TEXT NOT NULL
item_key TEXT NOT NULL
playlist_pool_id INTEGER
playlist_id INTEGER
song_id INTEGER
file_location_id INTEGER
status TEXT NOT NULL DEFAULT 'pending'
attempt_count INTEGER NOT NULL DEFAULT 0
max_attempts INTEGER NOT NULL DEFAULT 3
worker_id INTEGER
started_at TEXT
ended_at TEXT
last_error TEXT
last_error_code TEXT
payload_json TEXT
UNIQUE(job_stage_id, item_key)

New Table: `job_workers`

Purpose:

surface live worker state to the UI
show which song each worker is processing

Recommended fields:

id INTEGER PRIMARY KEY AUTOINCREMENT
job_run_id INTEGER NOT NULL
job_stage_id INTEGER
worker_name TEXT NOT NULL
status TEXT NOT NULL DEFAULT 'idle'
current_job_item_id INTEGER
current_song_id INTEGER
current_playlist_id INTEGER
current_display_text TEXT
heartbeat_at TEXT
last_progress_text TEXT
processed_count INTEGER NOT NULL DEFAULT 0
error_count INTEGER NOT NULL DEFAULT 0

New Table: `job_commands`

Purpose:

safely bridge UI actions and runner behavior

Recommended command types:

pause
resume
cancel
retry_item
force_retry_item

Recommended fields:

id INTEGER PRIMARY KEY AUTOINCREMENT
job_run_id INTEGER NOT NULL
command_type TEXT NOT NULL
target_item_id INTEGER
status TEXT NOT NULL DEFAULT 'pending'
created_at TEXT DEFAULT CURRENT_TIMESTAMP
applied_at TEXT
payload_json TEXT

New Table: `job_events`

Purpose:

structured audit trail for major runner events

Recommended event types include:

job_started
stage_started
item_started
item_failed
pause_requested
resumed
worker_heartbeat
recovery_requeued

New Table: `job_logs`

Purpose:

queryable log lines for the UI

Recommended fields:

id INTEGER PRIMARY KEY AUTOINCREMENT
job_run_id INTEGER NOT NULL
job_stage_id INTEGER
worker_id INTEGER
level TEXT NOT NULL
message TEXT NOT NULL
created_at TEXT DEFAULT CURRENT_TIMESTAMP

New Table: `config_revisions`

Purpose:

keep revision history of catalogsync.env

Recommended fields:

id INTEGER PRIMARY KEY AUTOINCREMENT
source_type TEXT NOT NULL DEFAULT 'env_file'
file_path TEXT NOT NULL
content_text TEXT NOT NULL
content_hash TEXT NOT NULL
created_at TEXT DEFAULT CURRENT_TIMESTAMP
applied_at TEXT
note TEXT

UI Design

Page 1: Dashboard

Show:

current active job
queue length
downloaded song count
uploaded file count
failed item count
per-stage summaries
recent exceptions
worker heartbeat overview

Page 2: Job Center

Show:

queued jobs
running or paused job
job template
scope
stage progression
pause, resume, cancel controls

Allow:

creating a new job from the supported templates
changing priority of queued jobs if desired

Page 3: Playlist Pools

Show:

all playlist pools and playlists
source platform
pool kind
song count
downloaded count
uploaded count
main status
current stage
last processed time
latest error summary

Derived Playlist Status Rules

Recommend deriving the main status as:

异常
- any recent failed item exists for the playlist
进行中
- any running or pause-requested item exists
未完成
- unfinished items remain but the playlist is not actively processing
已完成
- no unfinished item remains in the relevant pipeline scope

Page 4: Song Processing

Show:

each worker and its current song
failed songs
interrupted songs
retryable items

Allow:

retry single item
force-retry single item
filter by stage, platform, playlist, or error state

Page 5: Logs And Exceptions

Show:

structured events
text logs
job-level and item-level errors
stack traces or HTTP error summaries where available

Page 6: Config Management

Show:

current catalogsync.env
parsed effective values
validation errors
revision history

Allow:

save a new env revision
re-apply a previous revision

Rule:

config edits affect only future jobs unless an explicit resume override is supplied

API Surface

Recommended management endpoints:

GET /api/dashboard
GET /api/jobs
POST /api/jobs
GET /api/jobs/{id}
POST /api/jobs/{id}/pause
POST /api/jobs/{id}/resume
POST /api/jobs/{id}/cancel
GET /api/jobs/{id}/items
POST /api/job-items/{id}/retry
POST /api/job-items/{id}/force-retry
GET /api/workers
GET /api/playlists
GET /api/playlists/{id}
GET /api/logs
GET /api/config/env
PUT /api/config/env
GET /api/config/revisions
POST /api/config/revisions/{id}/apply
GET /api/events/stream

/api/events/stream should use server-sent events so the dashboard and worker pages can refresh without polling every table separately.

Pause, Resume, And Recovery Rules

Soft Pause

The only supported pause mode in v1 is soft pause.

Behavior:

UI inserts a pause command
the runner marks the job and current stage as pause_requested
workers stop claiming new items
any in-progress item is allowed to finish naturally
once all workers are idle, the stage becomes paused and then the job becomes paused

This avoids half-written file state and keeps item completion boundaries clean.

Resume

Resume behavior:

UI inserts a resume command
the runner validates the job can continue
the runner resets paused stage and job state back to running
unstarted items stay pending
succeeded items remain untouched

The resume action may optionally carry a limited override payload, such as a new library root after disk exhaustion.

Crash Recovery

On runner startup:

find all jobs with status running or pause_requested
mark those jobs paused
find all job_items left in running
convert those items to interrupted
record a recovery event

After that:

succeeded items remain done
pending items remain pending
interrupted items become eligible for retry or auto-requeue depending on stage policy
failed items remain failed until explicit retry

This preserves progress without restarting the whole job or whole database.

Retry Rules

Single Item Retry

When the operator clicks retry for a failed or interrupted item:

insert job_commands.retry_item
clear execution fields on the target item
set status back to pending
increment attempt_count on the next worker claim

Force Retry

Force retry is more aggressive:

download stage may ignore an existing local mapping if the operator requests a fresh re-download
upload stage may ignore an existing active remote mapping if the operator explicitly wants a re-upload

Force retry must stay item-scoped, never job-scoped.

Disk Exhaustion Handling

If the downloader detects insufficient space:

fail or interrupt the current download item
pause the active job with a machine-readable reason such as disk_full
surface a UI banner asking for a new library root override

After the operator supplies a new directory and clicks resume:

the job continues only for unfinished items
completed downloads are not restarted
the currently failed song can be retried from scratch

This matches the requirement that one song may restart while the whole database must not restart.

Execution Strategy

Stage Executors

Implement separate executor paths for:

collect
sync
download
upload

Recommended concurrency:

collect
- low concurrency, v1 may stay serial
sync
- low concurrency, v1 may stay serial
download
- configurable worker pool
upload
- configurable worker pool

Reuse Strategy

Prefer reusing current catalogsync modules:

musicdl.catalogsync.services
musicdl.catalogsync.downloader
musicdl.catalogsync.uploader
musicdl.catalogsync.repository

The runner should orchestrate these modules rather than rewriting the domain logic from scratch.

Technology Choice

Backend

Recommended stack:

FastAPI
Jinja2
SQLite
SSE for live updates

Frontend

Recommended rendering model:

server-rendered pages with Jinja2
HTMX for partial updates and action forms
a small amount of vanilla JavaScript for log streaming and live worker refresh

Why this fits:

NAS-local internal tool
mainly operational tables and actions
lower dependency and deployment complexity than a separate SPA
easier to keep aligned with the existing Python-only project

Verification Plan

The implementation should be verified at four levels:

unit tests
- state transitions
- retry rules
- recovery transforms
API integration tests
- job creation
- pause and resume
- item retry
- config revision flow
fault injection tests
- kill the runner mid-download and confirm item-level recovery
NAS smoke tests
- create jobs
- pause and resume
- crash and restart
- retry a single failed song
- change library directory after disk-full pause

V1 Delivery Boundary

Must Ship In V1

queue-based single-active-job runner
supported job templates
dashboard, job center, playlist pools, song processing, logs, and config pages
soft pause and resume
crash-safe item-level recovery
single-item retry and force-retry
env revision history and apply flow

Explicitly Deferred

authentication
multi-user permissions
multiple active jobs
distributed workers
arbitrary stage composition
automatic endless retries
destructive file cleanup actions

Open Follow-Up Items

Two source-coverage follow-ups remain outside this console design and should stay tracked separately:

redeploy the local Kuwo toplist fallback fix to the NAS and backfill the missing collection or sync results
repair QQ playlist square collection after the old endpoint started returning parameter failed

These belong to operational backlog work, not to the web console architecture itself.

17 KiB Raw Permalink Blame History

Catalogsync Operations Console Design

Goal

Scope

In Scope

Out of Scope

Constraints

Operator Model

Deployment Model

Configuration Model

Recommended Architecture

Why Not A Thin Shell Wrapper

Job Model

Active Job Policy

Job Templates

Job Status

Stage Status

Work Item Status

Data Model

Existing Table Reuse

New Table: job_runs

New Table: job_stages

New Table: job_items

New Table: job_workers

New Table: job_commands

New Table: job_events

New Table: job_logs

New Table: config_revisions

UI Design

Page 1: Dashboard

Page 2: Job Center

Page 3: Playlist Pools

Derived Playlist Status Rules

Page 4: Song Processing

Page 5: Logs And Exceptions

Page 6: Config Management

API Surface

Pause, Resume, And Recovery Rules

Soft Pause

Resume

Crash Recovery

Retry Rules

Single Item Retry

Force Retry

Disk Exhaustion Handling

Execution Strategy

Stage Executors

Reuse Strategy

Technology Choice

Backend

Frontend

Verification Plan

V1 Delivery Boundary

Must Ship In V1

Explicitly Deferred

Open Follow-Up Items

17 KiB

Raw Permalink Blame History

New Table: `job_runs`

New Table: `job_stages`

New Table: `job_items`

New Table: `job_workers`

New Table: `job_commands`

New Table: `job_events`

New Table: `job_logs`

New Table: `config_revisions`