Step 3: Runbook

This runbook defines how to operate the pipeline in production conditions. Step 2 described execution flow; Step 3 describes operator procedure and recovery discipline.

3.1 Operating objective

The objective of a run is to move selected sets from local source folders into reviewed and safely published assets, while preserving deterministic naming, auditable queue state, and clean retry behavior on failure.

3.2 Preflight checklist (before each run)

  1. Open a shell at repository root and activate the environment used for production runs.
  2. If running EXE with external Python, set AMIR_PYTHON to a Python 3.13 runtime.
  3. Confirm Ollama is reachable: ollama list.
  4. Confirm required caption model is available (for example minicpm-v:latest if configured).
  5. Validate publish configuration values (FTP/MySQL host, credentials, base paths) for the target environment.
  6. Confirm writable local paths: data/, logs/, and data/ollama_tmp/.
  7. Ensure no stale external tool is holding locks on data/review.db.
Set-Location "\path\to\amir2000_image_automation"
.\.venv313\Scripts\Activate.ps1
ollama list
python .\main_set.py

3.3 Standard run procedure

  1. Start main_set.py and import one or more sets.
  2. Run the batch and monitor stage progress in console/UI.
  3. Allow stages 1 to 7 to complete; review queue rows are prepared automatically.
  4. In the review editor, validate/edit filename, caption, alt text, keywords, and quality context.
  5. Set row decisions explicitly to approved or rejected.
  6. Publish approved rows only and wait for uploader completion.
  7. Perform post-run validation before starting a new batch.

3.4 Publish gate controls

3.5 Post-run validation checklist

  1. Review logs/latest_run.log for stage failures or warnings requiring action.
  2. Review logs/db_uploader.log for upload/upsert failures by row.
  3. Verify expected rows exist in MySQL photos_info_revamp by File_Name.
  4. Verify website image and thumbnail URLs resolve as expected.
  5. Confirm local mirror and queue statuses align with final decisions.
  6. Confirm temporary staging does not retain unintended stale artifacts.

3.6 Incident playbooks

A. Quality scoring stage fails

  1. Retry once in-app.
  2. If it fails again, inspect logs/latest_run.log for dependency/model/runtime errors.
  3. If error mentions python312.dll conflicts with this version of Python, relaunch EXE with AMIR_PYTHON pointing to Python 3.13 and verify runtime line in log.
  4. Fix dependency/model/runtime mismatch, then rerun the batch from start.

B. Caption prefill fails or stalls

  1. Check Ollama service and model list using ollama list.
  2. If error states model missing, pull it (for example ollama pull minicpm-v:latest).
  3. Re-run after confirming model availability and stable service response.

C. Publish fails (FTP/MySQL)

  1. Inspect logs/db_uploader.log and identify the first failing row.
  2. Validate credentials, host reachability, and target path/table configuration.
  3. Re-run publish after connectivity/authentication is confirmed fixed.

D. Crash or forced stop

  1. Restart app and use Recover crash session first (before rebuilding sets manually).
  2. Inspect logs/latest_run.log and crash_startup.log when present.
  3. Inspect data/crash_runtime.log when add-set callbacks or UI runtime handlers failed.
  4. Inspect latest queue rows for partial state before taking cleanup actions.
  5. Restore staged files only through documented rollback-aware process.
  6. Release reserved filenames only when reuse safety is certain.

E. SQLite lock/inconsistency

  1. Close any process holding the DB file.
  2. Back up current data/ state.
  3. Re-initialize DB with python .\init_db.py only if reset is required.

3.7 Safe rerun procedure

  1. Fix the root cause first (model, credentials, path permission, dependency).
  2. Confirm rollback completed or manually validate that staging state is clean.
  3. Re-run the same set through normal pipeline entry, not partial manual edits.
  4. Verify that newly generated filenames remain collision-free.
  5. Re-check publish output and queue status after completion.

3.8 Controlled taxonomy/config updates

3.9 Continuation path

Step 3 defines operations and incident handling. Step 4 documents the database model that supports these controls.

© 2026 Amir Darzi
Privacy Policy  |  Photography site | W3C-Valid  |  Cookie settings