Step 3: Runbook

This runbook defines how to operate the pipeline in production conditions. Step 2 described execution flow; Step 3 describes operator procedure and recovery discipline.

3.1 Operating objective

The objective of a run is to move selected sets from local source folders into reviewed and safely published assets, while preserving deterministic naming, auditable queue state, and clean retry behavior on failure.

3.2 Preflight checklist (before each run)

  1. Open a shell at repository root and activate the environment used for production runs.
  2. If running EXE with external Python, set AMIR_PYTHON to a Python 3.13 runtime.
  3. Confirm Ollama is reachable: ollama list.
  4. Confirm required caption models are available (primary + optional fallback based on config).
  5. Confirm startup runtime line in log shows expected processor mode: processor=GPU/CPU.
  6. Validate publish configuration values (FTP/MySQL host, credentials, base paths) for the target environment.
  7. Confirm writable local paths: data/, logs/, and data/ollama_tmp/.
  8. Ensure no stale external tool is holding locks on data/review.db.
Set-Location "\path\to\amir2000_image_automation"
.\.venv313\Scripts\Activate.ps1
ollama list
python .\main_set.py

3.3 Standard run procedure

  1. Start main_set.py and import one or more sets.
  2. If metadata logic/build changed since the last run, execute a small smoke batch first (for example 2 to 10 images across different folders) before large production queues.
  3. Run the batch and monitor stage progress in console/UI.
  4. Allow stages 1 to 7 to complete; review queue rows are prepared automatically.
  5. In the review editor, validate/edit filename, caption, alt text, keywords, and quality context; use Generate for row-level metadata retry when needed.
  6. Set row decisions explicitly to approved or rejected.
  7. Publish approved rows only and wait for uploader completion.
  8. On publish completion, one final dialog is shown; click OK to close the review window.
  9. Perform post-run validation before starting a new batch.

3.4 Publish gate controls

3.5 Post-run validation checklist

  1. Review logs/latest_run.log for stage failures or warnings requiring action.
  2. Verify startup line reports expected Ollama processor mode and VRAM.
  3. Review data/prefill_qc_last.json for duplicate/suspicious prefill rows before final publish decisions.
  4. Review logs/db_uploader.log for upload/upsert failures by row.
  5. Verify expected rows exist in MySQL photos_info_revamp by File_Name.
  6. Verify website image and thumbnail URLs resolve as expected.
  7. Confirm local mirror and queue statuses align with final decisions.
  8. Confirm temporary staging does not retain unintended stale artifacts.

3.6 Incident playbooks

A. Quality scoring stage fails

  1. Retry once in-app.
  2. If it fails again, inspect logs/latest_run.log for dependency/model/runtime errors.
  3. If error mentions python312.dll conflicts with this version of Python, relaunch EXE with AMIR_PYTHON pointing to Python 3.13 and verify runtime line in log.
  4. Fix dependency/model/runtime mismatch, then rerun the batch from start.

B. Caption prefill fails or stalls

  1. Check Ollama service and model list using ollama list.
  2. If error states model missing, pull the configured primary caption model. Pull fallback model only when your config explicitly enables fallback.
  3. Confirm retry/continuation behavior in logs: [RETRY]/[RETRY-OK] (when fallback is enabled), timeout warnings, or quarantine lines for repeated native crash rows.
  4. If Stage 6 appears to loop on the same row, stop the run, inspect the first failing row reason in latest_run.log, and rerun after fixing model/config issues.
  5. If the issue is weak/generic phrasing after a code fix, confirm the EXE was rebuilt after updating caption_review_local.py and rerun a small sample before restarting the full queue.
  6. Re-run after confirming model availability and stable service response.

C. Publish fails (FTP/MySQL)

  1. Inspect logs/db_uploader.log and identify the first failing row.
  2. Validate credentials, host reachability, and target path/table configuration.
  3. Re-run publish after connectivity/authentication is confirmed fixed.

D. Crash or forced stop

  1. Restart app and use Recover crash session first (before rebuilding sets manually).
  2. Inspect logs/latest_run.log and crash_startup.log when present.
  3. Inspect data/crash_runtime.log when add-set callbacks or UI runtime handlers failed.
  4. Inspect latest queue rows for partial state before taking cleanup actions.
  5. Recovery now validates saved paths against both incoming and staged locations (including recorded origin) before restoring rows.
  6. Release reserved filenames only when reuse safety is certain.

If recovery says no valid files were found on disk, verify files still exist in either incoming or staged and confirm the session file was not deleted.

E. SQLite lock/inconsistency

  1. Close any process holding the DB file.
  2. Back up current data/ state.
  3. Re-initialize DB with python .\init_db.py only if reset is required.

3.7 Safe rerun procedure

  1. Fix the root cause first (model, credentials, path permission, dependency).
  2. Confirm rollback completed or manually validate that staging state is clean.
  3. Re-run the same set through normal pipeline entry, not partial manual edits.
  4. Verify that newly generated filenames remain collision-free.
  5. Re-check publish output and queue status after completion.

3.8 Controlled taxonomy/config updates

3.9 Continuation path

Step 3 defines operations and incident handling. Step 4 documents the database model that supports these controls.

© 2026 Amir Darzi
Privacy Policy  |  Photography site | W3C-Valid  |  Cookie settings