Supporting Reference 3: Troubleshooting

Use this page when a run fails or quality output is unexpectedly weak. It is optimized for fast diagnosis and safe recovery.

Back to Step 0 Back to reference 2 Next reference: Logging

R3.1 First-response triage order

Read dist/logs/latest_run.log for EXE runs (or logs/latest_run.log for Python runs) and locate the first blocking error.
If publish is involved, read logs/db_uploader.log for first failing row.
Check row state in review_queue by Review_Status/QC_Status.
Fix root cause before retrying (model/dependency/path/credentials).
Rerun safely through normal flow; do not patch ad-hoc state blindly.

R3.2 Caption/prefill failures (Ollama)

Symptoms: caption stage stalls, timeout errors, or missing model exceptions.

Checks:

Run ollama list and confirm service responsiveness.
Confirm configured caption model exists (for example minicpm-v:latest).
Inspect stage logs in logs/latest_run.log.

Fix path:

Pull missing model with ollama pull <model>.
Restart/retry once after service health is restored.
If repeated, reduce parallel load and re-run controlled batch.

R3.3 Quality scoring failures

Symptoms: scoring stage crash, missing metric fields, or repeated stage abort.

Checks:

Validate environment dependencies (numpy/opencv/pyiqa/torch/torchvision).
Confirm sac+logos+ava1-l14-linearMSE.pth exists in repository root.
For EXE runs, verify Score runtime python: ... (ver=(3, 13)) appears in stage log.
Review error trace and failing row context in logs/latest_run.log.

Fix path:

Rebuild or repair venv if imports fail.
If error mentions python312.dll conflict, relaunch EXE with AMIR_PYTHON set to the 3.13 interpreter.
Restore missing model weight file.
Retry the run from start after environment is healthy.

R3.4 Resize failures (`QC_Status=ResizeFailed`)

Symptoms: rows flagged as ResizeFailed and caption prefill skipped.

Checks:

Validate source image path readability and file integrity.
Validate write permissions under data/ollama_tmp.
Check available disk space.

Fix path: correct filesystem constraints, then rerun the affected set through normal pipeline entry.

R3.5 Publish failures (FTP/MySQL)

Symptoms: uploader errors, partial remote updates, or missing published records.

Checks:

Inspect logs/db_uploader.log and identify first failed item.
Validate FTP/MySQL host credentials and network reachability.
Confirm publish target values in amir2000_config.py.

Fix path:

Resolve auth/connectivity issue.
Retry publish for approved rows.
Verify upsert completion by File_Name in MySQL table photos_info_revamp.

R3.6 SQLite lock or inconsistency

Close external processes that may hold data/review.db lock.
Back up current data/ before destructive actions.
If reset is required, run python .\init_db.py and rerun batch.

R3.7 Filename collision issues

Inspect data/used_filenames.json and destination folder contents.
Use editor rename flow so reservation updates remain consistent.
Avoid manual renames outside workflow scripts for in-flight items.

R3.8 App crash or forced close (continue safely)

Symptoms: app window closes unexpectedly, or Windows Event Viewer shows native crash events (for example 0xc0000005 / BEX64).

Safe continue path:

Restart the app.
Click Recover crash session next to Clear all.
Verify recovered queue size, then continue with Start Batch.

Recovery file: data/multiset_session.json. Optional backups: data/multiset_session.backup_*.json.

If crash happened while clicking Add set, inspect data/crash_runtime.log first. For high-volume queue building, you can disable automatic subject generation with AUTO_AI_SUBJECT_ON_SELECT=0 and run AI suggest manually per set.

R3.9 When to escalate to runbook flow

Repeated failures after one clean retry.
Unclear partial state after crash/interruption.
Any issue that may require rollback of staged files or reserved names.

Escalation path: Step 3: Runbook.

R3.10 Keyword context pollution (geo mismatch)

Symptoms: unrelated location keywords appear (for example usa, colorado) in non-USA images.

Checks:

Inspect row Location/Subject vs generated Keywords in review editor.
Confirm taxonomy sources in data/location_list.json and data/folder_map.json.
Review recent prefill behavior in caption_review_local.py and runtime logs.

Fix path:

Remove unrelated keywords in review editor for current run.
Keep keyword generation location-aware and avoid hardcoded geography pools.
For already-published rows, run targeted MySQL cleanup by run scope before republish.

For the 2026-02-18 affected 321-row publish batch, use: data/mysql_cleanup_exact_run321_20260218.sql from the automation project root.

Execute the full script in phpMyAdmin SQL tab on photos_info_revamp.
Confirm target_rows = 321 in the script result.
Confirm cleanup verification returns still_geo_bad = 0 and still_unsupported_terrain = 0.

R3.11 Semantic drift (terrain/time-of-day mismatch)

Symptoms: captions/alt/keywords conflict with obvious set context, for example NL/polder images getting mountain terms or sunrise filenames getting sunset text.

Checks:

Compare File_Name cue words (sunrise/sunset) against generated Caption, alt_text, and Keywords.
Compare row context (Subject/Location/Folder) against terrain-heavy output terms.
Review prefill lines in logs/latest_run.log for the affected row IDs.

Fix path:

For current queue rows, edit conflicting metadata directly in review editor and continue.
For recurrent drift, confirm current build includes context guardrails in caption_review_local.py and rebuild the EXE.
Re-run only affected queued rows where needed.

R3.12 Continuation path

Next reference: Logging Back to reference 2 Back to Step 0

Supporting Reference 3: Troubleshooting

R3.1 First-response triage order

R3.2 Caption/prefill failures (Ollama)

R3.3 Quality scoring failures

R3.4 Resize failures (QC_Status=ResizeFailed)

R3.5 Publish failures (FTP/MySQL)

R3.6 SQLite lock or inconsistency

R3.7 Filename collision issues

R3.8 App crash or forced close (continue safely)

R3.9 When to escalate to runbook flow

R3.10 Keyword context pollution (geo mismatch)

R3.11 Semantic drift (terrain/time-of-day mismatch)

R3.12 Continuation path

R3.4 Resize failures (`QC_Status=ResizeFailed`)