Home Case Studies CV About Contact

Supporting Reference 4: Logging and Failure Handling

This page defines where failures are recorded and how to interpret logs during operations and incident response.

Back to Step 0 Back to reference 3 Next reference: Data contracts

R4.1 Logging design rule

Failures must be explicit, traceable, and recoverable. The system favors visible stage errors and DB state over silent partial success.

R4.2 Primary log surfaces

logs/latest_run.log - orchestrator and stage execution stream from main_set.py.
logs/db_uploader.log - uploader events and publish-side errors from db_uploader.py.
data/crash_runtime.log - callback/runtime crash traces (for example add-set event failures).
data/run_log.txt - run summary and high-level stage notes.
crash_startup.log - startup crash trace (when available, near script/EXE root).

R4.3 Typical failure domains

Ollama not reachable or required model missing during prefill stage.
Quality scoring dependency/runtime failures (including interpreter/DLL mismatch).
Resize/write issues causing QC_Status='ResizeFailed'.
FTP/MySQL connectivity or authentication failures during publish.
Filename collision or reservation inconsistencies.

R4.4 State-to-log correlation

Log analysis should be correlated with queue state in review_queue.

Use Review_Status to scope lifecycle phase of failures.
Use QC_Status to identify quality/resize-related issues quickly.
For publish problems, cross-check row decisions against uploader failures by File_Name.

R4.5 Triage workflow

Identify first fatal/error in logs/latest_run.log.
If publish path involved, inspect first failing row in logs/db_uploader.log.
Query queue rows to estimate impact scope.
Apply targeted fix and re-run through standard workflow.
If partial state is uncertain, follow rollback-aware runbook path.

R4.6 Retention and review habits

Keep preflight reports and run logs for at least recent batches during active tuning periods.
Capture first error line and row/file identifiers when reporting incidents.
Avoid overwriting troubleshooting context before root cause is confirmed.

R4.7 Continuation path

Next reference: Data contracts Back to reference 3 Back to Step 0

© 2026 Amir Darzi
Privacy Policy | Photography site | W3C-Valid |