Step 5: Developer Guide
This chapter closes the core path with implementation guardrails for safe evolution.
The goal is to ship improvements without regressing reliability, review safety, or publish integrity.
5.1 Developer objective
Extend the automation incrementally while preserving deterministic filenames, explicit queue state transitions,
review-first publish behavior, and recoverable failure handling.
5.2 Non-negotiable guardrails
- Keep
review_queue as operational source of truth before publish.
- Preserve review-first policy: no direct publish from unreviewed AI output.
- Do not hardcode geography keyword pools; location terms must come from row context and taxonomy.
- Preserve deterministic filename reservation in
data/used_filenames.json.
- Do not bypass uploader semantics: publish uses approved rows only.
- Treat
File_Name as business key for publish upsert and mirror sync.
- Prefer explicit DB status and logs over implicit folder-based assumptions.
- Never commit local absolute paths, credentials, or private deployment values; use sanitized export before Git publishing.
5.3 Module responsibility map
main_set.py - stage orchestration, set lifecycle, session rollback boundaries, frozen-runtime helper staging under data/_runtime_scripts, classifier-first subject suggestion, startup GPU/CPU runtime probe, optional fallback caption model wiring, and Stage 6 timeout/quarantine protections.
batch_image_quality_score.py - metric computation, quality-class assignment inputs, and non-null score fallback behavior.
caption_review_local.py - caption/keyword generation, optional nature classifier hints, dedupe strategy, evidence guardrails, weak-output handling, and sentence normalization.
review_editor.py - human decision layer, row-level edits, Generate retry path with pending-row duplicate checks, approval and rejection semantics, and shared dictionary spellcheck overlays for editable metadata fields.
db_uploader.py - FTP upload, MySQL upsert by File_Name, local mirror synchronization.
init_db.py - SQLite bootstrap from data/init/*.sql.
amir2000_config.py - runtime paths, publish endpoints, credentials, model settings.
5.4 Safe change patterns
A. Add or modify a metadata field
- Update schema in both
data/init/*.sql and local docs copies (docs/review_queue.sql, docs/photos_info_revamp.sql) as needed.
- Update producers (stage scripts) and consumers (review editor, uploader, diagnostics).
- Verify DB bootstrap via
python .\init_db.py and run a smoke batch.
- Update Step 4 database documentation if semantics changed.
B. Adjust quality scoring behavior
- Change metric/threshold logic in
batch_image_quality_score.py.
- Validate impact on
QC_Status distribution and review effort.
- Verify score completeness:
nima_score, blur_score, brightness_score, contrast_score, brisque_score, clip_aesthetic_score, and QR must all be populated.
- Run controlled regression sets before applying to large batches.
C. Change caption/prefill behavior
- Modify
caption_review_local.py with anti-duplication behavior preserved.
- Preserve optional classifier hint behavior and filename/subject gating for nature labels when classifier is enabled.
- Keep primary/optional-fallback policy intact: fallback should trigger only for failed rows and only when fallback is configured.
- Preserve Stage 6 timeout/quarantine protections in
main_set.py so repeated native crashes do not create endless retries on a single row.
- Keep keyword context filters strict (subject/location/folder aligned) to prevent unrelated geo terms.
- Preserve deterministic context guardrails (for example NL lowland anti-mountain filtering and filename
sunrise/sunset time-of-day consistency).
- Reject generic filler phrasing and invented terrain/location claims when not visually supported (evidence-only outputs).
- Preserve sentence cleanup rules that remove malformed compounds (for example broken
City- patterns) and reject malformed caption starts.
- Maintain folder-aware fallback routing and verify fallback quality against current categories in
data/folder_map.json (non-target regions can be deferred intentionally, but the scope must be explicit).
- Confirm fallback behavior for Ollama latency or missing model states.
- Verify improved specificity without overfitting repeated templates.
F. Change review-editor spellcheck behavior
- Keep shared dictionary sources consistent across
Subject, Caption, alt_text, and Keywords.
- Preserve exception-write flow (operator can keep valid terms without disabling spellcheck globally).
- Verify right-click replace/keep actions do not break text widgets or save/publish behavior.
- Preserve Generate retry flow and pending-row duplicate protection before metadata is persisted.
D. Change publish behavior
- Keep approved-only selection logic intact.
- Preserve MySQL upsert-by-
File_Name and mirror ID synchronization.
- Re-verify that successful uploads clear queue rows as expected.
E. Change frozen EXE subprocess behavior
- Keep helper-script staging via
data/_runtime_scripts for stage scripts launched from EXE.
- Keep runtime hook behavior for classifier dependencies consistent (
helpers/runtime_hook_samevenv_classifier.py).
- Preserve subprocess environment scrub logic to avoid interpreter/DLL mismatch.
- Validate stage logs still print scoring runtime interpreter and version.
- Validate startup probe line still reports processor mode (
GPU/CPU), context, and VRAM.
5.5 Data and schema compatibility rules
- Keep runtime schema assumptions aligned with init SQL and documentation copies.
- Prefer additive, backward-compatible changes; avoid destructive migrations.
- Validate behavior with existing DB files before considering reset requirements.
- When reset is unavoidable, document migration/reset impact clearly in runbook notes.
5.6 Failure, rollback, and idempotency expectations
- Blocking stage failures must not leave irreversible partial state.
- Rollback/recovery must continue restoring files safely across both
incoming and staged paths and releasing reserved filenames when applicable.
- Publish retries should stay idempotent via upsert semantics keyed by
File_Name.
- Diagnostic visibility in
logs/latest_run.log and logs/db_uploader.log must remain clear.
5.7 Verification checklist before merge
- Run a small end-to-end source execution from import to review and publish.
- Verify approve/reject transitions and filename reservation behavior.
- Validate EXE run on Python 3.13 and confirm no
python312.dll conflict during scoring.
- Inject at least one known failure path (for example model missing) and verify recovery.
- Confirm MySQL upsert and local mirror synchronization remain consistent.
- Confirm publish completion UX remains single-message and closes the review window on OK.
- Run a representative metadata quality sample (for example NL lowland + cityscape + people) and verify no generic filler phrases or unsupported terrain claims are emitted.
- Verify review editor spellcheck overlays and context-menu corrections still work on
Caption, alt_text, and Keywords.
- Validate no critical regressions in logs across pipeline and uploader.
5.8 Release and packaging path
Set-Location "\path\to\amir2000_image_automation"
pwsh -NoProfile -ExecutionPolicy Bypass -File .\helpers\preflight_multiset.ps1
pwsh -NoProfile -ExecutionPolicy Bypass -File .\helpers\build_multiset.ps1 -Clean -BuildProfile Lite
- Use sanitized export tooling before sharing public repository snapshots.
- Keep packaging artifacts separate from runtime DB/log state when distributing builds.
- Document model/version assumptions for reproducible operator behavior.
5.9 Anti-patterns to avoid
- Direct manual DB edits in production without matching code/schema updates.
- Bypassing review editor and writing publish rows directly from AI output.
- Resetting or editing
used_filenames.json without collision checks.
- Publishing unsanitized packs to git that still contain local machine paths or secrets.
- Shipping behavior changes without exercising failure and retry scenarios.
5.10 Core path completion
Core documentation is complete at this step (Step 0 to Step 5). Continue with supporting references based on task context.