Architecture Selection Aid

Pattern-level fit analysis for LLM deployment architectures. No recommendations—only constraint mapping.

> PATTERN_FIT_METHODOLOGY

This tool maps architectural patterns to deployment constraints. It does NOT recommend solutions. It shows where patterns fit well and where they introduce risk.

Key Principle: Every pattern has trade-offs. "Fits well when" describes constraint alignment. "Risky when" describes constraint violation or elevated risk exposure.

> PATTERN_01: ISOLATED_INFERENCE

Definition: Single model instance, no external data, no retrieval layer. User prompts model directly. All knowledge is in model weights.

✓ Fits Well When:

  • Air-gapped deployment required (no external connections)
  • Task is general-purpose (code generation, summarization, translation)
  • No proprietary/internal data needed in responses
  • Low-latency critical (no retrieval overhead)
  • Minimal operational complexity acceptable

✗ Risky When:

  • Answers require current/internal knowledge (hallucination risk)
  • Domain-specific accuracy critical (model may lack depth)
  • Regulatory traceability required (no citation mechanism)
  • Knowledge updates frequent (requires full model retraining)
  • Multi-user with diverse needs (no personalization layer)

Minimum Controls Checklist:

  • Model version pinning and hash verification
  • Input sanitization (prompt injection defense)
  • Output content filtering (PII, sensitive data)
  • Request/response logging for audit
  • Rate limiting per user/session
  • Model size validation against available VRAM
  • Graceful degradation plan (model unavailable scenario)

Scenario fit: Manufacturing (edge tasks), Enterprise IT (general Q&A), Pharma (pre-validated static tasks only)

> PATTERN_02: RAG_ONLY_ARCHITECTURE

Definition: LLM + vector database + document corpus. Model retrieves relevant context before generating answers. Grounded responses with citations.

✓ Fits Well When:

  • Answers must cite internal documents (SOPs, manuals, policies)
  • Knowledge base changes frequently (docs updated regularly)
  • Regulatory traceability required (source attribution)
  • Domain-specific accuracy critical (technical documentation)
  • Hallucination mitigation priority (grounding in evidence)

✗ Risky When:

  • Document corpus exceeds 100GB (index performance degradation)
  • Ultra-low latency required (<100ms, retrieval adds overhead)
  • No IT staff for vector DB maintenance (operational complexity)
  • Document versioning unclear (citation integrity risk)
  • Air-gapped + frequent doc updates (sync complexity)

Minimum Controls Checklist:

  • Document versioning and change tracking
  • Vector DB backup and recovery procedures
  • Re-indexing pipeline (automated or manual)
  • Access control on document corpus
  • Citation verification mechanism (link to source doc)
  • Embedding model pinning (consistency across re-indexes)
  • Retrieval quality monitoring (relevance scoring)
  • Stale document detection (timestamp checks)

Scenario fit: Pharma (SOP/batch record queries), Enterprise IT (knowledge management), Manufacturing (maintenance manuals)

> PATTERN_03: AIR_GAPPED_DEPLOYMENT

Definition: Complete LLM stack (model, vector DB, app) deployed in network-isolated environment. No external connectivity. Updates via physical media or scheduled sync windows.

✓ Fits Well When:

  • Regulatory mandate for isolation (FDA, ITAR, defense)
  • Zero data exfiltration risk acceptable
  • Sensitive IP or trade secrets in prompts/docs
  • Threat model includes insider exfiltration via API
  • Infrequent model/doc updates (monthly or less)

✗ Risky When:

  • Frequent model updates required (version lag risk)
  • Remote troubleshooting needed (no vendor access)
  • Limited on-site IT expertise (dependency on external support)
  • Continuous monitoring/telemetry expected (observability gap)
  • Cost sensitivity (hardware redundancy required, no cloud fallback)

Minimum Controls Checklist:

  • Physical network isolation verification
  • Hardware redundancy (no single point of failure)
  • Offline update procedures documented and tested
  • Local monitoring/logging stack (no external telemetry)
  • Disaster recovery plan (hardware failure, data corruption)
  • Validated change control process (model/doc updates)
  • On-site staff training (no remote vendor support)
  • Compliance audit trail (network isolation verification)

Scenario fit: Pharma (GMP environments), Manufacturing (production networks), Enterprise IT (classified data)

> PATTERN_04: EDGE_PLANT_DEPLOYMENT

Definition: Distributed model instances at edge locations (factories, retail, remote sites). Small form factor hardware, limited IT support, connectivity constraints.

✓ Fits Well When:

  • Latency to central data center unacceptable (>100ms)
  • Network connectivity unreliable (offline operation required)
  • Data residency mandates (data cannot leave site)
  • Physical space/power constrained (mini PC, fanless)
  • Use case is location-specific (factory floor QA, POS assistance)

✗ Risky When:

  • Large models required (70B+, exceeds edge hardware)
  • Frequent model updates (update distribution complexity)
  • No remote management capability (site visits for every issue)
  • Harsh environment (dust, temperature extremes, vibration)
  • Multi-site consistency critical (drift risk across locations)

Minimum Controls Checklist:

  • Remote monitoring (health checks, disk space, GPU temp)
  • Automated rollback mechanism (failed update recovery)
  • Centralized logging/alerting (aggregate across sites)
  • Over-the-air update capability (secure, verified)
  • Hardware failure detection (alerting before total failure)
  • Model version consistency monitoring (drift detection)
  • Local fallback mode (degraded operation if central unreachable)
  • Physical security measures (tamper detection, locked cabinets)

Scenario fit: Manufacturing (production lines, warehouses), Enterprise IT (retail stores, branch offices), Pharma (remote labs)

> CONSTRAINT_PATTERN_MATRIX
CONSTRAINT Isolated Inference RAG-only Air-gapped Edge/Plant
Air-gapped required △ (sync complexity) △ (per site)
Citation/traceability △ (depends on pattern) △ (depends on pattern)
Low IT expertise
Frequent knowledge updates △ (update distribution)
Ultra-low latency (<100ms) △ (depends on pattern)
Limited hardware (edge) ✓ (small models)
Multi-site deployment △ (version drift) ✗ (sync complexity) ✗ (update overhead)

✓ = Fits well | △ = Possible with caveats | ✗ = High risk or poor fit

RELATED RESOURCES

View Patterns → Deployment Checklists → Hardware Matrix → Ask Mode → ← Back to Home