Architecture Selection Aid
Pattern-level fit analysis for LLM deployment architectures. No recommendations—only constraint mapping.
This tool maps architectural patterns to deployment constraints. It does NOT recommend solutions. It shows where patterns fit well and where they introduce risk.
Key Principle: Every pattern has trade-offs. "Fits well when" describes constraint alignment. "Risky when" describes constraint violation or elevated risk exposure.
Definition: Single model instance, no external data, no retrieval layer. User prompts model directly. All knowledge is in model weights.
✓ Fits Well When:
- → Air-gapped deployment required (no external connections)
- → Task is general-purpose (code generation, summarization, translation)
- → No proprietary/internal data needed in responses
- → Low-latency critical (no retrieval overhead)
- → Minimal operational complexity acceptable
✗ Risky When:
- → Answers require current/internal knowledge (hallucination risk)
- → Domain-specific accuracy critical (model may lack depth)
- → Regulatory traceability required (no citation mechanism)
- → Knowledge updates frequent (requires full model retraining)
- → Multi-user with diverse needs (no personalization layer)
Minimum Controls Checklist:
- Model version pinning and hash verification
- Input sanitization (prompt injection defense)
- Output content filtering (PII, sensitive data)
- Request/response logging for audit
- Rate limiting per user/session
- Model size validation against available VRAM
- Graceful degradation plan (model unavailable scenario)
Scenario fit: Manufacturing (edge tasks), Enterprise IT (general Q&A), Pharma (pre-validated static tasks only)
Definition: LLM + vector database + document corpus. Model retrieves relevant context before generating answers. Grounded responses with citations.
✓ Fits Well When:
- → Answers must cite internal documents (SOPs, manuals, policies)
- → Knowledge base changes frequently (docs updated regularly)
- → Regulatory traceability required (source attribution)
- → Domain-specific accuracy critical (technical documentation)
- → Hallucination mitigation priority (grounding in evidence)
✗ Risky When:
- → Document corpus exceeds 100GB (index performance degradation)
- → Ultra-low latency required (<100ms, retrieval adds overhead)
- → No IT staff for vector DB maintenance (operational complexity)
- → Document versioning unclear (citation integrity risk)
- → Air-gapped + frequent doc updates (sync complexity)
Minimum Controls Checklist:
- Document versioning and change tracking
- Vector DB backup and recovery procedures
- Re-indexing pipeline (automated or manual)
- Access control on document corpus
- Citation verification mechanism (link to source doc)
- Embedding model pinning (consistency across re-indexes)
- Retrieval quality monitoring (relevance scoring)
- Stale document detection (timestamp checks)
Scenario fit: Pharma (SOP/batch record queries), Enterprise IT (knowledge management), Manufacturing (maintenance manuals)
Definition: Complete LLM stack (model, vector DB, app) deployed in network-isolated environment. No external connectivity. Updates via physical media or scheduled sync windows.
✓ Fits Well When:
- → Regulatory mandate for isolation (FDA, ITAR, defense)
- → Zero data exfiltration risk acceptable
- → Sensitive IP or trade secrets in prompts/docs
- → Threat model includes insider exfiltration via API
- → Infrequent model/doc updates (monthly or less)
✗ Risky When:
- → Frequent model updates required (version lag risk)
- → Remote troubleshooting needed (no vendor access)
- → Limited on-site IT expertise (dependency on external support)
- → Continuous monitoring/telemetry expected (observability gap)
- → Cost sensitivity (hardware redundancy required, no cloud fallback)
Minimum Controls Checklist:
- Physical network isolation verification
- Hardware redundancy (no single point of failure)
- Offline update procedures documented and tested
- Local monitoring/logging stack (no external telemetry)
- Disaster recovery plan (hardware failure, data corruption)
- Validated change control process (model/doc updates)
- On-site staff training (no remote vendor support)
- Compliance audit trail (network isolation verification)
Scenario fit: Pharma (GMP environments), Manufacturing (production networks), Enterprise IT (classified data)
Definition: Distributed model instances at edge locations (factories, retail, remote sites). Small form factor hardware, limited IT support, connectivity constraints.
✓ Fits Well When:
- → Latency to central data center unacceptable (>100ms)
- → Network connectivity unreliable (offline operation required)
- → Data residency mandates (data cannot leave site)
- → Physical space/power constrained (mini PC, fanless)
- → Use case is location-specific (factory floor QA, POS assistance)
✗ Risky When:
- → Large models required (70B+, exceeds edge hardware)
- → Frequent model updates (update distribution complexity)
- → No remote management capability (site visits for every issue)
- → Harsh environment (dust, temperature extremes, vibration)
- → Multi-site consistency critical (drift risk across locations)
Minimum Controls Checklist:
- Remote monitoring (health checks, disk space, GPU temp)
- Automated rollback mechanism (failed update recovery)
- Centralized logging/alerting (aggregate across sites)
- Over-the-air update capability (secure, verified)
- Hardware failure detection (alerting before total failure)
- Model version consistency monitoring (drift detection)
- Local fallback mode (degraded operation if central unreachable)
- Physical security measures (tamper detection, locked cabinets)
Scenario fit: Manufacturing (production lines, warehouses), Enterprise IT (retail stores, branch offices), Pharma (remote labs)
| CONSTRAINT | Isolated Inference | RAG-only | Air-gapped | Edge/Plant |
|---|---|---|---|---|
| Air-gapped required | ✓ | △ (sync complexity) | ✓ | △ (per site) |
| Citation/traceability | ✗ | ✓ | △ (depends on pattern) | △ (depends on pattern) |
| Low IT expertise | ✓ | ✗ | ✗ | ✗ |
| Frequent knowledge updates | ✗ | ✓ | ✗ | △ (update distribution) |
| Ultra-low latency (<100ms) | ✓ | ✗ | △ (depends on pattern) | ✓ |
| Limited hardware (edge) | ✓ (small models) | ✗ | ✗ | ✓ |
| Multi-site deployment | △ (version drift) | ✗ (sync complexity) | ✗ (update overhead) | ✓ |
✓ = Fits well | △ = Possible with caveats | ✗ = High risk or poor fit
RELATED RESOURCES