Reference Architectures
Standardized deployment patterns for maximizing sovereignty and minimizing latency.
Use Case
Defense, Health, IP-heavy R&D. No internet connection allowed for inference server.
Stack
- LLM: Llama-3-70B (GGUF/ExLlamaV2)
- Interface: OpenWebUI / Text-Gen-WebUI
- Vector DB: ChromaDB (Local Persist)
Use Case
Corporate Knowledge Base. Sensitive docs stay local; General queries might go to Cloud.
Stack
- Router: LiteLLM / AI Gateway
- Local: Mistral-Small (Summarization/PII Scrub)
- Cloud: GPT-4o (Complex Reasoning only)
Use Case
Factory floor, Retail POS, Field devices. Zero network reliability required.
Stack
- Hardware: Jetson Orin / Mac Mini / Consumer GPU
- Model: Phi-3 / Gemma-2-9b (4-bit)
- Serve: Llama.cpp Server
> ARCHITECTURE_FIT_MATRIX
Which pattern fits which scenario? Constraint-based mapping.
| PATTERN | MANUFACTURING (A&D) | PHARMA/VALIDATED | ENTERPRISE IT | KEY CONSTRAINTS |
|---|---|---|---|---|
| AIR_GAPPED_FORTRESS |
✓ IDEAL ITAR/EAR mandates |
✓ VIABLE If GxP validated |
△ OVER-ENGINEERED Unless trade secrets |
• No internet egress • Manual model updates • High ops burden • Max data locality |
| HYBRID_RAG_GATEWAY |
✗ RISKY ITAR violation risk |
△ CONDITIONAL Non-GxP queries only |
✓ IDEAL Balance cost & control |
• Data classification req • Router logic complexity • API dependency • Cost control at scale |
| EDGE_WORKER |
✓ VIABLE Factory floor use |
△ CONDITIONAL Must validate device |
✓ VIABLE Retail/POS/Field |
• Small models only • Limited reasoning • Network-optional • Low latency critical |
| VALIDATED_ISOLATED |
✓ VIABLE If AS9100 certified |
✓ IDEAL 21 CFR Part 11 |
△ OVER-ENGINEERED Unless regulated |
• Full IQ/OQ/PQ docs • Change control SOP • Audit trail logging • Version freeze |
| API_FIRST_FALLBACK |
✗ PROHIBITED Data locality fail |
△ CONDITIONAL BAA required |
✓ IDEAL Fast time-to-value |
• Start API, scale local • Cost governance needed • Gradual migration path • Lower ops burden |
Start here: Answer these constraint questions to narrow architecture choices.
Q1: Data Locality Requirement
If ITAR/EAR/CUI/Top Secret: → AIR_GAPPED_FORTRESS only
If PHI with no BAA available: → VALIDATED_ISOLATED or AIR_GAPPED
If confidential but non-regulated: → HYBRID_RAG_GATEWAY or API_FIRST_FALLBACK
If public/internal data only: → Any pattern viable
Q2: Validation Requirement
If FDA 21 CFR Part 11 required: → VALIDATED_ISOLATED (with IQ/OQ/PQ)
If AS9100/CMMC required: → AIR_GAPPED or VALIDATED_ISOLATED
If no validation required: → Any pattern viable
Q3: Operational Capacity
If ML engineering team available: → On-premise patterns viable
If no ML capacity: → API_FIRST_FALLBACK or managed services
If ops team stretched thin: → HYBRID or API preferred over full on-prem
Q4: Cost Constraints
If CapEx budget available: → On-premise patterns for long-term savings
If OpEx budget only: → API_FIRST_FALLBACK or HYBRID
If high query volume (>1M/mo): → On-premise economics favorable
If low/uncertain volume: → API avoids sunk costs
For detailed constraint analysis, failure modes, and verification checklists specific to your industry:
RELATED DECISION TOOLS