Reference Architectures

Standardized deployment patterns for maximizing sovereignty and minimizing latency.

> PATTERN_01: AIR_GAPPED_FORTRESS MAX SEC

Use Case

Defense, Health, IP-heavy R&D. No internet connection allowed for inference server.

Stack

  • LLM: Llama-3-70B (GGUF/ExLlamaV2)
  • Interface: OpenWebUI / Text-Gen-WebUI
  • Vector DB: ChromaDB (Local Persist)
[User] <---> [Internal Network] | [Firewall (DENY ALL OUT)] | [Inference Server] |-- [Model Weights (Local)] |-- [Vector Store (Local)]
> PATTERN_02: HYBRID_RAG_GATEWAY BALANCED

Use Case

Corporate Knowledge Base. Sensitive docs stay local; General queries might go to Cloud.

Stack

  • Router: LiteLLM / AI Gateway
  • Local: Mistral-Small (Summarization/PII Scrub)
  • Cloud: GPT-4o (Complex Reasoning only)
[User] --> [AI Gateway / Router] | /---------------------\ (PII?) (Safe?) | | [Local LLM] [Cloud API] (Redact/Summarize) (Reasoning)
> PATTERN_03: EDGE_WORKER LOW LATENCY

Use Case

Factory floor, Retail POS, Field devices. Zero network reliability required.

Stack

  • Hardware: Jetson Orin / Mac Mini / Consumer GPU
  • Model: Phi-3 / Gemma-2-9b (4-bit)
  • Serve: Llama.cpp Server
[Sensor/Input] --> [Local Queue] | [Small Model] (JSON Output) | [Action/Alert]

> ARCHITECTURE_FIT_MATRIX

Which pattern fits which scenario? Constraint-based mapping.

PATTERN MANUFACTURING (A&D) PHARMA/VALIDATED ENTERPRISE IT KEY CONSTRAINTS
AIR_GAPPED_FORTRESS ✓ IDEAL
ITAR/EAR mandates
✓ VIABLE
If GxP validated
△ OVER-ENGINEERED
Unless trade secrets
• No internet egress
• Manual model updates
• High ops burden
• Max data locality
HYBRID_RAG_GATEWAY ✗ RISKY
ITAR violation risk
△ CONDITIONAL
Non-GxP queries only
✓ IDEAL
Balance cost & control
• Data classification req
• Router logic complexity
• API dependency
• Cost control at scale
EDGE_WORKER ✓ VIABLE
Factory floor use
△ CONDITIONAL
Must validate device
✓ VIABLE
Retail/POS/Field
• Small models only
• Limited reasoning
• Network-optional
• Low latency critical
VALIDATED_ISOLATED ✓ VIABLE
If AS9100 certified
✓ IDEAL
21 CFR Part 11
△ OVER-ENGINEERED
Unless regulated
• Full IQ/OQ/PQ docs
• Change control SOP
• Audit trail logging
• Version freeze
API_FIRST_FALLBACK ✗ PROHIBITED
Data locality fail
△ CONDITIONAL
BAA required
✓ IDEAL
Fast time-to-value
• Start API, scale local
• Cost governance needed
• Gradual migration path
• Lower ops burden
> IMPLEMENTATION DECISION FLOW

Start here: Answer these constraint questions to narrow architecture choices.

Q1: Data Locality Requirement

If ITAR/EAR/CUI/Top Secret: → AIR_GAPPED_FORTRESS only
If PHI with no BAA available: → VALIDATED_ISOLATED or AIR_GAPPED
If confidential but non-regulated: → HYBRID_RAG_GATEWAY or API_FIRST_FALLBACK
If public/internal data only: → Any pattern viable

Q2: Validation Requirement

If FDA 21 CFR Part 11 required: → VALIDATED_ISOLATED (with IQ/OQ/PQ)
If AS9100/CMMC required: → AIR_GAPPED or VALIDATED_ISOLATED
If no validation required: → Any pattern viable

Q3: Operational Capacity

If ML engineering team available: → On-premise patterns viable
If no ML capacity: → API_FIRST_FALLBACK or managed services
If ops team stretched thin: → HYBRID or API preferred over full on-prem

Q4: Cost Constraints

If CapEx budget available: → On-premise patterns for long-term savings
If OpEx budget only: → API_FIRST_FALLBACK or HYBRID
If high query volume (>1M/mo): → On-premise economics favorable
If low/uncertain volume: → API avoids sunk costs

> SCENARIO DEEP-DIVES

For detailed constraint analysis, failure modes, and verification checklists specific to your industry:

RELATED DECISION TOOLS

Architecture Fit Tool → Hardware Matrix → Deployment Checklists → Ask Mode → ← Back to Home