Scenario: General Enterprise IT
Productivity, Knowledge Management, and Internal Operations
Environment Characteristics:
Mid-to-large enterprises using LLMs for productivity enhancement, knowledge management, customer support,
and operational efficiency. Industries include technology, finance (non-trading), retail, professional services,
and manufacturing (non-regulated). Data sensitivity varies: confidential business data, customer information (non-PHI),
internal communications, code repositories.
Typical Use Cases:
• Employee productivity tools (meeting summarization, document generation)
• Internal knowledge base Q&A and search
• Customer support chatbots and ticket routing
• Code generation and review assistance
• Contract and legal document review (non-critical)
• Competitive intelligence aggregation
• HR policy and benefits Q&A
1. COST PREDICTABILITY (High — 80% Weight)
Why Dominant: Most enterprises have fixed IT budgets with limited appetite for variable cloud costs. Usage-based API pricing can escalate unpredictably with user adoption. CFO visibility into AI spending is critical.
Implication: On-premise CapEx + predictable OpEx is easier to budget. API costs require careful usage governance or risk budget overruns. Hybrid allows tiered usage (expensive queries → on-prem).
2. OPERATIONAL COMPLEXITY (High — 75% Weight)
Why Dominant: Enterprise IT teams are already stretched. Adding GPU infrastructure, model ops, and monitoring overhead must justify itself. Time-to-value and maintenance burden matter more than absolute performance.
Implication: API-first is faster to deploy (days vs. months). On-premise requires hiring or training ML engineers. Hybrid adds complexity but offers flexibility.
3. DATA LOCALITY / PRIVACY (Moderate — 60% Weight)
Why Considered: Confidential business data (M&A plans, financials, customer lists) should not leak. However, most enterprises tolerate third-party SaaS (Salesforce, G Suite) under contract. Data classification determines sensitivity.
Implication: API vendors with strong DPAs (Data Processing Agreements) may be acceptable for non-critical data. On-premise required only for highest-sensitivity data (M&A, trade secrets). Hybrid can route by data classification.
4. LATENCY CONTROL (Moderate — 50% Weight)
Why Considered: Most enterprise productivity use cases tolerate 1-3 second response times. However, customer-facing chatbots and code completion benefit from low latency. Batch jobs (e.g., nightly document summarization) are latency-insensitive.
Implication: API latency (200-500ms base + model inference) is acceptable for most use cases. On-premise sub-50ms P99 is rarely business-critical unless high-frequency trading or real-time systems involved.
5. GOVERNANCE / AUDITABILITY (Moderate — 55% Weight)
Why Considered: Legal, compliance, and security teams want visibility into what data is processed and how. However, strict audit requirements (like FDA Part 11) typically do not apply. SOC 2 / ISO 27001 controls are usually sufficient.
Implication: API vendors with SOC 2 Type II compliance can satisfy most governance needs. On-premise offers more granular logging but requires internal tooling. Logs must be retained per corporate policy (e.g., 7 years).
1. Runaway API Costs
Scenario: LLM tool rolled out to 5,000 employees without usage caps. Usage explodes. Monthly bill jumps from $10K to $200K in three months.
Consequence: Budget crisis, emergency spending approvals, tool access restricted mid-project, user frustration.
Mitigation: Implement per-user quotas, cost monitoring dashboards, alerts at thresholds (e.g., 150% of forecast). Consider on-premise for heavy users.
2. Inadvertent Data Leakage
Scenario: Employee pastes confidential M&A document into API-based chatbot. Vendor logs prompts for model training or monitoring. Data potentially exposed.
Consequence: Legal exposure, competitor advantage, regulatory inquiry (if public company), loss of customer trust.
Mitigation: User training on data classification. DPA negotiation with no-training clause. On-premise for confidential data. Content filtering at ingress.
3. On-Premise Expertise Gap
Scenario: Enterprise deploys on-premise LLM without ML engineering capacity. Model degrades, hardware underutilized, no one can troubleshoot.
Consequence: Sunk CapEx, user abandonment, revert to API (double cost), loss of internal credibility for AI initiatives.
Mitigation: Honest skills assessment before deployment. Hire or train staff, or partner with managed services. API-first is safer if skills lacking.
4. Shadow AI Proliferation
Scenario: IT does not provide approved LLM tools. Employees use personal ChatGPT accounts with company data. No governance, no audit trail.
Consequence: Unmanaged data exposure, compliance violations, inconsistent quality, security team blind spots.
Mitigation: Provide approved tools quickly (API or on-premise). Policy enforcement (DLP rules). Security awareness training.
Pre-Deployment Verification Checklist
□ Cost Modeling & Budgeting
- API cost forecast based on user count and usage patterns
- On-premise TCO modeled (hardware, software, staff, facilities)
- Break-even analysis (API vs. on-prem over 3-year horizon)
- Budget approval secured with contingency (e.g., +30%)
- Cost monitoring and alerting configured
□ Data Governance & Privacy
- Data classification scheme applied (public, internal, confidential, restricted)
- DPA negotiated with API vendor (if applicable)
- No-training clause confirmed for proprietary data
- Content filtering or data masking implemented if needed
- User training on acceptable use and data handling
□ Operational Readiness
- Skills assessment completed (ML engineering, DevOps, GPU expertise)
- On-call rotation and escalation path defined
- Monitoring and alerting configured (uptime, latency, costs)
- Disaster recovery plan documented and tested
- Vendor support contract in place (if on-premise hardware)
□ Security & Compliance
- Security review completed (InfoSec sign-off)
- Authentication and authorization integrated (SSO, RBAC)
- Logging captures user actions and data access per policy
- Vendor SOC 2 / ISO 27001 certification verified (if API)
- Data residency requirements met (GDPR, CCPA if applicable)
Based on the dominant constraints in this scenario, the following architectural patterns are most relevant:
- API-First with Fallback — Start with API vendor for speed, add on-premise capacity for cost control at scale. See Architectures →
- Tiered Hybrid Routing — Route by data classification: confidential → on-prem, non-confidential → API. Compare Models →
- On-Premise with Managed Service — Enterprise deploys hardware, managed service provider operates LLM platform. Reduces ops burden. See Architectures →
This is not a recommendation. Based on the constraints typical of this scenario:
API-First is most pragmatic when:
• Time-to-value is critical (deploy in days not months)
• ML/GPU expertise is limited or absent
• Usage volume is uncertain (scale elastically)
• OpEx budget available but CapEx constrained
• Data sensitivity is low-to-moderate (non-confidential)
Hybrid balances flexibility and control when:
• Mixed data sensitivity (some confidential, some not)
• Cost control needed at scale but fast start desired
• Some ML expertise available but not deep bench
• Existing on-premise GPU capacity can be leveraged
• Business units have different risk tolerances
On-Premise Only justifies itself when:
• High volume (>1M queries/month) makes API uneconomical
• Strong ML engineering team in place
• High data sensitivity (trade secrets, M&A, financials)
• CapEx budget available and amortization acceptable
• Existing data center with GPU infrastructure
On-Premise is high-risk when:
• No ML engineering capacity (hiring plan undefined)
• No GPU infrastructure (new data center build required)
• Usage volume low (<100K queries/month)
• Business pressure for immediate results (3-6 month deployment too slow)
• IT team already over-stretched with existing systems
→ A phased approach (start API, migrate heavy users to on-prem over 12-18 months) often reduces risk while preserving long-term cost efficiency.
DECISION TOOLS FOR THIS SCENARIO