When GCHQ Director Anne Keast-Butler called for a national cyber defence capability that “hardwires cutting-edge agentic AI into machine-speed defence,” the message was clear: for critical infrastructure, AI cannot be a cloud add-on controlled by third parties. The response now comes from e2e-assure with the launch of Cumulo, a UK-developed and UK-hosted SOC-as-a-service platform that shifts AI’s centre of gravity inside the customer’s own perimeter.

On-prem LLMs, real sovereignty

The technical core of Cumulo is the use of dedicated large language models, trained on each organisation’s specific environment and run on customer-controlled infrastructure. Inference never leaves the client’s domain; no external cloud is involved, so the customer retains full sovereignty over security data and defensive operations. In sectors like energy, water, transport and telecoms, where a connectivity outage or loss of access to cloud services could cripple response capabilities, this architecture removes a critical weak link.

The platform does not merely drop models on-prem. It introduces a layered AI architecture: a local model layer for environment-specific detection, a security intelligence layer for large-scale threat correlation, and a frontier model layer for non-sensitive enrichment. Sensitive operational data stays fully segregated, while advanced AI capabilities are applied where there is no exposure risk.

Digital twin and zero-day SOC

Cumulo couples its LLMs with a digital twin maintained through passive discovery across IT and OT systems. This enables safe attack simulations, vulnerability identification before exploitation, and continuous stress testing of the live environment without operational impact. The platform also introduces the “zero-day SOC” concept: the latest threat intelligence is instantly applied as detection rules, closing the gap between the discovery of a new compromise indicator and its operational use.

To manage the growing volume of security data without hallucination or false positives, Cumulo employs multiple AI models that cross-check every alert from different perspectives. An anti-hallucination layer validates findings against threat intelligence and deterministic detection engines before results reach a human analyst – who remains firmly in the loop. Automation carries the load; people stay free for high-value judgement.

Why it matters for on-prem AI evaluation

e2e-assure’s choice signals a broader trend: the rising demand for sensitive AI workloads to run in-house, not just for compliance but for operational resilience. Architectures that rely on cloud APIs for inference introduce decision latency and access risk that regulated sectors can no longer accept. Cumulo, though delivered as a service, shows that local inference can be integrated into a commercial offering without sacrificing managed usability. The implications for teams assessing on-prem LLM deployment are practical: dedicated infrastructure is needed, with attention to VRAM, quantization and pipeline management. The company does not disclose public hardware or throughput metrics, but the architectural principle stands as a reference for anyone designing AI-augmented SOCs today.

Outlook and open questions

Cumulo’s multi-tier pricing – Standard for behaviour-based threat hunting and reporting, Enterprise for digital twin and unified IT/OT monitoring – makes it adaptable to different maturity stages. Yet running LLMs on-prem raises questions around TCO, in-house skills and ongoing maintenance, which e2e-assure’s managed service can partly absorb but remain central for any organisation aiming to replicate the model. The direction is clear: defensive AI is moving closer to the data, and sovereign architectures are no longer a niche but an operational necessity.