Microsoft and the Frontiers of Networked Systems at NSDI '26

The USENIX Symposium on Networked Systems Design and Implementation (NSDI '26) serves as a leading forum for researchers and practitioners to share advancements in the design and operation of distributed systems. These systems form the foundation of cloud computing, artificial intelligence, and large-scale distributed applications and services. Microsoft participated in the 2026 edition as a sponsor, reaffirming its ongoing commitment to systems and networking research, as well as engagement with the broader technical community.

The company contributed 11 papers, the result of work by its researchers and collaborators, which were accepted at the conference. The research spans datacenter and wide-area networks, AI systems, and cloud infrastructure. These contributions highlight a wide range of innovations aimed at improving the construction and management of complex, high-performance networked systems.

Optimizations for Large Language Models and Video Analytics

Among the most relevant presentations for the AI sector, DroidSpeak introduces a mechanism for sharing and partially reusing KV caches across LLM variants with the same architecture. This innovation promises up to four times higher throughput and faster responses, with minimal impact on output quality. For AI infrastructure operators, this translates into significant potential for optimizing resource utilization and reducing latency in model inference.

Another notable project is Eywa, which leverages LLMs to automate model-based testing by automatically building protocol models from natural language sources. This approach has enabled the identification of 33 bugs, 16 of which were previously unknown, in widely used network protocol implementations. On the video analytics front, AVA proposes a system that combines event knowledge graphs with agentic retrieval over Vision Language Models, supporting open-ended video analytics. The authors also introduced AVA-100, a benchmark for ultra-long scenarios, on which AVA achieved 75.8% accuracy.

Infrastructure Innovations for Efficiency and Scalability

Microsoft's research also touched upon crucial aspects of hardware and network infrastructure. Octopus, for example, presents a switch-free design for disaggregated CXL memory pods, aiming to reduce costs and improve scalability up to multi-rack configurations. A three-server hardware prototype demonstrated that Octopus RPCs are 3.2 times faster than in-rack RDMA and 2.4 times faster than CXL switches. This solution is particularly interesting for those looking to optimize TCO and memory density in on-premise environments.

Other contributions include HEDGE, which addresses wavelength-specific faults in optical networks, and Pyrocumulus, which enables fast, low-overhead live migration for storage-optimized VMs by leveraging FPGA SmartNICs. ForestColl, on the other hand, focuses on throughput-optimal collective communications on heterogeneous network fabrics, supporting both switching fabrics and direct accelerator connections. Finally, SONiC DASH SmartSwitch, a Community Award winner, redesigns cloud network offloading with a unified architecture and an open development model, already deployed in Azure for high throughput and energy efficiency.

Implications for On-Premise Deployments and Data Sovereignty

The innovations presented at NSDI '26 offer significant insights for organizations considering or managing on-premise or hybrid AI deployments. Solutions like DroidSpeak, which optimizes LLM throughput, or Octopus, which improves disaggregated memory efficiency, can have a direct impact on the TCO and performance of self-hosted infrastructures. The ability to achieve higher throughput with fewer resources or reduce network hardware costs is fundamental for those seeking to maintain control over their data and infrastructure.

Furthermore, projects such as HarvestContainers, which reclaims unused CPU resources in containerized systems while keeping latency within acceptable limits, and KRAKENGUARD, which ensures fine-grained eBPF isolation for multi-tenant environments, are essential for security and operational efficiency in contexts where data sovereignty and compliance are absolute priorities. For those evaluating self-hosted alternatives to the cloud, these advancements demonstrate how research continues to provide tools for building robust, performant, and controllable AI infrastructures, allowing for a balance between performance, cost, and security needs.