NIH unveils the world’s largest genomics-and-health database amid funding cuts

The U.S. government has just handed researchers the most detailed map of human health ever assembled. The database, built by the National Institutes of Health’s All of Us programme, pairs over 500,000 whole genomes with electronic health records – a trove that could reshape personalised medicine, yet arrives at a moment of deep financial uncertainty for the initiative.

Inside the database

The scale is staggering. Each genome is sequenced with cutting-edge technology, anchored to longitudinal phenotypic data: diagnoses, treatments, and self-reported lifestyle information. It is not a mere collection of sequences but an ecosystem of real-world health data, representing an ethnically diverse population – an attribute historically missing in traditional biobanks. The size enables association studies on rare variants and predictive analyses that were previously unthinkable.

Budget cuts hang over the programme

The release is bittersweet. All of Us, originally designed to enroll one million participants, faces mounting financial pressure. The current administration’s budget proposals call for deep cuts, threatening the expansion and even the maintenance of the database. For researchers and institutions, this spells uncertainty about the long-term availability of an irreplaceable resource.

Data sovereignty and on-premise infrastructure: an unavoidable tension

The arrival of a genomic archive of this magnitude reignites the debate over health data sovereignty. Genetic information is inherently personal, cannot be fully anonymised, and is subject to stringent regulations like GDPR in Europe and HIPAA in the United States. Any large-scale analysis – especially when powered by LLMs or machine learning – must coexist with strict requirements on data residency and control.

For organisations looking to leverage such archives, on-premise deployment becomes nearly mandatory. Uploading genomes to public clouds introduces compliance risks and governance costs that are hard to sustain. Anyone working with AI models on biomedical data faces a classic trade-off: the computational muscle of the cloud versus the security and cost predictability of self-hosted infrastructure. Moreover, training models on such datasets demands VRAM and compute power that only dedicated clusters can deliver without bottlenecks.

Implications for medical research and AI

Beyond the funding controversy, the database is a catalyst for predictive medicine. Researchers in oncology, cardiology, and pharmacogenomics will have an unprecedented reference to validate hypotheses. Integrating machine learning techniques – provided the infrastructural complexity is managed – promises to accelerate the discovery of biomarkers and therapeutic targets. Yet without a robust, compliant computational ecosystem, the risk is that this asset remains underused.

AI-RADAR will track the evolution of the All of Us programme, because the tension between scientific ambition, economic sustainability, and data control is exactly the terrain where deployment choices for the most sensitive AI workloads are made.