A benchmark for cardiometabolic health

Machine learning on structured data is pervasive in medicine, but existing benchmarks don't reflect real-world complexities: complex survey sampling, demographic oversampling, and the need for subgroup fairness. A research team has just filled this gap with the NHANES Accelerometry Cardiometabolic Benchmark, a dataset derived from NHANES 2003-2006. The benchmark includes 1,381 adults with hip-worn accelerometry data, fasting blood tests, dietary information, and anthropometric measurements.

Three approaches compared

The researchers tested three tabular learning methods — ridge regression, XGBoost, and the foundation model TabPFN v2 — to predict three key markers: glycated haemoglobin (HbA1c), fasting triglycerides, and C-reactive protein (CRP), using physical activity features and lifestyle covariates. TabPFN v2 achieved the best overall performance, with an R² of 0.156 for HbA1c and 0.383 for CRP. Triglycerides, however, remained largely unpredictable, with a near-zero R². The work also introduces uncertainty metrics, which are crucial for clinical decisions.

Why TabPFN v2 makes a difference

TabPFN v2 (Tabular Prior-data Fitted Network) is a transformer model pre-trained on synthetic data, designed for in-context learning on tabular data. Unlike XGBoost, it doesn't require a training phase on each new dataset: it uses direct inference, making it extremely fast in scenarios where data sizes are small, as in many clinical studies. This makes it an interesting candidate for environments with limited computing power, such as on-premise deployments in healthcare facilities that can't or won't rely on the cloud.

Data sovereignty and local inference

For those in the medical sector, patient privacy is a non-negotiable constraint. Using models like TabPFN v2, which perform inference without sharing data with external servers, aligns with self-hosted architectures and complies with regulations like GDPR. Although the benchmark doesn't directly address hardware specs, it's clear that models optimized for tabular data require far fewer resources than billion-parameter LLMs: a consumer GPU with a few gigabytes of VRAM may suffice for inference, making deployment in a hospital server closet feasible. The path to AI in medicine goes through lean and verifiable solutions, where quantization and local control become allies of security.

The bigger picture

This benchmark isn't just an academic exercise: it signals that foundation models are moving out of the natural language niche to tackle tabular data, the dominant data format in healthcare. For those evaluating on-premise stacks, the trade-off is clear: less interpretative flexibility compared to traditional models, but greater speed and respect for data sovereignty. At AI-RADAR we closely follow the evolution of tools that allow artificial intelligence to be brought where data resides, without compromise.