Chisao: The GPU-Native Optimizer That Finds All Peaks with Up to 34x Speedup

The Weak Spot of Traditional Optimizers

Finding all minima (or maxima) of a multimodal function is a challenge that cuts across optimization, Bayesian inference, and scientific computing. Classic methods – basin-hopping, CMA-ES, restarted gradient descent – operate sequentially and fail to exploit the massive parallelism of modern GPUs. When dimensionality rises and the function has dozens of peaks, the chance of missing a mode skyrockets, and run times become prohibitive.

How Chisao Uses the GPU with a Controlled Oscillation

The research team built Chisao (Convergence-Halt-Invert-Stick-And-Oscillate), a population optimizer that runs the entire sample batch simultaneously on the GPU. The core idea is a convergence-anticonvergence oscillation cycle: samples that reach a true peak are frozen and preserved, while the rest keep exploring the space via momentum-based anti-convergence and stochastically smoothed gradients. The structural move is asymmetric – real peaks get stuck, the rest stay on the move.

To maintain population diversity, Chisao employs two adaptive reseeding strategies: “Repulse Monkey” and “Golden Rooster”, preventing collapse onto a single mode. Crucially, the algorithm does not require an analytic gradient; it uses finite differences, making it agnostic to the analytical form of the objective.

Clear Results on the SFU Suite and High Noise

The real test comes from the Simon Fraser University optimization benchmark suite: 42 functions with dimensions from 2 to 64. Chisao achieves 100% mode recovery, while all CPU baselines fail starting at 8 dimensions on the hardest multimodal functions. Where competitors do keep up, the performance gap is stark: up to 34× speedup over basin-hopping on the Michalewicz function at 64 dimensions, and up to 39× on unimodal functions like Rotated Hyper-Ellipsoid, a pure GPU dividend.

Robustness to noise is another standout feature: even with a likelihood noise standard deviation of up to 1.0, mode detection remains 100% reliable. That is no minor point for real-world data, where noise is the rule rather than the exception.

Why It Matters for On-Premise Computing

This is where the on-premise perspective comes into play. Black-box optimization is everywhere in hyperparameter tuning of neural networks, training of physics-based models, and simulation calibration. In regulated environments or where data sovereignty is non-negotiable (GDPR, healthcare, defense), computation must stay on proprietary infrastructure. Chisao, distributed as an open-source Python package on PyPI, can run on any enterprise GPU node, free from external cloud dependencies. Its derivative-free nature makes it usable even when an analytic gradient is unavailable or too costly.

Beyond raw acceleration, the ability to explore the entire mode landscape in parallel on a GPU transforms optimization from serial to massively concurrent, slashing time-to-insight for problems that would otherwise require hours or days of distributed CPU work. From a TCO standpoint, squeezing maximum performance from an existing on-premise GPU can lower operational costs compared to renting cloud resources.

The project is available on PyPI and clears the path for direct integration into on-prem MLOps pipelines. Developers can download and test it on their own workloads with zero barriers to entry.