Ora Computing raises €3.5M to compress AI models by up to 80% and slash inference costs

The economic and technical pressure of large-scale inference has found a new protagonist. Ora Computing, a startup developing compression algorithms for foundation models, has just closed a €3.5 million seed round led by Constructor Capital and Greencode Ventures, with participation from XISTA Science Ventures. The capital will fund team expansion, the development of compression capabilities for the largest frontier models, and the launch of a commercial product aimed at cloud inference providers and organizations deploying AI at the edge.

Compression without hardware compromises

The core of the technology is software that can compress a model by up to 80% of its original size, making it run up to four times faster with accuracy drops between 0 and 5%. This is not yet another quantization or pruning tool: Ora Computing’s approach works across different hardware platforms, integrates directly with standard inference frameworks, and requires no custom software layers, infrastructure changes, or capital-intensive retraining. The algorithms continuously map the trade-off between model size and accuracy, enabling companies to optimize each deployment based on specific hardware, performance, and cost constraints.

Why it matters for local deployment

In the real world, the race toward ever-larger models clashes with the inability to run them on devices like vehicles, industrial machinery, or edge hardware. Ora’s promise is to unlock widespread AI adoption on self-hosted and on-premise devices, where data sovereignty and latency matter above all else. The path is clear: compact models optimized for specific tasks can become a practical alternative to massive general-purpose cloud-hosted LLMs. For those evaluating a shift from centralized APIs to local infrastructure, there are trade-offs between energy cost, maintenance, and degree of control that need to be analyzed in detail – and AI-RADAR provides analytical frameworks to assess the total TCO of an on-premise stack.

Environmental impact and concrete benchmarks

Ora Computing doesn’t just talk about computational costs: compression reduces energy consumption and the CO₂ emissions tied to inference. The company estimates that even a 1% market penetration could translate into annual savings of over 50,000 tonnes of carbon dioxide. On the technical front, the numbers are equally solid: a 70-billion-parameter model was compressed in a few hours at a compute cost of less than $1,000, against industry benchmarks that often reach hundreds of thousands of dollars for similar tasks.

The market gears up for a paradigm shift

CEO and co-founder Stefan Sack stressed that the next wave of AI adoption will be driven by highly efficient, specialized models rather than increasingly giant ones. Ora Computing’s positioning straddles two converging trends: cloud inference providers looking to slash operational costs, and organizations moving workloads to local or edge infrastructure. With the new funding, the startup is gearing up to bring its software to commercial scale, aiming to turn what is now a promise into a product: making AI efficient, affordable, and truly distributed.