A post on Reddit, in the LocalLLaMA subreddit, raises a crucial question for anyone considering running large language models (LLMs) locally: the choice of hardware.

On-Premise LLM Hardware Considerations

The user is asking for information on other users' experiences with specific hardware configurations, particularly regarding model loading speeds and the comparison between using a single large model versus multiple smaller models. This type of evaluation is critical to determining the Total Cost of Ownership (TCO) of an on-premise solution, as hardware represents a significant portion of the initial investment (CapEx).

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.