## Introduction A new benchmark has been launched to test the spatial reasoning capabilities of large language models. GamiBench is a benchmark that focuses on spatial reasoning and 2D-3D planning, with the goal of evaluating how well large language models can understand and manipulate objects across multiple views. ## How GamiBench works GamiBench includes 186 crease patterns 2D and their corresponding 3D folded shapes, with objectives such as predicting 3D fold configurations, distinguishing valid viewpoints, and detecting impossible patterns. The benchmark uses an unique approach that combines perception and instruction-following to evaluate the spatial reasoning of large language models. ## Impact and applications GamiBench has the potential to significantly improve the capabilities of large language models in the field of spatial reasoning and 2D-3D planning. This benchmark can be used to test and improve large language models in various applications, such as computer-aided design, engineering, and robotics. ## Dataset and code The dataset and code are available on GitHub (https://github.com/stvngo/GamiBench).