Efficient Tabular Classification with LLMs

A recent study published on arXiv investigates the use of existing large language models (LLMs) for the classification of tabular data found on the web. The goal is to avoid the development of specialized models or costly retraining.

The proposed approach, called TaRL (Table Representation with Language Model), leverages semantic embeddings of individual table rows. Initially, the direct application of these embeddings proved less effective compared to dedicated tabular models. However, the researchers found that by removing the common component from the embeddings and calibrating the softmax temperature, it is possible to unlock their potential.

A meta-learner trained on handcrafted features is able to predict an appropriate temperature. This method achieves performance comparable to the state of the art in low-data regimes (k โ‰ค 32) for semantically rich tables. The results demonstrate the viability of reusing existing LLM infrastructure for Web table understanding.