DRL-Based Transformer for Open Shop Scheduling Optimization

The Open Shop Scheduling Problem and its Challenges

The Open Shop Scheduling Problem (OSSP) presents a significant computational challenge in numerous industrial and service settings. Its complexity grows exponentially as the number of jobs and machines increases, quickly rendering exact methods intractable. Traditional solutions, such as classical dispatching rules and metaheuristics, often require substantial fine-tuning to maintain solution quality when applied to large-scale problems. This continuous optimization requirement can represent a considerable burden for companies seeking to implement efficient resource management solutions.

The search for innovative approaches that can overcome these limitations is therefore crucial. The goal is to develop methodologies capable of providing robust and scalable solutions, while simultaneously reducing the need for manual interventions or specific calibrations for each new problem configuration.

An Innovative Approach with Transformers

In this context, a recent study explored the application of a Deep Reinforcement Learning (DRL)-based method using a Transformer architecture to address the OSSP. The developed scheduling policy employs an encoder-decoder architecture with multi-head attention mechanisms, typical of Large Language Models (LLM) but adapted here for optimization problems. The model was trained exclusively on the processing-time matrix, using small Taillard benchmark instances (4x4, 5x5, 7x7, and 10x10).

Initial results showed that the model is capable of producing feasible schedules, with makespans (the total time to complete all jobs) typically within 15-30% of the best-known values. This indicates a good ability of the model to learn effective scheduling patterns even from a relatively limited training dataset.

Scalability and Comparison with Classical Heuristics

One of the most relevant aspects of this research concerns the evaluation of the model's scalability. The Transformer-based scheduling policy was applied, without any retraining, to randomly generated instances of much larger sizes, ranging from 40x40 to 100x100. Its performance was then compared against classical dispatching heuristics, including SPT (Shortest Processing Time), LPT (Longest Processing Time), MWKR (Most Work Remaining), and EST (Earliest Start Time).

Across these large instances, the Transformer achieved average gaps of 12.89-15.12% relative to a standard lower bound. While remaining competitive with EST, typically within a modest margin, it substantially outperformed SPT and LPT. These results highlight how a Transformer policy trained on small OSSP instances can effectively generalize to substantially larger problems, offering a feature-light, learning-based alternative to classical dispatching rules.

Implications for AI Deployments and Future Prospects

The ability of a model to generalize from limited training data to larger-scale problems is a critical factor for the efficiency and Total Cost of Ownership (TCO) of AI deployments. For organizations evaluating self-hosted or on-premise solutions for AI/LLM workloads, a model that requires less training data or less fine-tuning to adapt to new conditions can translate into lower computational resource consumption and reduced development times. This is particularly relevant in environments where data sovereignty and infrastructure control are priorities.

Although the study does not specify hardware requirements or deployment context, the intrinsic efficiency of a model capable of generalizing well can have a direct impact on the choice of hardware for inference and the sizing of the infrastructure. AI-RADAR provides analytical frameworks on /llm-onpremise to evaluate these trade-offs, helping companies make informed decisions about on-premise deployments. The evolution of these Transformer-based approaches could open new avenues for optimizing complex processes across various sectors, reducing reliance on manual calibrations and improving overall operational efficiency.