Modular Architecture for Artificial Lexicons: Control and Reproducibility Beyond LLMs
Generating coherent and structured artificial lexicons remains an open challenge. A new modular framework addresses the limitations of current generators, often based on opaque and non-reproducible LLM pipelines. The system samples phoneme inventories, generates word forms with interchangeable phonological grammars, and assigns meanings via a specific ontology. Results show that probabilistic grammars outperform deterministic baselines in phonotactic coherence and typological realism, offering enhanced control and transparency.