OpenGPT-X research project publishes large AI language model – European alternative for business and science © Fraunhofer IAIS
Multilingual and open source, adaptable for real applications in companies and organisations
The AI language model ‘Teuken-7B’, which was developed as part of the OpenGPT-X research project, is now available for download on Hugging Face. The model was trained from scratch with the 24 official languages of the EU and comprises seven billion parameters. Researchers and companies can use the commercially usable open source model for their own applications in the field of artificial intelligence (AI). With this step, the partners in the OpenGPT-X consortium project, which is funded by the German Federal Ministry for Economic Affairs and Climate Protection (BMWK) and led by the Fraunhofer Institutes for Intelligent Analysis and Information Systems IAIS and for Integrated Circuits IIS, have created an important AI language model as a freely usable open source model with a European focus.
“In the OpenGPT-X project, we have spent the past two years researching the basic technology for large AI fundamental models and training corresponding models with strong partners from research and industry. We are delighted that we are now able to make our ‘Teuken-7B’ model freely available worldwide and thus offer an alternative for science and business based on public research,” says Prof Dr Stefan Wrobel, Institute Director at Fraunhofer IAIS. “Our model has demonstrated its capabilities across a wide range of languages, and we hope that as many people as possible will adapt or further develop the model for their own work and applications. In this way, we want to make a contribution both within the scientific community and together with companies from different industries to address the growing demand for transparent and customisable generative artificial intelligence solutions.”
Advantages for companies and access to the model
Teuken-7B is currently one of the few AI language models that have been developed multilingually from the ground up. It contains around 50 per cent non-English pre-training data and has been trained in all 24 official European languages. It has proven to be stable and reliable in its performance across several languages. This offers added value, particularly for international companies with multilingual communication requirements and product and service offerings. The provision as an open source model also allows users to operate their own customised models in real applications, for example in the automotive sector, robotics, medicine or finance. This not only enables much better control over the technology, but also allows sensitive data to remain within the company. Thanks to a newly developed ‘tokeniser’, the model can also be trained and operated in a more energy- and cost-efficient manner. Fraunhofer scientists explain how and which applications can be realised with Teuken-7B in free demo appointments (to register). Teuken-7B is available in two versions: a version that can be used for research purposes and a version under the ‘Apache 2.0’ licence that companies can use for commercial purposes and integrate into their own AI applications (download).
Development with strong participation from NRW
In addition to the two Fraunhofer Institutes and the Jülich Research Centre, the AI Federal Association, TU Dresden, the German Research Centre for Artificial Intelligence (DFKI), IONOS, Aleph Alpha, ControlExpert and Westdeutscher Rundfunk (WDR) worked on OpenGPT-X as partners. Teuken-7B was trained using the JUWELS supercomputer at the Jülich Research Centre. Scientists from the Lamarr Institute for Machine Learning and Artificial Intelligence also contributed essential foundations. With their research, they are setting new standards in multilingual instruction tuning (see blog post). The research project, which was launched at the beginning of 2022, is now nearing completion. It will run until 31 March 2025.
Further information
All further information on Teuken-7B, including model cards, facts on technical background and benchmarks as well as a comprehensive FAQ can be found on the Fraunhofer IAIS project website.