📊 Full opportunity report: Minerva. The opposite path. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Italy’s Minerva project built a large-scale, open-source LLM from scratch, focusing on Italian content. Despite impressive technical results, it scored very low on Italian academic benchmarks, highlighting challenges in achieving language-specific knowledge depth.
Italy’s Minerva project has publicly released its models and data, but its 3B parameter model scored just 4.9% on the INVALSI Italian school-exam benchmark, raising questions about the effectiveness of large-scale native-language training for complex language understanding.
Minerva was developed by Sapienza University of Rome, funded through Italy’s national AI strategy, and trained on 2.5 trillion tokens, with approximately 50% Italian content. The project aimed to create a truly open, from-scratch LLM that emphasizes native-language expertise, contrasting with other European initiatives like Portugal’s AMÁLIA, which used continuation training on multilingual models.
Despite its extensive training data and institutional backing, Minerva’s 3B model performed near chance on the INVALSI Italian exam, a significant benchmark for academic language skills. Researchers noted that while dataset composition and size are important, the number of parameters and overall scale are more critical for handling complex language tasks, suggesting current investments may be insufficient for achieving deep country-specific knowledge at these scales.
Minerva.
The opposite
path.
Italy spent years building a European sovereign LLM from scratch. Then Minerva-3B scored 4.9% on the INVALSI Italian school exam.
Where AMÁLIA layered Portuguese specialization onto a multilingual foundation, Minerva trained from scratch on 2.5 trillion tokens with approximately 50% Italian content. Where AMÁLIA’s weights are not yet public, Minerva published weights, training data, and code as truly-open from day one. By every institutional measure, the Italian approach worked. But the empirical results contain a finding the press coverage has been quiet about — and it has implications that extend well beyond Italy.
Same problem. Opposite path.
European sovereign-LLM development has two primary architectural approaches. Italy chose from scratch with substantial native-language foundation. Portugal chose continuation pre-training of a multilingual model. The structural comparison surfaces what each commitment actually requires operationally.
The comparison is not “Italy did it better than Portugal.” Both projects respond to the same structural problem with different architectural strategies under different institutional and economic constraints. Italy’s national-AI investment is structurally larger by an order of magnitude — and Minerva is the visible artifact of that scale.
large language model training datasets
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
4.9% on INVALSI. The bitter lesson surfaces.
In June 2024, researchers evaluated Minerva-3B on the Italian school-exam benchmark. The result was unambiguous. This is not a critique of Minerva — it is a critique of the public discourse around what Minerva’s empirical results actually demonstrate.
Italian language learning AI tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
350M to 7B. Four parameter scales, one architecture.
The Minerva model family covers four parameter tiers, each with specific training corpora. Each scale level reveals what the from-scratch path actually requires at different operating points.
Italian + English
100B English
~50% English
+ 200B code
open-source AI language models
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three answers. Same question.
Minerva, AMÁLIA, and OpenEuroLLM represent the three operational answers to the European sovereign-LLM question. Each makes different architectural and institutional bets. The strategic discourse benefits from treating all three as data points in the same empirical experiment.

Engineering a Small AI Language Model: Training, Evaluation, and Deployment Without Myth
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards the movement should adopt.
The structural critique generalizes beyond Minerva. The European sovereign-LLM movement benefits from internalizing these lessons across every subsequent national project. Italy modeled the openness standard; the movement should adopt it as norm.
Minerva is one valid answer to the European sovereign-LLM question. AMÁLIA is another. OpenEuroLLM is potentially a third. The strategic discourse benefits from treating all three as data points in the same empirical experiment rather than as competing national-prestige projects. More analysis like this is needed. Not less.
Implications of Scale in European Sovereign LLMs
The results from Minerva challenge assumptions that larger, native-language models automatically produce deeper language understanding and knowledge. They suggest that even substantial investments in data and parameters may not suffice without further scaling or methodological innovations. This has broad implications for European AI sovereignty efforts, emphasizing the need for realistic assessments of resource commitments versus desired outcomes.
European Sovereign-LLM Strategies and Challenges
European countries have pursued sovereign LLM projects to reduce reliance on commercial models and develop tailored AI solutions. Italy’s Minerva exemplifies a long-term, large-scale effort, trained on over 2.5 trillion tokens with significant institutional support, contrasting with Portugal’s AMÁLIA, which adopted a different approach by continuation training on multilingual models with less native data. The debate centers on whether scale or specialized training yields better results for country-specific language understanding.
Unresolved Questions About Model Capabilities
It remains unclear whether further scaling, different training methodologies, or additional fine-tuning could improve Minerva’s performance on complex language tasks. The ongoing nature of the project means that results may evolve as researchers iterate on their approach.
Next Steps for Minerva and European Sovereign AI
Researchers plan to continue refining Minerva, experimenting with increased scale and alternative training strategies. The project aims to evaluate whether different approaches can bridge the gap between technical performance and real-world language understanding, with broader implications for European AI sovereignty policies.
Key Questions
Why did Minerva perform so poorly on Italian exams despite large-scale training?
Experts suggest that the current scale of data and parameters may still be insufficient for deep language understanding, especially for complex, academic content. The results indicate that more targeted or larger investments might be necessary.
How does Minerva’s approach differ from other European projects like AMÁLIA?
Minerva was trained from scratch on native Italian data, with open weights and data, whereas AMÁLIA used continuation pre-training on a multilingual foundation with a smaller proportion of Italian data.
What are the broader implications of these findings for European AI sovereignty?
The results suggest that achieving country-specific language expertise through large-scale models may require more resources than currently allocated, prompting a reassessment of strategies and investments.
Is the Minerva project still ongoing?
Yes, researchers are continuing to iterate on the models and methodologies, with future evaluations expected to shed more light on scaling effects and performance improvements.
Source: ThorstenMeyerAI.com