📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal’s national LLM, AMÁLIA, is now operational, outperforming many benchmarks in European Portuguese. However, experts question its openness, native data sufficiency, and optimization goals, raising broader concerns about sovereign-LLM strategies across Europe.
Portugal’s €5.5 million AMÁLIA language model is now operational, outperforming previous models on European Portuguese benchmarks, but it faces significant scrutiny over its openness, native data adequacy, and strategic objectives, raising questions relevant to national AI policy.
AMÁLIA, developed through a consortium of 60 researchers across Portugal’s top research institutions, was officially launched in October 2025 and is available to 450,000 academic users. It is built as a continuation of the EuroLLM multilingual foundation, with the base version completed in September 2025. The model demonstrates superior performance on Portuguese benchmarks, including outperforming models like Qwen 3-8B on most tasks, though it still trails on certain specific benchmarks like ALBA.
Despite technical success, experts like Duarte O.Carmo have publicly raised three critical questions: How open is ‘fully open’ in practice? How much native Portuguese data is enough for effective modeling? And what should be the primary goal—maximizing native language performance or broader multilingual capabilities? These questions are central to evaluating the strategic direction of Portugal’s AI efforts and the broader European sovereign-LLM landscape.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.

Portuguese for Beginners: Practical Learning with SynapseLingo (Learn Portuguese)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.
AI model training datasets for Portuguese
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.

Large Language Models: The Hard Parts: Open Source AI Solutions for Common Pitfalls
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.

AI Chatbot Development Board Kit ESP32-S3 N16R8 Artificial IntteIligence AI Voice Motherboard
✅ HIGH FOR QUALITY ELECTRONICS COMPONENTS: Our products are made with top-of-the-line electronics components, ensuring reliable and long-lasting…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Implications for European Sovereign AI Strategies
The questions raised by AMÁLIA reflect broader concerns across Europe about transparency, data sufficiency, and strategic focus in developing national language models. How open models truly are impacts their accessibility and security, while native data volume influences their cultural relevance and accuracy. These issues are critical as European countries aim to balance innovation with sovereignty, and the answers will shape future investments and policies in AI.
European Sovereign LLM Initiatives and Strategic Challenges
Across Europe, countries like Italy, Germany, France, and Norway are investing heavily in developing national language models, often with public funding and strategic goals tied to sovereignty and security. These efforts share common challenges: defining openness, sourcing sufficient native-language data, and setting clear objectives for their models. Portugal’s AMÁLIA exemplifies this trend, with a publicly funded, collaborative approach that emphasizes integration with national research institutions. The debate over technical approaches—whether to train from scratch or adapt existing multilingual models—underscores differing strategic visions within the European AI community.
“The three questions about openness, native data, and objectives are fundamental to understanding the true value and strategic implications of AMÁLIA.”
— Duarte O.Carmo
Unanswered Questions About Openness and Data Sufficiency
It remains unclear how open AMÁLIA truly is in terms of access and transparency, as the specifics of its licensing and data sharing are not fully disclosed. Additionally, questions persist about whether the native Portuguese data used—comprising roughly 5.8 billion tokens—adequately covers the language’s diversity and complexity. The final version, expected in June 2026, may address some of these gaps, but current details are limited.
Next Steps for Portugal’s AI Strategy and Model Development
In the coming months, the AMÁLIA team plans to release the final version in June 2026, which may clarify some of the current uncertainties regarding openness and data sufficiency. Simultaneously, broader European discussions are likely to intensify around establishing common standards for transparency, native-language data collection, and strategic objectives for sovereign models. Portugal’s experience will serve as a case study for these debates, influencing policy and investment decisions across the continent.
Key Questions
What makes AMÁLIA different from other European language models?
AMÁLIA is built as a continuation of a multilingual foundation (EuroLLM) rather than training from scratch, which differs from models like Italy’s Minerva. It emphasizes Portuguese performance and is publicly funded, with a focus on native-language benchmarks.
Why are the questions of openness and native data important?
Openness affects model accessibility, security, and trust, while native data volume influences the model’s cultural relevance and accuracy in Portuguese. These factors are crucial for national sovereignty and strategic control over AI technology.
What are the potential risks of not answering these core questions?
Without clarity on openness and data adequacy, models may face issues related to transparency, bias, and strategic autonomy, potentially undermining public trust and international competitiveness.
When will the final version of AMÁLIA be available?
The final version is expected in June 2026, which may provide further insights into its capabilities and strategic positioning.
How does AMÁLIA compare to other European models in performance?
AMÁLIA outperforms most open models on European Portuguese benchmarks and surpasses Qwen 3-8B on most tasks, though it still trails on certain benchmarks like ALBA.
Source: ThorstenMeyerAI.com