📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems are now capable of automating large parts of engineering tasks, with benchmarks nearing saturation. However, the automation of AI research itself is still uncertain, raising questions about future AI progress and institutional strategies.
Recent evidence indicates that AI systems are now capable of automating most engineering tasks related to AI development, with key benchmarks nearing saturation. However, the automation of AI research—distinct from engineering—remains an open question, with implications for the future pace of AI progress, according to Thorsten Meyer’s analysis of recent developments.
Multiple independent benchmarks, including CORE-Bench and MLE-Bench, show AI systems achieving near-complete automation in core engineering skills such as reproducing research experiments and competing in Kaggle competitions. For example, CORE-Bench has reached 95.5% success, with one author declaring it ‘solved.’ Similarly, AI performance in Kaggle competitions now rivals mid-tier human practitioners, with scores reaching 64.4%, prompting the pause of the leaderboard to develop fairer evaluation methods.
These advances suggest that the engineering side of AI development—reproducing experiments, optimizing kernels, and running complex tasks—may soon be fully automated, reducing reliance on human engineers. Conversely, the question of whether AI can automate the research process itself, involving hypothesis generation, creativity, and theoretical innovation, remains unresolved. Clark’s framework leaves open whether research is simply a form of large-scale engineering or a distinct, less automatable activity.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

AI Workflow Automation for Bloggers: Build a Simple Content System to Research, Write, Optimize, and Repurpose Posts Faster with AI and No-Code Tools (AI Toolkit for Bloggers 2026 Book 8)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
engineering automation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.
AI development research platforms
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational
automated machine learning tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI Development and Institutional Strategy
The near-complete automation of engineering tasks could accelerate AI development dramatically, lowering costs and increasing iteration speed. However, if research remains largely human-driven, progress may slow or become bottlenecked by creative and theoretical limitations. This divergence influences how organizations should allocate resources, potentially shifting focus from engineering automation to fostering human research innovation.
Recent Benchmarks and the Trajectory Toward Automation
Over the past 18 months, multiple benchmarks have shown rapid progress in AI capabilities relevant to AI research and engineering. CORE-Bench, measuring research reproduction, improved from 21.5% to 95.5%. Kaggle performance benchmarks have similarly advanced, with AI reaching competitive levels. These trends suggest a pattern of approaching saturation, indicating that many engineering tasks are becoming fully automatable. The question remains whether research tasks will follow this trajectory or diverge due to their creative nature.
“AI can today automate vast swatches, perhaps the entirety, of AI engineering. It is not yet clear how much of AI research it can automate.”
— Thorsten Meyer
Unresolved Questions About AI Research Automation
It remains unclear how much of AI research—beyond engineering—is automatable. The structural question Clark leaves open is whether research is fundamentally different from engineering or if it can be scaled and automated similarly. The pace at which research tasks might become fully automated is still uncertain, as is the impact on scientific innovation and institutional roles.
Next Milestones in AI Capability Development
In the coming 12-24 months, focus will likely be on refining benchmarks, developing better evaluation methods, and testing the limits of AI in research tasks. Organizations may shift resources toward understanding and fostering the creative aspects of research that AI has yet to master. Monitoring progress in research automation will be critical to anticipate future AI development trajectories and strategic implications.
Key Questions
What does the automation of engineering mean for AI development?
It suggests that many engineering tasks, such as reproducing experiments and optimizing code, can soon be fully automated, potentially accelerating AI progress and reducing costs.
Will AI be able to automate scientific research in the near future?
This remains uncertain. While engineering tasks are nearing full automation, the creative, hypothesis-driven aspects of research are still unresolved and may require human input for the foreseeable future.
How might these developments affect AI research institutions?
Institutions may need to reconsider their strategies, focusing more on managing and guiding AI in creative research tasks rather than solely automating engineering processes.
What are the risks if research automation lags behind engineering automation?
If research remains human-driven, progress could slow, creating bottlenecks in AI development despite engineering capabilities reaching saturation.
Source: ThorstenMeyerAI.com