AI System Outperforms Leading Human Methods by Automatically Writing Expert-Level Scientific Code
From AnyHelix Team · 21 May 2026 · 3 min read
A new artificial intelligence system can automatically write and refine complex scientific software, producing programs that beat the best human-developed methods in tasks from genomics to epidemiology. Created by a team at Google DeepMind, Google Research, and partner institutions, the tool—called Empirical Research Assistance (ERA)—has the potential to shave weeks or months off the tedious trial-and-error process that bogs down data-driven discovery.
The researchers, led by Shibl Mourad and Michael P. Brenner, report the results in an accelerated article preview published in Nature. ERA pairs a large language model (LLM) with an algorithm known as tree search. Given a problem where success can be scored numerically, the system writes code, checks whether the score improves, and intelligently explores mutations and combinations of the most promising solutions. It can also ingest research ideas directly from scientific papers or AI-generated suggestions to guide its search.
The group tested ERA on a diverse set of benchmarks. In one, the system tackled the challenge of integrating single-cell RNA sequencing data from different laboratories—a notoriously difficult computational task. ERA produced 40 novel methods that surpassed the current top human-developed method on the public OpenProblems leaderboard. Its best-performing creation combined two existing algorithms, ComBat and BBKNN, in a way that yielded a 14% overall improvement over the previous best published approach.
For forecasting COVID-19 hospitalizations, ERA generated 14 models that outperformed the gold-standard ensemble maintained by the U.S. Centers for Disease Control and Prevention (CDC). In a retrospective study covering the 2024–2025 season, the best ERA model achieved an average weighted interval score of 26, compared with 29 for the CDC’s ensemble. Hybrid strategies that fused simple historical baselines with more sophisticated statistical and machine-learning models drove much of the gain.
ERA also matched or beat expert-crafted software for time series forecasting on the GIFT-Eval benchmark, geospatial image segmentation, prediction of neural activity across the whole zebrafish brain, and numerically solving difficult integrals.
The work highlights a growing capability of AI to go beyond one-shot code generation. By systematically exploring the space of possible solutions and combining ideas from the literature, ERA achieves results that previously required significant human expertise and effort. The authors note that the system optimizes for a defined quality metric and does not perform genuine scientific reasoning—it cannot formulate new causal theories. It is also restricted to “scorable” tasks where success can be measured automatically. Moreover, the ease with which such systems can generate sophisticated software raises concerns about misuse in sensitive domains.
With the right guardrails, however, tools like ERA could dramatically accelerate the computational side of scientific research. The team is making the system’s approach public as the manuscript undergoes final editing for formal publication.
Reference: Aygün, E. et al. An AI system to help scientists write expert-level empirical software. Nature (2026). https://doi.org/10.1038/s41586-026-10658-6