#10
AI Agents

Why LLMs Are Better at Elixir Than Almost Anything Else

A Tencent benchmark put Elixir at the top of 20 languages for LLM code completion. José Valim just explained why — and the answer is about design philosophy, not market share.

There’s a benchmark result circulating in Elixir circles that deserves more than a passing mention. A study by Tencent Research — AutoCodeBench, a large-scale evaluation of 30+ models against 3,920 problems across 20 programming languages — put Elixir at the top of the pile. When combining all models, 97.5% of Elixir problems were solved by at least one model. Individually, Claude Opus 4 scored 80.3% on Elixir, ahead of C# at 74.9% and Kotlin at 72.5%. Elixir ranked first in most individual model evaluations too, in both reasoning and non-reasoning modes.

That’s a strange result on the surface. Elixir has a fraction of the training data of Python, Java, or TypeScript. The common assumption is that LLMs write better code in popular languages because they’ve seen more of it. Elixir breaks that model — and José Valim just published his explanation of why. It’s worth sitting with, because the argument cuts deeper than “LLMs happen to like our syntax.”

The core of Valim’s case is immutability and explicit data flow. In a mutable OOP codebase, a function call might change the state of any object in the system as a side effect. To understand what a function does, you often need to pull in its dependencies and trace their mutations. In Elixir, everything that a function changes must be passed in as input and returned as output. Data flows in one direction. There’s no spooky action at a distance. What goes in is clear; what comes out is clear. This property — which humans often experience as Elixir being “readable” — turns out to matter enormously for LLMs, because it means less context is needed to reason correctly about any given piece of code. The model doesn’t have to simulate shared mutable state to understand what’s happening.

The second factor is documentation. Elixir treats @doc as a first-class language feature, not a comment convention. Documentation is separated from code comments by design, and iex> examples in @doc blocks are executable and verified as part of the test suite. This means the training data for Elixir is unusually clean: documented functions have verified examples, and those examples are almost certainly correct. Compare that to a large Python codebase where documentation might be absent, stale, or contradicting what the code actually does. For an LLM trying to learn how a function behaves, Elixir’s ecosystem is a substantially higher signal-to-noise environment.

There’s a third factor Valim names that’s easy to dismiss as boosterism but is actually substantive: stability. Elixir has been on v1.x since 2014. Phoenix has been on v1.x since 2014. Ecto has been on v3 since 2018. In the same window, other ecosystems have gone through multiple major versions with breaking changes. For an LLM trained on years of documentation and Stack Overflow threads, a stable ecosystem means the old articles still apply. There’s no confusion in the training corpus about what’s current versus deprecated. When a model reads a five-year-old Phoenix tutorial, the code mostly works. That’s not accidental — it’s the product of a deliberate stability philosophy. And it compounds over time in the training data.

What this doesn’t mean is that Elixir is suddenly easy to learn, or that LLMs will never get confused by macros, or that you can skip reading the docs. The Tencent benchmark measures problem-solving ability on single-pass completions, not multi-step agent workflows. There are still real gaps — durable execution across node restarts being the most significant for AI agent use cases, something neither the benchmark nor the Valim post addresses. But the result points at something real: languages designed with explicit data flow, verified documentation, and long-term stability are genuinely better substrates for LLM-assisted development than languages with large training sets and chaotic ecosystems.

The implication for Elixir developers in 2026 is practical. When you reach for req over Hackney, or write a pipe chain instead of nested function calls, or add proper @doc blocks with iex> examples to your modules, you’re not just writing cleaner code for human reviewers. You’re making your codebase legible to the tools that will increasingly help you build it. That alignment between good Elixir style and good AI-assisted development ergonomics isn’t a coincidence. It’s the language showing its architecture.