Etched Just Bet $800M That Nvidia's GPUs Are Solving the Wrong Problem

A four-year-old chip startup came out of hiding this week with a working chip, a billion dollars in contracts, and a claim that could reshape how AI companies pay for inference.
If you saw the Etched thread on X and felt lost, this is for you.
There was no context or background, just a wall of jargon like "Cluster-Scale Memory" and "mem2mem latency." Confusing or not, it worked, and the AI world spent two days arguing about it.
By the end of this issue, you'll know what Etched actually built, how they staged this launch, why the $1 billion number is more complicated than it sounds, and why one of the sharpest replies in the whole thread compared them to Theranos.
Here's the original launch thread if you want to read it cold, the way most of the internet did.
TL;DR
Etched builds computer chips designed only to run transformer models, the architecture behind ChatGPT, Claude, and basically every major AI model today.
This week, Etched came out of stealth, meaning they'd been building quietly for years and finally revealed what they'd made.
They announced a working chip called Sohu, already running real AI models, $800 million raised across rounds nobody knew about until now, over $1 billion in signed customer contracts, and two new chip technologies nobody else has shipped yet.
What Etched Is Actually Selling

Every AI chip today faces the same tradeoff. A chip can be fast, or it can hold a lot of data, and rarely both.
Chips lean on two types of memory to do their job. SRAM (Static Random-Access Memory) is blazing fast but small and expensive. HBM (High Bandwidth Memory) holds far more data but takes longer to access. Every existing chip ends up leaning one way or the other.
Etched's answer combines two technologies.
Low Voltage Inference (LVI) handles the compute side. Most AI chips run so hot under full load that they throttle themselves, often dropping to under half their theoretical peak performance just to avoid overheating. LVI runs the chip's math at far lower voltage, so it never has to slow down to stay cool.
Cluster-Scale Memory (CSM) handles the memory side, and it's the part that dominated the launch thread. Instead of forcing a choice between SRAM and HBM, CSM links every chip in a cluster into one shared, ultra-low-latency memory pool, so a chip can reach a neighboring chip's memory almost as fast as its own.
This matters most for the biggest AI models. Modern frontier models increasingly use a Mixture of Experts (MoE) design, where a query gets routed to one of many specialized sub-networks instead of firing up the whole model at once. That routing means constantly shuttling data between chips, and every hop through a slower memory layer adds latency.
Etched claims its models run at over 80% of peak hardware efficiency, while general-purpose GPUs typically run inference at 20 to 50% efficiency, since they're built to do everything reasonably well rather than one thing exceptionally well.
One outside estimate puts the gap in concrete terms. An eight-chip Sohu server running the Llama 70B model reaches roughly 500,000 tokens per second, compared to around 25,000 tokens per second for an equivalent Nvidia H100 setup.
That gap isn't a small upgrade. It points to a different category of machine, assuming the number holds up outside a controlled demo.
How Etched Staged This Launch
This wasn't a casual product post. It was choreographed carefully.
Etched was founded in 2022 by three Harvard dropouts, Gavin Uberti, Chris Zhu, and Robert Wachen. For most of the years since, the company said almost nothing publicly, while quietly building a team that pulled engineers from Nvidia, Google's TPU program, Broadcom, SK hynix, and TSMC. They now stand at over 400 engineers.
On June 30, Etched didn't just post a thread. They ran a synchronized press push alongside a GlobeNewswire release and a three-post X thread, all landing the same day. The financial numbers, the technical claims, and the physical infrastructure proof all dropped together, including a 2MW datacenter and test lab built into their own San Jose office, plus a new 24/7 engineering site in Taiwan.
The funding reveal was its own surprise. The $800 million wasn't one round, but several previously undisclosed rounds, capped by a $500 million close in December 2025 at a $5 billion valuation. Investors ranged from quant trading firms like Jane Street, Two Sigma, Jump Trading, and Hudson River Trading, to AI research figures including Andrej Karpathy, Geoffrey Hinton, and Fei-Fei Li, alongside Peter Thiel and Stanley Druckenmiller. Revealing a round like that all at once works as a credibility move as much as a funding update.
The thread itself stayed short on purpose, just three posts explaining CSM, pointing at the physical buildout, and closing with a direct hiring call to action. There was no roadmap teasing beyond "more this summer," and that restraint reads like a company with something already built rather than something still being pitched.
The $1 Billion Question Nobody's Answering
This is the number that traveled fastest, and it deserves more scrutiny than it got in most reactions.
Etched says it has signed over $1 billion in customer contracts, but that figure isn't revenue that's been collected, it isn't deployed running hardware, and Etched hasn't named a single customer.
What Etched has confirmed is that these are committed purchase orders, and the company has been testing racks with customers using production traffic patterns run through a simulator, not necessarily live customer workloads yet.
A signed contract is a company saying it intends to buy once the product works. That's different from a company saying it already bought the product and is running its workload today. In chip manufacturing, the gap between those two things has swallowed more than one promising startup.
To be fair to Etched, they aren't hiding this. CEO Gavin Uberti has been candid about the binary nature of the bet, saying that if transformer architecture remains dominant, Etched could become an enormous business, and if the industry shifts to a different architecture, the chips become, in his own framing, expensive paperweights.
That kind of candor is rare from a company mid-launch, and it's also the kind of statement that cuts both ways when your entire product line assumes today's dominant AI architecture stays dominant.
The Replies Split Into Two Camps
Here's where the launch got genuinely interesting, since the replies weren't a chorus, they were a fault line.

The believers carried serious weight. Andrej Karpathy called out the engineering itself, specifically the very low voltage domains and cluster-scale memory, as work aimed at maximizing tokens generated per watt of power at real interactive speeds. Sequoia's Sonya Huang celebrated the raise outright, and investor Nikita Bier framed the whole thing as an attack on one of the biggest bottlenecks of the global economy.
Then there was one skeptic, and it landed hard. George Hotz's team, tiny corp, replied with a single line and no elaboration, comparing Etched directly to Theranos.
There was no thread and no follow-up, just a direct comparison to a company whose hardware claims famously didn't survive outside scrutiny. It landed hard because it sat right next to a wall of praise from some of the most credible names in AI infrastructure and venture investing.
That contrast is the real story here, more than any single spec. Etched is making a specific, testable claim of 80%+ efficiency on trillion-parameter MoE models, at rack scale, sustained under real load, and backing it with shipping hardware and signed contracts rather than a pitch deck. Whether that claim survives independent benchmarking is now a summer 2026 story, not a launch day one.
Why This Matters Past the Thread
Etched's core bet has always been narrow and aggressive. Instead of building a general-purpose chip like a GPU, they hard-code the transformer computation graph directly into silicon, with no flexible programming model and no ability to run other kinds of AI workloads. Just transformers, done as fast as physically possible.
The tradeoff is real. Sohu reportedly can't run every major open-weight model architecture on the market today, since some newer models deviate from the exact patterns Etched hard-coded for, and that's the cost of specialization.
CSM is the memory-side half of that same bet. As models get sparser, using more MoE-style specialized sub-networks, and context windows get longer, the bottleneck shifts from raw compute to how fast data can move between chips. If Etched's numbers hold up in the field, it signals that the next fight over AI inference costs won't be about who has the most FLOPs, but about who solved the memory wall first.
Etched isn't shy about the scale of that ambition either, and is targeting gigawatt-scale inference infrastructure by 2027. That's infrastructure company language, the kind used by businesses trying to become load-bearing for an entire industry.