Microsoft Launches Maia 200 AI Chip, Beats Trainium

Microsoft’s second‑generation Maia 200 AI inference chip is now in production across Azure data centers, delivering over 10 petaFLOPS in FP4 and more than 5 petaFLOPS in FP8 while consuming just 750 watts. Built on TSMC’s 3‑nm process with 140 billion transistors, 216 GB of HBM3e memory and 272 MB of on‑chip SRAM, the chip targets lower cost‑per‑token for large‑language‑model workloads.

Key Specifications of Maia 200 AI Chip

  • Process technology: TSMC 3‑nm
  • Transistor count: >140 billion
  • Memory subsystem: 216 GB HBM3e delivering 7 TB/s bandwidth
  • On‑chip SRAM: 272 MB
  • Precision support: Native FP8 and FP4 tensor cores
  • Power envelope: 750 W per chip

Performance Claims vs Competitors

FP4 Performance Compared to Trainium

Microsoft states Maia 200 provides three times the FP4 performance of Amazon’s third‑generation Trainium accelerator.

FP8 Performance Compared to Google TPU

In FP8 workloads, Maia 200 is claimed to surpass the seventh‑generation Google TPU, delivering higher throughput at comparable power.

Production Deployments and Use Cases

  • OpenAI GPT‑5.2 models running on Azure
  • Microsoft 365 Copilot inference workloads
  • Internal Superintelligence projects for synthetic data generation and reinforcement‑learning pipelines
  • Initial region rollout: US Central (Des Moines, Iowa)
  • Planned expansion: US West 3 (Phoenix, Arizona) and additional regions

Developer Tools and SDK

Microsoft is releasing a Maia SDK that includes PyTorch integration, a Triton compiler, an optimized kernel library, and a low‑level programming language. The SDK enables AI startups, researchers, and independent developers to fine‑tune models for the new hardware, extending its reach beyond internal workloads.

Impact on Azure Customers

For Azure users, Maia 200 promises a lower cost per token for services that rely on large language models, potentially reducing pricing for Copilot, Azure OpenAI Service, and other AI‑powered offerings. The chip’s efficiency gains aim to make AI inference more affordable at scale.

Future Outlook for AI Inference Silicon

Maia 200 positions Azure as a cost‑effective platform for the next generation of AI applications. Its launch intensifies the silicon rivalry among hyperscalers, each seeking to reduce reliance on external GPU suppliers and capture a larger share of the rapidly expanding AI inference market.