Microsoft Launches Maia 200 AI Chip, 3× FP4 Edge

Microsoft’s new Maia 200 AI chip is a custom‑built inference accelerator designed to boost large‑language‑model token generation. Built on TSMC’s 3‑nm process, it features native FP4 and FP8 tensor cores, 216 GB of HBM3e memory delivering 7 TB/s bandwidth, and a 750 W power envelope, promising up to three times the FP4 performance of competing silicon.

Key Specifications of Maia 200

  • Process technology: TSMC 3‑nanometer
  • Precision support: Native FP4 and FP8 tensor cores
  • Memory subsystem: 216 GB HBM3e with 7 TB/s bandwidth
  • On‑chip SRAM: 272 MB
  • Transistor count: >140 billion
  • Compute capability: >10 petaFLOPS @ FP4, >5 petaFLOPS @ FP8
  • Power envelope: 750 W system‑on‑chip

Performance Compared to Competitors

FP4 Performance Edge

Maia 200 delivers roughly three times the FP4 throughput of Amazon’s third‑generation Trainium accelerator, enabling faster token generation for large language models.

FP8 Performance Edge

In FP8 precision, the chip surpasses the performance of Google’s seventh‑generation TPU, offering higher efficiency for inference workloads that can tolerate 8‑bit arithmetic.

Economic Benefits for AI Workloads

  • Performance‑per‑dollar: Up to 30 % improvement over the latest generation of inference hardware deployed in Microsoft’s data centers.
  • Cost reduction: Lower power and memory bandwidth requirements translate into reduced operational expenses for services such as Microsoft 365 Copilot and Azure OpenAI.
  • Scalability: Designed for large‑scale deployment across Azure regions, supporting massive inference workloads.

Developer Support and SDK

  • Maia SDK includes native PyTorch integration.
  • Integrated Triton compiler for optimized kernel generation.
  • Low‑level programming language for fine‑grained control of tensor operations.
  • Tools simplify model porting across Microsoft’s heterogeneous AI infrastructure.

Deployment in Azure Regions

  • Initial production deployment in the US Central Azure region (near Des Moines, Iowa).
  • Second deployment planned for US West 3 (near Phoenix, Arizona).
  • Additional regional rollouts scheduled to expand coverage.

Implications for Microsoft AI Strategy

Maia 200 marks a strategic shift toward hardware independence, reinforcing Microsoft’s AI services with purpose‑built silicon. By coupling high‑throughput tensor cores with a high‑bandwidth memory system, the chip addresses data‑movement bottlenecks that limit conventional GPUs. The resulting performance gains are expected to lower token generation costs, accelerate synthetic‑data pipelines, and strengthen Microsoft’s competitive position in the fast‑growing inference market.