NII Announces Full‑Duplex Japanese Speech Model Moshi‑v1

technology

NII has just released LLM‑jp‑Moshi‑v1, the first commercially‑ready full‑duplex Japanese speech model, letting you speak and hear responses in real time. Built on a 7 billion‑parameter architecture and trained on over 70 000 hours of dialogue, the model runs on standard hardware and is available under an Apache 2.0 licence, so you can integrate it without royalty fees.

What the Model Does

LLM‑jp‑Moshi‑v1 takes spoken input the moment it arrives, processes it on the fly, and generates a natural‑sounding spoken reply instantly. In practice, the system behaves like a human interlocutor who can listen and talk simultaneously, a capability most Japanese language models still lack.

Key Features

  • Full‑duplex interaction – respond while the user is still speaking.
  • 7 billion parameters for robust language understanding.
  • 70 000+ hours of dialogue training covering casual to formal styles.
  • Apache 2.0 licence – no royalty fees.

Improvements Over Previous Models

The new model upgrades its predecessor, J‑Moshi, in two key ways: naturalness and semantic appropriateness. Researchers found that chat‑style corpora produce smoother, overlapping turns, while consultation‑style data encourage slower, more deliberate exchanges. Fine‑tuning on high‑quality audio boosts perceived naturalness, but mixing mismatched data can erode linguistic coherence.

Why Full‑Duplex Matters

Most voice assistants operate in half‑duplex mode, waiting for the user to finish speaking before responding. Full‑duplex lets a system interject, clarify, or confirm while the speaker is still talking—mirroring a real phone call. This cuts latency, saving seconds in call‑center automation, tele‑health triage, or any scenario where speed matters.

Commercial‑Ready Open Licence

Releasing Moshi‑v1 under Apache 2.0 signals NII’s intent to see rapid adoption. The open licence means startups, telecom operators, and enterprises can integrate the model without royalty fees, giving Japanese firms a home‑grown alternative that respects local data‑privacy norms.

You’re able to start using it right away thanks to the permissive licence.

Practitioner Perspective

Engineers say the model offers a practical path to upgrade existing IVR systems without a massive rewrite. Because Moshi‑v1 runs on standard hardware and can be accessed via a simple API, teams can prototype a real‑time bilingual support line in weeks rather than months. Preserving turn‑taking cues—pauses, overlaps, quick confirmations—helps keep customers from feeling they’re talking to a stiff bot.

Future Directions

The NII team plans to extend the model to multilingual settings and domain‑specific fine‑tuning, such as medical triage or financial advice. Ongoing research also explores a curriculum‑learning approach, gradually feeding the model more complex turn‑taking patterns to push conversational fluidity even further.

Bottom Line

If you’ve ever wanted a voice assistant that can talk over you like a human agent, Moshi‑v1 is the first openly licensed tool that makes that possible for Japanese. Its launch not only marks a technical milestone but also opens the door for a wave of commercial applications that can finally treat spoken Japanese with real‑time agility.