Microsoft AI, the tech giant’s research lab, announced the release of three foundational AI models on Thursday that can generate text, voice, and images. This release signals Microsoft’s continued push to build out its own stack of multimodal AI models and compete with rival AI labs, even though it remains tied to OpenAI.
MAI-Transcribe-1 transcribes speech across 25 different languages into text and is 2.5 times faster than Microsoft’s Azure Fast offering, according to a company press release. MAI-Voice-1 is an audio-generating model, enabling users to generate 60 seconds of audio in one second and create a custom voice. Meanwhile, MAI-Image-2 is a video-generating model.
MAI-Image-2 was originally released on MAI Playground, a new large language model testing software, on March 19. Now, all three models are being released on Microsoft Foundry, with the transcription and voice models also available in MAI Playground.
The models were developed by Microsoft’s MAI Superintelligence team, led by Mustafa Suleyman, CEO of Microsoft AI, which was formed and announced in November 2025.
“At Microsoft AI, we’re building Humanist AI. We have a distinct view when creating our AI models—putting humans at the center, optimizing for how people actually communicate, training for practical use,” Suleyman wrote in the blog post. “You’ll see more models from us soon in Foundry and directly in Microsoft products and experiences.”
In an increasingly crowded LLM market, MAI aims to provide a compelling case for its new models by offering them at a lower cost compared to those from Google and OpenAI. For instance, MAI-Transcribe-1 starts at $0.36 per hour, while MAI-Voice-1 starts at $22 per 1 million characters, and MAI-Image-2 starts at $5 for 1 million tokens for text input and $33 for 1 million tokens for image output.
Despite the new releases, Suleyman reaffirmed Microsoft’s commitment to its partnership with OpenAI in an interview with VentureBeat. Recent negotiations allowed Microsoft to pursue this superintelligence research, indicating a strategic alignment with OpenAI that remains critical for their collaborative efforts.
Microsoft’s substantial investment of over $13 billion into the AI research lab serves as a testament to its long-term goals. The company continues to integrate its AI innovations across various products, highlighting its dual strategy of developing in-house capabilities while leveraging partnerships.
Strategic Impact on Business Automation
The introduction of Microsoft’s new foundational AI models illustrates a significant leap forward within the tech industry, particularly for business automation. Organizations looking to improve efficiency through AI-driven solutions now have access to powerful tools capable of transcribing speech, generating custom voice content, and creating visual media at an unprecedented scale.
For developers, this evolution represents an opportunity to build applications incorporating these capabilities, potentially transforming how businesses engage with customers and streamline operations. The models’ competitive pricing may enable smaller firms to utilize high-quality AI solutions previously accessible only to larger enterprises.
For business owners, the advent of versatile AI tools means they can enhance productivity and reduce operational costs by automating time-consuming tasks such as transcribing meetings and generating engaging marketing materials. Businesses that capitalize on these technologies could establish themselves at the forefront of their respective industries, leveraging advanced AI to achieve scalability and improve customer experiences.
In the broader AI ecosystem, these developments might encourage increased competition and innovation. As Microsoft positions itself against established players like OpenAI and Google, industry-wide pressure to enhance performance, reduce costs, and improve accessibility could ultimately benefit consumers and businesses alike.









