Microsoft has launched Phi-4-multimodal and Phi-4-mini, the latest additions to its Phi family of small language models (SLMs). These models are now available on Azure AI Foundry, Hugging Face, and the NVIDIA API Catalog.
Phi-4-multimodal is a 5.6 billion-parameter model that integrates speech, vision, and text processing. “By leveraging advanced cross-modal learning techniques, this model enables more natural and context-aware interactions, allowing devices to understand and reason across multiple input modalities simultaneously,” said Weizhu Chen, vice president of generative AI at Microsoft.
Last year, Microsoft launched phi-4, with 14 billion parameters. The model excels at complex reasoning capabilities.
The Phi-4 multimodal model supports applications including document analysis and speech recognition. On multimodal audio and visual benchmarks, it surpasses Google Gemini 2 Flash and Gemini 1.5 Pro. Microsoft claims that it is comparable to OpenAI’s GPT-4o.
The company said it has demonstrated strong performance in speech-related tasks, surpassing models such as WhisperV3 and SeamlessM4T-v2-Large in automatic speech recognition and speech translation. It also ranks first on the Hugging Face OpenASR leaderboard with a word error rate of 6.14%. The model shows competitive results in document and chart understanding, Optical Character Recognition (OCR), and visual science reasoning.
On the other hand, Phi-4-mini is a 3.8 billion-parameter text-based model for reasoning, coding, and long-context tasks. It supports sequences of up to 128,000 tokens and offers efficient processing with reduced computational requirements. It supports function calling, allowing integration with external tools and APIs.
Both of the models are suitable for deployment in constrained computing environments. They can be optimised using ONNX Runtime for cross-platform availability and lower latency.
Microsoft is incorporating these models into its ecosystem, including Windows applications and Copilot+ PCs. “Copilot+ PCs will build upon Phi-4-multimodal’s capabilities, delivering the power of Microsoft’s advanced SLMs without the energy drain,” said Vivek Pradeep, vice president and distinguished engineer of Windows Applied Sciences.
Developers can access Phi-4-multimodal and Phi-4-mini on multiple platforms and explore their applications in various industries, including finance, healthcare, and automotive technology.