Microsoft’s Windows lead, Pavan Davuluri, has indicated that the upcoming iteration of Windows will be “more ambient, pervasive, and multimodal,” as artificial intelligence (AI) is set to redefine the desktop interface and user interaction with computers.
In a recent video interview, Davuluri, who serves as Microsoft CVP and Windows boss, elaborated on the future of the platform, highlighting significant changes anticipated for the operating system. When questioned about the impact of AI on human-computer interaction, Davuluri stated, “I think we will see computing become more ambient, more pervasive, continue to span form factors, and certainly become more multimodal in the arc of time … I think experience diversity is the next space where we will continue to see voice becoming more important. Fundamentally, the concept that your computer can actually look at your screen and is context aware is going to become an important modality for us going forward.”
This is not the first instance where Microsoft has hinted at prioritizing voice as a primary input method for Windows. A “Windows 2030 Vision” video, released just the prior week by Microsoft’s CVP of Enterprise & Security, also explored a similar future for the operating system, reinforcing the company’s direction. The forthcoming Windows experience is expected to integrate voice as a first-class input method, allowing users to interact with the OS using natural language. This capability will enable the system to understand user intent based on the on-screen context, supplementing traditional mouse and keyboard inputs.
D suggested that the visual suggested that the visual appearance of Windows is likely to evolve significantly due to agentic AI. He noted, “I think what human interfaces look like today and what they look like in five years from now is one big area of thrust for us that Windows continues to evolve. The operating system is increasingly agentic and multimodal … that is an area of tremendous investment and change for us.”
Furthermore, Davuluri emphasized the crucial role of cloud computing in enabling these future experiences. He explained, “Compute will become pervasive, as in Windows experiences are going to use a combination of capabilities that are local and that are in the cloud. I think it’s our responsibility to make sure they’re seamless to our customers.” This indicates a hybrid approach, leveraging both local processing power and cloud-based resources to deliver seamless user experiences.
Microsoft appears to be positioning Windows as an integrated AI assistant, moving beyond the current model where AI assistants function primarily as separate applications or overlays on existing operating systems. Unlike current AI assistants such as Copilot on Windows, Gemini on Android, or Siri on Mac, which operate as distinct applications or floating windows, the future Windows is envisioned to have AI intrinsically woven throughout its architecture. This fundamental integration of AI suggests a profound shift in how the operating system is designed and utilized.
This transformative shift is anticipated to materialize within the next five years, potentially with the release of Windows 12. Multiple high-level Microsoft executives have alluded to this being a significant paradigm shift for both the platform and computing at large, driven by advancements in AI.
While the concept of voice becoming a primary and reliable input method for PCs might initially seem challenging for some users to grasp, the integration of agentic AI and the OS’s ability to comprehend user intent and natural language are expected to make the experience feel more intuitive than anticipated. The shift is not exclusive to Microsoft; Apple is also rumored to be developing a voice-centric feature for iOS 26, which would enable iPhone users to navigate applications solely by voice commands, indicating a broader industry trend towards enhanced voice interaction.
On Windows, voice is likely to serve as an additional input method, alongside the mouse and keyboard, creating three primary interaction modalities: typing, touch/mouse, and voice. While voice input may not be mandatory for task completion, its inclusion is expected to streamline workflows and enhance productivity.
However, the widespread adoption of such AI-driven experiences is expected to raise significant privacy concerns. These advanced functionalities will require access to substantial amounts of personal user data to optimize their utility. Coupled with Microsoft’s stated need for a balance between local and cloud computing to facilitate these experiences, there is an anticipation of potential public pushback regarding data privacy and security.




