
Recently, OpenAI released an official blog post announcing the full integration of ChatGPT's "Voice Mode" into the main chat interface, marking another step forward in its multimodal interaction capabilities. After this update, users can directly initiate voice conversations from the main window without switching to a separate module, while enjoying real-time presentation of visual information such as text, images, and maps, significantly improving the seamlessness and functionality of the interaction.
The core breakthrough of the new Voice Mode lies in the coordinated output of visual and auditory elements. According to the demonstration, when a user asks a question via voice, ChatGPT not only responds with natural speech but also simultaneously displays relevant charts, images, or maps in the chat interface, and automatically generates a transcript for the user to review. This design is particularly suitable for scenarios requiring multi-dimensional information support, such as travel planning or data analysis. To accommodate different user habits, OpenAI has retained the option to switch to a "separate Voice Mode" in the settings, allowing users who prefer a pure audio experience to easily revert to the old interface.
This update is the latest step in OpenAI's strategic plan. Previously, the company had launched a series of innovations, including an AI shopping assistant, the Atlas browser (supporting iCloud Keychain), group chat functionality, and the GPT-5.1 model. By continuously iterating on product boundaries, OpenAI is gradually building a more comprehensive AI ecosystem, and the deep integration of voice and vision undoubtedly provides new possibilities for expanding its application scenarios.