Google’s Gemini AI assistant has expanded its capabilities with the introduction of audio file uploads, allowing users to transcribe, summarize, and extract key information from voice recordings.
The new feature supports audio files up to 10 minutes long, enabling users to process voice memos, meetings, lectures, and interviews into searchable documents. This functionality is available on both the web and mobile apps, accessible through the standard file-upload interface. According to Josh Woodward, Google’s VP of Gemini, the audio file uploading feature was the most requested by users. The feature is distinct from Gemini Live, which focuses on real-time voice commands. During testing, Gemini demonstrated its ability to accurately transcribe various types of audio content, including sketches from comedy albums and phone conversations, with minor errors related to name recognition. The AI also effectively identified key elements suitable for creating to-do lists.
The addition of audio processing aligns with recent improvements to Gemini, including app integration, a card-based visual interface, and expanded personalization options. This feature allows users to convert saved audio logs and memos into searchable content, streamlining a process that previously required external transcription software. While other AI assistants, such as ChatGPT, Anthropic’s Claude, and Perplexity, also offer audio processing capabilities, Gemini’s implementation is geared towards everyday use cases. Users can leverage Gemini to simplify language, isolate speaker-specific comments, generate questions, and create study guides from audio content.
However, the 10-minute audio limit and daily usage caps for free-tier users may restrict the frequency of use. Google has not yet released formal pricing for high-volume audio processing, as it currently falls under the regular Gemini quota. Users planning to process extensive audio content should manage their usage accordingly. In essence, Gemini’s new audio feature provides a streamlined way to process and extract valuable information from audio files, making it a useful tool for various personal and professional applications.




