Google has unveiled EmbeddingGemma, a novel on-device embedding model featuring 308 million parameters, designed to operate efficiently on mobile devices and offline environments.
The model has achieved a notable sub-15ms inference latency for 256 tokens on EdgeTPU, rendering it apt for real-time applications. Trained on a diverse dataset encompassing over 100 languages, EmbeddingGemma has secured the top position on the Massive Text Embedding Benchmark (MTEB) among models with fewer than 500 million parameters.
According to Google, EmbeddingGemma’s performance either rivals or surpasses that of embedding models nearly twice its size, particularly in cross-lingual retrieval and semantic search tasks. For those seeking more information, links are available to access a comprehensive analysis, the model on Hugging Face, and technical details.




