Apple has made its FastVLM (Visual Language Model) available for testing directly in a web browser, allowing users with Apple Silicon-powered Macs to experience its near-instant high-resolution image processing capabilities.
The model, initially released a few months ago, leverages MLX, Apple’s open machine learning framework optimized for Apple Silicon, to achieve significantly faster video captioning and a smaller model size compared to its counterparts. FastVLM is reported to be up to 85 times faster in video captioning and more than three times smaller than comparable models.
Apple has expanded the project’s availability, making it accessible on Hugging Face in addition to GitHub. Users can now load the lightweight FastVLM-0.5B model directly in their browser to evaluate its performance. According to tests, loading the model takes a couple of minutes on a 16GB M2 Pro MacBook Pro.
Once loaded, the model accurately describes the user’s appearance, the surrounding environment, expressions, and objects in view. Users can interact with the model by adjusting the prompt or selecting from preset options such as “Describe what you see in one sentence,” “What is the color of my shirt?”, “Identify any text or written content visible,” “What emotions or actions are being portrayed?” and “Name the object I am holding in my hand.” This allows for a customized and interactive experience.
Furthermore, users can employ a virtual camera app to feed live video to the tool, enabling it to instantly describe multiple scenes in detail. This highlights the model’s speed and accuracy. A key feature of this implementation is that it runs locally in the browser, ensuring data privacy as no data leaves the device.
The model can also operate offline, making it suitable for applications in wearables and assistive technology, where low latency and efficiency are crucial. The browser-based demo utilizes the 0.5-billion-parameter version of FastVLM. The FastVLM family includes larger variants with 1.5 billion and 7 billion parameters.
While these larger models could offer improved performance and speed, running them directly in the browser is unlikely due to resource constraints. Apple encourages users to test the model and share their feedback.




