Enfabrica, a startup backed by Nvidia, has unveiled its Emfasys system, a groundbreaking solution designed to enhance server memory capacity for demanding AI inference workloads, providing up to 18TB of additional DDR5 memory via Ethernet.
The rack-compatible Emfasys system leverages Enfabrica’s ACF-S SuperNIC, boasting a 3.2 Tb/s (400 GB/s) throughput, to connect DDR5 memory with CXL capabilities. This enables 4-way and 8-way GPU servers to access the memory pool through standard 400G or 800G Ethernet ports, utilizing Remote Direct Memory Access (RDMA) over Ethernet for seamless integration with existing AI server infrastructure.
Data transfer between GPU servers and the Emfasys memory pool is facilitated by RDMA, allowing for zero-copy, low-latency memory access measured in microseconds without CPU intervention, utilizing the CXL.mem protocol. To access the Emfasys memory pool, memory-tiering software provided by Enfabrica is required, managing transfer delays and related issues. This software is designed to function within existing hardware and OS environments, building upon established RDMA interfaces to simplify deployment without necessitating major architectural modifications.
Enfabrica’s Emfasys is specifically tailored to address the increasing memory demands of modern AI applications, particularly those involving long prompts, large context windows, or multiple agents, which place significant strain on GPU-attached HBM. By employing an external memory pool, data center operators can expand the memory capacity of individual AI servers, making it a suitable solution for these challenging scenarios.
By adopting the Emfasys memory pool, AI server operators can enhance efficiency through improved utilization of compute resources, reduced wastage of expensive GPU memory, and overall reduction in infrastructure costs. Enfabrica claims that this configuration can decrease the cost per AI-generated token by up to 50% in high-turn and long-context scenarios. Furthermore, token generation tasks can be distributed more evenly across servers, mitigating potential bottlenecks.
“AI inference has a memory bandwidth-scaling problem and a memory margin-stacking problem,” said Rochan Sankar, CEO of Enfabrica. “As inference gets more agentic versus conversational, more retentive versus forgetful, the current ways of scaling memory access won’t hold. We built Emfasys to create an elastic, rack-scale AI memory fabric and solve these challenges in a way that has not been done before. Customers are excited to partner with us to build a far more scalable memory movement architecture for their GenAI workloads and drive even better token economics.”
The Emfasys AI memory fabric system and the 3.2 Tb/s ACF SuperNIC chip are currently undergoing evaluation and testing by select customers, with the timeline for general availability remaining unclear.
Enfabrica is an advisory member of the Ultra Ethernet Consortium (UEC) and contributes to the Ultra Accelerator Link (UALink) Consortium.




