Tekmono
  • News
  • Guides
  • Lists
  • Reviews
  • Deals
No Result
View All Result
Tekmono
No Result
View All Result
Home News
DeepSeek Unveils AI Model with Lower Inference Costs

DeepSeek Unveils AI Model with Lower Inference Costs

by Tekmono Editorial Team
30/09/2025
in News
Share on FacebookShare on Twitter

Researchers at DeepSeek have released a new experimental model, V3.2-exp, designed to significantly lower inference costs in long-context operations, as announced in a post on Hugging Face and an accompanying academic paper on GitHub.

The model’s key feature is DeepSeek Sparse Attention, a system that utilizes a “lightning indexer” to prioritize specific excerpts from the context window. Following this, a “fine-granular token selection system” selects specific tokens from within those excerpts, which are then loaded into the module’s limited attention window. This enables the Sparse Attention model to operate over extensive context portions with relatively small server loads.

DeepSeek’s preliminary testing indicates that the price of a simple API call can be reduced by as much as half in long-context operations. Although further testing is necessary to validate these claims, the model is open-weight and freely available on Hugging Face, allowing third-party evaluations to assess the results presented in the paper.

Related Reads

Microsoft enhances Copilot with multimodal features, introduces new $99 tier

Apple celebrates 50th anniversary amid scrutiny over privacy practices

Huawei launches Converged Development Engine for HarmonyOS PCs

Salesforce unveils updated Slack with 30 new AI features

DeepSeek’s new model is part of a series of recent breakthroughs addressing the issue of inference costs, which represent the server expenses associated with operating a pre-trained AI model, distinct from training costs. The researchers aimed to enhance the fundamental transformer architecture’s efficiency and found significant room for improvement.

Based in China, DeepSeek has been an unconventional player in the AI sector, particularly for those who view AI research as a nationalist competition between the U.S. and China. The company garnered attention earlier this year with its R1 model, trained primarily using reinforcement learning at a lower cost than its American counterparts. Although R1 did not spark a revolution in AI training as predicted, DeepSeek’s new “sparse attention” approach could still offer valuable insights to U.S. providers on maintaining low inference costs.

The new model’s release is unlikely to generate the same level of excitement as R1, but it has the potential to teach U.S. providers important strategies for reducing inference costs.

ShareTweet

You Might Be Interested

Microsoft enhances Copilot with multimodal features, introduces new  tier
News

Microsoft enhances Copilot with multimodal features, introduces new $99 tier

02/04/2026
News

Apple celebrates 50th anniversary amid scrutiny over privacy practices

02/04/2026
News

Huawei launches Converged Development Engine for HarmonyOS PCs

02/04/2026
Salesforce unveils updated Slack with 30 new AI features
News

Salesforce unveils updated Slack with 30 new AI features

02/04/2026
Please login to join discussion

Recent Posts

  • Microsoft enhances Copilot with multimodal features, introduces new $99 tier
  • Apple celebrates 50th anniversary amid scrutiny over privacy practices
  • Huawei launches Converged Development Engine for HarmonyOS PCs
  • Salesforce unveils updated Slack with 30 new AI features
  • Meta announces release of second generation smart glasses starting April 14

Recent Comments

No comments to show.
  • News
  • Guides
  • Lists
  • Reviews
  • Deals
Tekmono is a Linkmedya brand. © 2015.

No Result
View All Result
  • News
  • Guides
  • Lists
  • Reviews
  • Deals