We’re excited to announce that DeepInfra has officially joined the roster of Inference Providers available on the Hugging Face Hub! This integration enhances the diverse ecosystem of serverless inference capabilities right from the model pages on the Hub.
DeepInfra’s entry into the platform signifies a valuable addition, streamlining how developers can harness AI technologies for their applications. With its serverless AI inference platform, DeepInfra offers one of the industry’s most cost-effective pricing structures per token. Their extensive catalog boasts over 100 models, allowing developers to seamlessly incorporate a wide array of AI functionalities into their applications with minimal setup.
The flexibility of DeepInfra extends across various model types, encompassing everything from large language models (LLMs) to text generation, embeddings, text-to-image, and even text-to-video. As part of this initial launch, DeepInfra will specifically support conversational and text-generation tasks on Hugging Face, granting users access to high-demand LLMs like DeepSeek V4, Kimi-K2.6, and GLM-5.1, among others. Expect additional support for diverse tasks like text-to-image, text-to-video, and more in the near future!
To dive deeper into how to effectively use DeepInfra as an Inference Provider, make sure to check out its dedicated documentation page. You can also find a comprehensive list of models that DeepInfra supports here.
If you want to follow DeepInfra on the Hugging Face platform, you can find them at this link.
How it works
In the website UI
- In your user account settings, you can:
- Specify your own API keys for the providers you’ve signed up with. If a custom key isn’t set, your requests will be routed through Hugging Face.
- Prioritize providers according to your preferences, which will reflect in the widgets and code snippets on the model pages.
- When calling Inference Providers, you have two operational modes:
- Custom Key: This allows requests to be sent directly to the inference provider using your own API key.
- Routed by HF: In this case, no token is needed from the provider, and charges are applied to your Hugging Face account instead of the provider’s account.
- Model pages showcase compatible third-party inference providers sorted by user preference.
From the client SDKs
DeepInfra is integrated into Hugging Face SDKs, specifically huggingface_hub (version >= 1.11.2) for Python and @huggingface/inference for JavaScript. Below are examples of utilizing DeepSeek V4 Pro through DeepInfra with authentication.
From your favorite Agent Harness
Hugging Face Inference Providers are supported in multiple Agent Harnesses, including Pi, OpenCode, Hermes Agents, OpenClaw, and more. Consequently, you can directly integrate models hosted by DeepInfra into your preferred tools without requiring additional coding. Explore the complete list of integrations here.
From Python
import os
from openai import OpenAI
client = OpenAI(
base_url="https://router.huggingface.co/v1",
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
messages=[
{
"role": "user",
"content": "Write a Python function that returns the nth Fibonacci number using memoization."
}
],
)
print(completion.choices[0].message)
From JS
import { OpenAI } from "openai";
const client = new OpenAI({
baseURL: "https://router.huggingface.co/v1",
apiKey: process.env.HF_TOKEN,
});
const chatCompletion = await client.chat.completions.create({
model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
messages: [
{
role: "user",
content: "Write a Python function that returns the nth Fibonacci number using memoization.",
},
],
});
console.log(chatCompletion.choices[0].message);
Billing
When using a direct request with your inference provider’s API key, you will be billed according to that provider’s rates. For example, using a DeepInfra API key will charge your DeepInfra account.
Conversely, when requests are routed through the Hugging Face Hub, you will only incur regular API charges, with no added fees from Hugging Face. In the future, revenue-sharing agreements with provider partners may be explored.
Important Note: PRO users receive $2 worth of inference credits monthly, usable across providers. 🔥
Consider subscribing to the Hugging Face PRO plan to access inference credits, ZeroGPU, Spaces Dev Mode, enhanced limits, and more.
Free inference is available for signed-in free users with a small allowance, but upgrading to PRO is encouraged for additional benefits!
Feedback and next steps
Your feedback is invaluable to us! Share any thoughts or comments you might have at this link: Hugging Discussions.
Inspired by: Source




