We are excited to announce that Featherless AI is now an officially supported Inference Provider on the Hugging Face Hub! This latest addition enriches our growing ecosystem, allowing for enhanced serverless inference capabilities directly on the Hub’s model pages. Featherless AI seamlessly integrates into our client SDKs for both JavaScript and Python, simplifying the process of utilizing a broad range of models with your preferred providers.
Featherless AI specializes in a variety of text and conversational models, including cutting-edge open-source models from major contributors like DeepSeek, Meta, Google, Qwen, and many more. Its serverless architecture ensures that a diverse catalogue of models is at your fingertips while maintaining cost-efficiency.
One of Featherless AI’s standout features is its unique model loading and GPU orchestration abilities. Most providers either offer a limited selection of models at low costs or require users to manage extensive server operations, often leading to high operational costs. Featherless AI strikes a balance, delivering a wide range of models with serverless pricing, optimizing both access and affordability. For a complete list of models available, head over to the models page.
We look forward to witnessing the innovative solutions you’ll create with this new provider!
Curious about how to integrate Featherless as an Inference Provider? Check out its dedicated documentation page for step-by-step instructions.
How it works
In the website UI
- In your user account settings, you can:
- Set your own API keys for the providers you’ve signed up with. If you do not set a custom key, your requests will be routed through Hugging Face. For further details, refer to the documentation.
- Order providers based on your preference. This order applies to the widgets and code snippets provided on the model pages.
- When calling Inference Providers, there are two modes:
- Custom key: This allows requests to be sent directly to the inference provider using your own API key.
- Routed by Hugging Face: In this mode, no token from the provider is necessary, and charges are applied directly to your Hugging Face account instead of the provider’s account.
- Model pages showcase third-party inference providers compatible with the current model, all sorted according to user preference.
From the client SDKs
from Python, using huggingface_hub
Here’s an example of how you can utilize the DeepSeek-R1 model using Featherless AI as your inference provider. You can use a Hugging Face token for automatic routing through Hugging Face, or insert your own Featherless AI API key if desired.
First, make sure to install or upgrade the huggingface_hub library to version v0.33.0 or higher by running:
pip install --upgrade huggingface-hub
Now, you can use the following code to get started:
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="featherless-ai",
api_key=os.environ["HF_TOKEN"]
)
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=messages,
)
print(completion.choices[0].message)
from JS using @huggingface/inference
import { InferenceClient } from "@huggingface/inference";
const client = new InferenceClient(process.env.HF_TOKEN);
const chatCompletion = await client.chatCompletion({
model: "deepseek-ai/DeepSeek-R1-0528",
messages: [
{
role: "user",
content: "What is the capital of France?"
}
],
provider: "featherless-ai",
});
console.log(chatCompletion.choices[0].message);
Billing
When you make requests using your own API key from an inference provider, billing occurs directly through that provider. For instance, using a Featherless AI API key means charges will be reflected on your Featherless AI account.
In cases where requests are routed through the Hugging Face Hub, you’ll only incur the standard provider API rates without any additional markup from us. We might consider establishing revenue-sharing agreements with our provider partners in the future.
Important Note: PRO users receive $2 worth of inference credits each month, usable across various providers. If you want to maximize your capabilities, subscribing to the Hugging Face PRO plan grants access to these credits, along with benefits like ZeroGPU, Spaces Dev Mode, and significantly increased limits!
Moreover, we offer a small quota for free inference to signed-in free users, but upgrading to PRO will provide a more seamless experience.
Feedback and next steps
Your feedback is invaluable to us! We invite you to share your thoughts and comments here: Hugging Face Discussions.
Inspired by: Source




