Unlocking Featherless AI: Explore Inference Providers On Hugging Face 🔥

We are excited to announce that Featherless AI is now an officially supported Inference Provider on the Hugging Face Hub! This latest addition enriches our growing ecosystem, allowing for enhanced serverless inference capabilities directly on the Hub’s model pages. Featherless AI seamlessly integrates into our client SDKs for both JavaScript and Python, simplifying the process of utilizing a broad range of models with your preferred providers.

Featherless AI specializes in a variety of text and conversational models, including cutting-edge open-source models from major contributors like DeepSeek, Meta, Google, Qwen, and many more. Its serverless architecture ensures that a diverse catalogue of models is at your fingertips while maintaining cost-efficiency.

One of Featherless AI’s standout features is its unique model loading and GPU orchestration abilities. Most providers either offer a limited selection of models at low costs or require users to manage extensive server operations, often leading to high operational costs. Featherless AI strikes a balance, delivering a wide range of models with serverless pricing, optimizing both access and affordability. For a complete list of models available, head over to the models page.

We look forward to witnessing the innovative solutions you’ll create with this new provider!

Curious about how to integrate Featherless as an Inference Provider? Check out its dedicated documentation page for step-by-step instructions.

How it works

In the website UI

In your user account settings, you can:

Set your own API keys for the providers you’ve signed up with. If you do not set a custom key, your requests will be routed through Hugging Face. For further details, refer to the documentation.
Order providers based on your preference. This order applies to the widgets and code snippets provided on the model pages.

When calling Inference Providers, there are two modes:

Custom key: This allows requests to be sent directly to the inference provider using your own API key.
Routed by Hugging Face: In this mode, no token from the provider is necessary, and charges are applied directly to your Hugging Face account instead of the provider’s account.

Model pages showcase third-party inference providers compatible with the current model, all sorted according to user preference.

From the client SDKs

from Python, using huggingface_hub

Here’s an example of how you can utilize the DeepSeek-R1 model using Featherless AI as your inference provider. You can use a Hugging Face token for automatic routing through Hugging Face, or insert your own Featherless AI API key if desired.

First, make sure to install or upgrade the huggingface_hub library to version v0.33.0 or higher by running:

pip install --upgrade huggingface-hub

Now, you can use the following code to get started:

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="featherless-ai",
    api_key=os.environ["HF_TOKEN"]
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-0528", 
    messages=messages, 
)

print(completion.choices[0].message)

from JS using @huggingface/inference

import { InferenceClient } from "@huggingface/inference";

const client = new InferenceClient(process.env.HF_TOKEN);

const chatCompletion = await client.chatCompletion({
    model: "deepseek-ai/DeepSeek-R1-0528",
    messages: [
        {
            role: "user",
            content: "What is the capital of France?"
        }
    ],
    provider: "featherless-ai",
});

console.log(chatCompletion.choices[0].message);

Billing

When you make requests using your own API key from an inference provider, billing occurs directly through that provider. For instance, using a Featherless AI API key means charges will be reflected on your Featherless AI account.

In cases where requests are routed through the Hugging Face Hub, you’ll only incur the standard provider API rates without any additional markup from us. We might consider establishing revenue-sharing agreements with our provider partners in the future.

Important Note: PRO users receive $2 worth of inference credits each month, usable across various providers. If you want to maximize your capabilities, subscribing to the Hugging Face PRO plan grants access to these credits, along with benefits like ZeroGPU, Spaces Dev Mode, and significantly increased limits!

Moreover, we offer a small quota for free inference to signed-in free users, but upgrading to PRO will provide a more seamless experience.

Feedback and next steps

Your feedback is invaluable to us! We invite you to share your thoughts and comments here: Hugging Face Discussions.

Inspired by: Source

Contents

How it works

In the website UI
From the client SDKs

from Python, using huggingface_hub
from JS using @huggingface/inference

Billing
Feedback and next steps

Unlocking Featherless AI: Explore Inference Providers on Hugging Face 🔥

How it works

In the website UI

From the client SDKs

from Python, using huggingface_hub

from JS using @huggingface/inference

Billing

Feedback and next steps

Stay Connected

Explore Top AI Tools Instantly

Latest News

Unlocking Niche Domain Insights: CANDI’s Contextual Alignment in Question Answering

Unlocking Authentication in Virtual and Augmented Reality: A Point-Voxel Cross-Attention Network Interface

NetForge RL: An Advanced Multi-Agent Cyber Defense Simulation Environment Featuring Durative Actions

Stripe Benchmark Report: AI Agents Excel in Building Integrations but Face Challenges in Validation

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

How it works

In the website UI

From the client SDKs

from Python, using huggingface_hub

from JS using @huggingface/inference

Billing

Feedback and next steps

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

Unlocking Niche Domain Insights: CANDI’s Contextual Alignment in Question Answering

Unlocking Authentication in Virtual and Augmented Reality: A Point-Voxel Cross-Attention Network Interface

NetForge RL: An Advanced Multi-Agent Cyber Defense Simulation Environment Featuring Durative Actions

Stripe Benchmark Report: AI Agents Excel in Building Integrations but Face Challenges in Validation