Hugging Face Acquires XetHub: Pioneering the Future of AI Collaboration
We are thrilled to announce that Hugging Face has officially acquired XetHub, a groundbreaking Seattle-based company founded by industry veterans Yucheng Low, Ajit Banerjee, and Rajat Arya. With their rich backgrounds at Apple, where they developed and scaled internal machine learning (ML) infrastructures, XetHub is poised to transform how AI development teams collaborate and innovate.
The Mission of XetHub
At the heart of XetHub’s mission is the commitment to enable software engineering best practices within AI development. They have created cutting-edge technologies that allow Git to efficiently manage repositories containing terabytes of data. This innovation empowers teams to explore, understand, and collaborate on continuously evolving datasets and models. The talented team at XetHub, now part of Hugging Face, consists of 12 exceptional members dedicated to pushing the boundaries of AI collaboration.
Our Shared Vision at Hugging Face
As Julien Chaumond, Hugging Face’s CTO, articulated, the integration of XetHub into Hugging Face will accelerate our growth over the next five years. The XetHub team will facilitate the transition to a more optimized storage and versioning backend for the Hub’s repositories, enhancing our capabilities significantly.
In 2020, when we launched the first version of the Hugging Face Hub, we chose Git LFS (Large File Storage) as a reliable, albeit temporary, solution. However, as the AI landscape evolved, it became clear that we needed a dedicated system tailored for the massive files typical in AI applications. XetHub’s technology is designed specifically for this challenge, making it a perfect fit for our needs.
Future Use Cases: Unlocking New Possibilities
The acquisition of XetHub opens up a world of possibilities for Hugging Face users. For instance, consider the scenario where a 10GB Parquet file requires modification by adding a single row. Currently, this involves re-uploading the entire 10GB file. However, with XetHub’s chunked files and deduplication technology, users will only need to upload the chunks that contain the new row—streamlining the process and saving time.
Similarly, for GGUF model files, if a user needs to update a single metadata value in the GGUF header of a large model, they will only need to upload a small chunk of data instead of the entire file. This efficiency is crucial as the AI industry moves towards increasingly complex models, including those with trillions of parameters.
Statistics That Speak Volumes
The Hugging Face Hub has seen remarkable growth. Here are some current stats that illustrate our scale:
- Repositories: 1.3 million models, 450,000 datasets, and 680,000 spaces.
- Total Size: 12 petabytes stored in LFS (280 million files) and 7.3 terabytes stored in Git (non-LFS).
- Daily Requests: 1 billion requests across the Hub.
- Cloudfront Bandwidth: An astounding 6 petabytes daily.
These figures highlight the immense demand for AI resources and the pressing need for efficient collaboration tools.
Insights from Yucheng Low
Yucheng Low, one of XetHub’s founders, shares his extensive experience in the AI/ML sector, spanning over 15 years. He reflects on the transformative power of data, noting how tasks that once seemed impossible—like image generation—have become achievable through the accumulation of vast datasets and advanced models.
Yucheng’s journey began with his work in startups focused on scaling machine learning algorithms and data management. His experience at Apple, where he managed over 100 petabytes of AI data, laid the groundwork for the establishment of XetHub in 2021. The aim was clear: to empower machine learning teams to collaborate as effectively as software development teams, enhancing experimentation, reproducibility, and data visualization.
Joining Hugging Face represents an exciting new chapter for Yucheng and the entire XetHub team. Their shared vision is to make AI collaboration seamless and intuitive, integrating XetHub’s innovations into the Hugging Face Hub and providing these advanced features to the global ML community.
Join Our Growing Team
In light of this exciting development, we are also looking to expand our Infrastructure team. If you are passionate about building and scaling collaboration platforms for the open-source AI movement, we would love to hear from you!
The acquisition of XetHub marks a significant milestone in our journey to enhance AI collaboration. As we continue to innovate and grow, we invite you to join us in shaping the future of machine learning and artificial intelligence.
Source: Original Article

