The Power of DOIs: Enhancing Machine Learning Model and Dataset Citations at Hugging Face
At Hugging Face, our mission is clear: to democratize good machine learning practices. This commitment extends to making machine learning models and datasets more reproducible, well-documented, and accessible for everyone. In a significant step toward achieving this goal, we are thrilled to announce that users can now generate a Digital Object Identifier (DOI) for their models and datasets directly from the Hugging Face Hub. This new feature not only enhances the visibility of your work but also facilitates easier citation within the research community.
What is a DOI and Why Does It Matter?
Digital Object Identifiers (DOIs) are unique alphanumeric strings assigned to various digital objects, such as research articles, datasets, and models. Think of DOIs as the digital equivalent of a book’s ISBN. They provide a permanent link to the object’s metadata, which includes essential information like the object’s URL, version, creation date, and a concise description.
Using DOIs offers several advantages:
-
Ease of Access: DOIs simplify the process of finding information about a specific model or dataset. Researchers can easily locate and reference your work through a stable link that won’t change over time.
-
Persistent Identification: Datasets and models with DOIs are designed to persist indefinitely. The only way to remove them is by filing a request with our support team, ensuring that your work remains accessible.
- Standardized Citations: In academic and research contexts, DOIs have become a commonly accepted reference format, helping to standardize how digital resources are cited. This is particularly valuable in machine learning, where the reproducibility and verification of models and datasets are crucial.
How Does Hugging Face Assign DOIs?
We have partnered with DataCite to streamline the process of DOI assignment for our users. Registered Hugging Face Hub users can now request a DOI for their models or datasets with just a few clicks. Once they fill out the required metadata, voila! A shiny new DOI is generated, ready to be shared with the world.
Updating DOIs is a breeze as well. Should a new version of a model or dataset be released, the DOI can be easily updated, rendering the previous version outdated. This functionality allows for precise referencing of specific versions of your work, a valuable feature for researchers who wish to cite the exact iteration of a model or dataset they utilized in their studies.
Community-Driven Improvements
At Hugging Face, we value the feedback from our community. Many of our features, including the DOI generation tool, stem from suggestions made by users like you. If you have ideas for further enhancements or other features you’d like to see, we encourage you to reach out! Drop us a note or tweet us at @HuggingFace or open an issue on our GitHub page under huggingface/hub-docs. Your insights help us shape the future of Hugging Face and improve the experience for everyone involved in the machine learning ecosystem.
Acknowledgments
We extend our gratitude to the DataCite team for partnering with us on this initiative. Special thanks to Alix Leroy, Bram Vanroy, Daniel van Strien, and Yoshitomo Matsubara for initiating and nurturing the discussion around this feature on our GitHub repository. Your contributions are invaluable in making Hugging Face a better platform for all machine learning practitioners.
With the introduction of DOI generation, Hugging Face is taking significant strides toward making machine learning more accessible and trustworthy. By allowing researchers to easily cite and share their work, we are not only enhancing the reproducibility of models and datasets but also fostering a culture of collaboration and innovation in the machine learning community.
Inspired by: Source

