Advancing Hebrew NLP: A New Era with Our Open LLM Leaderboard
The need for language technology advancements in Hebrew has never been more critical. As a morphologically rich language, Hebrew presents unique challenges that existing language models often struggle to address. In response to this gap, we’re thrilled to unveil our new open LLM leaderboard, specifically designed to evaluate and enhance language models for Hebrew.
Understanding the Complexity of Hebrew
Hebrew is characterized by its intricate system of roots and patterns. Words are constructed from roots, with prefixes, suffixes, and infixes modifying their meaning, tense, or form. This complexity means that a single root can generate multiple valid word forms, making traditional tokenization strategies—suitable for morphologically simpler languages—inadequate. Consequently, many existing language models falter when tasked with understanding the nuances of Hebrew, underscoring the urgent need for dedicated benchmarks.
The challenge lies not just in the language’s structure but also in the cultural and contextual layers that influence meaning. Therefore, our focus is on developing benchmarks that specifically cater to the linguistic properties of Hebrew, ensuring that models can accurately interpret and generate Hebrew text.
Introducing Our Leaderboard Metrics and Tasks
To address these challenges, our leaderboard features four key datasets, each meticulously designed to evaluate language models on their understanding and generation of Hebrew. These benchmarks utilize a few-shot prompt format, allowing models to respond effectively even with limited context. Below is a summary of the benchmarks included:
1. Hebrew Question Answering
This task assesses a model’s ability to comprehend and process Hebrew information. It focuses on accurately retrieving answers based on context, evaluating the model’s grasp of Hebrew syntax and semantics through direct question-and-answer formats. The data for this benchmark is sourced from the HeQ dataset’s test subset.
2. Sentiment Accuracy
In this benchmark, we test the model’s capability to detect and interpret sentiments within Hebrew text. The task involves classifying statements as positive, negative, or neutral based on linguistic cues, providing insights into the model’s understanding of emotional nuances in language.
3. Winograd Schema Challenge
This task evaluates the model’s proficiency in pronoun resolution and contextual ambiguity in Hebrew. It challenges the model to apply logical reasoning and general world knowledge to disambiguate pronouns in complex sentences, reflecting its understanding of context and language intricacies.
4. Translation
Assessing the model’s translation skills between English and Hebrew, this task highlights its ability to maintain linguistic accuracy and fluency while preserving meaning across languages. This benchmark is crucial for evaluating bilingual translation capabilities, which are essential in our increasingly globalized world.
Technical Setup of the Leaderboard
Our leaderboard draws inspiration from the Open LLM Leaderboard and utilizes the Demo Leaderboard template. Submissions are deployed automatically using Hugging Face’s Inference Endpoints, with evaluations managed through API requests using the lighteval library. The setup process was streamlined, allowing us to focus on refining the evaluation metrics and ensuring robust performance across various models.
Join the Conversation: Engage with Us
We invite researchers, developers, and language enthusiasts to participate in this groundbreaking initiative. Whether you’re keen to submit your model for evaluation or wish to engage in discussions aimed at improving Hebrew language technologies, your involvement is invaluable. You can find submission guidelines on the leaderboard’s page and join the conversation in the discussion section of our HF space.
This leaderboard is not merely a benchmarking tool; it stands as a call to the Israeli tech community to recognize and tackle the gaps in language technology research for Hebrew. With precise evaluations, we aim to foster the development of models that are linguistically diverse and culturally accurate, paving the way for innovations that celebrate the richness of the Hebrew language.
Acknowledgments and Sponsorship
We take pride in announcing that our leaderboard is sponsored by DDR&D IMOD / The Israeli National Program for NLP in Hebrew and Arabic, in collaboration with DICTA: The Israel Center for Text Analysis and Webiks. This partnership signifies a collective commitment to advancing language technologies in Hebrew. A special note of thanks goes to Prof. Reut Tsarfaty from Bar-Ilan University for her invaluable scientific consultation and guidance.
In this exciting journey, we look forward to reshaping the landscape of language modeling in Hebrew, fostering a community that values and enhances linguistic diversity.
Inspired by: Source

