Advancing Hebrew NLP: A New Era with Our Open LLM Leaderboard

The need for language technology advancements in Hebrew has never been more critical. As a morphologically rich language, Hebrew presents unique challenges that existing language models often struggle to address. In response to this gap, we’re thrilled to unveil our new open LLM leaderboard, specifically designed to evaluate and enhance language models for Hebrew.

Contents

Understanding the Complexity of Hebrew
Introducing Our Leaderboard Metrics and Tasks

1. Hebrew Question Answering
2. Sentiment Accuracy
3. Winograd Schema Challenge
4. Translation

Technical Setup of the Leaderboard
Join the Conversation: Engage with Us
Acknowledgments and Sponsorship

Understanding the Complexity of Hebrew

Hebrew is characterized by its intricate system of roots and patterns. Words are constructed from roots, with prefixes, suffixes, and infixes modifying their meaning, tense, or form. This complexity means that a single root can generate multiple valid word forms, making traditional tokenization strategies—suitable for morphologically simpler languages—inadequate. Consequently, many existing language models falter when tasked with understanding the nuances of Hebrew, underscoring the urgent need for dedicated benchmarks.

The challenge lies not just in the language’s structure but also in the cultural and contextual layers that influence meaning. Therefore, our focus is on developing benchmarks that specifically cater to the linguistic properties of Hebrew, ensuring that models can accurately interpret and generate Hebrew text.

Introducing Our Leaderboard Metrics and Tasks

To address these challenges, our leaderboard features four key datasets, each meticulously designed to evaluate language models on their understanding and generation of Hebrew. These benchmarks utilize a few-shot prompt format, allowing models to respond effectively even with limited context. Below is a summary of the benchmarks included:

1. Hebrew Question Answering

This task assesses a model’s ability to comprehend and process Hebrew information. It focuses on accurately retrieving answers based on context, evaluating the model’s grasp of Hebrew syntax and semantics through direct question-and-answer formats. The data for this benchmark is sourced from the HeQ dataset’s test subset.

2. Sentiment Accuracy

In this benchmark, we test the model’s capability to detect and interpret sentiments within Hebrew text. The task involves classifying statements as positive, negative, or neutral based on linguistic cues, providing insights into the model’s understanding of emotional nuances in language.

3. Winograd Schema Challenge

This task evaluates the model’s proficiency in pronoun resolution and contextual ambiguity in Hebrew. It challenges the model to apply logical reasoning and general world knowledge to disambiguate pronouns in complex sentences, reflecting its understanding of context and language intricacies.

4. Translation

Assessing the model’s translation skills between English and Hebrew, this task highlights its ability to maintain linguistic accuracy and fluency while preserving meaning across languages. This benchmark is crucial for evaluating bilingual translation capabilities, which are essential in our increasingly globalized world.

Technical Setup of the Leaderboard

Our leaderboard draws inspiration from the Open LLM Leaderboard and utilizes the Demo Leaderboard template. Submissions are deployed automatically using Hugging Face’s Inference Endpoints, with evaluations managed through API requests using the lighteval library. The setup process was streamlined, allowing us to focus on refining the evaluation metrics and ensuring robust performance across various models.

Join the Conversation: Engage with Us

We invite researchers, developers, and language enthusiasts to participate in this groundbreaking initiative. Whether you’re keen to submit your model for evaluation or wish to engage in discussions aimed at improving Hebrew language technologies, your involvement is invaluable. You can find submission guidelines on the leaderboard’s page and join the conversation in the discussion section of our HF space.

This leaderboard is not merely a benchmarking tool; it stands as a call to the Israeli tech community to recognize and tackle the gaps in language technology research for Hebrew. With precise evaluations, we aim to foster the development of models that are linguistically diverse and culturally accurate, paving the way for innovations that celebrate the richness of the Hebrew language.

Acknowledgments and Sponsorship

We take pride in announcing that our leaderboard is sponsored by DDR&D IMOD / The Israeli National Program for NLP in Hebrew and Arabic, in collaboration with DICTA: The Israel Center for Text Analysis and Webiks. This partnership signifies a collective commitment to advancing language technologies in Hebrew. A special note of thanks goes to Prof. Reut Tsarfaty from Bar-Ilan University for her invaluable scientific consultation and guidance.

In this exciting journey, we look forward to reshaping the landscape of language modeling in Hebrew, fostering a community that values and enhances linguistic diversity.

Inspired by: Source

Unveiling the Open Leaderboard for Hebrew Language Models: Track Performance and Rankings!

Advancing Hebrew NLP: A New Era with Our Open LLM Leaderboard

Understanding the Complexity of Hebrew

Introducing Our Leaderboard Metrics and Tasks

1. Hebrew Question Answering

2. Sentiment Accuracy

3. Winograd Schema Challenge

4. Translation

Technical Setup of the Leaderboard

Join the Conversation: Engage with Us

Acknowledgments and Sponsorship

Stay Connected

Explore Top AI Tools Instantly

Latest News

Enhancing Mission-Critical Small Language Models through Multi-Model Synthetic Training: Insights from Research 2509.13047

OpenAI Acquires AI Personal Finance Startup Hiro: What This Means for the Future

Google Launches Gemma 4: Emphasizing Local-First, On-Device AI Inference for Enhanced Performance

Master Python Continuous Integration and Deployment with GitHub Actions: Take the Real Python Quiz

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Advancing Hebrew NLP: A New Era with Our Open LLM Leaderboard

Understanding the Complexity of Hebrew

Introducing Our Leaderboard Metrics and Tasks

1. Hebrew Question Answering

More Read

2. Sentiment Accuracy

3. Winograd Schema Challenge

4. Translation

Technical Setup of the Leaderboard

Join the Conversation: Engage with Us

Acknowledgments and Sponsorship

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

Enhancing Mission-Critical Small Language Models through Multi-Model Synthetic Training: Insights from Research 2509.13047

OpenAI Acquires AI Personal Finance Startup Hiro: What This Means for the Future

Google Launches Gemma 4: Emphasizing Local-First, On-Device AI Inference for Enhanced Performance

Master Python Continuous Integration and Deployment with GitHub Actions: Take the Real Python Quiz