By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
AIModelKitAIModelKitAIModelKit
  • Home
  • News
    NewsShow More
    What Apple Could Offer with a Budget-Friendly Mac: Potential Features and Benefits
    What Apple Could Offer with a Budget-Friendly Mac: Potential Features and Benefits
    5 Min Read
    EU Weighs Easing AI Act in Response to Trump and Big Tech Pressure: Insights from the European Commission
    EU Weighs Easing AI Act in Response to Trump and Big Tech Pressure: Insights from the European Commission
    5 Min Read
    Microsoft’s Ambitious AI Vision: Developing a Human-Centric Superintelligence
    Microsoft’s Ambitious AI Vision: Developing a Human-Centric Superintelligence
    5 Min Read
    Sam Altman Reveals OpenAI’s  Billion Annual Recurring Revenue and .4 Trillion in Data Center Investments
    Sam Altman Reveals OpenAI’s $20 Billion Annual Recurring Revenue and $1.4 Trillion in Data Center Investments
    4 Min Read
    “Shift Your Focus: Embrace the Big Picture and Overcome Your AI Footprint Concerns”
    “Shift Your Focus: Embrace the Big Picture and Overcome Your AI Footprint Concerns”
    5 Min Read
  • Open-Source Models
    Open-Source ModelsShow More
    Revolutionizing Continual Learning: A New Paradigm in Machine Learning
    Revolutionizing Continual Learning: A New Paradigm in Machine Learning
    5 Min Read
    Advanced and Versatile Data Science Agent: Cutting-Edge Solutions for Your Business
    Advanced and Versatile Data Science Agent: Cutting-Edge Solutions for Your Business
    5 Min Read
    Transforming Loss Analysis into Effective Risk Prediction Strategies
    Transforming Loss Analysis into Effective Risk Prediction Strategies
    5 Min Read
    Designing a Scalable AI Infrastructure System for Space Applications
    Designing a Scalable AI Infrastructure System for Space Applications
    5 Min Read
    Aligning Frozen Latent Text-to-Audio Models with Video: Insights from Stability AI
    Aligning Frozen Latent Text-to-Audio Models with Video: Insights from Stability AI
    4 Min Read
  • Guides
    GuidesShow More
    Unlocking AI Potential: Effective Strategies and Insights from the TDS Newsletter
    Unlocking AI Potential: Effective Strategies and Insights from the TDS Newsletter
    6 Min Read
    Unlock Free AI and Data Science Courses with 365 Data Science – Enjoy Unlimited Access Until November 21!
    Unlock Free AI and Data Science Courses with 365 Data Science – Enjoy Unlimited Access Until November 21!
    4 Min Read
    Creating User Interfaces in the Terminal Using Python Textual – A Comprehensive Guide by Real Python
    Creating User Interfaces in the Terminal Using Python Textual – A Comprehensive Guide by Real Python
    5 Min Read
    Top Data Science Resources: What’s on My Bookmarks Bar
    Top Data Science Resources: What’s on My Bookmarks Bar
    7 Min Read
    Exploring a FastAPI Example Application: Quiz Tutorial on Real Python
    Exploring a FastAPI Example Application: Quiz Tutorial on Real Python
    4 Min Read
  • Tools
    ToolsShow More
    Evaluating LLM Performance on AI-Generated CUDA Code Using ComputeEval 2025.2: A Comprehensive Benchmarking Study
    Evaluating LLM Performance on AI-Generated CUDA Code Using ComputeEval 2025.2: A Comprehensive Benchmarking Study
    4 Min Read
    Deep Learning Library for Solving Imaging Inverse Problems Using PyTorch
    Deep Learning Library for Solving Imaging Inverse Problems Using PyTorch
    4 Min Read
    Boosting AI Innovation: How PyTorch is Revolutionizing Performance with Intelligent Caching
    Boosting AI Innovation: How PyTorch is Revolutionizing Performance with Intelligent Caching
    5 Min Read
    Collaborating for a Brighter Future: Introducing OpenEnv and the Open Agent Ecosystem
    Collaborating for a Brighter Future: Introducing OpenEnv and the Open Agent Ecosystem
    6 Min Read
    Dell Technologies Becomes Premier Member of the PyTorch Foundation: Enhancing AI Development and Collaboration
    Dell Technologies Becomes Premier Member of the PyTorch Foundation: Enhancing AI Development and Collaboration
    5 Min Read
  • Events
    EventsShow More
    Effective Use of QR Codes in Education: Guidelines for Thoughtful Integration
    Effective Use of QR Codes in Education: Guidelines for Thoughtful Integration
    6 Min Read
    4 Essential Features for Effective Handouts: Enhancing Tech Education
    4 Essential Features for Effective Handouts: Enhancing Tech Education
    5 Min Read
    How Hack The Box is Revolutionizing Cybersecurity Training Labs on LinkedIn Learning to Address Workforce Readiness Gaps
    How Hack The Box is Revolutionizing Cybersecurity Training Labs on LinkedIn Learning to Address Workforce Readiness Gaps
    5 Min Read
    Deutsche Telekom and NVIDIA Unveil Industrial AI Cloud: A Game-Changer for Germany’s Industrial Transformation
    Deutsche Telekom and NVIDIA Unveil Industrial AI Cloud: A Game-Changer for Germany’s Industrial Transformation
    5 Min Read
    NVIDIA Unveils BlueField-4: Key Features and Impact on Data Center Innovation | NVIDIA Blog
    NVIDIA Unveils BlueField-4: Key Features and Impact on Data Center Innovation | NVIDIA Blog
    5 Min Read
  • Ethics
    EthicsShow More
    How ICE’s Unsafe Dependence on Facial Recognition Technology Poses Risks to Public Safety
    How ICE’s Unsafe Dependence on Facial Recognition Technology Poses Risks to Public Safety
    5 Min Read
    How AI Can Optimize Government Spending: Why Human Oversight Is Essential
    How AI Can Optimize Government Spending: Why Human Oversight Is Essential
    6 Min Read
    The Dangers of For-Profit Solar Geoengineering: Threats to Science and Public Trust
    The Dangers of For-Profit Solar Geoengineering: Threats to Science and Public Trust
    5 Min Read
    Understanding ChatGPT in School Counseling: Capabilities, Stability, Potential Risks, and Simulation Use in Psychological Counseling
    Understanding ChatGPT in School Counseling: Capabilities, Stability, Potential Risks, and Simulation Use in Psychological Counseling
    5 Min Read
    Meta Asserts Downloaded Porn Central to AI Lawsuit Was Intended for ‘Personal Use’
    Meta Asserts Downloaded Porn Central to AI Lawsuit Was Intended for ‘Personal Use’
    5 Min Read
  • Comparisons
    ComparisonsShow More
    Optimizing Model Performance: Effective Strategies for Fine-Tuning Transfer Learning
    Optimizing Model Performance: Effective Strategies for Fine-Tuning Transfer Learning
    5 Min Read
    Enhancing Knowledge Graph Augmented LLMs: The Role of Ground-Truth Subgraphs in Training and Evaluation
    Enhancing Knowledge Graph Augmented LLMs: The Role of Ground-Truth Subgraphs in Training and Evaluation
    5 Min Read
    How to Create a Fraud-Proof Revenue Stream for Your Subscription-Based Platform
    How to Create a Fraud-Proof Revenue Stream for Your Subscription-Based Platform
    5 Min Read
    TextualVerifier: A Step-by-Step Guide to Verifying TextGrad
    TextualVerifier: A Step-by-Step Guide to Verifying TextGrad
    4 Min Read
    Enhancing Robust Control Systems with Recurrent Neural Networks: Closed-Loop Regional Incremental ISS and Its Application in Model Predictive Control (MPC) Design
    Enhancing Robust Control Systems with Recurrent Neural Networks: Closed-Loop Regional Incremental ISS and Its Application in Model Predictive Control (MPC) Design
    4 Min Read
Search
  • Privacy Policy
  • Terms of Service
  • Contact Us
  • FAQ / Help Center
  • Advertise With Us
  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events
© 2025 AI Model Kit. All Rights Reserved.
Reading: Evaluating LLM Performance on AI-Generated CUDA Code Using ComputeEval 2025.2: A Comprehensive Benchmarking Study
Share
Notification Show More
Font ResizerAa
AIModelKitAIModelKit
Font ResizerAa
  • 🏠
  • 🚀
  • 📰
  • 💡
  • 📚
  • ⭐
Search
  • Home
  • News
  • Models
  • Guides
  • Tools
  • Ethics
  • Events
  • Comparisons
Follow US
  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events
© 2025 AI Model Kit. All Rights Reserved.
AIModelKit > Tools > Evaluating LLM Performance on AI-Generated CUDA Code Using ComputeEval 2025.2: A Comprehensive Benchmarking Study
Tools

Evaluating LLM Performance on AI-Generated CUDA Code Using ComputeEval 2025.2: A Comprehensive Benchmarking Study

aimodelkit
Last updated: November 7, 2025 9:15 pm
aimodelkit
Share
Evaluating LLM Performance on AI-Generated CUDA Code Using ComputeEval 2025.2: A Comprehensive Benchmarking Study
SHARE

As the landscape of artificial intelligence evolves, the question arises: Can AI coding assistants effectively write efficient CUDA code? To probe this nuanced topic, we introduced ComputeEval, a comprehensive open-source benchmark designed to evaluate AI models and agents across various CUDA programming tasks.

A few months back, we proudly unveiled the first iteration of ComputeEval and are now thrilled to announce its first major expansion—over 100 new CUDA challenges. This significant update brings the dataset to a total of 232 CUDA and CUDA Compute Core Libraries (CCCL) problems, showcasing our commitment to continuously push the envelope of what’s possible in AI-assisted coding.

With this expansion, we have purposefully elevated the complexity of the challenges. The new problems require large language models (LLMs) to harness modern CUDA features, incorporating elements like Tensor Cores, advanced shared memory patterns, and warp-level primitives. Furthermore, these challenges rigorously test the ability to orchestrate cutting-edge features such as CUDA Graphs, Streams, and Events—all within the framework of real-world applications like dynamic simulations.

LLM Performance on CUDA Programming

To gauge the performance of several leading LLMs on ComputeEval, our team conducted extensive evaluations, establishing baseline performance metrics and gaining insights into the current state of AI-assisted CUDA programming. As per our findings, displayed in Table 1, the results reflect the challenges posed by our latest benchmark.

Model ComputeEval 2025.2
232 new problems
pass@1
ComputeEval 2025.1
128 problems
pass@1
GPT-5 (medium) 0.5819 0.61
Claude Sonnet 4.0 0.5517 0.64
gpt-oss-20B (high) 0.5474 N/A
gpt-oss-120b (high) 0.5302 N/A
Claude Opus 4.0 0.5216 N/A
DeepSeek-R1 0.4397 0.55
gpt-oss-120b (medium) 0.4224 N/A
gpt-oss-20b (medium) 0.4224 N/A
gpt-oss-120b (low) 0.4052 N/A
DeepSeek-V3.1 0.3750 0.44
Llama 4 Maverick 17B 128E 0.3448 0.47
Llama 3.1 405B 0.3405 0.4
gpt-oss-20B (low) 0.3319 0.41
Table 1. Pass@1 accuracy of state-of-the-art LLMs on ComputeEval 2025.1 and 2025.2. The latest version introduces 232 new CUDA programming challenges, providing a tougher benchmark for AI-assisted coding.

Interestingly, all models exhibited a decline in performance metrics with the transition to ComputeEval 2025.2. This is not an indication of decreasing capabilities; rather, it highlights the increased difficulty of the benchmark. Each new release represents a step forward in our efforts to demand a deeper understanding from AI systems regarding the subtleties of accelerated computing.

What’s Next and How to Get Involved

The journey doesn’t stop here. We are committed to further expanding the dataset and enhancing the capabilities of the ComputeEval evaluation framework. Plans are already underway to broaden ComputeEval’s coverage to include additional CUDA-X libraries such as cuBLAS, CUTLASS, cuDNN, RAPIDS, and beyond. We enthusiastically invite members of the HPC and AI communities to contribute and collaborate in this pioneering initiative.

Explore the code on GitHub and access the dataset on Hugging Face. Together, let’s reshape the future of AI-powered coding!

Inspired by: Source

Contents
  • LLM Performance on CUDA Programming
  • What’s Next and How to Get Involved
Optimized Gemma 3 Models: Lightweight, Multimodal, and Multilingual for Enhanced Performance
Master Long Document Processing with Mistral Medium 3 and NVIDIA NIM: A Guide to Building Effective Agents
Dell Technologies Becomes Premier Member of the PyTorch Foundation: Enhancing AI Development and Collaboration
Unlock Google Cloud TPUs for Hugging Face Users: Enhance Your AI Models Today!
Introducing the AI Text-to-Image Leaderboard and Arena: A New Frontier in Artificial Analysis

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Copy Link Print
Previous Article Revolutionizing Continual Learning: A New Paradigm in Machine Learning Revolutionizing Continual Learning: A New Paradigm in Machine Learning

Stay Connected

XFollow
PinterestPin
TelegramFollow
LinkedInFollow
banner banner
Explore Top AI Tools Instantly
Discover, compare, and choose the best AI tools in one place. Easy search, real-time updates, and expert-picked solutions.
Browse AI Tools

Latest News

Revolutionizing Continual Learning: A New Paradigm in Machine Learning
Revolutionizing Continual Learning: A New Paradigm in Machine Learning
Open-Source Models
Effective Use of QR Codes in Education: Guidelines for Thoughtful Integration
Effective Use of QR Codes in Education: Guidelines for Thoughtful Integration
Events
What Apple Could Offer with a Budget-Friendly Mac: Potential Features and Benefits
What Apple Could Offer with a Budget-Friendly Mac: Potential Features and Benefits
News
Optimizing Model Performance: Effective Strategies for Fine-Tuning Transfer Learning
Optimizing Model Performance: Effective Strategies for Fine-Tuning Transfer Learning
Comparisons
//

Leading global tech insights for 20M+ innovators

Quick Link

  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events

Support

  • Privacy Policy
  • Terms of Service
  • Contact Us
  • FAQ / Help Center
  • Advertise With Us

Sign Up for Our Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

AIModelKitAIModelKit
Follow US
© 2025 AI Model Kit. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?