By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
AIModelKitAIModelKitAIModelKit
  • Home
  • News
    NewsShow More
    Chinese Court Grants Compensation to Employee Replaced by AI Technology
    Chinese Court Grants Compensation to Employee Replaced by AI Technology
    5 Min Read
    Musk’s xAI Operates Almost 50 Unmonitored Gas Turbines at Mississippi Data Center
    Musk’s xAI Operates Almost 50 Unmonitored Gas Turbines at Mississippi Data Center
    4 Min Read
    AI Chatbots Exposing Users’ Real Phone Numbers: What You Need to Know
    AI Chatbots Exposing Users’ Real Phone Numbers: What You Need to Know
    5 Min Read
    Mark Zuckerberg Unveils ‘Fully Private’ Encrypted Meta AI Chat for Enhanced User Security
    Mark Zuckerberg Unveils ‘Fully Private’ Encrypted Meta AI Chat for Enhanced User Security
    4 Min Read
    Commercial Plans for Drug Manufacturing in Space: Turning Orbit into a Pharmaceutical Production Hub
    Commercial Plans for Drug Manufacturing in Space: Turning Orbit into a Pharmaceutical Production Hub
    5 Min Read
  • Open-Source Models
    Open-Source ModelsShow More
    Enhancing Scientific Impact with Global Partnerships and Open Resources
    Enhancing Scientific Impact with Global Partnerships and Open Resources
    5 Min Read
    Top 4 Ways Google Research Scientists Utilize Empirical Research Assistance
    Top 4 Ways Google Research Scientists Utilize Empirical Research Assistance
    5 Min Read
    Unlocking DeepInfra on Hugging Face: Explore Powerful Inference Providers 🔥
    Unlocking DeepInfra on Hugging Face: Explore Powerful Inference Providers 🔥
    5 Min Read
    How AI-Generated Synthetic Neurons are Revolutionizing Brain Mapping
    How AI-Generated Synthetic Neurons are Revolutionizing Brain Mapping
    5 Min Read
    Discover HoloTab by HCompany: Your Ultimate AI Browser Companion
    4 Min Read
  • Guides
    GuidesShow More
    Creating Type-Safe LLM Agents Using Pydantic AI: A Comprehensive Guide | Real Python
    Creating Type-Safe LLM Agents Using Pydantic AI: A Comprehensive Guide | Real Python
    5 Min Read
    Mastering List Flattening in Python: A Quiz from Real Python
    Mastering List Flattening in Python: A Quiz from Real Python
    4 Min Read
    Test Your Knowledge: Python Memory Management Quiz – Real Python
    Test Your Knowledge: Python Memory Management Quiz – Real Python
    2 Min Read
    Mastering OpenCode: AI-Assisted Python Coding Quiz Guide | Real Python
    Mastering OpenCode: AI-Assisted Python Coding Quiz Guide | Real Python
    2 Min Read
    Master Python & APIs: Your Ultimate Quiz Guide to Accessing Public Data – Real Python
    Master Python & APIs: Your Ultimate Quiz Guide to Accessing Public Data – Real Python
    4 Min Read
  • Tools
    ToolsShow More
    Optimizing Use-Case Based Deployments with SageMaker JumpStart
    Optimizing Use-Case Based Deployments with SageMaker JumpStart
    5 Min Read
    Safetensors Partners with PyTorch Foundation: Strengthening AI Development
    Safetensors Partners with PyTorch Foundation: Strengthening AI Development
    5 Min Read
    High Throughput Computer Use Agent: Understanding 12B for Optimal Performance
    High Throughput Computer Use Agent: Understanding 12B for Optimal Performance
    5 Min Read
    Introducing the First Comprehensive Healthcare Robotics Dataset and Essential Physical AI Models for Advancing Healthcare Robotics
    Introducing the First Comprehensive Healthcare Robotics Dataset and Essential Physical AI Models for Advancing Healthcare Robotics
    6 Min Read
    Creating Native Multimodal Agents with Qwen 3.5 VLM on NVIDIA GPU-Accelerated Endpoints
    Creating Native Multimodal Agents with Qwen 3.5 VLM on NVIDIA GPU-Accelerated Endpoints
    5 Min Read
  • Events
    EventsShow More
    NVIDIA and Ineffable Intelligence Join Forces to Revolutionize Reinforcement Learning Infrastructure
    NVIDIA and Ineffable Intelligence Join Forces to Revolutionize Reinforcement Learning Infrastructure
    5 Min Read
    UK Financial Services Security Hackathon: Lloyds Banking Group, Hack The Box, and Google Cloud Join Forces
    UK Financial Services Security Hackathon: Lloyds Banking Group, Hack The Box, and Google Cloud Join Forces
    6 Min Read
    NVIDIA and SAP Enhance Trust in Specialized Agents Through Collaboration
    NVIDIA and SAP Enhance Trust in Specialized Agents Through Collaboration
    7 Min Read
    Introducing NVIDIA Spectrum-X: The Open, AI-Native Ethernet Fabric for Gigascale AI with Enhanced MRC Capabilities
    Introducing NVIDIA Spectrum-X: The Open, AI-Native Ethernet Fabric for Gigascale AI with Enhanced MRC Capabilities
    5 Min Read
    NVIDIA and ServiceNow Collaborate on Next-Gen Autonomous AI Agents for Enterprise Solutions
    NVIDIA and ServiceNow Collaborate on Next-Gen Autonomous AI Agents for Enterprise Solutions
    6 Min Read
  • Ethics
    EthicsShow More
    Layered Mutability: Continuous Governance in Self-Modifying Agents for Enhanced Persistence
    Layered Mutability: Continuous Governance in Self-Modifying Agents for Enhanced Persistence
    5 Min Read
    Ilya Sutskever Defends His Role in Sam Altman’s OpenAI Ouster: ‘I Aimed to Protect the Company’
    Ilya Sutskever Defends His Role in Sam Altman’s OpenAI Ouster: ‘I Aimed to Protect the Company’
    6 Min Read
    Understanding AI Behavior: Distinguishing Artificial Intelligence from Consciousness
    Understanding AI Behavior: Distinguishing Artificial Intelligence from Consciousness
    5 Min Read
    Understanding Speech Transcription: How It Influences Power Dynamics and Bias
    Understanding Speech Transcription: How It Influences Power Dynamics and Bias
    6 Min Read
    Trump-Xi Summit in Beijing: Prioritizing Shared AI Risks for Global Cooperation
    Trump-Xi Summit in Beijing: Prioritizing Shared AI Risks for Global Cooperation
    6 Min Read
  • Comparisons
    ComparisonsShow More
    SmellBench: Assessing LLM Agents for Repairing Architectural Code Smells
    SmellBench: Assessing LLM Agents for Repairing Architectural Code Smells
    6 Min Read
    MathlibPR: Benchmarking Pull Request Merge Readiness for Formal Mathematical Libraries
    MathlibPR: Benchmarking Pull Request Merge Readiness for Formal Mathematical Libraries
    5 Min Read
    Anthropic Unveils Claude AI Platform on AWS: What You Need to Know
    Anthropic Unveils Claude AI Platform on AWS: What You Need to Know
    5 Min Read
    ORCE: Enhancing Order-Aware Alignment of Verbalized Confidence in Large Language Models for Improved Performance
    ORCE: Enhancing Order-Aware Alignment of Verbalized Confidence in Large Language Models for Improved Performance
    5 Min Read
    Enhancing Predictive Monitoring of Clinical Pathways: A Comprehensive Pipeline for Continuous Risk Estimation from Data Lifting (2605.03895)
    Enhancing Predictive Monitoring of Clinical Pathways: A Comprehensive Pipeline for Continuous Risk Estimation from Data Lifting (2605.03895)
    6 Min Read
Search
  • Privacy Policy
  • Terms of Service
  • Contact Us
  • FAQ / Help Center
  • Advertise With Us
  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events
© 2025 AI Model Kit. All Rights Reserved.
Reading: SmellBench: Assessing LLM Agents for Repairing Architectural Code Smells
Share
Notification Show More
Font ResizerAa
AIModelKitAIModelKit
Font ResizerAa
  • 🏠
  • 🚀
  • 📰
  • 💡
  • 📚
  • ⭐
Search
  • Home
  • News
  • Models
  • Guides
  • Tools
  • Ethics
  • Events
  • Comparisons
Follow US
  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events
© 2025 AI Model Kit. All Rights Reserved.
AIModelKit > Comparisons > SmellBench: Assessing LLM Agents for Repairing Architectural Code Smells
Comparisons

SmellBench: Assessing LLM Agents for Repairing Architectural Code Smells

aimodelkit
Last updated: May 14, 2026 2:00 pm
aimodelkit
Share
SmellBench: Assessing LLM Agents for Repairing Architectural Code Smells
SHARE

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

In a rapidly evolving software development landscape, maintaining code quality is more important than ever. One of the prevailing challenges facing developers is the presence of architectural code smells—issues in the code design that can erode maintainability and are often costly to repair manually. While localized bugs are typically easier to fix due to their more straightforward nature, architectural code smells necessitate a higher level of cross-module reasoning about design intent. This complexity complicates the repair process, making automated tools less effective.

Contents
  • Understanding Architectural Code Smells
  • The Role of LLM Agents
    • Task Orchestration Framework
    • Evaluation Methodology
  • Empirical Findings
  • Implications for Automated Software Engineering

In this article, we explore SmellBench, an innovative framework designed by Ion George Dinu and his collaborators. This framework serves to evaluate the ability of large language model (LLM) agents to repair architectural code smells effectively.

Understanding Architectural Code Smells

Architectural code smells suggest deficiencies in code structure and design that can hinder long-term maintainability. Unlike simple bugs, these issues require an understanding of inter-module relationships and overall design principles. This need for broader architectural insights presents a significant challenge for both developers and automated tools. Some common types of architectural code smells include:

  • God Objects: Classes that control too much behavior, leading to high coupling and low cohesion.
  • Spaghetti Code: Code that is tangled and difficult to follow, making it hard to manage and maintain.
  • Feature Envy: Situations where one class is overly interested in another’s data or functionality, indicating a potential design flaw.

Addressing these smells is critical for creating maintainable and scalable software systems, but the challenge lies in their complexity.

The Role of LLM Agents

Large language model agents have demonstrated remarkable capabilities in code-level tasks, particularly in bug fixing and localized refactoring. However, their potential for repairing architectural code smells remains an underexplored area. SmellBench sets out to fill this gap by providing a structured evaluation of various agent configurations from four prominent model families: GPT, Claude, Gemini, and Mistral.

More Read

Join the AMD Pervasive AI Developer Contest: Showcase Your Skills and Win Prizes!
Exploring Fairness in Computer Vision and Natural Language Processing Models: An In-Depth Analysis of Research [2412.09900]
Enhancing Low-Illumination Anime Scenery: A Data Relativistic Uncertainty Framework
OpenAI Launches WebSocket Execution Mode to Minimize Latency in Agentic Workflows
Voice-Assisted Debugging for Python: Hear Your Code Errors with Enhanced Insights (Paper 2507.15007)

Task Orchestration Framework

At the heart of SmellBench is its task orchestration framework. This framework incorporates smell-type-specific optimized prompts, which help guide the LLM agents in their attempts to repair detected smells. Additionally, the framework supports iterative multi-step execution, allowing agents to refine their approaches based on outcomes.

Evaluation Methodology

The evaluation methodologies employed by SmellBench are comprehensive. They include a scoring system that measures:

  • Repair Effectiveness: How well the agents manage to fix the identified architectural smells.
  • False Positive Identification: The ability of agents to discern between actual smells and those erroneously flagged.
  • Net Codebase Impact: The broader effects of the repairs on the overall codebase quality.

By using these criteria, SmellBench can paint a more nuanced picture of LLM agent performance in relation to architectural code smell repair.

Empirical Findings

The empirical evaluation conducted on 11 agent configurations revealed some enlightening insights into the current capabilities of LLM agents. The study focused on 65 hard-severity architectural smells detected by PyExamine in the widely used Python project, scikit-learn, and compared the results with expert judgments for validation.

Notably, the expert validation process indicated that a staggering 63.1% of detected smells were false positives. Despite this high false-positive rate, the best-performing LLM agent achieved a commendable 47.7% resolution rate for actual architectural code smells. This shows that while LLMs are making strides, there remains a critical need for development in their architectural understanding.

Moreover, an intriguing relationship was uncovered between repair aggressiveness and net codebase quality. While some agents exhibited high repair rates, they inadvertently introduced up to 140 new smells—a clear indicator that aggressive repairs do not always lead to improved quality.

Implications for Automated Software Engineering

The findings from SmellBench underscore a significant gap between the current capabilities of LLMs in performing localized code transformations and the architectural awareness essential for effective cross-module refactoring. As developers increasingly rely on automated tools to maintain code, understanding these limitations becomes crucial for informed decision-making.

Beyond individual agent performance, SmellBench is positioned to serve as a reusable infrastructure that tracks progress in this critical yet underexplored domain of automated software engineering. By focusing on architectural code smells, it opens avenues for further research and development aimed at enhancing LLM capabilities.

This framework not only aims to improve LLM behavior but also enriches the discussions around automated software engineering practices, helping to shape the future of code maintenance and quality assurance in tech development.

To explore the comprehensive findings and methodologies, researchers and developers can access the paper, “SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair,” available in PDF format from the authors.

Inspired by: Source

Exploring Innovative Perspectives on Learning Dynamics
Microsoft Unveils Azure DevOps MCP Server: Now Available in Public Preview
Test-Time Reinforcement Learning for GUI Grounding: Ensuring Region Consistency
Anthropic Launches Custom Claude Skills for Tailored Task Management
Optimizing Functionality-Oriented LLM Merging on the Fisher-Rao Manifold for Enhanced Performance

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Copy Link Print
Previous Article NVIDIA and Ineffable Intelligence Join Forces to Revolutionize Reinforcement Learning Infrastructure NVIDIA and Ineffable Intelligence Join Forces to Revolutionize Reinforcement Learning Infrastructure

Stay Connected

XFollow
PinterestPin
TelegramFollow
LinkedInFollow

							banner							
							banner
Explore Top AI Tools Instantly
Discover, compare, and choose the best AI tools in one place. Easy search, real-time updates, and expert-picked solutions.
Browse AI Tools

Latest News

NVIDIA and Ineffable Intelligence Join Forces to Revolutionize Reinforcement Learning Infrastructure
NVIDIA and Ineffable Intelligence Join Forces to Revolutionize Reinforcement Learning Infrastructure
Events
Chinese Court Grants Compensation to Employee Replaced by AI Technology
Chinese Court Grants Compensation to Employee Replaced by AI Technology
News
MathlibPR: Benchmarking Pull Request Merge Readiness for Formal Mathematical Libraries
MathlibPR: Benchmarking Pull Request Merge Readiness for Formal Mathematical Libraries
Comparisons
Musk’s xAI Operates Almost 50 Unmonitored Gas Turbines at Mississippi Data Center
Musk’s xAI Operates Almost 50 Unmonitored Gas Turbines at Mississippi Data Center
News
//

Leading global tech insights for 20M+ innovators

Quick Link

  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events

Support

  • Privacy Policy
  • Terms of Service
  • Contact Us
  • FAQ / Help Center
  • Advertise With Us

Sign Up for Our Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

AIModelKitAIModelKit
Follow US
© 2025 AI Model Kit. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?