Measuring Intelligence Summit: A Deep Dive into AI Evaluation

The upcoming Measuring Intelligence Summit on October 21 in San Francisco is set to capture the attention of AI enthusiasts and experts alike. Co-located with the PyTorch Conference 2025, this innovative event focuses on one of the most pressing questions in the realm of artificial intelligence: how can we effectively measure intelligence in both foundation models and agentic systems?

Contents

The Need for Evolving Evaluation Methods
Top 3 Reasons to Attend

1. Engage with Leading Voices in AI Evaluation
2. Be Part of Shaping Future Benchmarks
3. Connect with Leaders Driving Innovation

Program Highlights

Keynotes
Sessions
Panels

Are We Measuring Intelligence or Just Benchmarks?
Beyond the Leaderboard: Practical Intelligence in the Wild

Registration Information

The Need for Evolving Evaluation Methods

As AI systems rapidly evolve, so too must our methods for evaluating them. The summit is not just another conference; it’s a focused half-day gathering that aims to enhance and refine the metrics we use to assess AI capabilities. With a rising number of sophisticated AI models entering the market, traditional evaluation methods often fall short. Key topics slated for discussion include evaluating reasoning models, addressing the nuances of superintelligence, and the continuous evolution of AI benchmarks.

Attendees will gain first-hand insights into cutting-edge evaluation techniques, explore existing challenges, and participate in groundbreaking discussions that will shape the future of AI assessment. This is an opportunity for professionals invested in the fabric of AI to re-evaluate and innovate the ways we measure progress in this critical field.

Top 3 Reasons to Attend

1. Engage with Leading Voices in AI Evaluation

The summit will feature not only researchers but industry innovators from notable organizations like OpenAI, Stanford, and Meta. Participants can look forward to deep dives into the latest evaluation methodologies for reasoning, intelligence, and agentic behavior in complex AI systems. Engaging with these leading voices provides invaluable insights that can help guide future research and development in AI evaluation.

2. Be Part of Shaping Future Benchmarks

As the discussions unfold, attendees will engage in critical debates surrounding the effectiveness of existing benchmarks. The question looms: Do these benchmarks truly capture intelligence, or are we measuring something more superficial? The insights gained here will provide a unique perspective on how benchmarks can evolve to reflect true AI capabilities. Be part of the conversation that could redefine standards and practices for AI evaluation across various industries.

3. Connect with Leaders Driving Innovation

With a focus on building networks that go beyond the event, the Measuring Intelligence Summit offers an unparalleled platform for connecting with experts who are at the forefront of AI research and application. These connections can lead to collaborative projects, partnerships, and new ideas, enriching the community and fostering innovation.

Program Highlights

Keynotes

Framing the Frontier of Machine Intelligence – Joe Spisak, Meta
A Conversation on SOTA in Reasoning, Planning, and Inference Time Scaling – Noam Brown, OpenAI with Joe Spisak, Meta

These keynote sessions are designed to set the tone for the summit, framing discussions around what’s currently at the cutting edge of machine intelligence evaluation.

Sessions

The summit will also host a series of targeted sessions:

Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers – Jon Saad-Falcon, Stanford
Holistic Evaluation of Language Models (HELM) – Yifan Mai, Stanford University
Scaling Agentic Intelligence from Pre-Training to RL – Aakanksha Chowdery, Reflection AI & Stanford University
LMArena: The Reliability Standard for AI – Anastasios Angelopolous, LMArena

These sessions are structured to cover a spectrum of topics, each addressing different aspects of AI evaluation.

Panels

The summit will also feature engaging panel discussions, including:

Are We Measuring Intelligence or Just Benchmarks?

Participants: Sara Hooker, Vivienne Zhang (NVIDIA), Baber Abbasi (Eleuther AI), Nathan Habib (HuggingFace), Carlos Jimenez (Princeton University / SWE Bench)

This panel poses critical questions that explore the distinctions between genuine intelligence and mere benchmark performance.

Beyond the Leaderboard: Practical Intelligence in the Wild

Participants: Shishir Patil (Meta), Haifeng Xu (ProphetArena / U.Chicago), Tatiana Shavrina (Meta), Lisa Dunlap (UCB / LMSys), Rebecca Qian (Patronus AI)

This discussion aims to bridge the gap between theoretical intelligence and practical applications, focusing on real-world scenarios involving AI.

Registration Information

To secure your spot, attendees should register by adding the Measuring Intelligence Summit to their PyTorch Conference registration. This is a unique opportunity to gain insights that could redefine how we evaluate artificial intelligence moving forward.

The Measuring Intelligence Summit serves as a vital touchpoint for professionals involved in AI evaluation and development. Attendees can look forward to a day filled with thought-provoking discussions, innovative ideas, and the opportunity to shape the future of the AI landscape.

Inspired by: Source

Explore the Measuring Intelligence Summit at the PyTorch Conference

Measuring Intelligence Summit: A Deep Dive into AI Evaluation

The Need for Evolving Evaluation Methods

Top 3 Reasons to Attend

1. Engage with Leading Voices in AI Evaluation

2. Be Part of Shaping Future Benchmarks

3. Connect with Leaders Driving Innovation

Program Highlights

Keynotes

Sessions

Panels

Are We Measuring Intelligence or Just Benchmarks?

Beyond the Leaderboard: Practical Intelligence in the Wild

Registration Information

Stay Connected

Explore Top AI Tools Instantly

Latest News

Agoda Launches Innovative Multimodal Content System to Enhance Travel Discovery Through Images and Reviews

Ultimate Guide to Absolute vs Relative Imports in Python: Test Your Knowledge with Our Quiz – Real Python

Stricter UK Regulations for Tech Firms Addressing Intimate Image Abuse | Enhancing Internet Safety

Enhancing Urgent Care Satisfaction: How AI Analyzes Patient Reviews to Identify Key Drivers

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Measuring Intelligence Summit: A Deep Dive into AI Evaluation

The Need for Evolving Evaluation Methods

Top 3 Reasons to Attend

1. Engage with Leading Voices in AI Evaluation

2. Be Part of Shaping Future Benchmarks

More Read

3. Connect with Leaders Driving Innovation

Program Highlights

Keynotes

Sessions

Panels

Are We Measuring Intelligence or Just Benchmarks?

Beyond the Leaderboard: Practical Intelligence in the Wild

Registration Information

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

Agoda Launches Innovative Multimodal Content System to Enhance Travel Discovery Through Images and Reviews

Ultimate Guide to Absolute vs Relative Imports in Python: Test Your Knowledge with Our Quiz – Real Python

Stricter UK Regulations for Tech Firms Addressing Intimate Image Abuse | Enhancing Internet Safety

Enhancing Urgent Care Satisfaction: How AI Analyzes Patient Reviews to Identify Key Drivers