Measuring Intelligence Summit: A Deep Dive into AI Evaluation
The upcoming Measuring Intelligence Summit on October 21 in San Francisco is set to capture the attention of AI enthusiasts and experts alike. Co-located with the PyTorch Conference 2025, this innovative event focuses on one of the most pressing questions in the realm of artificial intelligence: how can we effectively measure intelligence in both foundation models and agentic systems?
The Need for Evolving Evaluation Methods
As AI systems rapidly evolve, so too must our methods for evaluating them. The summit is not just another conference; it’s a focused half-day gathering that aims to enhance and refine the metrics we use to assess AI capabilities. With a rising number of sophisticated AI models entering the market, traditional evaluation methods often fall short. Key topics slated for discussion include evaluating reasoning models, addressing the nuances of superintelligence, and the continuous evolution of AI benchmarks.
Attendees will gain first-hand insights into cutting-edge evaluation techniques, explore existing challenges, and participate in groundbreaking discussions that will shape the future of AI assessment. This is an opportunity for professionals invested in the fabric of AI to re-evaluate and innovate the ways we measure progress in this critical field.
Top 3 Reasons to Attend
1. Engage with Leading Voices in AI Evaluation
The summit will feature not only researchers but industry innovators from notable organizations like OpenAI, Stanford, and Meta. Participants can look forward to deep dives into the latest evaluation methodologies for reasoning, intelligence, and agentic behavior in complex AI systems. Engaging with these leading voices provides invaluable insights that can help guide future research and development in AI evaluation.
2. Be Part of Shaping Future Benchmarks
As the discussions unfold, attendees will engage in critical debates surrounding the effectiveness of existing benchmarks. The question looms: Do these benchmarks truly capture intelligence, or are we measuring something more superficial? The insights gained here will provide a unique perspective on how benchmarks can evolve to reflect true AI capabilities. Be part of the conversation that could redefine standards and practices for AI evaluation across various industries.
3. Connect with Leaders Driving Innovation
With a focus on building networks that go beyond the event, the Measuring Intelligence Summit offers an unparalleled platform for connecting with experts who are at the forefront of AI research and application. These connections can lead to collaborative projects, partnerships, and new ideas, enriching the community and fostering innovation.
Program Highlights
Keynotes
- Framing the Frontier of Machine Intelligence – Joe Spisak, Meta
- A Conversation on SOTA in Reasoning, Planning, and Inference Time Scaling – Noam Brown, OpenAI with Joe Spisak, Meta
These keynote sessions are designed to set the tone for the summit, framing discussions around what’s currently at the cutting edge of machine intelligence evaluation.
Sessions
The summit will also host a series of targeted sessions:
- Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers – Jon Saad-Falcon, Stanford
- Holistic Evaluation of Language Models (HELM) – Yifan Mai, Stanford University
- Scaling Agentic Intelligence from Pre-Training to RL – Aakanksha Chowdery, Reflection AI & Stanford University
- LMArena: The Reliability Standard for AI – Anastasios Angelopolous, LMArena
These sessions are structured to cover a spectrum of topics, each addressing different aspects of AI evaluation.
Panels
The summit will also feature engaging panel discussions, including:
Are We Measuring Intelligence or Just Benchmarks?
- Participants: Sara Hooker, Vivienne Zhang (NVIDIA), Baber Abbasi (Eleuther AI), Nathan Habib (HuggingFace), Carlos Jimenez (Princeton University / SWE Bench)
This panel poses critical questions that explore the distinctions between genuine intelligence and mere benchmark performance.
Beyond the Leaderboard: Practical Intelligence in the Wild
- Participants: Shishir Patil (Meta), Haifeng Xu (ProphetArena / U.Chicago), Tatiana Shavrina (Meta), Lisa Dunlap (UCB / LMSys), Rebecca Qian (Patronus AI)
This discussion aims to bridge the gap between theoretical intelligence and practical applications, focusing on real-world scenarios involving AI.
Registration Information
To secure your spot, attendees should register by adding the Measuring Intelligence Summit to their PyTorch Conference registration. This is a unique opportunity to gain insights that could redefine how we evaluate artificial intelligence moving forward.
The Measuring Intelligence Summit serves as a vital touchpoint for professionals involved in AI evaluation and development. Attendees can look forward to a day filled with thought-provoking discussions, innovative ideas, and the opportunity to shape the future of the AI landscape.
Inspired by: Source

