By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
AIModelKitAIModelKitAIModelKit
  • Home
  • News
    NewsShow More
    OpenAI Acquires AI Personal Finance Startup Hiro: What This Means for the Future
    OpenAI Acquires AI Personal Finance Startup Hiro: What This Means for the Future
    5 Min Read
    Microsoft Develops New OpenClaw-like AI Agent: What to Expect
    Microsoft Develops New OpenClaw-like AI Agent: What to Expect
    4 Min Read
    Microsoft Tests OpenClaw-Inspired AI Bots for Enhanced Copilot Functionality
    Microsoft Tests OpenClaw-Inspired AI Bots for Enhanced Copilot Functionality
    4 Min Read
    How Companies Are Expanding AI Adoption While Maintaining Control
    How Companies Are Expanding AI Adoption While Maintaining Control
    6 Min Read
    Explore the World’s Largest Orbital Compute Cluster Now Open for Business
    Explore the World’s Largest Orbital Compute Cluster Now Open for Business
    6 Min Read
  • Open-Source Models
    Open-Source ModelsShow More
    Pioneering the Future of Computer Use: Expanding Digital Frontiers
    Pioneering the Future of Computer Use: Expanding Digital Frontiers
    5 Min Read
    Protecting Cryptocurrency: How to Responsibly Disclose Quantum Vulnerabilities
    Protecting Cryptocurrency: How to Responsibly Disclose Quantum Vulnerabilities
    4 Min Read
    Boosting AI and XR Prototyping Efficiency with XR Blocks and Gemini
    Boosting AI and XR Prototyping Efficiency with XR Blocks and Gemini
    5 Min Read
    Transforming News Reports into Data Insights with Gemini: A Comprehensive Guide
    Transforming News Reports into Data Insights with Gemini: A Comprehensive Guide
    6 Min Read
    Enhancing Urban Safety: AI-Powered Flash Flood Forecasting Solutions for Cities
    Enhancing Urban Safety: AI-Powered Flash Flood Forecasting Solutions for Cities
    5 Min Read
  • Guides
    GuidesShow More
    Master Python Continuous Integration and Deployment with GitHub Actions: Take the Real Python Quiz
    Master Python Continuous Integration and Deployment with GitHub Actions: Take the Real Python Quiz
    3 Min Read
    Exploring the Role of Data Generalists: Why Range is More Important than Depth
    Exploring the Role of Data Generalists: Why Range is More Important than Depth
    6 Min Read
    Master Python Protocols: Take the Ultimate Quiz with Real Python
    Master Python Protocols: Take the Ultimate Quiz with Real Python
    4 Min Read
    Mastering Input and Output in Python: Quiz from Real Python
    Mastering Input and Output in Python: Quiz from Real Python
    3 Min Read
    Mastering Python Logging: Simplify Your Workflow with Loguru – A Real Python Guide
    Mastering Python Logging: Simplify Your Workflow with Loguru – A Real Python Guide
    4 Min Read
  • Tools
    ToolsShow More
    Safetensors Partners with PyTorch Foundation: Strengthening AI Development
    Safetensors Partners with PyTorch Foundation: Strengthening AI Development
    5 Min Read
    High Throughput Computer Use Agent: Understanding 12B for Optimal Performance
    High Throughput Computer Use Agent: Understanding 12B for Optimal Performance
    5 Min Read
    Introducing the First Comprehensive Healthcare Robotics Dataset and Essential Physical AI Models for Advancing Healthcare Robotics
    Introducing the First Comprehensive Healthcare Robotics Dataset and Essential Physical AI Models for Advancing Healthcare Robotics
    6 Min Read
    Creating Native Multimodal Agents with Qwen 3.5 VLM on NVIDIA GPU-Accelerated Endpoints
    Creating Native Multimodal Agents with Qwen 3.5 VLM on NVIDIA GPU-Accelerated Endpoints
    5 Min Read
    Discover SyGra Studio: Your Gateway to Exceptional Creative Solutions
    Discover SyGra Studio: Your Gateway to Exceptional Creative Solutions
    6 Min Read
  • Events
    EventsShow More
    Navigating the ESSER Cliff: Key Reasons Education Company Leaders are Attending the 2026 EdExec Summit
    Navigating the ESSER Cliff: Key Reasons Education Company Leaders are Attending the 2026 EdExec Summit
    6 Min Read
    Exploring National Robotics Week: Key Physical AI Research Breakthroughs and Essential Resources
    Exploring National Robotics Week: Key Physical AI Research Breakthroughs and Essential Resources
    5 Min Read
    Developing a Comprehensive Four-Part Professional Development Series on AI Education
    Developing a Comprehensive Four-Part Professional Development Series on AI Education
    6 Min Read
    NVIDIA and Thinking Machines Lab Forge Strategic Gigawatt-Scale Partnership for Long-Term Innovation
    NVIDIA and Thinking Machines Lab Forge Strategic Gigawatt-Scale Partnership for Long-Term Innovation
    5 Min Read
    ABB Robotics Utilizes NVIDIA Omniverse for Scalable Industrial-Grade Physical AI Solutions
    ABB Robotics Utilizes NVIDIA Omniverse for Scalable Industrial-Grade Physical AI Solutions
    5 Min Read
  • Ethics
    EthicsShow More
    Meta Faces Warning: Facial Recognition Glasses Could Empower Sexual Predators
    Meta Faces Warning: Facial Recognition Glasses Could Empower Sexual Predators
    5 Min Read
    How Increased Job Commodification Makes Your Role More Susceptible to AI: Insights from Online Freelancing
    How Increased Job Commodification Makes Your Role More Susceptible to AI: Insights from Online Freelancing
    6 Min Read
    Exclusive Jeff VanderMeer Story & Unreleased AI Models: The Download You Can’t Miss
    Exclusive Jeff VanderMeer Story & Unreleased AI Models: The Download You Can’t Miss
    5 Min Read
    Exploring Psychological Learning Paradigms: Their Impact on Shaping and Constraining Artificial Intelligence
    Exploring Psychological Learning Paradigms: Their Impact on Shaping and Constraining Artificial Intelligence
    4 Min Read
    Anthropic Faces Supply Chain Risk Limbo Amid Conflicting Legal Rulings
    Anthropic Faces Supply Chain Risk Limbo Amid Conflicting Legal Rulings
    6 Min Read
  • Comparisons
    ComparisonsShow More
    Enhancing Mission-Critical Small Language Models through Multi-Model Synthetic Training: Insights from Research 2509.13047
    Enhancing Mission-Critical Small Language Models through Multi-Model Synthetic Training: Insights from Research 2509.13047
    4 Min Read
    Google Launches Gemma 4: Emphasizing Local-First, On-Device AI Inference for Enhanced Performance
    Google Launches Gemma 4: Emphasizing Local-First, On-Device AI Inference for Enhanced Performance
    5 Min Read
    Overcoming Limitations of Discrete Neuronal Attribution in Neuroscience
    Overcoming Limitations of Discrete Neuronal Attribution in Neuroscience
    5 Min Read
    Optimizing Bandwidth for Cooperative Multi-Agent Reinforcement Learning: Variational Message Encoding Techniques
    Optimizing Bandwidth for Cooperative Multi-Agent Reinforcement Learning: Variational Message Encoding Techniques
    4 Min Read
    Anthropic Unveils Claude Mythos Preview Featuring Advanced Cybersecurity Features, Access Restricted for Public
    Anthropic Unveils Claude Mythos Preview Featuring Advanced Cybersecurity Features, Access Restricted for Public
    6 Min Read
Search
  • Privacy Policy
  • Terms of Service
  • Contact Us
  • FAQ / Help Center
  • Advertise With Us
  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events
© 2025 AI Model Kit. All Rights Reserved.
Reading: Why SAEs Trained on Identical Data Sets Can Discover Different Features
Share
Notification Show More
Font ResizerAa
AIModelKitAIModelKit
Font ResizerAa
  • 🏠
  • 🚀
  • 📰
  • 💡
  • 📚
  • ⭐
Search
  • Home
  • News
  • Models
  • Guides
  • Tools
  • Ethics
  • Events
  • Comparisons
Follow US
  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events
© 2025 AI Model Kit. All Rights Reserved.
AIModelKit > Comparisons > Why SAEs Trained on Identical Data Sets Can Discover Different Features
Comparisons

Why SAEs Trained on Identical Data Sets Can Discover Different Features

aimodelkit
Last updated: April 13, 2025 8:45 am
aimodelkit
Share
Why SAEs Trained on Identical Data Sets Can Discover Different Features
SHARE

Understanding the Impact of Random Initializations on TopK Sparse Autoencoders

In the realm of machine learning, particularly in the training of neural networks, the role of initialization cannot be overstated. This article delves into the intriguing findings from our investigation of TopK Sparse Autoencoders (SAEs), specifically how variations in random initialization can lead to divergent feature representations even when trained on identical datasets with the same batch order.

Contents
  • Divergence in Latent Representations
  • Interpretability of Unshared Latents
  • Feature Splitting and Absorption
  • Stability Across Different Architectures
  • Methodology: Measuring Latent Alignment
  • Latent Overlap Across Multiple Models
  • Frequency of Latent Activation
  • The Influence of SAE Size on Feature Overlap
  • Investigating Interpretability of Unique Latents
  • Conclusion

Divergence in Latent Representations

When two TopK SAEs are trained using the same data but with different random initializations, a fascinating phenomenon occurs. Our study reveals that only about 53% of the features are shared between these two models. This relatively low overlap suggests that a significant number of latents in one SAE do not have a close counterpart in the other, and vice versa. The implication here is profound: the features learned by SAEs are not fixed or universally applicable, but rather can be highly variable based on initialization.

Interpretability of Unshared Latents

Interestingly, many of the unshared latents exhibit interpretability. This raises the question of how different training paths can lead to distinct yet interpretable representations. Furthermore, we observed that narrower SAEs tend to have a higher overlap of features across random seeds. In contrast, as the size of the SAE increases, the degree of overlap diminishes. This trend aligns with existing literature on feature splitting and absorption, indicating that the features learned by SAEs can be somewhat arbitrary.

Feature Splitting and Absorption

The behavior of SAEs supports the idea that learned features are not atomic. Instead, different configurations can lead to various interpretations of the same latent features. As the size of the SAEs increases, we also see a phenomenon known as feature absorption, where some latents gain an “implicit” meaning alongside their “explicit” feature interpretation. This duality in representation can allow models to learn disjoint representations even when they are trained on the same data.

Stability Across Different Architectures

Our findings suggest that the architecture of the SAE plays a crucial role in the stability of feature learning under different random seeds. Previous studies have indicated that certain architectures, like ReLU SAEs trained with an L1 penalty, show significant stability across different initializations. In contrast, TopK SAEs appear to benefit from methods that align different seeds, highlighting the need for careful consideration in architectural choices.

More Read

How to Generate Pragmatic Examples for Training Neural Program Synthesizers
How to Generate Pragmatic Examples for Training Neural Program Synthesizers
Expert Prompt Tuning: A Comprehensive Guide to Manifold Mapping Techniques
Discover Affordable AI Assistants Powered by Knowledge Graphs of Thoughts
Adaptive Tokenization Strategies for Improving Evolving Language Models
Optimizing Sequential Bayesian Experimental Design in Infinite Dimensions Using Policy Gradient Reinforcement Learning

Methodology: Measuring Latent Alignment

To quantify the alignment between independently trained SAEs, we employed the Hungarian algorithm. This method efficiently computes the matching between latents, maximizing the average cosine similarity between matched encoder and decoder vectors. The resulting alignment score provides a clear measure of how similarly the two models interpret the latent space.

Upon analyzing the distribution of cosine similarities, we observed that there are two distinct modes: one reflecting high similarity and another indicating low similarity. This duality suggests that while some latents are closely aligned, others diverge significantly. In cases where the encoder and decoder matchings disagree, the cosine similarity tends to be lower, reinforcing the complexity of the latent space.

Latent Overlap Across Multiple Models

Further exploration revealed that when introducing a third SAE trained with a different random seed, the overlap of shared latents decreased from 47% to 35%. This finding indicates that the majority of shared latents between the first two models also persist in their relationship with the third model, showcasing an interesting dynamic of latent retention across different configurations.

Frequency of Latent Activation

An important aspect of our investigation was examining the frequency of latent activation across models. We found that the latents most frequently activated in SAE 1 were also those shared with SAE 2 and SAE 3. Conversely, the latents that activated infrequently in SAE 1 were those unique to that model. Intriguingly, some latents exclusive to SAE 1 exhibited a higher average firing rate than those present across all models, hinting at a complex relationship between activation frequency and latent representation.

The Influence of SAE Size on Feature Overlap

Our research also underscores a clear relationship between the size of the SAE and the fraction of unshared latents. Even when applying a more lenient metric for defining shared features, it became evident that larger SAEs retained a greater number of unique features. The computational demands of analyzing these larger models are significant, with implementations taking considerable time and resources, further complicating the exploration of this relationship.

Investigating Interpretability of Unique Latents

To delve deeper into the interpretability of unshared latents, we utilized an auto-interp approach to evaluate over 7,000 latents from two 32,768 latent SAEs. Our findings indicated a promising average interpretability score of 0.72, with a significant number of explanations falling within a reasonable range of clarity. However, the latents with low interpretability scores often correlated with low similarity across different seeds, suggesting that while some latents may be unique to a specific initialization, they might not lend themselves to clear interpretation.

Conclusion

The exploration of TopK SAEs trained under varying random initializations reveals a rich tapestry of latent representations that diverge significantly based on initial conditions. Our results challenge the notion of a universal set of features, highlighting the importance of viewing feature discovery as a compositional problem. As we continue to investigate these phenomena, we anticipate further insights into the intricate dance between architecture, initialization, and feature representation in neural networks.

By understanding these dynamics, we can better harness the capabilities of SAEs and other machine learning models, paving the way for more robust and interpretable AI systems.

Enhancing Continual Learning in Language Models with Thalamically Routed Cortical Columns: A 2602.22479 Study
Maximizing Efficiency in Large Language Model Inference: Key Energy Considerations and Optimization Strategies
Enhancing Explainable AI: The Importance of Formalization in Artificial Intelligence Development
Achieving Effective Long-Context Training Without Relying on Lengthy Documents
Effective Load Balancing Strategies for Optimizing AI Training Workloads

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Copy Link Print
Previous Article Advanced Bilingual French-English Language Model for Enhanced Communication Advanced Bilingual French-English Language Model for Enhanced Communication
Next Article Enhancing Explainable Moral Judgment Through Contrastive Ethical Insights from Large Language Models Enhancing Explainable Moral Judgment Through Contrastive Ethical Insights from Large Language Models

Stay Connected

XFollow
PinterestPin
TelegramFollow
LinkedInFollow

							banner							
							banner
Explore Top AI Tools Instantly
Discover, compare, and choose the best AI tools in one place. Easy search, real-time updates, and expert-picked solutions.
Browse AI Tools

Latest News

Enhancing Mission-Critical Small Language Models through Multi-Model Synthetic Training: Insights from Research 2509.13047
Enhancing Mission-Critical Small Language Models through Multi-Model Synthetic Training: Insights from Research 2509.13047
Comparisons
OpenAI Acquires AI Personal Finance Startup Hiro: What This Means for the Future
OpenAI Acquires AI Personal Finance Startup Hiro: What This Means for the Future
News
Google Launches Gemma 4: Emphasizing Local-First, On-Device AI Inference for Enhanced Performance
Google Launches Gemma 4: Emphasizing Local-First, On-Device AI Inference for Enhanced Performance
Comparisons
Master Python Continuous Integration and Deployment with GitHub Actions: Take the Real Python Quiz
Master Python Continuous Integration and Deployment with GitHub Actions: Take the Real Python Quiz
Guides
//

Leading global tech insights for 20M+ innovators

Quick Link

  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events

Support

  • Privacy Policy
  • Terms of Service
  • Contact Us
  • FAQ / Help Center
  • Advertise With Us

Sign Up for Our Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

AIModelKitAIModelKit
Follow US
© 2025 AI Model Kit. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?