Guide Labs, a San Francisco-based startup, has announced the open-sourcing of Steerling-8B, an 8-billion-parameter large language model. The model is designed to trace every token generated back to its specific origins within the training data.
The development of Steerling-8B stems from research Julius Adebayo began during his PhD at MIT. In 2018, he co-authored a widely cited paper demonstrating that existing methods for understanding deep learning models were unreliable. This foundational work led to a new methodology for building LLMs that engineers interpretability directly into the model’s structure. Rather than applying post-hoc analysis, Guide Labs inserts a concept layer that buckets data into traceable categories. While this approach requires more up-front data annotation assisted by other AI models, it establishes a transparent framework from the ground up.
Despite the structured architecture, Steerling-8B retains emergent behaviors. The team tracks what they term “discovered concepts,” which the model identifies independently during training. Adebayo cited quantum computing as an example of a concept the model found on its own, illustrating that the system does not rely solely on pre-labeled data categories.
Adebayo addressed the complexities of controlling model behavior, specifically regarding sensitive attributes like gender. “If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I’ve encoded, and then you have to be able to reliably turn that on, turn them off,” Adebayo told TechCrunch. He noted that while current models allow for some control, it remains fragile, characterizing the reliable management of these encodings as “one of the holy grail questions” in the field.
The company identifies several practical applications for Steerling-8B’s interpretability. In consumer-facing applications, the architecture enables developers to block copyrighted material or control outputs related to sensitive subjects such as violence or drug abuse. In regulated industries, specifically finance, the model allows for compliance in areas like loan evaluation, where the algorithm can be instructed to consider financial records while explicitly ignoring race. Guide Labs has also developed technology for scientific research, addressing the need for insight into why deep learning models produce specific results, such as in protein folding simulations.
Performance benchmarks indicate that Steerling-8B achieves 90% of the capability of existing, non-interpretable models while utilizing less training data. Adebayo argues that this efficiency demonstrates a shift from theoretical science to practical engineering. “This model demonstrates that training interpretable models is no longer a sort of science; it’s now an engineering problem,” Adebayo said. “We figured out the science and we can scale them, and there is no reason why this kind of model wouldn’t match the performance of the frontier level models.”
Guide Labs originated from Y Combinator and secured a $9 million seed round from Initialized Capital in November 2024. The company’s roadmap includes building a larger model and providing API and agentic access to users. Adebayo emphasized the importance of democratizing interpretability as AI systems grow more powerful. “The way we’re currently training models is super primitive, and so democratizing inherent interpretability is actually going to be a long-term good thing for our role within the human race,” Adebayo said. “As we’re going after these models that are going to be super intelligent, you don’t want something to be making decisions on your behalf that’s sort of mysterious to you.”




