Extracting 100K concepts from an 8B LLM

(guidelabs.ai)

2 points | by adebayoj 11 hours ago ago

1 comments

adebayoj 11 hours ago ago
Hey HN we recently released Steerling-8B, an 8B model designed to be interpretable from the ground up. The model has ~100K concept slots it fills on its own during training, and we can read off what each one means by projecting into vocabulary space.
The model figured out things like British vs. American spelling, second-person pronouns across 6+ languages, and even broken Unicode.
Take a look, and let us know what you think.