MIT researchers developed a new method to improve concept bottleneck models (CBMs) for AI explainability in computer vision. Instead of relying on human-defined or LLM-generated concepts, the technique uses a sparse autoencoder to extract concepts the model already learned during training, then translates them into plain language using a multimodal LLM. The approach converts any pretrained computer vision model into one that explains its predictions using up to five human-understandable concepts. Tested on bird species classification and skin lesion identification, the method outperformed existing CBMs in both accuracy and explanation quality, though a gap remains compared to non-interpretable black-box models.

6m read timeFrom news.mit.edu
Post cover image

Sort: