09/12/2023
Explainable AI (XAI) gives ML practitioners tools to understand what their network is doing. Typically, this is done at a full architecture level where XAI tools provide insight as to what features a model is using such as sets of superpixels. However, to do this within the system, i.e. pulling out features from sublayers, is a much more daunting task. Researchers at Brown University have put together at XAI toolkit to do just this. It is a python based library that currently only supports transformer architectures, but is an important step forward in continuing to provide high quality XAI toolkits.
Find their work at: https://arxiv.org/abs/2309.00244
Despite recent advances in the field of explainability, much remains unknown about the algorithms that neural networks learn to represent. Recent work has attempted to understand trained models by decomposing them into functional circuits (Csordás et al., 2020; Lepori et al., 2023). To advance this...