Abstract

This paper addresses the challenge of providing understandable explanations for machine learning classification decisions. To do this, we introduce a dataset of expert-written textual explanations paired with numerical explanations, forming a data-to-text generation task. We fine-tune BART and T5 language models on this dataset to generate natural language explanations by linearizing the information represented by explainable output graphs. We find that the models can produce fluent and largely accurate textual explanations. We experiment with various configurations and see that an augmented dataset leads to a reduced error rate. Additionally, we probe the numerical explanations more directly by fine-tuning BART and T5 on a question-answer task and achieved an accuracy of 91% with T5.