
Latest PyPI version Documentation Status HuggingFace Spaces Demo YouTube Video arxiv preprint downloads badge

Ferret circular logo with the name to the right

A python package for benchmarking interpretability techniques.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from ferret import Benchmark

name = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(name)
tokenizer = AutoTokenizer.from_pretrained(name)

bench = Benchmark(model, tokenizer)
explanations = bench.explain("You look stunning!", target=1)
evaluations = bench.evaluate_explanations(explanations, target=1)



ferret offers a painless integration with Hugging Face models and naming conventions. If you are already using the transformers library, you immediately get access to our Explanation and Evaluation API.

Supported Post-hoc Explainers

Supported Evaluation Metrics

Faithfulness measures:

Plausibility measures:

See our paper for details.


The Benchmark class exposes easy-to-use table visualization methods (e.g., within Jupyter Notebooks)

bench = Benchmark(model, tokenizer)

# Pretty-print feature attribution scores by all supported explainers
explanations = bench.explain("You look stunning!")

# Pretty-print all the supported evaluation metrics
evaluations = bench.evaluate_explanations(explanations)

Dataset Evaluations

The Benchmark class has a handy method to compute and average our evaluation metrics across multiple samples from a dataset.

import numpy as np
bench = Benchmark(model, tokenizer)

# Compute and average evaluation scores one of the supported dataset
samples = np.arange(20)
hatexdata = bench.load_dataset("hatexplain")
sample_evaluations =  bench.evaluate_samples(hatexdata, samples)

# Pretty-print the results


This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Logo and graphical assets made by Luca Attanasio.