Benchmark#

Constructor#

Benchmark(model, tokenizer[, task_name, ...])

Generic interface to compute multiple explanations.

Explaining#

Benchmark.explain(text[, target, ...])

Compute explanations using all the explainers stored in the class.

Benchmarking Explanations#

Benchmark.evaluate_explanation(explanation)

Evaluate an explanation using all the evaluators stored in the class.

Benchmark.evaluate_explanations(explanations)

Evaluate explanations using all the evaluators stored in the class.

Visualization#

Benchmark.show_table(explanations[, ...])

Benchmark.show_evaluation_table(...[, style])

Benchmark.show_samples_evaluation_table(...)

Format average evaluation scores into a colored table.

Datasets Interface#

Benchmark.load_dataset(dataset_name, **kwargs)

Benchmark.evaluate_samples(dataset, sample)

Explain a dataset sample, evaluate explanations, and compute average scores.

Inference#

Benchmark.score(text[, return_dict])

Compute prediction scores for a single query