Depth Score¶

Module Interface¶

class torchmetrics.text.depth_score.DepthScore(model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, device=None, max_length=512, batch_size=64, num_threads=0, n_alpha=5, eps=0.3, p=5, depth_measure='irw', truncation=False, multi_ref_reduction='min', **kwargs)[source]¶

DepthScore Evaluating Text Generation for measuring text similarity.

DepthScore leverages pre-trained contextual token embeddings (e.g., from BERT-like models) and compares candidate and reference sentences by treating their token embeddings as point clouds and computing a depth- based pseudo-metric between the two distributions. This distance is designed to capture distributional mismatches between contextual representations and can be used for evaluating text generation tasks where lower distance indicates a better match.

This implementation follows the original implementation from DEPTH_score.

As input to forward and update the metric accepts the following input:

preds: Predicted sentence(s). Can be one of:
- A single predicted sentence as a string (str)
- A sequence of predicted sentences (Sequence[str])
target: Target/reference sentence(s). Can be one of:
- A single reference sentence as a string (str)
- A sequence of reference sentences (Sequence[str])
- A sequence of sequences of reference sentences for multi-reference evaluation (Sequence[Sequence[str]])

As output of forward and compute the metric returns the following output:

score (Tensor): A 1D tensor of distances of shape (num_predictions,). For multi-reference input, the output is reduced per original prediction according to multi_ref_reduction.

Parameters:

preds¶ (Union[str, Sequence[str]]) – A single predicted sentence or a sequence of predicted sentences.
target¶ (Union[str, Sequence[str], Sequence[Sequence[str]]]) – A single target sentence, a sequence of target sentences, or a sequence of sequences of target sentences for multiple references per prediction.
model_name_or_path¶ (Optional[str]) – A name or a model path used to load transformers pretrained model.
num_layers¶ (Optional[int]) – A layer of representation to use.
all_layers¶ (bool) – An indication of whether the representation from all model’s layers should be used. If all_layers=True, the argument num_layers is ignored.
model¶ (Optional[Module]) – A user’s own model. Must be of torch.nn.Module instance.
user_tokenizer¶ (Optional[Any]) – A user’s own tokenizer used with the own model. This must be an instance with the __call__ method. This method must take an iterable of sentences (List[str]) and must return a python dictionary containing “input_ids” and “attention_mask” represented by Tensor. It is up to the user’s model of whether “input_ids” is a Tensor of input ids or embedding vectors. This tokenizer must prepend an equivalent of [CLS] token and append an equivalent of [SEP] token as transformers tokenizer does.
user_forward_fn¶ (Optional[Callable[[Module, dict[str, Tensor]], Tensor]]) – A user’s own forward function used in a combination with user_model. This function must take user_model and a python dictionary of containing "input_ids" and "attention_mask" represented by Tensor as an input and return the model’s output represented by the single Tensor.
verbose¶ (bool) – An indication of whether a progress bar to be displayed during the embeddings’ calculation.
device¶ (Union[device, str, None]) – A device to be used for calculation.
max_length¶ (int) – A maximum length of input sequences. Sequences longer than max_length are to be trimmed.
batch_size¶ (int) – A batch size used for model processing.
num_threads¶ (int) – A number of threads to use for a dataloader.
n_alpha¶ (int) – The Monte-Carlo parameter for the approximation of the integral over alpha (number of level-set thresholds between eps and 1.0).
eps¶ (float) – The lowest level-set bound in [0, 1]. The highest level set is fixed to 1.0 in this implementation.
p¶ (int) – The power of the ground cost.
depth_measure¶ (str) – Depth / discrepancy measure to use (e.g. "irw" or "ai_irw").
truncation¶ (bool) – An indication of whether the input sequences should be truncated to the max_length.
multi_ref_reduction¶ (str) – Reduction to apply across multiple references per prediction. Default "min" (best match) since this is a distance metric. Options: "min", "max", "mean".
kwargs¶ (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example

>>> from pprint import pprint
>>> from torchmetrics.text.depth_score import DepthScore
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> depthscore = DepthScore()
>>> pprint(depthscore(preds, target))
tensor([...])

Example

>>> from pprint import pprint
>>> from torchmetrics.text.depth_score import DepthScore
>>> preds = ["hello there", "general kenobi"]
>>> target = [["hello there", "master kenobi"], ["hello there", "master kenobi"]]
>>> depthscore = DepthScore()
>>> pprint(depthscore(preds, target))
tensor([...])

plot(val=None, ax=None)[source]¶

Plot a single or multiple values from the metric.

Parameters:

val¶ (Union[Tensor, Sequence[Tensor], None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.
ax¶ (Optional[Axes]) – A matplotlib axis object. If provided will add plot to that axis.

Return type:

tuple[Figure, Union[Axes, ndarray]]

Returns:

Figure and Axes object

Raises:

ModuleNotFoundError – If matplotlib is not installed

>>> # Example plotting a single value
>>> from torchmetrics.text.depth_score import DepthScore
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> metric = DepthScore()
>>> metric.update(preds, target)
>>> fig_, ax_ = metric.plot()

>>> # Example plotting multiple values
>>> from torch import tensor
>>> from torchmetrics.text.depth_score import DepthScore
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> metric = DepthScore()
>>> values = []
>>> for _ in range(10):
...     val = metric(preds, target)
...     val = val.mean()  # convert into a single scalar
...     values.append(val)
>>> fig_, ax_ = metric.plot(values)

Functional Interface¶

torchmetrics.functional.text.depth_score.depth_score(preds, target, model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, device=None, max_length=512, batch_size=64, num_threads=0, truncation=False, n_alpha=5, n_dirs=10000, eps=0.3, p=5, depth_measure='irw', multi_ref_reduction='min')[source]¶

DepthScore Evaluating Text Generation for text similarity matching.

DepthScore measures the distance between two sentences by comparing the distributions of their contextual token embeddings using a depth-based pseudo-metric. Lower values indicate that the predicted sentence is closer to the reference sentence.

This implementation follows the original implementation from DEPTH_score.

Parameters:

preds¶ (Union[str, Sequence[str], dict[str, Tensor]]) – Predicted sentence(s) as str, Sequence[str], or tokenized dict containing “input_ids” and “attention_mask”.
target¶ (Union[str, Sequence[str], Sequence[Sequence[str]], dict[str, Tensor]]) – Reference sentence(s) as str, Sequence[str], multi-reference Sequence[Sequence[str]], or tokenized dict containing “input_ids” and “attention_mask”.
model_name_or_path¶ (Optional[str]) – Hugging Face model name/path used when model is not provided.
num_layers¶ (Optional[int]) – Hidden layer index to use for contextual embeddings. If None, the last layer is used.
all_layers¶ (bool) – An indication of whether the representation from all model’s layers should be used. If all_layers=True, the argument num_layers is ignored.
model¶ (Optional[Module]) – Optional user-provided model. If provided, user_tokenizer must also be provided.
user_tokenizer¶ (Optional[Any]) – Tokenizer to use with a user-provided model. Ignored when model is None.
user_forward_fn¶ (Optional[Callable[[Module, dict[str, Tensor]], Tensor]]) – Optional user-defined forward function producing embeddings from (model, batch_dict).
verbose¶ (bool) – Whether to show a progress bar during embedding extraction.
device¶ (Union[device, str, None]) – Device to run embedding extraction on.
max_length¶ (int) – Maximum input sequence length. Longer sequences are trimmed if truncation=True.
batch_size¶ (int) – Batch size used for model processing.
num_threads¶ (int) – Number of dataloader workers.
truncation¶ (bool) – Whether to truncate input sequences to max_length.
n_alpha¶ (int) – Number of alpha levels used by the depth-based distance computation.
n_dirs¶ (int) – Number of random projection directions used by depth/sliced computations.
eps¶ (float) – Lower quantile bound (eps_min) used in the depth distance integration (upper bound fixed at 1.0).
p¶ (int) – Power used in the distance aggregation.
depth_measure¶ (str) – Depth/distance backend to use. One of: “irw”, “ai_irw”, “wasserstein”, “sliced”, “mmd”.
multi_ref_reduction¶ (str) – Reduction to apply across multiple references per prediction. Default “min” (best match) since this is a distance metric.

Return type:

Tensor

Returns:

A 1D tensor of distances of shape (num_predictions,). For multi-reference input, the output is reduced per original prediction according to multi_ref_reduction.

Raises:

ValueError – If len(preds) != len(target).
ModuleNotFoundError – If verbose=True but tqdm is not installed.
ModuleNotFoundError – If default transformers model is required but transformers is not installed.
ValueError – If invalid input is provided for preds/target.
ValueError – If num_layers is larger than the number of model layers (when detectable).

Example

>>> from torchmetrics.functional.text.depth_score import depth_score
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> depth_score(preds, target, model_name_or_path="distilbert-base-uncased", num_layers=4, device="cpu")
tensor([...])

Example

>>> from torchmetrics.functional.text.depth_score import depth_score
>>> preds = ["hello there", "general kenobi"]
>>> target = [["hello there", "master kenobi"], ["hello there", "master kenobi"]]
>>> depth_score(preds, target, model_name_or_path="distilbert-base-uncased", num_layers=4, device="cpu")
tensor([...])