Depth Score

Module Interface

class torchmetrics.text.depth_score.DepthScore(model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, device=None, max_length=512, batch_size=64, num_threads=0, n_alpha=5, eps=0.3, p=5, depth_measure='irw', truncation=False, multi_ref_reduction='min', **kwargs)[source]

DepthScore Evaluating Text Generation for measuring text similarity.

DepthScore leverages pre-trained contextual token embeddings (e.g., from BERT-like models) and compares candidate and reference sentences by treating their token embeddings as point clouds and computing a depth- based pseudo-metric between the two distributions. This distance is designed to capture distributional mismatches between contextual representations and can be used for evaluating text generation tasks where lower distance indicates a better match.

This implementation follows the original implementation from DEPTH_score.

As input to forward and update the metric accepts the following input:

  • preds: Predicted sentence(s). Can be one of:

    • A single predicted sentence as a string (str)

    • A sequence of predicted sentences (Sequence[str])

  • target: Target/reference sentence(s). Can be one of:

    • A single reference sentence as a string (str)

    • A sequence of reference sentences (Sequence[str])

    • A sequence of sequences of reference sentences for multi-reference evaluation (Sequence[Sequence[str]])

As output of forward and compute the metric returns the following output:

  • score (Tensor): A 1D tensor of distances of shape (num_predictions,). For multi-reference input, the output is reduced per original prediction according to multi_ref_reduction.

Parameters:
  • preds (Union[str, Sequence[str]]) – A single predicted sentence or a sequence of predicted sentences.

  • target (Union[str, Sequence[str], Sequence[Sequence[str]]]) – A single target sentence, a sequence of target sentences, or a sequence of sequences of target sentences for multiple references per prediction.

  • model_name_or_path (Optional[str]) – A name or a model path used to load transformers pretrained model.

  • num_layers (Optional[int]) – A layer of representation to use.

  • all_layers (bool) – An indication of whether the representation from all model’s layers should be used. If all_layers=True, the argument num_layers is ignored.

  • model (Optional[Module]) – A user’s own model. Must be of torch.nn.Module instance.

  • user_tokenizer (Optional[Any]) – A user’s own tokenizer used with the own model. This must be an instance with the __call__ method. This method must take an iterable of sentences (List[str]) and must return a python dictionary containing “input_ids” and “attention_mask” represented by Tensor. It is up to the user’s model of whether “input_ids” is a Tensor of input ids or embedding vectors. This tokenizer must prepend an equivalent of [CLS] token and append an equivalent of [SEP] token as transformers tokenizer does.

  • user_forward_fn (Optional[Callable[[Module, dict[str, Tensor]], Tensor]]) – A user’s own forward function used in a combination with user_model. This function must take user_model and a python dictionary of containing "input_ids" and "attention_mask" represented by Tensor as an input and return the model’s output represented by the single Tensor.

  • verbose (bool) – An indication of whether a progress bar to be displayed during the embeddings’ calculation.

  • device (Union[device, str, None]) – A device to be used for calculation.

  • max_length (int) – A maximum length of input sequences. Sequences longer than max_length are to be trimmed.

  • batch_size (int) – A batch size used for model processing.

  • num_threads (int) – A number of threads to use for a dataloader.

  • n_alpha (int) – The Monte-Carlo parameter for the approximation of the integral over alpha (number of level-set thresholds between eps and 1.0).

  • eps (float) – The lowest level-set bound in [0, 1]. The highest level set is fixed to 1.0 in this implementation.

  • p (int) – The power of the ground cost.

  • depth_measure (str) – Depth / discrepancy measure to use (e.g. "irw" or "ai_irw").

  • truncation (bool) – An indication of whether the input sequences should be truncated to the max_length.

  • multi_ref_reduction (str) – Reduction to apply across multiple references per prediction. Default "min" (best match) since this is a distance metric. Options: "min", "max", "mean".

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example

>>> from pprint import pprint
>>> from torchmetrics.text.depth_score import DepthScore
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> depthscore = DepthScore()
>>> pprint(depthscore(preds, target))
tensor([...])

Example

>>> from pprint import pprint
>>> from torchmetrics.text.depth_score import DepthScore
>>> preds = ["hello there", "general kenobi"]
>>> target = [["hello there", "master kenobi"], ["hello there", "master kenobi"]]
>>> depthscore = DepthScore()
>>> pprint(depthscore(preds, target))
tensor([...])
plot(val=None, ax=None)[source]

Plot a single or multiple values from the metric.

Parameters:
  • val (Union[Tensor, Sequence[Tensor], None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.

  • ax (Optional[Axes]) – A matplotlib axis object. If provided will add plot to that axis.

Return type:

tuple[Figure, Union[Axes, ndarray]]

Returns:

Figure and Axes object

Raises:

ModuleNotFoundError – If matplotlib is not installed

>>> # Example plotting a single value
>>> from torchmetrics.text.depth_score import DepthScore
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> metric = DepthScore()
>>> metric.update(preds, target)
>>> fig_, ax_ = metric.plot()
../_images/depth_score-1.png
>>> # Example plotting multiple values
>>> from torch import tensor
>>> from torchmetrics.text.depth_score import DepthScore
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> metric = DepthScore()
>>> values = []
>>> for _ in range(10):
...     val = metric(preds, target)
...     val = val.mean()  # convert into a single scalar
...     values.append(val)
>>> fig_, ax_ = metric.plot(values)
../_images/depth_score-2.png

Functional Interface

torchmetrics.functional.text.depth_score.depth_score(preds, target, model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, device=None, max_length=512, batch_size=64, num_threads=0, truncation=False, n_alpha=5, n_dirs=10000, eps=0.3, p=5, depth_measure='irw', multi_ref_reduction='min')[source]

DepthScore Evaluating Text Generation for text similarity matching.

DepthScore measures the distance between two sentences by comparing the distributions of their contextual token embeddings using a depth-based pseudo-metric. Lower values indicate that the predicted sentence is closer to the reference sentence.

This implementation follows the original implementation from DEPTH_score.

Parameters:
  • preds (Union[str, Sequence[str], dict[str, Tensor]]) – Predicted sentence(s) as str, Sequence[str], or tokenized dict containing “input_ids” and “attention_mask”.

  • target (Union[str, Sequence[str], Sequence[Sequence[str]], dict[str, Tensor]]) – Reference sentence(s) as str, Sequence[str], multi-reference Sequence[Sequence[str]], or tokenized dict containing “input_ids” and “attention_mask”.

  • model_name_or_path (Optional[str]) – Hugging Face model name/path used when model is not provided.

  • num_layers (Optional[int]) – Hidden layer index to use for contextual embeddings. If None, the last layer is used.

  • all_layers (bool) – An indication of whether the representation from all model’s layers should be used. If all_layers=True, the argument num_layers is ignored.

  • model (Optional[Module]) – Optional user-provided model. If provided, user_tokenizer must also be provided.

  • user_tokenizer (Optional[Any]) – Tokenizer to use with a user-provided model. Ignored when model is None.

  • user_forward_fn (Optional[Callable[[Module, dict[str, Tensor]], Tensor]]) – Optional user-defined forward function producing embeddings from (model, batch_dict).

  • verbose (bool) – Whether to show a progress bar during embedding extraction.

  • device (Union[device, str, None]) – Device to run embedding extraction on.

  • max_length (int) – Maximum input sequence length. Longer sequences are trimmed if truncation=True.

  • batch_size (int) – Batch size used for model processing.

  • num_threads (int) – Number of dataloader workers.

  • truncation (bool) – Whether to truncate input sequences to max_length.

  • n_alpha (int) – Number of alpha levels used by the depth-based distance computation.

  • n_dirs (int) – Number of random projection directions used by depth/sliced computations.

  • eps (float) – Lower quantile bound (eps_min) used in the depth distance integration (upper bound fixed at 1.0).

  • p (int) – Power used in the distance aggregation.

  • depth_measure (str) – Depth/distance backend to use. One of: “irw”, “ai_irw”, “wasserstein”, “sliced”, “mmd”.

  • multi_ref_reduction (str) – Reduction to apply across multiple references per prediction. Default “min” (best match) since this is a distance metric.

Return type:

Tensor

Returns:

A 1D tensor of distances of shape (num_predictions,). For multi-reference input, the output is reduced per original prediction according to multi_ref_reduction.

Raises:
  • ValueError – If len(preds) != len(target).

  • ModuleNotFoundError – If verbose=True but tqdm is not installed.

  • ModuleNotFoundError – If default transformers model is required but transformers is not installed.

  • ValueError – If invalid input is provided for preds/target.

  • ValueError – If num_layers is larger than the number of model layers (when detectable).

Example

>>> from torchmetrics.functional.text.depth_score import depth_score
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> depth_score(preds, target, model_name_or_path="distilbert-base-uncased", num_layers=4, device="cpu")
tensor([...])

Example

>>> from torchmetrics.functional.text.depth_score import depth_score
>>> preds = ["hello there", "general kenobi"]
>>> target = [["hello there", "master kenobi"], ["hello there", "master kenobi"]]
>>> depth_score(preds, target, model_name_or_path="distilbert-base-uncased", num_layers=4, device="cpu")
tensor([...])