Depth Score¶
Module Interface¶
- class torchmetrics.text.depth_score.DepthScore(model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, device=None, max_length=512, batch_size=64, num_threads=0, n_alpha=5, eps=0.3, p=5, depth_measure='irw', truncation=False, multi_ref_reduction='min', **kwargs)[source]¶
DepthScore Evaluating Text Generation for measuring text similarity.
DepthScore leverages pre-trained contextual token embeddings (e.g., from BERT-like models) and compares candidate and reference sentences by treating their token embeddings as point clouds and computing a depth- based pseudo-metric between the two distributions. This distance is designed to capture distributional mismatches between contextual representations and can be used for evaluating text generation tasks where lower distance indicates a better match.
This implementation follows the original implementation from DEPTH_score.
As input to
forwardandupdatethe metric accepts the following input:preds: Predicted sentence(s). Can be one of:A single predicted sentence as a string (
str)A sequence of predicted sentences (
Sequence[str])
target: Target/reference sentence(s). Can be one of:A single reference sentence as a string (
str)A sequence of reference sentences (
Sequence[str])A sequence of sequences of reference sentences for multi-reference evaluation (
Sequence[Sequence[str]])
As output of
forwardandcomputethe metric returns the following output:score(Tensor): A 1D tensor of distances of shape (num_predictions,). For multi-reference input, the output is reduced per original prediction according to multi_ref_reduction.
- Parameters:
preds¶ (Union[str, Sequence[str]]) – A single predicted sentence or a sequence of predicted sentences.
target¶ (Union[str, Sequence[str], Sequence[Sequence[str]]]) – A single target sentence, a sequence of target sentences, or a sequence of sequences of target sentences for multiple references per prediction.
model_name_or_path¶ (
Optional[str]) – A name or a model path used to loadtransformerspretrained model.num_layers¶ (
Optional[int]) – A layer of representation to use.all_layers¶ (
bool) – An indication of whether the representation from all model’s layers should be used. Ifall_layers=True, the argumentnum_layersis ignored.model¶ (
Optional[Module]) – A user’s own model. Must be of torch.nn.Module instance.user_tokenizer¶ (
Optional[Any]) – A user’s own tokenizer used with the own model. This must be an instance with the__call__method. This method must take an iterable of sentences (List[str]) and must return a python dictionary containing “input_ids” and “attention_mask” represented byTensor. It is up to the user’s model of whether “input_ids” is aTensorof input ids or embedding vectors. This tokenizer must prepend an equivalent of[CLS]token and append an equivalent of[SEP]token astransformerstokenizer does.user_forward_fn¶ (
Optional[Callable[[Module,dict[str,Tensor]],Tensor]]) – A user’s own forward function used in a combination withuser_model. This function must takeuser_modeland a python dictionary of containing"input_ids"and"attention_mask"represented byTensoras an input and return the model’s output represented by the singleTensor.verbose¶ (
bool) – An indication of whether a progress bar to be displayed during the embeddings’ calculation.device¶ (
Union[device,str,None]) – A device to be used for calculation.max_length¶ (
int) – A maximum length of input sequences. Sequences longer thanmax_lengthare to be trimmed.num_threads¶ (
int) – A number of threads to use for a dataloader.n_alpha¶ (
int) – The Monte-Carlo parameter for the approximation of the integral over alpha (number of level-set thresholds betweenepsand 1.0).eps¶ (
float) – The lowest level-set bound in [0, 1]. The highest level set is fixed to 1.0 in this implementation.depth_measure¶ (
str) – Depth / discrepancy measure to use (e.g."irw"or"ai_irw").truncation¶ (
bool) – An indication of whether the input sequences should be truncated to themax_length.multi_ref_reduction¶ (
str) – Reduction to apply across multiple references per prediction. Default"min"(best match) since this is a distance metric. Options:"min","max","mean".kwargs¶ (
Any) – Additional keyword arguments, see Advanced metric settings for more info.
Example
>>> from pprint import pprint >>> from torchmetrics.text.depth_score import DepthScore >>> preds = ["hello there", "general kenobi"] >>> target = ["hello there", "master kenobi"] >>> depthscore = DepthScore() >>> pprint(depthscore(preds, target)) tensor([...])
Example
>>> from pprint import pprint >>> from torchmetrics.text.depth_score import DepthScore >>> preds = ["hello there", "general kenobi"] >>> target = [["hello there", "master kenobi"], ["hello there", "master kenobi"]] >>> depthscore = DepthScore() >>> pprint(depthscore(preds, target)) tensor([...])
- plot(val=None, ax=None)[source]¶
Plot a single or multiple values from the metric.
- Parameters:
val¶ (
Union[Tensor,Sequence[Tensor],None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.ax¶ (
Optional[Axes]) – A matplotlib axis object. If provided will add plot to that axis.
- Return type:
- Returns:
Figure and Axes object
- Raises:
ModuleNotFoundError – If matplotlib is not installed
>>> # Example plotting a single value >>> from torchmetrics.text.depth_score import DepthScore >>> preds = ["hello there", "general kenobi"] >>> target = ["hello there", "master kenobi"] >>> metric = DepthScore() >>> metric.update(preds, target) >>> fig_, ax_ = metric.plot()
>>> # Example plotting multiple values >>> from torch import tensor >>> from torchmetrics.text.depth_score import DepthScore >>> preds = ["hello there", "general kenobi"] >>> target = ["hello there", "master kenobi"] >>> metric = DepthScore() >>> values = [] >>> for _ in range(10): ... val = metric(preds, target) ... val = val.mean() # convert into a single scalar ... values.append(val) >>> fig_, ax_ = metric.plot(values)
Functional Interface¶
- torchmetrics.functional.text.depth_score.depth_score(preds, target, model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, device=None, max_length=512, batch_size=64, num_threads=0, truncation=False, n_alpha=5, n_dirs=10000, eps=0.3, p=5, depth_measure='irw', multi_ref_reduction='min')[source]¶
DepthScore Evaluating Text Generation for text similarity matching.
DepthScore measures the distance between two sentences by comparing the distributions of their contextual token embeddings using a depth-based pseudo-metric. Lower values indicate that the predicted sentence is closer to the reference sentence.
This implementation follows the original implementation from DEPTH_score.
- Parameters:
preds¶ (
Union[str,Sequence[str],dict[str,Tensor]]) – Predicted sentence(s) as str, Sequence[str], or tokenized dict containing “input_ids” and “attention_mask”.target¶ (
Union[str,Sequence[str],Sequence[Sequence[str]],dict[str,Tensor]]) – Reference sentence(s) as str, Sequence[str], multi-reference Sequence[Sequence[str]], or tokenized dict containing “input_ids” and “attention_mask”.model_name_or_path¶ (
Optional[str]) – Hugging Face model name/path used when model is not provided.num_layers¶ (
Optional[int]) – Hidden layer index to use for contextual embeddings. If None, the last layer is used.all_layers¶ (
bool) – An indication of whether the representation from all model’s layers should be used. Ifall_layers=True, the argumentnum_layersis ignored.model¶ (
Optional[Module]) – Optional user-provided model. If provided, user_tokenizer must also be provided.user_tokenizer¶ (
Optional[Any]) – Tokenizer to use with a user-provided model. Ignored when model is None.user_forward_fn¶ (
Optional[Callable[[Module,dict[str,Tensor]],Tensor]]) – Optional user-defined forward function producing embeddings from (model, batch_dict).verbose¶ (
bool) – Whether to show a progress bar during embedding extraction.device¶ (
Union[device,str,None]) – Device to run embedding extraction on.max_length¶ (
int) – Maximum input sequence length. Longer sequences are trimmed if truncation=True.truncation¶ (
bool) – Whether to truncate input sequences to max_length.n_alpha¶ (
int) – Number of alpha levels used by the depth-based distance computation.n_dirs¶ (
int) – Number of random projection directions used by depth/sliced computations.eps¶ (
float) – Lower quantile bound (eps_min) used in the depth distance integration (upper bound fixed at 1.0).depth_measure¶ (
str) – Depth/distance backend to use. One of: “irw”, “ai_irw”, “wasserstein”, “sliced”, “mmd”.multi_ref_reduction¶ (
str) – Reduction to apply across multiple references per prediction. Default “min” (best match) since this is a distance metric.
- Return type:
- Returns:
A 1D tensor of distances of shape (num_predictions,). For multi-reference input, the output is reduced per original prediction according to multi_ref_reduction.
- Raises:
ValueError – If len(preds) != len(target).
ModuleNotFoundError – If verbose=True but tqdm is not installed.
ModuleNotFoundError – If default transformers model is required but transformers is not installed.
ValueError – If invalid input is provided for preds/target.
ValueError – If num_layers is larger than the number of model layers (when detectable).
Example
>>> from torchmetrics.functional.text.depth_score import depth_score >>> preds = ["hello there", "general kenobi"] >>> target = ["hello there", "master kenobi"] >>> depth_score(preds, target, model_name_or_path="distilbert-base-uncased", num_layers=4, device="cpu") tensor([...])
Example
>>> from torchmetrics.functional.text.depth_score import depth_score >>> preds = ["hello there", "general kenobi"] >>> target = [["hello there", "master kenobi"], ["hello there", "master kenobi"]] >>> depth_score(preds, target, model_name_or_path="distilbert-base-uncased", num_layers=4, device="cpu") tensor([...])