orkgnlp.annotation.tdm.encoder.TdmDataset
- class TdmDataset(text, labels, tokenizer, max_input_sizes)[source]
Bases:
Dataset
The TdmDataset is a customized torch.utils.data.Dataset that simplifies the tokenization of sequences and can be used afterwards in a torch.utils.data.Dataloader for batch creation.
- Parameters
text (
str
) – Input text (hypothesis) to be concatenated with all known labels (premises).labels (
DataFrame
) – TDM gold labels given as one-columned-dataframetokenizer (
PreTrainedTokenizer
) – Tokenizer for tokenizing the texts.max_input_sizes (
int
) – Max length of a sequence including special characters.
Methods