fastNLP.core¶
fastNLP.core.batch¶
-
class
fastNLP.core.batch.
Batch
(dataset, batch_size, sampler, as_numpy=False)[source]¶ Batch is an iterable object which iterates over mini-batches.
Example:
for batch_x, batch_y in Batch(data_set, batch_size=16, sampler=SequentialSampler()): # ...
Parameters: - dataset (DataSet) – a DataSet object
- batch_size (int) – the size of the batch
- sampler (Sampler) – a Sampler object
- as_numpy (bool) – If True, return Numpy array. Otherwise, return torch tensors.
fastNLP.core.dataset¶
-
class
fastNLP.core.dataset.
DataSet
(data=None)[source]¶ DataSet is the collection of examples. DataSet provides instance-level interface. You can append and access an instance of the DataSet. However, it stores data in a different way: Field-first, Instance-second.
-
add_field
(name, fields, padding_val=0, is_input=False, is_target=False)[source]¶ Add a new field to the DataSet.
Parameters: - name (str) – the name of the field.
- fields – a list of int, float, or other objects.
- padding_val (int) – integer for padding.
- is_input (bool) – whether this field is model input.
- is_target (bool) – whether this field is label or target.
-
append
(ins)[source]¶ Add an instance to the DataSet. If the DataSet is not empty, the instance must have the same field names as the rest instances in the DataSet.
Parameters: ins – an Instance object
-
apply
(func, new_field_name=None, **kwargs)[source]¶ Apply a function to every instance of the DataSet.
Parameters: - func – a function that takes an instance as input.
- new_field_name (str) – If not None, results of the function will be stored as a new field.
- **kwargs –
Accept parameters will be (1) is_input: boolean, will be ignored if new_field is None. If True, the new field will be as input. (2) is_target: boolean, will be ignored if new_field is None. If True, the new field will be as target.
Return results: if new_field_name is not passed, returned values of the function over all instances.
-
delete_field
(name)[source]¶ Delete a field based on the field name.
Parameters: name – the name of the field to be deleted.
-
drop
(func)[source]¶ Drop instances if a condition holds.
Parameters: func – a function that takes an Instance object as input, and returns bool. The instance will be dropped if the function returns True.
-
get_all_fields
()[source]¶ Return all the fields with their names.
Return field_arrays: the internal data structure of DataSet.
-
get_input_name
()[source]¶ Get all field names with is_input as True.
Return field_names: a list of str
-
get_target_name
()[source]¶ Get all field names with is_target as True.
Return field_names: a list of str
-
static
load
(path)[source]¶ Load a DataSet object from pickle.
Parameters: path (str) – the path to the pickle Return data_set:
-
classmethod
read_csv
(csv_path, headers=None, sep=', ', dropna=True)[source]¶ Load data from a CSV file and return a DataSet object.
Parameters: - csv_path (str) – path to the CSV file
- or Tuple[str] headers (List[str]) – headers of the CSV file
- sep (str) – delimiter in CSV file. Default: “,”
- dropna (bool) – If True, drop rows that have less entries than headers.
Return dataset: the read data set
-
rename_field
(old_name, new_name)[source]¶ Rename a field.
Parameters: - old_name (str) –
- new_name (str) –
-
save
(path)[source]¶ Save the DataSet object as pickle.
Parameters: path (str) – the path to the pickle
-
set_input
(*field_name, flag=True)[source]¶ Set the input flag of these fields.
Parameters: - field_name – a sequence of str, indicating field names.
- flag (bool) – Set these fields as input if True. Unset them if False.
-
fastNLP.core.fieldarray¶
-
class
fastNLP.core.fieldarray.
FieldArray
(name, content, padding_val=0, is_target=None, is_input=None)[source]¶ FieldArray
is the collection ofInstance``s of the same field. It is the basic element of ``DataSet
class.Parameters: - name (str) – the name of the FieldArray
- content (list) – a list of int, float, str or np.ndarray, or a list of list of one, or a np.ndarray.
- padding_val (int) – the integer for padding. Default: 0.
- is_target (bool) – If True, this FieldArray is used to compute loss.
- is_input (bool) – If True, this FieldArray is used to the model input.
fastNLP.core.instance¶
fastNLP.core.losses¶
-
class
fastNLP.core.losses.
LossFunc
(func, key_map=None, **kwargs)[source]¶ A wrapper of user-provided loss function.
-
fastNLP.core.losses.
make_mask
(lens, tar_len)[source]¶ To generate a mask over a sequence.
Parameters: - lens – list or LongTensor, [batch_size]
- tar_len – int
Return mask: ByteTensor
-
fastNLP.core.losses.
mask
(predict, truth, **kwargs)[source]¶ To select specific elements from Tensor. This method calls
squash()
.Parameters: - predict – Tensor, [batch_size , max_len , tag_size]
- truth – Tensor, [batch_size , max_len]
- **kwargs –
extra arguments, kwargs[“mask”]: ByteTensor, [batch_size , max_len], the mask Tensor. The position that is 1 will be selected.
Return predict , truth: predict & truth after processing
-
fastNLP.core.losses.
squash
(predict, truth, **kwargs)[source]¶ To reshape tensors in order to fit loss functions in PyTorch.
Parameters: - predict – Tensor, model output
- truth – Tensor, truth from dataset
- **kwargs –
extra arguments
Return predict , truth: predict & truth after processing
-
fastNLP.core.losses.
unpad
(predict, truth, **kwargs)[source]¶ To process padded sequence output to get true loss.
Parameters: - predict – Tensor, [batch_size , max_len , tag_size]
- truth – Tensor, [batch_size , max_len]
- kwargs – kwargs[“lens”] is a list or LongTensor, with size [batch_size]. The i-th element is true lengths of i-th sequence.
Return predict , truth: predict & truth after processing
-
fastNLP.core.losses.
unpad_mask
(predict, truth, **kwargs)[source]¶ To process padded sequence output to get true loss.
Parameters: - predict – Tensor, [batch_size , max_len , tag_size]
- truth – Tensor, [batch_size , max_len]
- kwargs – kwargs[“lens”] is a list or LongTensor, with size [batch_size]. The i-th element is true lengths of i-th sequence.
Return predict , truth: predict & truth after processing
fastNLP.core.metrics¶
-
class
fastNLP.core.metrics.
AccuracyMetric
(pred=None, target=None, seq_lens=None)[source]¶ Accuracy Metric
-
evaluate
(pred, target, seq_lens=None)[source]¶ Parameters: - pred – List of (torch.Tensor, or numpy.ndarray). Element’s shape can be: torch.Size([B,]), torch.Size([B, n_classes]), torch.Size([B, max_len]), torch.Size([B, max_len, n_classes])
- target – List of (torch.Tensor, or numpy.ndarray). Element’s can be: torch.Size([B,]), torch.Size([B,]), torch.Size([B, max_len]), torch.Size([B, max_len])
- seq_lens – List of (torch.Tensor, or numpy.ndarray). Element’s can be: None, None, torch.Size([B], torch.Size([B]). ignored if masks are provided.
-
-
class
fastNLP.core.metrics.
BMESF1PreRecMetric
(b_idx=0, m_idx=1, e_idx=2, s_idx=3, pred=None, target=None, seq_lens=None)[source]¶ - 按照BMES标注方式计算f1, precision, recall。由于可能存在非法tag,比如”BS”,所以需要用以下的表格做转换,cur_B意思是当前tag是B,
- next_B意思是后一个tag是B。则cur_B=S,即将当前被predict是B的tag标为S;next_M=B, 即将后一个被predict是M的tag标为B | | next_B | next_M | next_E | next_S | end | |:-----:|:——-:|:--------:|:——–:|:-------:|:——-:| | start | 合法 | next_M=B | next_E=S | 合法 | - | | cur_B | cur_B=S | 合法 | 合法 | cur_B=S | cur_B=S | | cur_M | cur_M=E | 合法 | 合法 | cur_M=E | cur_M=E | | cur_E | 合法 | next_M=B | next_E=S | 合法 | 合法 | | cur_S | 合法 | next_M=B | next_E=S | 合法 | 合法 |
- 举例:
- prediction为BSEMS,会被认为是SSSSS.
- 本Metric不检验target的合法性,请务必保证target的合法性。
- pred的形状应该为(batch_size, max_len) 或 (batch_size, max_len, 4)。 target形状为 (batch_size, max_len) seq_lens形状为 (batch_size, )
-
class
fastNLP.core.metrics.
MetricBase
[source]¶ Base class for all metrics.
MetricBase
handles validity check of its input dictionaries -pred_dict
andtarget_dict
.pred_dict
is the output offorward()
or prediction function of a model.target_dict
is the ground truth from DataSet whereis_target
is setTrue
.MetricBase
will do the following type checks:- whether self.evaluate has varargs, which is not supported.
- whether params needed by self.evaluate is not included in
pred_dict
,target_dict
. - whether params needed by self.evaluate duplicate in
pred_dict
,target_dict
. - whether params in
pred_dict
,target_dict
are not used by evaluate.(Might cause warning)
Besides, before passing params into self.evaluate, this function will filter out params from output_dict and target_dict which are not used in self.evaluate. (but if **kwargs presented in self.evaluate, no filtering will be conducted.) However, in some cases where type check is not necessary,
_fast_param_map
will be used.
-
class
fastNLP.core.metrics.
SpanFPreRecMetric
(tag_vocab, pred=None, target=None, seq_lens=None, encoding_type='bio', ignore_labels=None, only_gross=True, f_type='micro', beta=1)[source]¶ 在序列标注问题中,以span的方式计算F, pre, rec. 最后得到的metric结果为 {
‘f’: xxx, # 这里使用f考虑以后可以计算f_beta值 ‘pre’: xxx, ‘rec’:xxx} 若only_gross=False, 即还会返回各个label的metric统计值
{ ‘f’: xxx, ‘pre’: xxx, ‘rec’:xxx, ‘f-label’: xxx, ‘pre-label’: xxx, ‘rec-label’:xxx, …}
-
fastNLP.core.metrics.
accuracy_topk
(y_true, y_prob, k=1)[source]¶ Compute accuracy of y_true matching top-k probable labels in y_prob.
Parameters: - y_true – ndarray, true label, [n_samples]
- y_prob – ndarray, label probabilities, [n_samples, n_classes]
- k – int, k in top-k
Returns acc: accuracy of top-k
-
fastNLP.core.metrics.
bio_tag_to_spans
(tags, ignore_labels=None)[source]¶ Parameters: - tags – List[str],
- ignore_labels – List[str], 在该list中的label将被忽略
Returns: List[Tuple[str, List[int, int]]]. [(label,[start, end])]
-
fastNLP.core.metrics.
bmes_tag_to_spans
(tags, ignore_labels=None)[source]¶ Parameters: - tags – List[str],
- ignore_labels – List[str], 在该list中的label将被忽略
Returns: List[Tuple[str, List[int, int]]]. [(label,[start, end])]
-
fastNLP.core.metrics.
pred_topk
(y_prob, k=1)[source]¶ Return top-k predicted labels and corresponding probabilities.
Parameters: - y_prob – ndarray, size [n_samples, n_classes], probabilities on labels
- k – int, k of top-k
Returns (y_pred_topk, y_prob_topk): y_pred_topk: ndarray, size [n_samples, k], predicted top-k labels y_prob_topk: ndarray, size [n_samples, k], probabilities for top-k labels
fastNLP.core.optimizer¶
-
class
fastNLP.core.optimizer.
Adam
(lr=0.001, weight_decay=0, betas=(0.9, 0.999), eps=1e-08, amsgrad=False, model_params=None)[source]¶ Parameters: - lr (float) – learning rate
- weight_decay (float) –
- model_params – a generator. E.g.
model.parameters()
for PyTorch models.
fastNLP.core.predictor¶
-
class
fastNLP.core.predictor.
Predictor
[source]¶ An interface for predicting outputs based on trained models.
It does not care about evaluations of the model, which is different from Tester. This is a high-level model wrapper to be called by FastNLP. This class does not share any operations with Trainer and Tester. Currently, Predictor does not support GPU.
fastNLP.core.sampler¶
-
class
fastNLP.core.sampler.
BaseSampler
[source]¶ The base class of all samplers.
Sub-classes must implement the
__call__
method.__call__
takes a DataSet object and returns a list of int - the sampling indices.
-
class
fastNLP.core.sampler.
BucketSampler
(num_buckets=10, batch_size=32, seq_lens_field_name='seq_lens')[source]¶ Parameters: - num_buckets (int) – the number of buckets to use.
- batch_size (int) – batch size per epoch.
- seq_lens_field_name (str) – the field name indicating the field about sequence length.
-
fastNLP.core.sampler.
convert_to_torch_tensor
(data_list, use_cuda)[source]¶ Convert lists into (cuda) Tensors.
Parameters: - data_list – 2-level lists
- use_cuda – bool, whether to use GPU or not
Return data_list: PyTorch Tensor of shape [batch_size, max_seq_len]
-
fastNLP.core.sampler.
k_means_1d
(x, k, max_iter=100)[source]¶ Perform k-means on 1-D data.
Parameters: - x – list of int, representing points in 1-D.
- k – the number of clusters required.
- max_iter – maximum iteration
Return centroids: numpy array, centroids of the k clusters assignment: numpy array, 1-D, the bucket id assigned to each example.
-
fastNLP.core.sampler.
k_means_bucketing
(lengths, buckets)[source]¶ Assign all instances into possible buckets using k-means, such that instances in the same bucket have similar lengths.
Parameters: - lengths – list of int, the length of all samples.
- buckets – list of int. The length of the list is the number of buckets. Each integer is the maximum length threshold for each bucket (This is usually None.).
Return data: 2-level list
[ [index_11, index_12, ...], # bucket 1 [index_21, index_22, ...], # bucket 2 ... ]
fastNLP.core.tester¶
-
class
fastNLP.core.tester.
Tester
(data, model, metrics, batch_size=16, use_cuda=False, verbose=1)[source]¶ An collection of model inference and evaluation of performance, used over validation/dev set and test set.
Parameters: - data (DataSet) – a validation/development set
- model (torch.nn.modules.module) – a PyTorch model
- metrics (MetricBase) – a metric object or a list of metrics (List[MetricBase])
- batch_size (int) – batch size for validation
- use_cuda (bool) – whether to use CUDA in validation.
- verbose (int) – the number of steps after which an information is printed.
fastNLP.core.trainer¶
fastNLP.core.utils¶
-
exception
fastNLP.core.utils.
CheckError
(check_res: fastNLP.core.utils.CheckRes, func_signature: str)[source]¶ CheckError. Used in losses.LossBase, metrics.MetricBase.
-
class
fastNLP.core.utils.
CheckRes
(missing, unused, duplicated, required, all_needed, varargs)¶ -
all_needed
¶ Alias for field number 4
-
duplicated
¶ Alias for field number 2
-
missing
¶ Alias for field number 0
-
required
¶ Alias for field number 3
-
unused
¶ Alias for field number 1
-
varargs
¶ Alias for field number 5
-
-
fastNLP.core.utils.
get_func_signature
(func)[source]¶ Given a function or method, return its signature. For example: (1) function
method class Demo:
- def __init__(self):
xxx
def forward(self, a, b=’a’, **args)
demo = Demo() get_func_signature(demo.forward) # ‘Demo.forward(self, a, b=’a’, **args)’
Parameters: func – a function or a method Returns: str or None
-
fastNLP.core.utils.
load_pickle
(pickle_path, file_name)[source]¶ Load an object from a given pickle file.
Parameters: - pickle_path – str, the directory where the pickle file is.
- file_name – str, the name of the pickle file.
Return obj: an object stored in the pickle
-
fastNLP.core.utils.
pickle_exist
(pickle_path, pickle_name)[source]¶ Check if a given pickle file exists in the directory.
Parameters: - pickle_path – the directory of target pickle file
- pickle_name – the filename of target pickle file
Returns: True if file exists else False
-
fastNLP.core.utils.
save_pickle
(obj, pickle_path, file_name)[source]¶ Save an object into a pickle file.
Parameters: - obj – an object
- pickle_path – str, the directory where the pickle file is to be saved
- file_name – str, the name of the pickle file. In general, it should be ended by “pkl”.
-
fastNLP.core.utils.
seq_lens_to_masks
(seq_lens, float=False)[source]¶ Convert seq_lens to masks. :param seq_lens: list, np.ndarray, or torch.LongTensor, shape should all be (B,) :param float: if True, the return masks is in float type, otherwise it is byte. :return: list, np.ndarray or torch.Tensor, shape will be (B, max_length)
fastNLP.core.vocabulary¶
-
class
fastNLP.core.vocabulary.
Vocabulary
(max_size=None, min_freq=None, unknown='<unk>', padding='<pad>')[source]¶ Use for word and index one to one mapping
Example:
vocab = Vocabulary() word_list = "this is a word list".split() vocab.update(word_list) vocab["word"] vocab.to_word(5)
Parameters: - max_size (int) – set the max number of words in Vocabulary. Default: None
- min_freq (int) – set the min occur frequency of words in Vocabulary. Default: None