fastNLP.api

fastNLP.api.api

class fastNLP.api.api.POS(model_path=None, device='cpu')[source]

FastNLP API for Part-Of-Speech tagging.

Parameters:
  • model_path (str) – the path to the model.
  • device (str) – device name such as “cpu” or “cuda:0”. Use the same notation as PyTorch.
predict(content)[source]
Parameters:content – list of list of str. Each string is a token(word).
Return answer:list of list of str. Each string is a tag.
test(file_path)[source]

Test performance over the given data set.

Parameters:file_path (str) –
Returns:a dictionary of metric values

fastNLP.api.converter

fastNLP.api.model_zoo

fastNLP.api.pipeline

class fastNLP.api.pipeline.Pipeline(processors=None)[source]

Pipeline takes a DataSet object as input, runs multiple processors sequentially, and outputs a DataSet object.

fastNLP.api.processor

class fastNLP.api.processor.FullSpaceToHalfSpaceProcessor(field_name, change_alpha=True, change_digit=True, change_punctuation=True, change_space=True)[source]

全角转半角,以字符为处理单元

class fastNLP.api.processor.Index2WordProcessor(vocab, field_name, new_added_field_name)[source]

将DataSet中某个为index的field根据vocab转换为str

class fastNLP.api.processor.IndexerProcessor(vocab, field_name, new_added_field_name, delete_old_field=False, is_input=True)[source]
给定一个vocabulary , 将指定field转换为index形式。指定field应该是一维的list,比如
[‘我’, ‘是’, xxx]
class fastNLP.api.processor.Num2TagProcessor(tag, field_name, new_added_field_name=None)[source]

将一句话中的数字转换为某个tag。

class fastNLP.api.processor.PreAppendProcessor(data, field_name, new_added_field_name=None)[source]
向某个field的起始增加data(应该为str类型)。该field需要为list类型。即新增的field为
[data] + instance[field_name]
class fastNLP.api.processor.SeqLenProcessor(field_name, new_added_field_name='seq_lens', is_input=True)[source]

根据某个field新增一个sequence length的field。取该field的第一维

class fastNLP.api.processor.SliceProcessor(start, end, step, field_name, new_added_field_name=None)[source]

从某个field中只取部分内容。等价于instance[field_name][start:end:step]

class fastNLP.api.processor.VocabIndexerProcessor(field_name, new_added_filed_name=None, min_freq=1, max_size=None, verbose=0, is_input=True)[source]
根据DataSet创建Vocabulary,并将其用数字index。新生成的index的field会被放在new_added_filed_name, 如果没有提供
new_added_field_name, 则覆盖原有的field_name.
construct_vocab(*datasets)[source]

使用传入的DataSet创建vocabulary

Parameters:datasets – DataSet类型的数据,用于构建vocabulary
Returns:
process(*datasets, only_index_dataset=None)[source]
若还未建立Vocabulary,则使用dataset中的DataSet建立vocabulary;若已经有了vocabulary则使用已有的vocabulary。得到vocabulary
后,则会index datasets与only_index_dataset。
Parameters:
  • datasets – DataSet类型的数据
  • only_index_dataset – DataSet, or list of DataSet. 该参数中的内容只会被用于index,不会被用于生成vocabulary。
Returns:

set_verbose(verbose)[source]

设置processor verbose状态。

Parameters:verbose – int, 0,不输出任何信息;1,输出vocab 信息。
Returns:
class fastNLP.api.processor.VocabProcessor(field_name, min_freq=1, max_size=None)[source]

传入若干个DataSet以建立vocabulary。