fastNLP.io

fastNLP.io.base_loader

class fastNLP.io.base_loader.BaseLoader[source]

Base loader for all loaders.

class fastNLP.io.base_loader.DataLoaderRegister[source]

Register for all data sets.

fastNLP.io.config_io

class fastNLP.io.config_io.ConfigLoader(data_path=None)[source]

Loader for configuration.

Parameters:data_path (str) – path to the config
static load_config(file_path, sections)[source]

Load section(s) of configuration into the sections provided. No returns.

Parameters:
  • file_path (str) – the path of config file
  • sections (dict) – the dict of {section_name(string): ConfigSection object}

Example:

test_args = ConfigSection()
ConfigLoader("config.cfg", "").load_config("./data_for_tests/config", {"POS_test": test_args})
class fastNLP.io.config_io.ConfigSaver(file_path)[source]

ConfigSaver is used to save config file and solve related conflicts.

Parameters:file_path (str) – path to the config file
save_config_file(section_name, section)[source]

This is the function to be called to change the config file with a single section and its name.

Parameters:
  • section_name (str) – The name of section what needs to be changed and saved.
  • section (ConfigSection) – The section with key and value what needs to be changed and saved.
class fastNLP.io.config_io.ConfigSection[source]

ConfigSection is the data structure storing all key-value pairs in one section in a config file.

fastNLP.io.dataset_loader

class fastNLP.io.dataset_loader.ClassDataSetLoader[source]

Loader for classification data sets

convert(data)[source]

Optional operation to build a DataSet.

Parameters:data – inner data structure (user-defined) to represent the data.
Returns:a DataSet object
load(data_path)[source]

Load data from a given file.

Parameters:path (str) – file path
Returns:a DataSet object
static parse(lines)[source]
Parameters:lines – lines from dataset
Returns:list(list(list())): the three level of lists are words, sentence, and dataset
class fastNLP.io.dataset_loader.Conll2003Loader[source]

Self-defined loader of conll2003 dataset

More information about the given dataset cound be found on https://sites.google.com/site/ermasoftware/getting-started/ne-tagging-conll2003-data

convert(parsed_data)[source]

Optional operation to build a DataSet.

Parameters:data – inner data structure (user-defined) to represent the data.
Returns:a DataSet object
load(dataset_path)[source]

Load data from a given file.

Parameters:path (str) – file path
Returns:a DataSet object
class fastNLP.io.dataset_loader.ConllLoader[source]

loader for conll format files

convert(data)[source]

Optional operation to build a DataSet.

Parameters:data – inner data structure (user-defined) to represent the data.
Returns:a DataSet object
load(data_path)[source]

Load data from a given file.

Parameters:path (str) – file path
Returns:a DataSet object
static parse(lines)[source]

:param list lines:a list containing all lines in a conll file. :return: a 3D list

class fastNLP.io.dataset_loader.DataSetLoader[source]

Interface for all DataSetLoaders.

convert(data)[source]

Optional operation to build a DataSet.

Parameters:data – inner data structure (user-defined) to represent the data.
Returns:a DataSet object
load(path)[source]

Load data from a given file.

Parameters:path (str) – file path
Returns:a DataSet object
class fastNLP.io.dataset_loader.LMDataSetLoader[source]

Language Model Dataset Loader

This loader produces data for language model training in a supervised way. That means it has X and Y.

convert(data)[source]

Optional operation to build a DataSet.

Parameters:data – inner data structure (user-defined) to represent the data.
Returns:a DataSet object
load(data_path)[source]

Load data from a given file.

Parameters:path (str) – file path
Returns:a DataSet object
class fastNLP.io.dataset_loader.NativeDataSetLoader[source]

A simple example of DataSetLoader

load(path)[source]

Load data from a given file.

Parameters:path (str) – file path
Returns:a DataSet object
class fastNLP.io.dataset_loader.POSDataSetLoader[source]

Dataset Loader for a POS Tag dataset.

In these datasets, each line are divided by ” “. The first Col is the vocabulary and the second Col is the label. Different sentence are divided by an empty line.

E.g:

Tom label1
and label2
Jerry   label1
.   label3
(separated by an empty line)
Hello   label4
world   label5
!   label3

In this example, there are two sentences “Tom and Jerry .” and “Hello world !”. Each word has its own label.

convert(data)[source]

Convert lists of strings into Instances with Fields.

load(data_path)[source]
Return data:

three-level list Example:

[
    [ [word_11, word_12, ...], [label_1, label_1, ...] ],
    [ [word_21, word_22, ...], [label_2, label_1, ...] ],
    ...
]
class fastNLP.io.dataset_loader.PeopleDailyCorpusLoader[source]

People Daily Corpus: Chinese word segmentation, POS tag, NER

convert(data)[source]

Optional operation to build a DataSet.

Parameters:data – inner data structure (user-defined) to represent the data.
Returns:a DataSet object
load(data_path)[source]

Load data from a given file.

Parameters:path (str) – file path
Returns:a DataSet object
class fastNLP.io.dataset_loader.RawDataSetLoader[source]

A simple example of raw data reader

convert(data)[source]

Optional operation to build a DataSet.

Parameters:data – inner data structure (user-defined) to represent the data.
Returns:a DataSet object
load(data_path, split=None)[source]

Load data from a given file.

Parameters:path (str) – file path
Returns:a DataSet object
class fastNLP.io.dataset_loader.SNLIDataSetLoader[source]

A data set loader for SNLI data set.

convert(data)[source]

Convert a 3D list to a DataSet object.

Parameters:data

A 3D tensor. Example:

[
    [ [premise_word_11, premise_word_12, ...], [hypothesis_word_11, hypothesis_word_12, ...], [label_1] ],
    [ [premise_word_21, premise_word_22, ...], [hypothesis_word_21, hypothesis_word_22, ...], [label_2] ],
    ...
]
Returns:A DataSet object.
load(path_list)[source]
Parameters:path_list (list) – A list of file name, in the order of premise file, hypothesis file, and label file.
Returns:A DataSet object.
class fastNLP.io.dataset_loader.TokenizeDataSetLoader[source]

Data set loader for tokenization data sets

convert(data)[source]

Optional operation to build a DataSet.

Parameters:data – inner data structure (user-defined) to represent the data.
Returns:a DataSet object
load(data_path, max_seq_len=32)[source]

Load pku dataset for Chinese word segmentation. CWS (Chinese Word Segmentation) pku training dataset format: 1. Each line is a sentence. 2. Each word in a sentence is separated by space. This function convert the pku dataset into three-level lists with labels <BMES>. B: beginning of a word M: middle of a word E: ending of a word S: single character

Parameters:
  • data_path (str) – path to the data set.
  • max_seq_len – int, the maximum length of a sequence. If a sequence is longer than it, split it into several sequences.
Returns:

three-level lists

fastNLP.io.dataset_loader.convert_seq2seq_dataset(data)[source]

Convert list of data into DataSet.

Parameters:data

list of list of strings, [num_examples, *]. Example:

[
    [ [word_11, word_12, ...], [label_1, label_1, ...] ],
    [ [word_21, word_22, ...], [label_2, label_1, ...] ],
    ...
]
Returns:a DataSet.
fastNLP.io.dataset_loader.convert_seq2tag_dataset(data)[source]

Convert list of data into DataSet.

Parameters:data

list of list of strings, [num_examples, *]. Example:

[
    [ [word_11, word_12, ...], label_1 ],
    [ [word_21, word_22, ...], label_2 ],
    ...
]
Returns:a DataSet.
fastNLP.io.dataset_loader.convert_seq_dataset(data)[source]

Create an DataSet instance that contains no labels.

Parameters:data

list of list of strings, [num_examples, *]. Example:

[
    [word_11, word_12, ...],
    ...
]
Returns:a DataSet.

fastNLP.io.embed_loader

class fastNLP.io.embed_loader.EmbedLoader[source]

docstring for EmbedLoader

static fast_load_embedding(emb_dim, emb_file, vocab)[source]

Fast load the pre-trained embedding and combine with the given dictionary. This loading method uses line-by-line operation.

Parameters:
  • emb_dim (int) – the dimension of the embedding. Should be the same as pre-trained embedding.
  • emb_file (str) – the pre-trained embedding file path.
  • vocab (Vocabulary) – a mapping from word to index, can be provided by user or built from pre-trained embedding
Return embedding_matrix:
 

numpy.ndarray

static load_embedding(emb_dim, emb_file, emb_type, vocab)[source]

Load the pre-trained embedding and combine with the given dictionary.

Parameters:
  • emb_dim (int) – the dimension of the embedding. Should be the same as pre-trained embedding.
  • emb_file (str) – the pre-trained embedding file path.
  • emb_type (str) – the pre-trained embedding format, support glove now
  • vocab (Vocabulary) – a mapping from word to index, can be provided by user or built from pre-trained embedding
Return (embedding_tensor, vocab):
 

embedding_tensor - Tensor of shape (len(word_dict), emb_dim); vocab - input vocab or vocab built by pre-train

fastNLP.io.logger

fastNLP.io.logger.create_logger(logger_name, log_path, log_format=None, log_level=20)[source]

Create a logger.

Parameters:
  • logger_name (str) –
  • log_path (str) –
  • log_format
  • log_level
Returns:

logger

To use a logger:

logger.debug("this is a debug message")
logger.info("this is a info message")
logger.warning("this is a warning message")
logger.error("this is an error message")

fastNLP.io.model_io

class fastNLP.io.model_io.ModelLoader[source]

Loader for models.

static load_pytorch(empty_model, model_path)[source]

Load model parameters from “.pkl” files into the empty PyTorch model.

Parameters:
  • empty_model – a PyTorch model with initialized parameters.
  • model_path (str) – the path to the saved model.
static load_pytorch_model(model_path)[source]

Load the entire model.

Parameters:model_path (str) – the path to the saved model.
class fastNLP.io.model_io.ModelSaver(save_path)[source]

Save a model

Parameters:save_path (str) – the path to the saving directory.

Example:

saver = ModelSaver("./save/model_ckpt_100.pkl")
saver.save_pytorch(model)
save_pytorch(model, param_only=True)[source]

Save a pytorch model into “.pkl” file.

Parameters:
  • model – a PyTorch model
  • param_only (bool) – whether only to save the model parameters or the entire model.