fastNLP.io¶
fastNLP.io.base_loader¶
fastNLP.io.config_io¶
-
class
fastNLP.io.config_io.
ConfigLoader
(data_path=None)[source]¶ Loader for configuration.
Parameters: data_path (str) – path to the config -
static
load_config
(file_path, sections)[source]¶ Load section(s) of configuration into the
sections
provided. No returns.Parameters: - file_path (str) – the path of config file
- sections (dict) – the dict of
{section_name(string): ConfigSection object}
Example:
test_args = ConfigSection() ConfigLoader("config.cfg").load_config("./data_for_tests/config", {"POS_test": test_args})
-
static
-
class
fastNLP.io.config_io.
ConfigSaver
(file_path)[source]¶ ConfigSaver is used to save config file and solve related conflicts.
Parameters: file_path (str) – path to the config file -
save_config_file
(section_name, section)[source]¶ This is the function to be called to change the config file with a single section and its name.
Parameters: - section_name (str) – The name of section what needs to be changed and saved.
- section (ConfigSection) – The section with key and value what needs to be changed and saved.
-
fastNLP.io.dataset_loader¶
-
class
fastNLP.io.dataset_loader.
Conll2003Loader
[source]¶ Loader for conll2003 dataset
More information about the given dataset cound be found on https://sites.google.com/site/ermasoftware/getting-started/ne-tagging-conll2003-data
-
class
fastNLP.io.dataset_loader.
ConllLoader
[source]¶ loader for conll format files
-
convert
(data)[source]¶ Optional operation to build a DataSet.
Parameters: data – inner data structure (user-defined) to represent the data. Returns: a DataSet object
-
-
class
fastNLP.io.dataset_loader.
ConllxDataLoader
[source]¶ 返回“词级别”的标签信息,包括词、词性、(句法)头依赖、(句法)边标签。跟``ZhConllPOSReader``完全不同。
-
class
fastNLP.io.dataset_loader.
DataSetLoader
[source]¶ Interface for all DataSetLoaders.
-
class
fastNLP.io.dataset_loader.
DummyCWSReader
[source]¶ Load pku dataset for Chinese word segmentation.
-
convert
(data)[source]¶ Optional operation to build a DataSet.
Parameters: data – inner data structure (user-defined) to represent the data. Returns: a DataSet object
-
load
(data_path, max_seq_len=32)[source]¶ Load pku dataset for Chinese word segmentation. CWS (Chinese Word Segmentation) pku training dataset format: 1. Each line is a sentence. 2. Each word in a sentence is separated by space. This function convert the pku dataset into three-level lists with labels <BMES>. B: beginning of a word M: middle of a word E: ending of a word S: single character
Parameters: - data_path (str) – path to the data set.
- max_seq_len – int, the maximum length of a sequence. If a sequence is longer than it, split it into several sequences.
Returns: three-level lists
-
-
class
fastNLP.io.dataset_loader.
DummyClassificationReader
[source]¶ Loader for a dummy classification data set
-
convert
(data)[source]¶ Optional operation to build a DataSet.
Parameters: data – inner data structure (user-defined) to represent the data. Returns: a DataSet object
-
-
class
fastNLP.io.dataset_loader.
DummyLMReader
[source]¶ A Dummy Language Model Dataset Reader
-
class
fastNLP.io.dataset_loader.
DummyPOSReader
[source]¶ A simple reader for a dummy POS tagging dataset.
In these datasets, each line are divided by ” “. The first Col is the vocabulary and the second Col is the label. Different sentence are divided by an empty line.
E.g:
Tom label1 and label2 Jerry label1 . label3 (separated by an empty line) Hello label4 world label5 ! label3
In this example, there are two sentences “Tom and Jerry .” and “Hello world !”. Each word has its own label.
-
class
fastNLP.io.dataset_loader.
NaiveCWSReader
(in_word_splitter=None)[source]¶ 这个reader假设了分词数据集为以下形式, 即已经用空格分割好内容了 例如:
这是 fastNLP , 一个 非常 good 的 包 .
或者,即每个part后面还有一个pos tag 例如:
也/D 在/P 團員/Na 之中/Ng ,/COMMACATEGORY
-
load
(filepath, in_word_splitter=None, cut_long_sent=False)[source]¶ - 允许使用的情况有(默认以 或空格作为seg)
- 这是 fastNLP , 一个 非常 good 的 包 .
- 和
- 也/D 在/P 團員/Na 之中/Ng ,/COMMACATEGORY
如果splitter不为None则认为是第二种情况, 且我们会按splitter分割”也/D”, 然后取第一部分. 例如”也/D”.split(‘/’)[0]
Parameters: - filepath –
- in_word_splitter –
- cut_long_sent –
Returns:
-
-
class
fastNLP.io.dataset_loader.
PeopleDailyCorpusLoader
[source]¶ 人民日报数据集
-
class
fastNLP.io.dataset_loader.
RawDataSetLoader
[source]¶ A simple example of raw data reader
-
class
fastNLP.io.dataset_loader.
SNLIDataSetReader
[source]¶ A data set loader for SNLI data set.
-
convert
(data)[source]¶ Convert a 3D list to a DataSet object.
Parameters: data – A 3D tensor. Example:
[ [ [premise_word_11, premise_word_12, ...], [hypothesis_word_11, hypothesis_word_12, ...], [label_1] ], [ [premise_word_21, premise_word_22, ...], [hypothesis_word_21, hypothesis_word_22, ...], [label_2] ], ... ]
Returns: A DataSet object.
-
-
class
fastNLP.io.dataset_loader.
ZhConllPOSReader
[source]¶ 读取中文Conll格式。返回“字级别”的标签,使用BMES记号扩展原来的词级别标签。
-
load
(path)[source]¶ - 返回的DataSet, 包含以下的field
- words:list of str, tag: list of str, 被加入了BMES tag, 比如原来的序列为[‘VP’, ‘NN’, ‘NN’, ..],会被认为是[“S-VP”, “B-NN”, “M-NN”,..]
假定了输入为conll的格式,以空行隔开两个句子,每行共7列,即
1 编者按 编者按 NN O 11 nmod:topic 2 : : PU O 11 punct 3 7月 7月 NT DATE 4 compound:nn 4 12日 12日 NT DATE 11 nmod:tmod 5 , , PU O 11 punct 1 这 这 DT O 3 det 2 款 款 M O 1 mark:clf 3 飞行 飞行 NN O 8 nsubj 4 从 从 P O 5 case 5 外型 外型 NN O 8 nmod:prep
-
-
fastNLP.io.dataset_loader.
add_seg_tag
(data)[source]¶ Parameters: data – list of ([word], [pos], [heads], [head_tags]) Returns: list of ([word], [pos])
-
fastNLP.io.dataset_loader.
convert_seq2seq_dataset
(data)[source]¶ Convert list of data into DataSet.
Parameters: data – list of list of strings, [num_examples, *]. Example:
[ [ [word_11, word_12, ...], [label_1, label_1, ...] ], [ [word_21, word_22, ...], [label_2, label_1, ...] ], ... ]
Returns: a DataSet.
-
fastNLP.io.dataset_loader.
convert_seq2tag_dataset
(data)[source]¶ Convert list of data into DataSet.
Parameters: data – list of list of strings, [num_examples, *]. Example:
[ [ [word_11, word_12, ...], label_1 ], [ [word_21, word_22, ...], label_2 ], ... ]
Returns: a DataSet.
fastNLP.io.embed_loader¶
-
class
fastNLP.io.embed_loader.
EmbedLoader
[source]¶ docstring for EmbedLoader
-
static
fast_load_embedding
(emb_dim, emb_file, vocab)[source]¶ Fast load the pre-trained embedding and combine with the given dictionary. This loading method uses line-by-line operation.
Parameters: - emb_dim (int) – the dimension of the embedding. Should be the same as pre-trained embedding.
- emb_file (str) – the pre-trained embedding file path.
- vocab (Vocabulary) – a mapping from word to index, can be provided by user or built from pre-trained embedding
Return embedding_matrix: numpy.ndarray
-
static
load_embedding
(emb_dim, emb_file, emb_type, vocab)[source]¶ Load the pre-trained embedding and combine with the given dictionary.
Parameters: - emb_dim (int) – the dimension of the embedding. Should be the same as pre-trained embedding.
- emb_file (str) – the pre-trained embedding file path.
- emb_type (str) – the pre-trained embedding format, support glove now
- vocab (Vocabulary) – a mapping from word to index, can be provided by user or built from pre-trained embedding
Return (embedding_tensor, vocab): embedding_tensor - Tensor of shape (len(word_dict), emb_dim); vocab - input vocab or vocab built by pre-train
-
static
fastNLP.io.logger¶
-
fastNLP.io.logger.
create_logger
(logger_name, log_path, log_format=None, log_level=20)[source]¶ Create a logger.
Parameters: - logger_name (str) –
- log_path (str) –
- log_format –
- log_level –
Returns: logger
To use a logger:
logger.debug("this is a debug message") logger.info("this is a info message") logger.warning("this is a warning message") logger.error("this is an error message")
fastNLP.io.model_io¶
-
class
fastNLP.io.model_io.
ModelLoader
[source]¶ Loader for models.