FastNLP 1分钟上手教程

step 1

读取数据集

from fastNLP import DataSet
# linux_path = "../test/data_for_tests/tutorial_sample_dataset.csv"
win_path = "C:\\Users\zyfeng\Desktop\FudanNLP\\fastNLP\\test\\data_for_tests\\tutorial_sample_dataset.csv"
ds = DataSet.read_csv(win_path, headers=('raw_sentence', 'label'), sep='\t')

step 2

数据预处理 1. 类型转换 2. 切分验证集 3. 构建词典

# 将所有数字转为小写
ds.apply(lambda x: x['raw_sentence'].lower(), new_field_name='raw_sentence')
# label转int
ds.apply(lambda x: int(x['label']), new_field_name='label_seq', is_target=True)

def split_sent(ins):
    return ins['raw_sentence'].split()
ds.apply(split_sent, new_field_name='words', is_input=True)
# 分割训练集/验证集
train_data, dev_data = ds.split(0.3)
print("Train size: ", len(train_data))
print("Test size: ", len(dev_data))
Train size:  54
Test size:  23
from fastNLP import Vocabulary
vocab = Vocabulary(min_freq=2)
train_data.apply(lambda x: [vocab.add(word) for word in x['words']])

# index句子, Vocabulary.to_index(word)
train_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True)
dev_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True)

step 3

定义模型

from fastNLP.models import CNNText
model = CNNText(embed_num=len(vocab), embed_dim=50, num_classes=5, padding=2, dropout=0.1)

step 4

开始训练

from fastNLP import Trainer, CrossEntropyLoss, AccuracyMetric
trainer = Trainer(model=model,
                  train_data=train_data,
                  dev_data=dev_data,
                  loss=CrossEntropyLoss(),
                  metrics=AccuracyMetric()
                  )
trainer.train()
print('Train finished!')
training epochs started 2018-12-07 14:03:41
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=6), HTML(value='')), layout=Layout(display='i…
Epoch 1/3. Step:2/6. AccuracyMetric: acc=0.26087
Epoch 2/3. Step:4/6. AccuracyMetric: acc=0.347826
Epoch 3/3. Step:6/6. AccuracyMetric: acc=0.608696
Train finished!

本教程结束。更多操作请参考进阶教程。