X: (2, 8) (n, ns)

为了防止词表太大，将低频词元替换为 <unk>

1
最后生成的 iter 接口为:
2

3
```python
4
train_iter, src_vocab, tgt_vocab = load_data_nmt(batch_size=2, num_steps=8)
5
for X, X_valid_len, Y, Y_valid_len in train_iter:
6
    print('X:', X.type(torch.int32))
7
    print('X的有效长度:', X_valid_len)
8
    print('Y:', Y.type(torch.int32))
9
    print('Y的有效长度:', Y_valid_len)
10
    break
11

12
# X: (2, 8) (n, ns)
13
# Y: 也是 (n, ns)

1
# 小批量中训练数据的长度统一为 num_steps（上面有），不足的用 1 即 <pad> 填充
2
X: tensor([[62, 25,  4,  3,  1,  1,  1,  1],
3
        [99, 10,  4,  3,  1,  1,  1,  1]], dtype=torch.int32)
4

5
# 可能代表的数据形如 [["I", "try", ".", <eos>, <pad>, <pad>, <pad>, <pad>], ["This", "is", ".", <eos>, <pad>, <pad>, <pad>, <pad>]]
6
X的有效长度: tensor([4, 4])
7
Y: tensor([[186,   5,   3,   1,   1,   1,   1,   1],
8
        [  0,   8,   4,   3,   1,   1,   1,   1]], dtype=torch.int32)
9
Y的有效长度: tensor([3, 4])
10

11
# <bos> 在 train_seq2seq() 中再加入

注意这并非要求网络输入层维数为 num_steps，比如 rnn 中也有 num_steps 但输入层维数就是词表大小（即一个单词的独热编码）