# 论文笔记：Sequence to Sequence Learning with Neural Networks

While we do not have a complete explanation to this phenomenon, we believe that it is caused by the introduction of many short term dependencies to the dataset

By reversing the words in the source sentence, the average distance between corresponding words in the source and target language is unchanged. However, the first few words in the source language are now very close to the first few words in the target language, so the problem's minimal time lag is greatly reduced.

$\frac{1}{|S|}\sum \log P(T|S)$

$\hat{T} = \arg\max_{T}P(T|S)$

sequence to sequence 模型被提出后，由于其灵活性，受到了广泛的关注，我个人是很喜欢这个模型中的想法的。然而现在流行的几个开源库对 sequence to sequence 模型的支持仍然不太理想，它们都要求在模型定义时就将输入序列的最大长度和输出序列的最大长度确定，对于长度不足的，则要用特殊符号进行填充，并在模型内部或外部做一些特殊处理。比如用 Python 的深度学习框架 Keras 来实现一个弱化版的 sequence to sequence 模型，可以这样:

# coding: utf-8
"""Sequence to Sequence with Keras 1.0"""

from keras.models import Sequential
from keras.layers.core import Dense, RepeatVector
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import TimeDistributed

def build_model(input_size, max_output_seq_len, hidden_size):
"""建立一个 sequence to sequence 模型"""
model = Sequential()