CS224Day-04
Attention Transformer Pretraining and Post-training (RLHF, SFT, DPO)
Lecture 7-9 机器翻译 seq2seq Attention Transformer Pretraining and Post-training (RLHF, SFT, DPO)
LSTM
RNN存在梯度消失和梯度爆炸问题(vanishing/exploding gradients)
梯度消失:由于梯度趋于零, 导致神经网络无法基于梯度更新参数
因而不能很好的更新隐状态, 失去了...