机器学习术语表:序列模型
使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。
本页包含序列模型术语表中的术语。如需查看所有术语表术语,请点击此处。
B
二元语法
一种 N 元语法,其中 N=2。
E
梯度爆炸问题
深度神经网络(尤其是循环神经网络)中的梯度趋于变得出奇地陡峭(高)。陡峭的梯度通常会导致深度神经网络中每个节点的权重发生非常大的更新。
存在梯度爆炸问题的模型很难或根本无法训练。梯度剪裁可以缓解此问题。
与梯度消失问题相对。
F
forget 门
长期短期记忆细胞中用于调节信息在细胞中流动的那部分。忘记门通过决定要从单元格状态中舍弃哪些信息来维护上下文。
G
梯度裁剪
一种常用的机制,用于在使用梯度下降方法训练模型时人为限制(剪裁)梯度的最大值,以缓解梯度爆炸问题。
L
长短期记忆 (LSTM)
循环神经网络中的一种单元,用于处理手写识别、机器翻译和图片说明等应用中的数据序列。LSTM 通过根据 RNN 中先前单元的新输入和上下文,在内部存储状态中保留历史记录,从而解决因数据序列较长而导致的训练 RNN 时出现的梯度消失问题。
LSTM
长短期记忆的缩写。
否
N 元语法
N 个字词的有序序列。例如,“truly madly”属于二元语法。由于顺序很重要,因此“madly truly”和“truly madly”是不同的二元语法。
否 | 此类 N 元语法的名称 | 示例 |
2 | 二元语法 | to go、go to、eat lunch、eat dinner |
3 | 三元语法 | ate too much、happily ever after、the bell tolls |
4 | 四元语法 | walk in the park, dust in the wind, the boy ate lentils |
很多自然语言理解模型依赖 N 元语法来预测用户将输入或说出的下一个字词。例如,假设用户输入了“happily ever”。 基于三元语法的 NLU 模型可能会预测该用户接下来将输入“after”一词。
N 元语法与词袋(无序字词集)相对。
如需了解详情,请参阅机器学习速成课程中的大型语言模型。
R
循环神经网络
特意运行多次的神经网络,其中每次运行的部分结果会馈送到下一次运行。具体来说,上一次运行时隐藏层中的结果会作为下一次运行时相同隐藏层的部分输入。循环神经网络在评估序列时尤其有用,因此隐藏层可以根据神经网络在序列的前几部分上的前几次运行进行学习。
例如,下图显示了运行四次的循环神经网络。请注意,第一次运行时在隐藏层中学习的值将成为第二次运行时相同隐藏层的部分输入。同样,第二次运行时在隐藏层中学习的值将成为第三次运行时相同隐藏层的部分输入。通过这种方式,循环神经网络逐步训练和预测整个序列的含义,而不只是各个字词的含义。
RNN
循环神经网络的缩写。
S
序列模型
输入具有序列依赖性的模型。例如,根据之前观看过的一系列视频对观看的下一个视频进行预测。
T
时间步
循环神经网络中的“展开”单元格。例如,下图显示了三个时间步(用下标 t-1、t 和 t+1 标记):
三元语法
一种 N 元语法,其中 N=3。
V
梯度消失问题
某些深度神经网络的早期隐藏层的梯度往往会出乎意料地变得平坦(低)。梯度越来越小会导致深度神经网络中节点的权重变化越来越小,从而导致学习效果不佳或根本无法学习。存在梯度消失问题的模型很难或无法训练。长短期记忆单元格可解决此问题。
与梯度爆炸问题相对。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-27。
[null,null,["最后更新时间 (UTC):2025-07-27。"],[[["\u003cp\u003eThis page provides definitions for glossary terms related to Sequence Models.\u003c/p\u003e\n"],["\u003cp\u003eSequence models are used to analyze sequential data like text or video sequences.\u003c/p\u003e\n"],["\u003cp\u003eRecurrent Neural Networks (RNNs) are a key type of sequence model, with LSTMs being a popular variant.\u003c/p\u003e\n"],["\u003cp\u003eCommon challenges in training sequence models include the exploding and vanishing gradient problems.\u003c/p\u003e\n"],["\u003cp\u003eN-grams are used to represent sequences of words and are crucial for natural language understanding tasks.\u003c/p\u003e\n"]]],[],null,["This page contains Sequence Models glossary terms. For all glossary terms,\n[click here](/machine-learning/glossary).\n\n\nB\n\n\u003cbr /\u003e\n\n\nbigram \n#seq \n#language\n\n\u003cbr /\u003e\n\nAn [**N-gram**](#N-gram) in which N=2.\n\n\nE\n\n\u003cbr /\u003e\n\n\nexploding gradient problem \n#seq\n\n\u003cbr /\u003e\n\nThe tendency for [**gradients**](/machine-learning/glossary#gradient) in\n[**deep neural networks**](/machine-learning/glossary#deep_neural_network) (especially\n[**recurrent neural networks**](#recurrent_neural_network)) to become\nsurprisingly steep (high). Steep gradients often cause very large updates\nto the [**weights**](/machine-learning/glossary#weight) of each [**node**](/machine-learning/glossary#node) in a\ndeep neural network.\n\nModels suffering from the exploding gradient problem become difficult\nor impossible to train. [**Gradient clipping**](#gradient_clipping)\ncan mitigate this problem.\n\nCompare to [**vanishing gradient problem**](#vanishing_gradient_problem).\n\n\nF\n\n\u003cbr /\u003e\n\n\nforget gate \n#seq\n\n\u003cbr /\u003e\n\nThe portion of a [**Long Short-Term Memory**](#Long_Short-Term_Memory)\ncell that regulates the flow of information through the cell.\nForget gates maintain context by deciding which information to discard\nfrom the cell state.\n\n\nG\n\n\u003cbr /\u003e\n\n\ngradient clipping \n#seq\n\n\u003cbr /\u003e\n\nA commonly used mechanism to mitigate the\n[**exploding gradient problem**](#exploding_gradient_problem) by artificially\nlimiting (clipping) the maximum value of gradients when using\n[**gradient descent**](/machine-learning/glossary#gradient_descent) to [**train**](/machine-learning/glossary#training) a model.\n\n\nL\n\n\u003cbr /\u003e\n\n\nLong Short-Term Memory (LSTM) \n#seq\n\n\u003cbr /\u003e\n\nA type of cell in a\n[**recurrent neural network**](#recurrent_neural_network) used to process\nsequences of data in applications such as handwriting recognition,\n[**machine translation**](/machine-learning/glossary#machine-translation), and image captioning. LSTMs\naddress the [**vanishing gradient problem**](#vanishing_gradient_problem) that\noccurs when training RNNs due to long data sequences by maintaining history in\nan internal memory state based on new input and context from previous cells in\nthe RNN.\n\n\nLSTM \n#seq\n\n\u003cbr /\u003e\n\nAbbreviation for [**Long Short-Term Memory**](#Long_Short-Term_Memory).\n\n\nN\n\n\u003cbr /\u003e\n\n\nN-gram \n#seq \n#language\n\n\u003cbr /\u003e\n\nAn ordered sequence of N words. For example, *truly madly* is a 2-gram. Because\norder is relevant, *madly truly* is a different 2-gram than *truly madly*.\n\n| N | Name(s) for this kind of N-gram | Examples |\n|---|---------------------------------|-----------------------------------------------------------|\n| 2 | bigram or 2-gram | *to go, go to, eat lunch, eat dinner* |\n| 3 | trigram or 3-gram | *ate too much, happily ever after, the bell tolls* |\n| 4 | 4-gram | *walk in the park, dust in the wind, the boy ate lentils* |\n\nMany [**natural language understanding**](/machine-learning/glossary#natural_language_understanding)\nmodels rely on N-grams to predict the next word that the user will type\nor say. For example, suppose a user typed *happily ever* .\nAn NLU model based on trigrams would likely predict that the\nuser will next type the word *after*.\n\nContrast N-grams with [**bag of words**](/machine-learning/glossary#bag_of_words), which are\nunordered sets of words.\n\nSee [Large language models](/machine-learning/crash-course/llm)\nin Machine Learning Crash Course for more information.\n\n\nR\n\n\u003cbr /\u003e\n\n\nrecurrent neural network \n#seq\n\n\u003cbr /\u003e\n\nA [**neural network**](/machine-learning/glossary#neural_network) that is intentionally run multiple\ntimes, where parts of each run feed into the next run. Specifically,\nhidden layers from the previous run provide part of the\ninput to the same hidden layer in the next run. Recurrent neural networks\nare particularly useful for evaluating sequences, so that the hidden layers\ncan learn from previous runs of the neural network on earlier parts of\nthe sequence.\n\nFor example, the following figure shows a recurrent neural network that\nruns four times. Notice that the values learned in the hidden layers from\nthe first run become part of the input to the same hidden layers in\nthe second run. Similarly, the values learned in the hidden layer on the\nsecond run become part of the input to the same hidden layer in the\nthird run. In this way, the recurrent neural network gradually trains and\npredicts the meaning of the entire sequence rather than just the meaning\nof individual words.\n\n\nRNN \n#seq\n\n\u003cbr /\u003e\n\nAbbreviation for [**recurrent neural networks**](#recurrent_neural_network).\n\n\nS\n\n\u003cbr /\u003e\n\n\nsequence model \n#seq\n\n\u003cbr /\u003e\n\nA model whose inputs have a sequential dependence. For example, predicting\nthe next video watched from a sequence of previously watched videos.\n\n\nT\n\n\u003cbr /\u003e\n\n\ntimestep \n#seq\n\n\u003cbr /\u003e\n\nOne \"unrolled\" cell within a\n[**recurrent neural network**](#recurrent_neural_network).\nFor example, the following figure shows three timesteps (labeled with\nthe subscripts t-1, t, and t+1):\n\n\ntrigram \n#seq \n#language\n\n\u003cbr /\u003e\n\nAn [**N-gram**](#N-gram) in which N=3.\n\n\nV\n\n\u003cbr /\u003e\n\n\nvanishing gradient problem \n#seq\n\n\u003cbr /\u003e\n\nThe tendency for the gradients of early [**hidden layers**](/machine-learning/glossary#hidden_layer)\nof some [**deep neural networks**](/machine-learning/glossary#deep_neural_network) to become\nsurprisingly flat (low). Increasingly lower gradients result in increasingly\nsmaller changes to the weights on nodes in a deep neural network, leading to\nlittle or no learning. Models suffering from the vanishing gradient problem\nbecome difficult or impossible to train.\n[**Long Short-Term Memory**](#Long_Short-Term_Memory) cells address this issue.\n\nCompare to [**exploding gradient problem**](#exploding_gradient_problem)."]]