장기 단기 기억 세포의 일부로, 세포를 통한 정보 흐름을 조절합니다. 잊어버리기 게이트는 셀 상태에서 삭제할 정보를 결정하여 컨텍스트를 유지합니다.
G
경사 제한
#seq
경사하강법을 사용하여 모델을 학습할 때 경사의 최대값을 인위적으로 제한 (클리핑)하여 경사 폭발 문제를 완화하는 데 일반적으로 사용되는 메커니즘입니다.
L
장단기 메모리 (LSTM)
#seq
필기 인식, 머신 번역, 이미지 자막과 같은 애플리케이션에서 데이터 시퀀스를 처리하는 데 사용되는 회귀형 신경망의 셀 유형입니다. LSTM은 RNN의 이전 셀의 새 입력과 컨텍스트를 기반으로 내부 메모리 상태에서 기록을 유지하여 긴 데이터 시퀀스로 인해 RNN을 학습할 때 발생하는 기울기 소실 문제를 해결합니다.
순서가 있는 N개 단어의 시퀀스입니다. 예를 들어 truly madly는 2-그램입니다. 순서는 의미가 있으므로 madly truly는 truly madly와 다른 2-그램입니다.
N
이 종류의 N-그램에 대한 이름
예
2
바이그램 또는 2-그램
to go, go to, eat lunch, eat dinner
3
트라이그램 또는 3-그램
ate too much, happily ever after, the bell tolls
4
4-그램
walk in the park, dust in the wind, the boy ate lentils
많은 자연어 이해 모델이 N-그램을 사용하여 사용자가 다음에 입력하거나 말할 가능성이 있는 단어를 예측합니다. 예를 들어 사용자가 happily ever를 입력했다고 가정합니다. 트라이그램을 기반으로 하는 NLU 모델은 사용자가 다음에 after라는 단어를 입력할 것으로 예측할 수 있습니다.
의도적으로 여러 번 실행되는 신경망으로 각 실행의 일부가 다음 실행으로 유입됩니다. 특히 이전 실행의 히든 레이어가 다음 실행의 동일한 히든 레이어에 입력의 일부를 제공합니다. 순환 신경망(RNN)은 시퀀스를 평가할 때 특히 유용하며, 히든 레이어가 시퀀스의 이전 부분에 대한 신경망의 이전 실행으로부터 학습할 수 있습니다.
예를 들어 다음 그림은 네 번 실행되는 recurrent neural network(RNN)을 보여줍니다. 첫 번째 실행에서 히든 레이어에 학습된 값이 두 번째 실행에서 동일한 히든 레이어에 입력의 일부로 제공됩니다. 마찬가지로 두 번째 실행에서 히든 레이어에 학습된 값이 세 번째 실행에서 동일한 히든 레이어에 입력의 일부로 제공됩니다. 이러한 방식으로 순환 신경망(RNN)은 개별 단어를 측정하지 않고 점진적으로 학습하여 전체 시퀀스의 의미를 예측합니다.
일부 심층신경망의 초기 숨겨진 레이어의 기울기가 놀라울 정도로 평평해지는 (낮아지는) 경향입니다. 기울기가 점점 낮아지면 심층신경망의 노드에 있는 가중치가 점점 더 적게 변경되어 학습이 거의 또는 전혀 이루어지지 않습니다. 경사 소멸 문제가 있는 모델은 학습하기가 어렵거나 불가능해집니다. 장단기 메모리 셀은 이 문제를 해결합니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["필요한 정보가 없음","missingTheInformationINeed","thumb-down"],["너무 복잡함/단계 수가 너무 많음","tooComplicatedTooManySteps","thumb-down"],["오래됨","outOfDate","thumb-down"],["번역 문제","translationIssue","thumb-down"],["샘플/코드 문제","samplesCodeIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-07-27(UTC)"],[[["\u003cp\u003eThis page provides definitions for glossary terms related to Sequence Models.\u003c/p\u003e\n"],["\u003cp\u003eSequence models are used to analyze sequential data like text or video sequences.\u003c/p\u003e\n"],["\u003cp\u003eRecurrent Neural Networks (RNNs) are a key type of sequence model, with LSTMs being a popular variant.\u003c/p\u003e\n"],["\u003cp\u003eCommon challenges in training sequence models include the exploding and vanishing gradient problems.\u003c/p\u003e\n"],["\u003cp\u003eN-grams are used to represent sequences of words and are crucial for natural language understanding tasks.\u003c/p\u003e\n"]]],[],null,["This page contains Sequence Models glossary terms. For all glossary terms,\n[click here](/machine-learning/glossary).\n\n\nB\n\n\u003cbr /\u003e\n\n\nbigram \n#seq \n#language\n\n\u003cbr /\u003e\n\nAn [**N-gram**](#N-gram) in which N=2.\n\n\nE\n\n\u003cbr /\u003e\n\n\nexploding gradient problem \n#seq\n\n\u003cbr /\u003e\n\nThe tendency for [**gradients**](/machine-learning/glossary#gradient) in\n[**deep neural networks**](/machine-learning/glossary#deep_neural_network) (especially\n[**recurrent neural networks**](#recurrent_neural_network)) to become\nsurprisingly steep (high). Steep gradients often cause very large updates\nto the [**weights**](/machine-learning/glossary#weight) of each [**node**](/machine-learning/glossary#node) in a\ndeep neural network.\n\nModels suffering from the exploding gradient problem become difficult\nor impossible to train. [**Gradient clipping**](#gradient_clipping)\ncan mitigate this problem.\n\nCompare to [**vanishing gradient problem**](#vanishing_gradient_problem).\n\n\nF\n\n\u003cbr /\u003e\n\n\nforget gate \n#seq\n\n\u003cbr /\u003e\n\nThe portion of a [**Long Short-Term Memory**](#Long_Short-Term_Memory)\ncell that regulates the flow of information through the cell.\nForget gates maintain context by deciding which information to discard\nfrom the cell state.\n\n\nG\n\n\u003cbr /\u003e\n\n\ngradient clipping \n#seq\n\n\u003cbr /\u003e\n\nA commonly used mechanism to mitigate the\n[**exploding gradient problem**](#exploding_gradient_problem) by artificially\nlimiting (clipping) the maximum value of gradients when using\n[**gradient descent**](/machine-learning/glossary#gradient_descent) to [**train**](/machine-learning/glossary#training) a model.\n\n\nL\n\n\u003cbr /\u003e\n\n\nLong Short-Term Memory (LSTM) \n#seq\n\n\u003cbr /\u003e\n\nA type of cell in a\n[**recurrent neural network**](#recurrent_neural_network) used to process\nsequences of data in applications such as handwriting recognition,\n[**machine translation**](/machine-learning/glossary#machine-translation), and image captioning. LSTMs\naddress the [**vanishing gradient problem**](#vanishing_gradient_problem) that\noccurs when training RNNs due to long data sequences by maintaining history in\nan internal memory state based on new input and context from previous cells in\nthe RNN.\n\n\nLSTM \n#seq\n\n\u003cbr /\u003e\n\nAbbreviation for [**Long Short-Term Memory**](#Long_Short-Term_Memory).\n\n\nN\n\n\u003cbr /\u003e\n\n\nN-gram \n#seq \n#language\n\n\u003cbr /\u003e\n\nAn ordered sequence of N words. For example, *truly madly* is a 2-gram. Because\norder is relevant, *madly truly* is a different 2-gram than *truly madly*.\n\n| N | Name(s) for this kind of N-gram | Examples |\n|---|---------------------------------|-----------------------------------------------------------|\n| 2 | bigram or 2-gram | *to go, go to, eat lunch, eat dinner* |\n| 3 | trigram or 3-gram | *ate too much, happily ever after, the bell tolls* |\n| 4 | 4-gram | *walk in the park, dust in the wind, the boy ate lentils* |\n\nMany [**natural language understanding**](/machine-learning/glossary#natural_language_understanding)\nmodels rely on N-grams to predict the next word that the user will type\nor say. For example, suppose a user typed *happily ever* .\nAn NLU model based on trigrams would likely predict that the\nuser will next type the word *after*.\n\nContrast N-grams with [**bag of words**](/machine-learning/glossary#bag_of_words), which are\nunordered sets of words.\n\nSee [Large language models](/machine-learning/crash-course/llm)\nin Machine Learning Crash Course for more information.\n\n\nR\n\n\u003cbr /\u003e\n\n\nrecurrent neural network \n#seq\n\n\u003cbr /\u003e\n\nA [**neural network**](/machine-learning/glossary#neural_network) that is intentionally run multiple\ntimes, where parts of each run feed into the next run. Specifically,\nhidden layers from the previous run provide part of the\ninput to the same hidden layer in the next run. Recurrent neural networks\nare particularly useful for evaluating sequences, so that the hidden layers\ncan learn from previous runs of the neural network on earlier parts of\nthe sequence.\n\nFor example, the following figure shows a recurrent neural network that\nruns four times. Notice that the values learned in the hidden layers from\nthe first run become part of the input to the same hidden layers in\nthe second run. Similarly, the values learned in the hidden layer on the\nsecond run become part of the input to the same hidden layer in the\nthird run. In this way, the recurrent neural network gradually trains and\npredicts the meaning of the entire sequence rather than just the meaning\nof individual words.\n\n\nRNN \n#seq\n\n\u003cbr /\u003e\n\nAbbreviation for [**recurrent neural networks**](#recurrent_neural_network).\n\n\nS\n\n\u003cbr /\u003e\n\n\nsequence model \n#seq\n\n\u003cbr /\u003e\n\nA model whose inputs have a sequential dependence. For example, predicting\nthe next video watched from a sequence of previously watched videos.\n\n\nT\n\n\u003cbr /\u003e\n\n\ntimestep \n#seq\n\n\u003cbr /\u003e\n\nOne \"unrolled\" cell within a\n[**recurrent neural network**](#recurrent_neural_network).\nFor example, the following figure shows three timesteps (labeled with\nthe subscripts t-1, t, and t+1):\n\n\ntrigram \n#seq \n#language\n\n\u003cbr /\u003e\n\nAn [**N-gram**](#N-gram) in which N=3.\n\n\nV\n\n\u003cbr /\u003e\n\n\nvanishing gradient problem \n#seq\n\n\u003cbr /\u003e\n\nThe tendency for the gradients of early [**hidden layers**](/machine-learning/glossary#hidden_layer)\nof some [**deep neural networks**](/machine-learning/glossary#deep_neural_network) to become\nsurprisingly flat (low). Increasingly lower gradients result in increasingly\nsmaller changes to the weights on nodes in a deep neural network, leading to\nlittle or no learning. Models suffering from the vanishing gradient problem\nbecome difficult or impossible to train.\n[**Long Short-Term Memory**](#Long_Short-Term_Memory) cells address this issue.\n\nCompare to [**exploding gradient problem**](#exploding_gradient_problem)."]]