[[["เข้าใจง่าย","easyToUnderstand","thumb-up"],["แก้ปัญหาของฉันได้","solvedMyProblem","thumb-up"],["อื่นๆ","otherUp","thumb-up"]],[["ไม่มีข้อมูลที่ฉันต้องการ","missingTheInformationINeed","thumb-down"],["ซับซ้อนเกินไป/มีหลายขั้นตอนมากเกินไป","tooComplicatedTooManySteps","thumb-down"],["ล้าสมัย","outOfDate","thumb-down"],["ปัญหาเกี่ยวกับการแปล","translationIssue","thumb-down"],["ตัวอย่าง/ปัญหาเกี่ยวกับโค้ด","samplesCodeIssue","thumb-down"],["อื่นๆ","otherDown","thumb-down"]],["อัปเดตล่าสุด 2025-08-17 UTC"],[[["\u003cp\u003eHyperparameters, such as learning rate, batch size, and epochs, are external configurations that influence the training process of a machine learning model.\u003c/p\u003e\n"],["\u003cp\u003eThe learning rate determines the step size during model training, impacting the speed and stability of convergence.\u003c/p\u003e\n"],["\u003cp\u003eBatch size dictates the number of training examples processed before updating model parameters, influencing training speed and noise.\u003c/p\u003e\n"],["\u003cp\u003eEpochs represent the number of times the entire training dataset is used during training, affecting model performance and training time.\u003c/p\u003e\n"],["\u003cp\u003eChoosing appropriate hyperparameters is crucial for optimizing model training and achieving desired results.\u003c/p\u003e\n"]]],[],null,["[**Hyperparameters**](/machine-learning/glossary#hyperparameter) are variables\nthat control different aspects of training. Three common hyperparameters are:\n\n- [**Learning rate**](/machine-learning/glossary#learning-rate)\n- [**Batch size**](/machine-learning/glossary#batch-size)\n- [**Epochs**](/machine-learning/glossary#epoch)\n\nIn contrast, [**parameters**](/machine-learning/glossary#parameter) are the\nvariables, like the weights and bias, that are part of the model itself. In\nother words, hyperparameters are values that you control; parameters are values\nthat the model calculates during training.\n\nLearning rate\n\n[**Learning rate**](/machine-learning/glossary#learning-rate) is a\nfloating point number you set that influences how quickly the\nmodel converges. If the learning rate is too low, the model can take a long time\nto converge. However, if the learning rate is too high, the model never\nconverges, but instead bounces around the weights and bias that minimize the\nloss. The goal is to pick a learning rate that's not too high nor too low so\nthat the model converges quickly.\n\nThe learning rate determines the magnitude of the changes to make to the weights\nand bias during each step of the gradient descent process. The model multiplies\nthe gradient by the learning rate to determine the model's parameters (weight\nand bias values) for the next iteration. In the third step of [gradient\ndescent](/machine-learning/crash-course/linear-regression/gradient-descent), the \"small amount\" to move in the direction\nof negative slope refers to the learning rate.\n\nThe difference between the old model parameters and the new model parameters is\nproportional to the slope of the loss function. For example, if the slope is\nlarge, the model takes a large step. If small, it takes a small step. For\nexample, if the gradient's magnitude is 2.5 and the learning rate is 0.01, then\nthe model will change the parameter by 0.025.\n\nThe ideal learning rate helps the model to converge within a reasonable number\nof iterations. In Figure 21, the loss curve shows the model significantly\nimproving during the first 20 iterations before beginning to converge:\n\n**Figure 21**. Loss graph showing a model trained with a learning rate that\nconverges quickly.\n\nIn contrast, a learning rate that's too small can take too many iterations to\nconverge. In Figure 22, the loss curve shows the model making only minor\nimprovements after each iteration:\n\n**Figure 22**. Loss graph showing a model trained with a small learning rate.\n\nA learning rate that's too large never converges because each iteration either\ncauses the loss to bounce around or continually increase. In Figure 23, the loss\ncurve shows the model decreasing and then increasing loss after each iteration,\nand in Figure 24 the loss increases at later iterations:\n\n**Figure 23**. Loss graph showing a model trained with a learning rate that's\ntoo big, where the loss curve fluctuates wildly, going up and down as the\niterations increase.\n\n**Figure 24**. Loss graph showing a model trained with a learning rate that's\ntoo big, where the loss curve drastically increases in later iterations.\n\nExercise: Check your understanding \nWhat is the ideal learning rate? \nThe ideal learning rate is problem-dependent. \nEach model and dataset will have its own ideal learning rate. \n0.01 \n1.0 \n\nBatch size\n\n[**Batch size**](/machine-learning/glossary#batch-size) is a hyperparameter that\nrefers to the number of [**examples**](/machine-learning/glossary#example)\nthe model processes before updating its weights\nand bias. You might think that the model should calculate the loss for *every*\nexample in the dataset before updating the weights and bias. However, when a\ndataset contains hundreds of thousands or even millions of examples, using the\nfull batch isn't practical.\n\nTwo common techniques to get the right gradient on *average* without needing to\nlook at every example in the dataset before updating the weights and bias are\n[**stochastic gradient descent**](/machine-learning/glossary#SGD) and\n[**mini-batch stochastic gradient\ndescent**](/machine-learning/glossary#mini-batch-stochastic-gradient-descent):\n\n- **Stochastic gradient descent (SGD)**: Stochastic gradient descent uses only\n a single example (a batch size of one) per iteration. Given enough\n iterations, SGD works but is very noisy. \"Noise\" refers to variations during\n training that cause the loss to increase rather than decrease during an\n iteration. The term \"stochastic\" indicates that the one example comprising\n each batch is chosen at random.\n\n Notice in the following image how loss slightly fluctuates as the model\n updates its weights and bias using SGD, which can lead to noise in the loss\n graph:\n\n **Figure 25**. Model trained with stochastic gradient descent (SGD) showing\n noise in the loss curve.\n\n Note that using stochastic gradient descent can produce noise throughout the\n entire loss curve, not just near convergence.\n- **Mini-batch stochastic gradient descent (mini-batch SGD)**: Mini-batch\n stochastic gradient descent is a compromise between full-batch and SGD. For\n $ N $ number of data points, the batch size can be any number greater than 1\n and less than $ N $. The model chooses the examples included in each batch\n at random, averages their gradients, and then updates the weights and bias\n once per iteration.\n\n Determining the number of examples for each batch depends on the dataset and\n the available compute resources. In general, small batch sizes behaves like\n SGD, and larger batch sizes behaves like full-batch gradient descent.\n\n **Figure 26**. Model trained with mini-batch SGD.\n\nWhen training a model, you might think that noise is an undesirable\ncharacteristic that should be eliminated. However, a certain amount of noise can\nbe a good thing. In later modules, you'll learn how noise can help a model\n[**generalize**](/machine-learning/glossary#generalization) better and find the\noptimal weights and bias in a [**neural\nnetwork**](/machine-learning/glossary#neural-network).\n\nEpochs\n\nDuring training, an [**epoch**](/machine-learning/glossary#epoch) means that the\nmodel has processed every example in the training set *once* . For example, given\na training set with 1,000 examples and a mini-batch size of 100 examples, it\nwill take the model 10 [**iterations**](/machine-learning/glossary#iteration) to\ncomplete one epoch.\n\nTraining typically requires many epochs. That is, the system needs to process\nevery example in the training set multiple times.\n\nThe number of epochs is a hyperparameter you set before the model begins\ntraining. In many cases, you'll need to experiment with how many epochs it takes\nfor the model to converge. In general, more epochs produces a better model, but\nalso takes more time to train.\n\n**Figure 27**. Full batch versus mini batch.\n\nThe following table describes how batch size and epochs relate to the number of\ntimes a model updates its parameters.\n\n| Batch type | When weights and bias updates occur |\n|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Full batch | After the model looks at all the examples in the dataset. For instance, if a dataset contains 1,000 examples and the model trains for 20 epochs, the model updates the weights and bias 20 times, once per epoch. |\n| Stochastic gradient descent | After the model looks at a single example from the dataset. For instance, if a dataset contains 1,000 examples and trains for 20 epochs, the model updates the weights and bias 20,000 times. |\n| Mini-batch stochastic gradient descent | After the model looks at the examples in each batch. For instance, if a dataset contains 1,000 examples, and the batch size is 100, and the model trains for 20 epochs, the model updates the weights and bias 200 times. |\n\nExercise: Check your understanding \n1. What's the best batch size when using mini-batch SGD? \nIt depends \nThe ideal batch size depends on the dataset and the available compute resources \n10 examples per batch \n100 examples per batch \n2. Which of the following statements is true? \nLarger batches are unsuitable for data with many outliers. \nThis statement is false. By averaging more gradients together, larger batch sizes can help reduce the negative effects of having outliers in the data. \nDoubling the learning rate can slow down training. \nThis statement is true. Doubling the learning rate can result in a learning rate that is too large, and therefore cause the weights to \"bounce around,\" increasing the amount of time needed to converge. As always, the best hyperparameters depend on your dataset and available compute resources.\n| **Key terms:**\n|\n| - [Batch size](/machine-learning/glossary#batch-size)\n| - [Epoch](/machine-learning/glossary#epoch)\n| - [Generalize](/machine-learning/glossary#generalization)\n| - [Hyperparameter](/machine-learning/glossary#hyperparameter)\n| - [Iteration](/machine-learning/glossary#iteration)\n| - [Learning rate](/machine-learning/glossary#learning-rate)\n| - [Mini-batch](/machine-learning/glossary#mini-batch)\n| - [Mini-batch stochastic gradient descent](/machine-learning/glossary#mini-batch-stochastic-gradient-descent)\n| - [Neural network](/machine-learning/glossary#neural-network)\n| - [Parameter](/machine-learning/glossary#parameter)\n- [Stochastic gradient descent](/machine-learning/glossary#stochastic-gradient-descent-sgd) \n[Help Center](https://support.google.com/machinelearningeducation)"]]