選擇最佳損失函式時,請考量您希望模型如何處理離群值。舉例來說,MSE 會將模型移往離群值,但 MAE 不會。相較於 L1 損失,L2 損失對離群值的懲罰高出許多。舉例來說,下圖顯示使用 MAE 訓練的模型,以及使用 MSE 訓練的模型。紅線代表完全訓練好的模型,可用於進行預測。與使用 MAE 訓練的模型相比,離群值更接近使用 MSE 訓練的模型。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["缺少我需要的資訊","missingTheInformationINeed","thumb-down"],["過於複雜/步驟過多","tooComplicatedTooManySteps","thumb-down"],["過時","outOfDate","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["示例/程式碼問題","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-08-21 (世界標準時間)。"],[[["\u003cp\u003eLoss is a numerical value indicating the difference between a model's predictions and the actual values.\u003c/p\u003e\n"],["\u003cp\u003eThe goal of model training is to minimize loss, bringing it as close to zero as possible.\u003c/p\u003e\n"],["\u003cp\u003eTwo common methods for calculating loss are Mean Absolute Error (MAE) and Mean Squared Error (MSE), which differ in their sensitivity to outliers.\u003c/p\u003e\n"],["\u003cp\u003eChoosing between MAE and MSE depends on the dataset and how you want the model to handle outliers, with MSE penalizing them more heavily.\u003c/p\u003e\n"]]],[],null,["[**Loss**](/machine-learning/glossary#loss) is a numerical metric that describes\nhow wrong a model's [**predictions**](/machine-learning/glossary#prediction)\nare. Loss measures the distance between the model's predictions and the actual\nlabels. The goal of training a model is to minimize the loss, reducing it to its\nlowest possible value.\n\nIn the following image, you can visualize loss as arrows drawn from the data\npoints to the model. The arrows show how far the model's predictions are from\nthe actual values.\n\n**Figure 9**. Loss is measured from the actual value to the predicted value.\n\nDistance of loss\n\nIn statistics and machine learning, loss measures the difference between the\npredicted and actual values. Loss focuses on the *distance* between the values,\nnot the direction. For example, if a model predicts 2, but the actual value is\n5, we don't care that the loss is negative ($ 2-5=-3 $).\nInstead, we care that the *distance* between the values is $ 3 $. Thus, all\nmethods for calculating loss remove the sign.\n\nThe two most common methods to remove the sign are the following:\n\n- Take the absolute value of the difference between the actual value and the prediction.\n- Square the difference between the actual value and the prediction.\n\nTypes of loss\n\nIn linear regression, there are four main types of loss, which are outlined in\nthe following table.\n\n| Loss type | Definition | Equation |\n|-------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|\n| **[L~1~ loss](/machine-learning/glossary#l1-loss)** | The sum of the absolute values of the difference between the predicted values and the actual values. | $ ∑ \\| actual\\\\ value - predicted\\\\ value \\| $ |\n| **[Mean absolute error (MAE)](/machine-learning/glossary#mean-absolute-error-mae)** | The average of L~1~ losses across a set of \\*N\\* examples. | $ \\\\frac{1}{N} ∑ \\| actual\\\\ value - predicted\\\\ value \\| $ |\n| **[L~2~ loss](/machine-learning/glossary#l2-loss)** | The sum of the squared difference between the predicted values and the actual values. | $ ∑(actual\\\\ value - predicted\\\\ value)\\^2 $ |\n| **[Mean squared error (MSE)](/machine-learning/glossary#mean-squared-error-mse)** | The average of L~2~ losses across a set of \\*N\\* examples. | $ \\\\frac{1}{N} ∑ (actual\\\\ value - predicted\\\\ value)\\^2 $ |\n\nThe functional difference between L~1~ loss and L~2~ loss\n(or between MAE and MSE) is squaring. When the difference between the\nprediction and label is large, squaring makes the loss even larger. When the\ndifference is small (less than 1), squaring makes the loss even smaller.\n\nWhen processing multiple examples at once, we recommend averaging the losses\nacross all the examples, whether using MAE or MSE.\n\nCalculating loss example\n\nUsing the previous [best fit line](/machine-learning/crash-course/linear-regression#linear_regression_equation),\nwe'll calculate L~2~ loss for a single example. From the\nbest fit line, we had the following values for weight and bias:\n\n- $ \\\\small{Weight: -4.6} $\n- $ \\\\small{Bias: 34} $\n\nIf the model predicts that a 2,370-pound car gets 23.1 miles per gallon, but it\nactually gets 26 miles per gallon, we would calculate the L~2~ loss as\nfollows:\n| **Note:** The formula uses 2.37 because the graphs are scaled to 1000s of pounds\n\n| Value | Equation | Result |\n|--------------|------------------------------------------------------------------------------------|-------------------|\n| Prediction | $\\\\small{bias + (weight \\* feature\\\\ value)}$ $\\\\small{34 + (-4.6\\*2.37)}$ | $\\\\small{23.1}$ |\n| Actual value | $ \\\\small{ label } $ | $ \\\\small{ 26 } $ |\n| L~2~ loss | $ \\\\small{ (actual\\\\ value - predicted\\\\ value)\\^2 } $ $\\\\small{ (26 - 23.1)\\^2 }$ | $\\\\small{8.41}$ |\n\nIn this example, the L~2~ loss for that single data point is 8.41.\n\nChoosing a loss\n\nDeciding whether to use MAE or MSE can depend on the dataset and the way you\nwant to handle certain predictions. Most feature values in a dataset typically\nfall within a distinct range. For example, cars are normally between 2000 and\n5000 pounds and get between 8 to 50 miles per gallon. An 8,000-pound car,\nor a car that gets 100 miles per gallon, is outside the typical range and would\nbe considered an [**outlier**](/machine-learning/glossary#outliers).\n\nAn outlier can also refer to how far off a model's predictions are from the real\nvalues. For instance, 3,000 pounds is within the typical car-weight range, and\n40 miles per gallon is within the typical fuel-efficiency range. However, a\n3,000-pound car that gets 40 miles per gallon would be an outlier in terms of\nthe model's prediction because the model would predict that a 3,000-pound car\nwould get around 20 miles per gallon.\n\nWhen choosing the best loss function, consider how you want the model to treat\noutliers. For instance, MSE moves the model more toward the outliers, while MAE\ndoesn't. L~2~ loss incurs a much higher penalty for an outlier than\nL~1~ loss. For example, the following images show a model trained\nusing MAE and a model trained using MSE. The red line represents a fully\ntrained model that will be used to make predictions. The outliers are closer to\nthe model trained with MSE than to the model trained with MAE.\n\n**Figure 10**. A model trained with MSE moves the model closer to the outliers.\n\n**Figure 11**. A model trained with MAE is farther from the outliers.\n\nNote the relationship between the model and the data:\n\n- **MSE**. The model is closer to the outliers but further away from most of\n the other data points.\n\n- **MAE**. The model is further away from the outliers but closer to most of\n the other data points.\n\n\u003cbr /\u003e\n\nCheck Your Understanding\n\n\u003cbr /\u003e\n\nConsider the following two plots:\n\n|---|---|\n| | |\n\nWhich of the two data sets shown in the preceding plots has the **higher** Mean Squared Error (MSE)? \nThe dataset on the left. \nThe six examples on the line incur a total loss of 0.4. The four examples not on the line are not very far off the line, so even squaring their offset still yields a low value: $MSE = \\\\frac{0\\^2 + 1\\^2 + 0\\^2 + 1\\^2 + 0\\^2 + 1\\^2 + 0\\^2 + 1\\^2 + 0\\^2 + 0\\^2} {10} = 0.4$ \nThe dataset on the right. \nThe eight examples on the line incur a total loss of 0.8. However, although only two points lay off the line, both of those points are *twice* as far off the line as the outlier points in the left figure. Squared loss amplifies those differences, so an offset of two incurs a loss four times as great as an offset of one: $MSE = \\\\frac{0\\^2 + 0\\^2 + 0\\^2 + 2\\^2 + 0\\^2 + 0\\^2 + 0\\^2 + 2\\^2 + 0\\^2 + 0\\^2} {10} = 0.8$\n| **Key terms:**\n|\n| - [Mean absolute error (MAE)](/machine-learning/glossary#mean-absolute-error-mae)\n| - [Mean squared error (MSE)](/machine-learning/glossary#mean-squared-error-mse)\n| - [L~1~](/machine-learning/glossary#l1-loss)\n| - [L~2~](/machine-learning/glossary#l2-loss)\n| - [Loss](/machine-learning/glossary#loss)\n| - [Outlier](/machine-learning/glossary#outliers)\n- [Prediction](/machine-learning/glossary#prediction) \n[Help Center](https://support.google.com/machinelearningeducation)"]]