阈值和混淆矩阵
使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。
假设您有一个用于垃圾邮件检测的逻辑回归模型,该模型预测一个介于 0 到 1 之间的值,表示给定电子邮件是垃圾邮件的概率。预测值为 0.50 表示该电子邮件是垃圾邮件的概率为 50%,预测值为 0.75 表示该电子邮件是垃圾邮件的概率为 75%,以此类推。
您希望在电子邮件应用中部署此模型,以将垃圾邮件过滤到单独的邮件文件夹中。不过,为此,您需要转换模型的原始数值输出(例如0.75
)分为“垃圾邮件”或“非垃圾邮件”这两类。
如需进行此转换,您需要选择一个阈值概率,称为分类阈值。然后,概率高于阈值的示例会被分配到正类别(即您要测试的类,此处为 spam
)。概率较低的示例会被分配到负类别(即备选类别,此处为 not spam
)。
点击此处详细了解分类阈值
您可能想知道:如果预测得分等于分类阈值(例如,得分为 0.5,分类阈值也为 0.5),会怎么样?此类情况的处理方式取决于为分类模型选择的具体实现。如果得分和阈值相等,Keras 库会预测负类,但其他工具/框架可能会以不同的方式处理这种情况。
假设模型为某封电子邮件评分为 0.99,预测该电子邮件是垃圾邮件的概率为 99%;为另一封电子邮件评分为 0.51,预测该电子邮件是垃圾邮件的概率为 51%。如果您将分类阈值设置为 0.5,模型将将这两封电子邮件都归类为垃圾邮件。如果您将阈值设置为 0.95,则只有得分为 0.99 的电子邮件才会被归类为垃圾邮件。
虽然 0.5 可能看起来是一个直观的阈值,但如果一种错误分类的代价大于另一种,或者类不平衡,则不建议使用该阈值。如果只有 0.01% 的电子邮件是垃圾邮件,或者将合法电子邮件归错会比让垃圾邮件进入收件箱更糟糕,那么如果将模型认为至少有 50% 的可能性是垃圾邮件的任何内容都标记为垃圾邮件,就会产生不理想的结果。
混淆矩阵
概率得分不是现实情况,也不是标准答案。二元分类器的每个输出有四种可能的结果。在垃圾邮件分类器示例中,如果您将标准答案作为列,将模型的预测作为行,则会得到以下表格(称为混淆矩阵):
| 实际正例 | 实际负例 |
预测为正例 | 真正例 (TP):垃圾邮件被正确分类为垃圾邮件。这些是系统自动发送到“垃圾邮件”文件夹的垃圾邮件。 | 假正例 (FP):非垃圾邮件被误分类为垃圾邮件。这些是最终被移至“垃圾邮件”文件夹的正常电子邮件。 |
预测为负 | 假负例 (FN):垃圾邮件被误分类为非垃圾邮件。这些垃圾邮件未被垃圾邮件过滤器捕获,而是进入了收件箱。 | 真负例 (TN):非垃圾邮件被正确分类为非垃圾邮件。 这些是直接发送到收件箱的真实电子邮件。 |
请注意,每行的总和表示所有预测正例 (TP + FP) 和所有预测负例 (FN + TN),无论其有效性如何。与此同时,每个列中的总和会显示所有真实正例 (TP + FN) 和所有真实负例 (FP + TN),而不会考虑模型分类。
如果实际正例的总数与实际负例的总数不接近,则表示数据集不平衡。不平衡数据集的一个示例可能是一组数以千计的云彩照片,其中您感兴趣的罕见云彩类型(例如卷云)只出现了几次。
阈值对真正例、假正例和假负例的影响
不同的阈值通常会导致真正例、假正例、真负例和假负例的数量不同。以下视频介绍了具体原因。
请尝试自行更改阈值。
此 widget 包含三个玩具数据集:
- 分离:正例和负例通常区分明显,大多数正例的分数高于负例。
- 未分离,其中许多正例的得分低于负例,许多负例的得分高于正例。
- 不均衡,仅包含少数正类示例。
检查您的理解情况
1. 假设有一个钓鱼式攻击或恶意软件分类模型,其中钓鱼式攻击网站和恶意软件网站属于标记为 1(true)的类别,无害网站属于标记为 0(false)的类别。此模型误将合法网站归类为恶意软件。这叫什么?
假正例
负例(合法网站)被错误地归类为正例(恶意软件网站)。
真正例
真正例是指被正确归类为恶意软件的恶意软件网站。
假负例
假负例是指恶意软件网站被错误地归类为合法网站。
真负例
真负例是指被正确归类为合法网站的合法网站。
2. 一般来说,如果提高分类阈值,假正例的数量会怎样?真正例又如何?请尝试使用上面的滑块。
真正例和假正例都会减少。
随着阈值的提高,模型预测的正例(包括真正例和假正例)总数可能会减少。如果垃圾邮件分类器的阈值为 .9999,则只有在其认为分类概率至少为 99.99% 时,才会将电子邮件标记为垃圾邮件,这意味着它不太可能误标记合法电子邮件,但也可能会漏掉实际的垃圾邮件。
真正例和假正例都会增加。
使用上方的滑块,尝试将阈值设置为 0.1,然后将其拖动到 0.9。假正例和真正例的数量会发生什么变化?
真正例增加。误报减少。
使用上方的滑块,尝试将阈值设置为 0.1,然后将其拖动到 0.9。假正例和真正例的数量会发生什么变化?
3. 一般来说,如果提高分类阈值,假负例的数量会怎样?真负例又如何?请尝试使用上面的滑块。
真负例和假负例都会增加。
随着阈值的提高,模型预测的总体负例数量(包括真负例和假负例)可能会增加。如果设置非常高的阈值,几乎所有电子邮件(包括垃圾邮件和非垃圾邮件)都会被归类为非垃圾邮件。
真负例和假负例都会减少。
使用上方的滑块,尝试将阈值设置为 0.1,然后将其拖动到 0.9。假负例数和真负例数会发生什么变化?
真负例会增加。假负例减少。
使用上方的滑块,尝试将阈值设置为 0.1,然后将其拖动到 0.9。假负例数和真负例数会发生什么变化?
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-05-22。
[null,null,["最后更新时间 (UTC):2025-05-22。"],[],[],null,["Let's say you have a logistic regression model for spam-email detection that\npredicts a value between 0 and 1, representing the probability that a given\nemail is spam. A prediction of 0.50 signifies a 50% likelihood that the email is\nspam, a prediction of 0.75 signifies a 75% likelihood that the email is spam,\nand so on.\n\nYou'd like to deploy this model in an email application to filter spam into\na separate mail folder. But to do so, you need to convert the model's raw\nnumerical output (e.g., `0.75`) into one of two categories: \"spam\" or \"not\nspam.\"\n\nTo make this conversion, you choose a threshold probability, called a\n[**classification threshold**](/machine-learning/glossary#classification-threshold).\nExamples with a probability above the threshold value are then assigned\nto the [**positive class**](/machine-learning/glossary#positive_class),\nthe class you are testing for (here, `spam`). Examples with a lower\nprobability are assigned to the [**negative class**](/machine-learning/glossary#negative_class),\nthe alternative class (here, `not spam`). \n\n**Click here for more details on the classification threshold**\n\nYou may be wondering: what happens if the predicted score is equal to\nthe classification threshold (for instance, a score of 0.5 where\nthe classification threshold is also 0.5)? Handling for this case\ndepends on the particular implementation chosen for the classification\nmodel. The [Keras](https://keras.io/)\nlibrary predicts the negative class if the score and threshold\nare equal, but other tools/frameworks may handle this case\ndifferently.\n\nSuppose the model scores one email as 0.99, predicting\nthat email has a 99% chance of being spam, and another email as\n0.51, predicting it has a 51% chance of being spam. If you set the\nclassification threshold to 0.5, the model will classify both emails as\nspam. If you set the threshold to 0.95, only the email scoring 0.99 will\nbe classified as spam.\n\nWhile 0.5 might seem like an intuitive threshold, it's not a good idea if the\ncost of one type of wrong classification is greater than the other, or if the\nclasses are imbalanced. If only 0.01% of emails are spam, or if misfiling\nlegitimate emails is worse than letting spam into the inbox,\nlabeling anything the model considers at least 50% likely to be spam\nas spam produces undesirable results.\n\nConfusion matrix\n\nThe probability score is not reality, or\n[**ground truth**](/machine-learning/glossary#ground_truth).\nThere are four possible outcomes for each output from a binary classifier.\nFor the spam classifier example, if you lay out the ground truth as columns\nand the model's prediction as rows, the following table, called a\n[**confusion matrix**](/machine-learning/glossary#confusion_matrix), is the\nresult:\n\n| | Actual positive | Actual negative |\n| Predicted positive | **True positive (TP)**: A spam email correctly classified as a spam email. These are the spam messages automatically sent to the spam folder. | **False positive (FP)**: A not-spam email misclassified as spam. These are the legitimate emails that wind up in the spam folder. |\n| Predicted negative | **False negative (FN)**: A spam email misclassified as not-spam. These are spam emails that aren't caught by the spam filter and make their way into the inbox. | **True negative (TN)**: A not-spam email correctly classified as not-spam. These are the legitimate emails that are sent directly to the inbox. |\n|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|\n\nNotice that the total in each row gives all predicted positives (TP + FP) and\nall predicted negatives (FN + TN), regardless of validity. The total in each\ncolumn, meanwhile, gives all real positives (TP + FN) and all real negatives\n(FP + TN) regardless of model classification.\n\nWhen the total of actual positives is not close to the total of actual\nnegatives, the dataset is\n[**imbalanced**](/machine-learning/glossary#class_imbalanced_data_set). An instance\nof an imbalanced dataset might be a set of thousands of photos of clouds, where\nthe rare cloud type you are interested in, say, volutus clouds, only appears\na few times.\n\nEffect of threshold on true and false positives and negatives\n\nDifferent thresholds usually result in different numbers of true and false\npositives and true and false negatives. The following video explains why this is\nthe case. \n\nTry changing the threshold yourself.\n\nThis widget includes three toy datasets:\n\n- **Separated**, where positive examples and negative examples are generally well differentiated, with most positive examples having higher scores than negative examples.\n- **Unseparated**, where many positive examples have lower scores than negative examples, and many negative examples have higher scores than positive examples.\n- **Imbalanced**, containing only a few examples of the positive class.\n\nCheck your understanding \n1. Imagine a phishing or malware classification model where phishing and malware websites are in the class labeled **1** (true) and harmless websites are in the class labeled **0** (false). This model mistakenly classifies a legitimate website as malware. What is this called? \nA false positive \nA negative example (legitimate site) has been wrongly classified as a positive example (malware site). \nA true positive \nA true positive would be a malware site correctly classified as malware. \nA false negative \nA false negative would be a malware site incorrectly classified as a legitimate site. \nA true negative \nA true negative would be a legitimate site correctly classified as a legitimate site. \n2. In general, what happens to the number of false positives when the classification threshold increases? What about true positives? Experiment with the slider above. \nBoth true and false positives decrease. \nAs the threshold increases, the model will likely predict fewer positives overall, both true and false. A spam classifier with a threshold of .9999 will only label an email as spam if it considers the classification to be at least 99.99% likely, which means it is highly unlikely to mislabel a legitimate email, but also likely to miss actual spam email. \nBoth true and false positives increase. \nUsing the slider above, try setting the threshold to 0.1, then dragging it to 0.9. What happens to the number of false positives and true positives? \nTrue positives increase. False positives decrease. \nUsing the slider above, try setting the threshold to 0.1, then dragging it to 0.9. What happens to the number of false positives and true positives? \n3. In general, what happens to the number of false negatives when the classification threshold increases? What about true negatives? Experiment with the slider above. \nBoth true and false negatives increase. \nAs the threshold increases, the model will likely predict more negatives overall, both true and false. At a very high threshold, almost all emails, both spam and not-spam, will be classified as not-spam. \nBoth true and false negatives decrease. \nUsing the slider above, try setting the threshold to 0.1, then dragging it to 0.9. What happens to the number of false negatives and true negatives? \nTrue negatives increase. False negatives decrease. \nUsing the slider above, try setting the threshold to 0.1, then dragging it to 0.9. What happens to the number of false negatives and true negatives?\n| **Key terms:**\n|\n| - [Binary classification](/machine-learning/glossary#binary-classification)\n| - [Class-imbalanced dataset](/machine-learning/glossary#class_imbalanced_data_set)\n| - [Classification threshold](/machine-learning/glossary#classification-threshold)\n| - [Confusion matrix](/machine-learning/glossary#confusion_matrix)\n| - [Ground truth](/machine-learning/glossary#ground_truth)\n| - [Negative class](/machine-learning/glossary#negative_class)\n- [Positive class](/machine-learning/glossary#positive_class) \n[Help Center](https://support.google.com/machinelearningeducation)"]]