門檻和混淆矩陣
透過集合功能整理內容 你可以依據偏好儲存及分類內容。
假設您有一個用於垃圾電子郵件偵測的邏輯迴歸模型,該模型會預測介於 0 和 1 之間的值,代表特定電子郵件為垃圾郵件的機率。預測值為 0.50 表示電子郵件為垃圾郵件的可能性為 50%,預測值為 0.75 表示電子郵件為垃圾郵件的可能性為 75%,以此類推。
您想在電子郵件應用程式中部署這個模型,將垃圾郵件篩除至不同的郵件資料夾。不過,您必須轉換模型的原始數值輸出內容 (例如 0.75
) 分類為「垃圾內容」或「非垃圾內容」。
如要進行這項轉換,您必須選擇門檻機率,也就是分類門檻。機率高於門檻值的示例會指派給正類別,也就是您要測試的類別 (此處為 spam
)。機率較低的示例會指派給負類別,也就是替代類別 (此處為 not spam
)。
按這裡進一步瞭解分類門檻
您可能會想知道:如果預測分數等於分類門檻 (例如分數為 0.5,分類門檻也為 0.5),會發生什麼情況?這類情況的處理方式取決於為分類模型選擇的具體實作方式。如果分數和閾值相等,Keras 程式庫會預測負向類別,但其他工具/架構可能會以不同方式處理這個情況。
假設模型為一封電子郵件評分 0.99,預測該郵件有 99% 的機率是垃圾郵件,而另一封郵件評分 0.51,預測該郵件有 51% 的機率是垃圾郵件。如果將分類閾值設為 0.5,模型會將兩封電子郵件都歸類為垃圾郵件。如果將閾值設為 0.95,只有得分 0.99 的電子郵件才會被歸類為垃圾郵件。
雖然 0.5 似乎是直覺的閾值,但如果某種錯誤分類的成本高於其他類型,或是類別不平衡,就不適合使用這個閾值。如果只有 0.01% 的電子郵件是垃圾郵件,或是將合法電子郵件歸檔的情況比垃圾郵件進入收件匣更糟糕,那麼將模型認為至少有 50% 機率為垃圾郵件的郵件標示為垃圾郵件,就會導致不理想的結果。
混淆矩陣
機率分數並非真實值或基準真相。二元分類器的每個輸出結果都有四種可能的結果。以垃圾郵件分類器範例來說,如果您將真值列為欄,模型的預測結果列為欄,則結果會是下列表格,稱為混淆矩陣:
| 實際陽性 | 實際陰性 |
預測為陽性 | 真陽性 (TP):垃圾郵件正確歸類為垃圾郵件。這些垃圾郵件會自動傳送至垃圾郵件資料夾。 | 誤判為垃圾郵件 (FP):將非垃圾郵件誤判為垃圾郵件。這些是會進入垃圾郵件資料夾的正常電子郵件。 |
預測為否 | 假陰性 (FN):垃圾郵件誤判為非垃圾郵件。這些是垃圾郵件,但未被垃圾郵件篩選器攔截,因此會傳送到收件匣。 | 真陰性 (TN):系統正確將非垃圾郵件歸類為非垃圾郵件。這些是直接傳送至收件匣的正常電子郵件。 |
請注意,每個資料列的總和會列出所有預測的正例 (TP + FP) 和所有預測的負例 (FN + TN),不論其有效性為何。同時,每個資料欄的總和會列出所有真陽性 (TP + FN) 和所有真陰性 (FP + TN),無論模型分類為何。
如果實際正例總數與實際負例總數相差太多,資料集就會失衡。不平衡資料集的例子可能是成千上萬張雲朵相片,其中您感興趣的罕見雲朵類型 (例如捲雲) 只出現幾次。
門檻值對真陽性、偽陽性和偽陰性的影響
不同的門檻通常會產生不同的真陽性、偽陽性、真陰性和偽陰性數量。以下影片說明為何如此。
請嘗試自行變更門檻。
這個小工具包含三個玩具資料集:
- 分開:正例和負例通常有明確的差異,且大多數正例的分數高於負例。
- 未分離:許多正面範例的分數低於負面範例,許多負面範例的分數高於正面範例。
- 不平衡,只包含少數正面類別的示例。
進行隨堂測驗
1. 假設網路釣魚或惡意軟體分類模型中,網路釣魚和惡意軟體網站屬於標示為「1」 (true) 的類別,而無害網站則屬於標示為「0」 (false) 的類別。這個模型誤將合法網站歸類為惡意軟體。這叫做什麼?
偽陽性
負面示例 (合法網站) 遭錯誤歸類為正面示例 (惡意軟體網站)。
真陽性
真陽性是指惡意軟體網站正確歸類為惡意軟體。
真陰性
真陰性是指系統正確將合法網站歸類為合法網站。
2. 一般來說,分類閾值提高後,誤判數量會如何變化?那麼真陽性呢?試著使用上方的滑桿。
真陽性和偽陽性都會減少。
隨著門檻提高,模型預測的整體陽性結果 (真陽性和偽陽性) 可能會減少。閾值為 .9999 的垃圾郵件分類器,只有在認為分類機率至少為 99.99% 時,才會將電子郵件標示為垃圾郵件,這表示分類器幾乎不會誤將合法電子郵件標示為垃圾郵件,但也可能會漏掉實際的垃圾郵件。
真陽性和偽陽性都會增加。
請使用上方的滑桿,將門檻設為 0.1,然後拖曳至 0.9。偽陽性和真陽性的數量會發生什麼情況?
真陽性增加。偽陽性減少。
請使用上方的滑桿,將門檻設為 0.1,然後拖曳至 0.9。偽陽性和真陽性的數量會發生什麼情況?
3. 一般來說,當分類閾值提高時,錯誤剔除的數量會發生什麼變化?真陰性呢?試著使用上方的滑桿。
真陰性和偽陰性都會增加。
隨著門檻提高,模型預測的整體負面結果 (真負面和偽負面) 數量可能會增加。在門檻極高的情況下,幾乎所有電子郵件 (包括垃圾郵件和非垃圾郵件) 都會歸類為非垃圾郵件。
真陰性和偽陰性都會減少。
請使用上方的滑桿,將門檻設為 0.1,然後拖曳至 0.9。偽陰性和真陰性數量會發生什麼變化?
真陰性增加。偽陰性結果減少。
請使用上方的滑桿,將門檻設為 0.1,然後拖曳至 0.9。偽陰性和真陰性數量會發生什麼變化?
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-05-22 (世界標準時間)。
[null,null,["上次更新時間:2025-05-22 (世界標準時間)。"],[],[],null,["Let's say you have a logistic regression model for spam-email detection that\npredicts a value between 0 and 1, representing the probability that a given\nemail is spam. A prediction of 0.50 signifies a 50% likelihood that the email is\nspam, a prediction of 0.75 signifies a 75% likelihood that the email is spam,\nand so on.\n\nYou'd like to deploy this model in an email application to filter spam into\na separate mail folder. But to do so, you need to convert the model's raw\nnumerical output (e.g., `0.75`) into one of two categories: \"spam\" or \"not\nspam.\"\n\nTo make this conversion, you choose a threshold probability, called a\n[**classification threshold**](/machine-learning/glossary#classification-threshold).\nExamples with a probability above the threshold value are then assigned\nto the [**positive class**](/machine-learning/glossary#positive_class),\nthe class you are testing for (here, `spam`). Examples with a lower\nprobability are assigned to the [**negative class**](/machine-learning/glossary#negative_class),\nthe alternative class (here, `not spam`). \n\n**Click here for more details on the classification threshold**\n\nYou may be wondering: what happens if the predicted score is equal to\nthe classification threshold (for instance, a score of 0.5 where\nthe classification threshold is also 0.5)? Handling for this case\ndepends on the particular implementation chosen for the classification\nmodel. The [Keras](https://keras.io/)\nlibrary predicts the negative class if the score and threshold\nare equal, but other tools/frameworks may handle this case\ndifferently.\n\nSuppose the model scores one email as 0.99, predicting\nthat email has a 99% chance of being spam, and another email as\n0.51, predicting it has a 51% chance of being spam. If you set the\nclassification threshold to 0.5, the model will classify both emails as\nspam. If you set the threshold to 0.95, only the email scoring 0.99 will\nbe classified as spam.\n\nWhile 0.5 might seem like an intuitive threshold, it's not a good idea if the\ncost of one type of wrong classification is greater than the other, or if the\nclasses are imbalanced. If only 0.01% of emails are spam, or if misfiling\nlegitimate emails is worse than letting spam into the inbox,\nlabeling anything the model considers at least 50% likely to be spam\nas spam produces undesirable results.\n\nConfusion matrix\n\nThe probability score is not reality, or\n[**ground truth**](/machine-learning/glossary#ground_truth).\nThere are four possible outcomes for each output from a binary classifier.\nFor the spam classifier example, if you lay out the ground truth as columns\nand the model's prediction as rows, the following table, called a\n[**confusion matrix**](/machine-learning/glossary#confusion_matrix), is the\nresult:\n\n| | Actual positive | Actual negative |\n| Predicted positive | **True positive (TP)**: A spam email correctly classified as a spam email. These are the spam messages automatically sent to the spam folder. | **False positive (FP)**: A not-spam email misclassified as spam. These are the legitimate emails that wind up in the spam folder. |\n| Predicted negative | **False negative (FN)**: A spam email misclassified as not-spam. These are spam emails that aren't caught by the spam filter and make their way into the inbox. | **True negative (TN)**: A not-spam email correctly classified as not-spam. These are the legitimate emails that are sent directly to the inbox. |\n|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|\n\nNotice that the total in each row gives all predicted positives (TP + FP) and\nall predicted negatives (FN + TN), regardless of validity. The total in each\ncolumn, meanwhile, gives all real positives (TP + FN) and all real negatives\n(FP + TN) regardless of model classification.\n\nWhen the total of actual positives is not close to the total of actual\nnegatives, the dataset is\n[**imbalanced**](/machine-learning/glossary#class_imbalanced_data_set). An instance\nof an imbalanced dataset might be a set of thousands of photos of clouds, where\nthe rare cloud type you are interested in, say, volutus clouds, only appears\na few times.\n\nEffect of threshold on true and false positives and negatives\n\nDifferent thresholds usually result in different numbers of true and false\npositives and true and false negatives. The following video explains why this is\nthe case. \n\nTry changing the threshold yourself.\n\nThis widget includes three toy datasets:\n\n- **Separated**, where positive examples and negative examples are generally well differentiated, with most positive examples having higher scores than negative examples.\n- **Unseparated**, where many positive examples have lower scores than negative examples, and many negative examples have higher scores than positive examples.\n- **Imbalanced**, containing only a few examples of the positive class.\n\nCheck your understanding \n1. Imagine a phishing or malware classification model where phishing and malware websites are in the class labeled **1** (true) and harmless websites are in the class labeled **0** (false). This model mistakenly classifies a legitimate website as malware. What is this called? \nA false positive \nA negative example (legitimate site) has been wrongly classified as a positive example (malware site). \nA true positive \nA true positive would be a malware site correctly classified as malware. \nA false negative \nA false negative would be a malware site incorrectly classified as a legitimate site. \nA true negative \nA true negative would be a legitimate site correctly classified as a legitimate site. \n2. In general, what happens to the number of false positives when the classification threshold increases? What about true positives? Experiment with the slider above. \nBoth true and false positives decrease. \nAs the threshold increases, the model will likely predict fewer positives overall, both true and false. A spam classifier with a threshold of .9999 will only label an email as spam if it considers the classification to be at least 99.99% likely, which means it is highly unlikely to mislabel a legitimate email, but also likely to miss actual spam email. \nBoth true and false positives increase. \nUsing the slider above, try setting the threshold to 0.1, then dragging it to 0.9. What happens to the number of false positives and true positives? \nTrue positives increase. False positives decrease. \nUsing the slider above, try setting the threshold to 0.1, then dragging it to 0.9. What happens to the number of false positives and true positives? \n3. In general, what happens to the number of false negatives when the classification threshold increases? What about true negatives? Experiment with the slider above. \nBoth true and false negatives increase. \nAs the threshold increases, the model will likely predict more negatives overall, both true and false. At a very high threshold, almost all emails, both spam and not-spam, will be classified as not-spam. \nBoth true and false negatives decrease. \nUsing the slider above, try setting the threshold to 0.1, then dragging it to 0.9. What happens to the number of false negatives and true negatives? \nTrue negatives increase. False negatives decrease. \nUsing the slider above, try setting the threshold to 0.1, then dragging it to 0.9. What happens to the number of false negatives and true negatives?\n| **Key terms:**\n|\n| - [Binary classification](/machine-learning/glossary#binary-classification)\n| - [Class-imbalanced dataset](/machine-learning/glossary#class_imbalanced_data_set)\n| - [Classification threshold](/machine-learning/glossary#classification-threshold)\n| - [Confusion matrix](/machine-learning/glossary#confusion_matrix)\n| - [Ground truth](/machine-learning/glossary#ground_truth)\n| - [Negative class](/machine-learning/glossary#negative_class)\n- [Positive class](/machine-learning/glossary#positive_class) \n[Help Center](https://support.google.com/machinelearningeducation)"]]