Tetap teratur dengan koleksi Simpan dan kategorikan konten berdasarkan preferensi Anda.
Unit sebelumnya memperkenalkan model berikut, yang salah mengategorikan banyak pohon dalam set pengujian:
Gambar 16. Model kompleks yang berperilaku tidak semestinya dari unit sebelumnya.
Model sebelumnya berisi banyak bentuk kompleks. Apakah model yang lebih sederhana akan menangani data baru dengan lebih baik? Misalkan Anda mengganti model kompleks dengan model yang sangat sederhana--garis lurus.
Gambar 17. Model yang jauh lebih sederhana.
Model sederhana lebih umum daripada model kompleks pada data baru. Artinya, model sederhana membuat prediksi yang lebih baik pada set pengujian daripada model kompleks.
Kesederhanaan telah mengalahkan kompleksitas sejak lama. Faktanya, preferensi untuk kesederhanaan sudah ada sejak zaman Yunani kuno. Berabad-abad kemudian, seorang biarawan abad keempat belas bernama William of Occam memformalkan preferensi kepada kesederhanaan dalam filosofi yang dikenal sebagai pisau cukur Occam. Filosofi ini tetap menjadi prinsip dasar yang penting dari banyak ilmu, termasuk machine learning.
Latihan: Periksa pemahaman Anda
Anda sedang mengembangkan persamaan fisika. Manakah dari formula berikut yang lebih sesuai dengan Occam's Razor?
Formula dengan tiga variabel.
Tiga variabel lebih sesuai dengan Occam daripada dua belas variabel.
Formula dengan dua belas variabel.
Dua belas variabel tampaknya terlalu rumit, bukan? Dua formula fisika paling terkenal sepanjang masa (F=ma dan E=mc2) masing-masing hanya melibatkan tiga variabel.
Anda sedang mengerjakan project machine learning baru, dan akan memilih fitur pertama. Berapa banyak fitur yang harus Anda pilih?
Pilih 1–3 fitur yang tampaknya memiliki kekuatan prediktif yang kuat.
Sebaiknya pipeline pengumpulan data Anda dimulai dengan hanya satu atau dua fitur. Hal ini akan membantu Anda mengonfirmasi bahwa model ML berfungsi sebagaimana mestinya. Selain itu, saat membuat dasar pengukuran dari beberapa fitur, Anda akan merasa telah membuat progres.
Pilih 4–6 fitur yang tampaknya memiliki kekuatan prediktif yang kuat.
Anda mungkin pada akhirnya akan menggunakan banyak fitur ini, tetapi sebaiknya mulailah dengan lebih sedikit. Lebih sedikit fitur biasanya berarti lebih sedikit kerumitan yang tidak perlu.
Pilih sebanyak mungkin fitur, sehingga Anda dapat mulai mengamati fitur mana yang memiliki kekuatan prediktif terkuat.
Mulai dari yang lebih kecil. Setiap fitur baru akan menambahkan dimensi baru ke set data pelatihan Anda. Saat dimensi meningkat, volume ruang meningkat begitu cepat sehingga data pelatihan yang tersedia menjadi jarang. Makin sedikit data Anda, makin sulit bagi model untuk mempelajari hubungan antara fitur yang benar-benar penting dan label. Fenomena ini disebut "kutukan dimensi".
Regularisasi
Model machine learning harus memenuhi dua sasaran yang bertentangan secara bersamaan:
Cocok dengan data.
Sesuaikan data sesederhana mungkin.
Salah satu pendekatan untuk menjaga model tetap sederhana adalah dengan menghukum model yang kompleks; yaitu, memaksa model menjadi lebih sederhana selama pelatihan. Menghukum model kompleks adalah salah satu bentuk regularisasi.
Kerugian dan kompleksitas
Sejauh ini, kursus ini telah menyarankan bahwa satu-satunya sasaran saat pelatihan adalah meminimalkan kerugian; yaitu:
$$\text{minimize(loss)}$$
Seperti yang telah Anda lihat, model yang hanya berfokus pada meminimalkan kerugian cenderung mengalami overfitting. Algoritma pengoptimalan pelatihan yang lebih baik meminimalkan beberapa kombinasi kerugian dan kompleksitas:
$$\text{minimize(loss + complexity)}$$
Sayangnya, kerugian dan kompleksitas biasanya memiliki hubungan terbalik. Seiring kompleksitas meningkat, kerugian menurun. Seiring dengan menurunnya kompleksitas, kerugian akan meningkat. Anda harus menemukan jalan tengah yang wajar saat model membuat prediksi yang baik pada data pelatihan dan data dunia nyata. Artinya, model Anda harus menemukan kompromi yang wajar antara kerugian dan kompleksitas.
Apa yang dimaksud dengan kompleksitas?
Anda telah melihat beberapa cara untuk mengukur kerugian. Bagaimana Anda akan mengukur kompleksitas? Mulai eksplorasi Anda melalui latihan berikut:
Latihan: Periksa intuisi Anda
Sejauh ini, kita masih cukup samar tentang apa sebenarnya kompleksitas. Manakah dari ide berikut yang menurut Anda akan menjadi metrik kompleksitas yang wajar?
Kompleksitas adalah fungsi dari bobot model.
Ya, ini adalah salah satu cara untuk mengukur kompleksitas beberapa model. Metrik ini disebut regularisasi 1.
Kompleksitas adalah fungsi dari kuadrat bobot model.
Ya, Anda dapat mengukur kompleksitas beberapa model dengan cara ini. Metrik ini disebut regularisasi 2.
Kompleksitas adalah fungsi dari bias semua fitur dalam model.
[null,null,["Terakhir diperbarui pada 2024-11-14 UTC."],[[["\u003cp\u003eSimpler models often generalize better to new data than complex models, even if they perform slightly worse on training data.\u003c/p\u003e\n"],["\u003cp\u003eOccam's Razor favors simpler explanations and models, prioritizing them over more complex ones.\u003c/p\u003e\n"],["\u003cp\u003eRegularization techniques help prevent overfitting by penalizing model complexity during training.\u003c/p\u003e\n"],["\u003cp\u003eModel training aims to minimize both loss (errors on training data) and complexity for optimal performance on new data.\u003c/p\u003e\n"],["\u003cp\u003eModel complexity can be quantified using functions of model weights, like L1 and L2 regularization.\u003c/p\u003e\n"]]],[],null,["The previous unit introduced the following model, which miscategorized a lot\nof trees in the test set:\n**Figure 16.** The misbehaving complex model from the previous unit.\n\nThe preceding model contains a lot of complex shapes. Would a simpler\nmodel handle new data better? Suppose you replace the complex model with\na ridiculously simple model--a straight line.\n**Figure 17.** A much simpler model.\n\nThe simple model generalizes better than the complex model on new data. That is,\nthe simple model made better predictions on the test set than the complex model.\n\nSimplicity has been beating complexity for a long time. In fact, the\npreference for simplicity dates back to ancient Greece. Centuries later,\na fourteenth-century friar named William of Occam formalized the preference\nfor simplicity in a philosophy known as [Occam's\nrazor](https://wikipedia.org/wiki/Occam%27s_razor). This philosophy\nremains an essential underlying principle of many sciences, including\nmachine learning.\n| **Note:** Complex models typically outperform simple models on the training set. However, simple models typically outperform complex models on the test set (which is more important).\n\nExercises: Check your understanding \nYou are developing a physics equation. Which of the following formulas conform more closely to Occam's Razor? \nA formula with three variables. \nThree variables is more Occam-friendly than twelve variables. \nA formula with twelve variables. \nTwelve variables seems overly complicated, doesn't it? The two most famous physics formulas of all time (F=ma and E=mc^2^) each involve only three variables. \nYou're on a brand-new machine learning project, about to select your first features. How many features should you pick? \nPick 1--3 features that seem to have strong predictive power. \nIt's best for your data collection pipeline to start with only one or two features. This will help you confirm that the ML model works as intended. Also, when you build a baseline from a couple of features, you'll feel like you're making progress! \nPick 4--6 features that seem to have strong predictive power. \nYou might eventually use this many features, but it's still better to start with fewer. Fewer features usually means fewer unnecessary complications. \nPick as many features as you can, so you can start observing which features have the strongest predictive power. \nStart smaller. Every new feature adds a new dimension to your training dataset. When the dimensionality increases, the volume of the space increases so fast that the available training data become sparse. The sparser your data, the harder it is for a model to learn the relationship between the features that actually matter and the label. This phenomenon is called \"the curse of dimensionality.\"\n\nRegularization\n\nMachine learning models must simultaneously meet two conflicting goals:\n\n- Fit data well.\n- Fit data as simply as possible.\n\nOne approach to keeping a model simple is to penalize complex models; that is,\nto force the model to become simpler during training. Penalizing complex\nmodels is one form of **regularization**.\n| **A regularization analogy:** Suppose every student in a lecture hall had a little buzzer that emitted a sound that annoyed the professor. Students would press the buzzer whenever the professor's lecture became too complicated. Annoyed, the professor would be forced to simplify the lecture. The professor would complain, \"When I simplify, I'm not being precise enough.\" The students would counter with, \"The only goal is to explain it simply enough that I understand it.\" Gradually, the buzzers would train the professor to give an appropriately simple lecture, even if the simpler lecture isn't as sufficiently precise.\n\nLoss and complexity\n\nSo far, this course has suggested that the only goal when training was to\nminimize loss; that is: \n$$\\\\text{minimize(loss)}$$\n\nAs you've seen, models focused solely on minimizing loss tend to overfit.\nA better training optimization algorithm minimizes some combination of\nloss and complexity: \n$$\\\\text{minimize(loss + complexity)}$$\n\nUnfortunately, loss and complexity are typically inversely related. As\ncomplexity increases, loss decreases. As complexity decreases, loss increases.\nYou should find a reasonable middle ground where the model makes good\npredictions on both the training data and real-world data.\nThat is, your model should find a reasonable compromise\nbetween loss and complexity.\n\nWhat is complexity?\n\nYou've already seen a few different ways of quantifying loss. How would\nyou quantify complexity? Start your exploration through the following exercise:\n\nExercise: Check your intuition \nSo far, we've been pretty vague about what *complexity* actually is. Which of the following ideas do you think would be reasonable complexity metrics? \nComplexity is a function of the model's weights. \nYes, this is one way to measure some models' complexity. This metric is called [**L~1~ regularization.**](/machine-learning/glossary#L1_regularization) \nComplexity is a function of the square of the model's weights. \nYes, you can measure some models' complexity this way. This metric is called [**L~2~ regularization**](/machine-learning/glossary#L2_regularization). \nComplexity is a function of the biases of all the features in the model. \nBias doesn't measure complexity.\n| **Key terms:**\n|\n| - [L~1~ regularization](/machine-learning/glossary#L1_regularization)\n| - [L~2~ regularization](/machine-learning/glossary#L2_regularization)\n- [Regularization](/machine-learning/glossary#regularization) \n[Help Center](https://support.google.com/machinelearningeducation)"]]