[null,null,["最后更新时间 (UTC):2022-09-27。"],[[["\u003cp\u003eThis tutorial teaches how Google developed its image classification model used in Google Photos.\u003c/p\u003e\n"],["\u003cp\u003eUsers will learn about convolutional neural networks and build their own image classifier to differentiate cat and dog photos.\u003c/p\u003e\n"],["\u003cp\u003eThe tutorial requires prior knowledge of machine learning fundamentals and basic Python coding skills.\u003c/p\u003e\n"],["\u003cp\u003eTraditional computer vision models relied on raw pixel data and engineered features but were limited in handling variations in images.\u003c/p\u003e\n"],["\u003cp\u003eThis tutorial uses the Keras API, though prior experience is not necessary due to heavily commented code examples and comprehensive documentation.\u003c/p\u003e\n"]]],[],null,["# ML Practicum: Image Classification\n\n\u003cbr /\u003e\n\nLearn how Google developed the state-of-the-art image classification\nmodel powering search in Google Photos. Get a crash course on convolutional neural\nnetworks, and then build your own image classifier to distinguish cat photos\nfrom dog photos.\n| **Estimated Completion Time:** 90--120 minutes\n\nPrerequisites\n-------------\n\n-\n\n [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/)\n\n or equivalent experience with ML fundamentals\n\n- Proficiency in programming basics, and some experience coding in Python\n\n| **Note:** The coding exercises in this\n| practicum use the [Keras](https://keras.io/) API.\n| Keras is a high-level deep-learning API for configuring neural networks. It is\n| available both as a standalone library and as a\n| [module within\n| TensorFlow.](https://www.tensorflow.org/api_docs/python/tf/keras)\n|\n| Prior experience with Keras is not required for the Colab exercises, as code\n| listings are heavily commented and explained step by step. Comprehensive API\ndocumentation is also available on the [Keras site](https://keras.io/). \n\nIntroduction\n------------\n\nIn May 2013, Google [released search for personal\nphotos](https://search.googleblog.com/2013/05/finding-your-photos-more-easily-with.html),\ngiving users the ability to retrieve photos in their libraries based on the\nobjects present in the images.\n\n*Figure 1. Google Photos search for\nSiamese cats delivers the goods!*\n\nThe feature, later incorporated into [Google\nPhotos](https://googleblog.blogspot.com/2015/05/picture-this-fresh-approach-to-photos.html)\nin 2015, was widely perceived as a game-changer, a proof of concept that\ncomputer vision software could classify images to human standards, adding value\nin several ways:\n\n- Users no longer needed to tag photos with labels like \"beach\" to categorize image content, eliminating a manual task that could become quite tedious when managing sets of hundreds or thousands of images.\n- Users could explore their collection of photos in new ways, using search terms to locate photos with objects they might never have tagged. For example, they could search for \"palm tree\" to surface all their vacation photos that had palm trees in the background.\n- Software could potentially \"see\" taxonomical distinctions that end users themselves might not be able to perceive (e.g., distinguishing Siamese and Abyssinian cats), effectively augmenting users' domain knowledge.\n\nHow Image Classification Works\n------------------------------\n\nImage classification is a supervised learning problem: define a set of target\nclasses (objects to identify in images), and train a model to recognize them\nusing labeled example photos. Early computer vision models relied on raw pixel\ndata as the input to the model. However, as shown in Figure 2, raw pixel data\nalone doesn't provide a sufficiently stable representation to encompass the\nmyriad variations of an object as captured in an image. The position of the\nobject, background behind the object, ambient lighting, camera angle, and camera\nfocus all can produce fluctuation in raw pixel data; these differences are\nsignificant enough that they cannot be corrected for by taking weighted averages\nof pixel RGB values.\n\n*Figure 2. **Left** : Cats can be captured\nin a photo in a variety of poses, with different backdrops and lighting\nconditions. **Right**: averaging pixel data to account for this variety does\nnot produce any meaningful information.*\n\n\nTo model objects more flexibly, classic computer vision models added new\nfeatures derived from pixel data, such as [color\nhistograms](https://wikipedia.org/wiki/Color_histogram), textures, and\nshapes. The downside of this approach was that [feature\nengineering](/machine-learning/crash-course/representation/feature-engineering)\nbecame a real burden, as there were so many inputs to tweak. For a cat\nclassifier, which colors were most relevant? How flexible should the shape\ndefinitions be? Because features needed to be tuned so precisely, building\nrobust models was quite challenging, and accuracy suffered.\n| **Key Terms**\n|\n| |-------------------------------------------------------------------------|\n| | - [feature engineering](/machine-learning/glossary#feature_engineering) |\n|"]]