生产型机器学习系统:要提出的问题
使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。
本课将重点介绍您应针对生产系统中的数据和模型提出的问题。
每项功能是否实用?
您应持续监控模型,以移除对模型预测能力贡献不大或没有贡献的特征。如果该特征的输入数据突然发生变化,模型的行为也可能会以不理想的方式突然发生变化。
另请考虑以下相关问题:
我们总是想向模型添加更多特征。例如,假设您发现添加一项新特征后,模型的预测结果略有改进。预测结果略好一点显然比略差一点要好;不过,额外的功能会增加您的维护负担。
您的数据源是否可靠?
以下是针对输入数据的可靠性询问的一些问题:
- 信号是否始终可用?信号来源是否不可靠?例如:
- 信号是否来自在高负载下崩溃的服务器?
- 信号是否来自每年 8 月都去度假的用户?
- 计算模型输入数据的系统是否会发生变化?如果是,请执行以下操作:
不妨考虑为您从上游进程收到的数据创建自己的副本。然后,仅在您确定可以安全地升级到上游数据的下一个版本时,才执行升级。
您的模型是否属于反馈环?
有时,模型可能会影响自己的训练数据。例如,某些模型的结果反过来会(直接或间接地)成为该模型的输入特征。
有时,一个模型可能会影响另一个模型。例如,假设有以下两个用于预测股票价格的模型:
- 模型 A,这是一个效果不佳的预测模型。
- 模型 B。
由于模型 A 存在 bug,因此错误地决定购买股票 X。这些购买交易会推高股票 X 的价格。模型 B 使用股票 X 的价格作为输入特征,因此可能会得出一些关于股票 X 价值的错误结论。因此,模型 B 可以根据模型 A 的错误行为来买入或卖出股票 X 的股票。反过来,模型 B 的行为可能会影响模型 A,可能会触发郁金香热潮或导致公司 X 的股票价格下跌。
练习:检查您的理解情况
以下哪三种模型容易受到反馈环的影响?
交通状况预测模型 - 使用海滩上的人群规模作为特征之一预测海滩附近各个高速公路出口的拥堵情况。
有些准备前往海滩的游客可能会根据交通状况预测结果制定出行计划。如果海滩上人群规模很大且交通预计会拥堵,则许多人可能会另做打算。这样一来,海滩上游客的数量就会减少,进而使模型作出交通畅通的预测,然后又会导致前往海滩的游客增加,这样,这个循环就会反复下去。
图书推荐模型 - 根据小说的受欢迎程度(即图书的购买量)向用户推荐其可能喜欢的小说。
图书推荐有可能吸引用户购买,而且这些额外销量将作为输入项反馈回模型,从而使该模型更有可能在将来推荐同样的图书。
大学排名模型 - 将选择率(即申请某所学校并被录取的学生所占百分比)作为一项学校评分依据。
此模型的排名可能会提高学生对高评分学校的兴趣,从而使这些学校收到的申请增加。如果这些学校继续录取相同数量的学生,则选择性将会提高(录取的学生所占的百分比将会降低)。这将提升这些学校的排名,从而进一步提高潜在学生的兴趣,以此类推...
一种选举结果模型,通过在投票结束后对 2% 的选民进行调查,预测市长竞选的胜出者。
如果此模型直到投票结束之后才发布其预测,则其预测结果不可能影响投票者的行为。
住宅价值预测模型 - 使用建筑面积(以平方米为单位计算的面积)、卧室数量和地理位置作为特征预测房价。
快速更改房屋位置、建筑面积或卧室数量以响应价格预测是不可能的,因此不可能形成反馈环。不过,面积和卧室数量之间可能存在相关性(面积较大的住宅可能有更多房间),因此可能需要将这两者区分开来。
用于检测照片中人物是否在微笑的面部属性模型,该模型会定期使用每月自动更新的照片库数据库进行训练。
此处没有反馈环,因为模型预测不会对照片数据库产生任何影响。不过,输入数据的版本控制在这里是一个问题,因为这些每月更新可能会对模型产生不可预见的影响。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-27。
[null,null,["最后更新时间 (UTC):2025-07-27。"],[[["\u003cp\u003eContinuously monitor models in production to evaluate feature importance and potentially remove unnecessary ones, ensuring prediction quality and resource efficiency.\u003c/p\u003e\n"],["\u003cp\u003eData reliability is crucial; consider data source stability, potential changes in upstream data processes, and create local data copies to control versioning and mitigate risks.\u003c/p\u003e\n"],["\u003cp\u003eBe aware of feedback loops where a model's predictions influence future input data, potentially leading to unexpected behavior or biased outcomes, especially in interconnected systems.\u003c/p\u003e\n"],["\u003cp\u003eRegularly assess your model by asking if features are truly helpful and if their value outweighs the costs of inclusion, aiming for a balance between prediction accuracy and maintainability.\u003c/p\u003e\n"],["\u003cp\u003eEvaluate if your model is susceptible to a feedback loop and take steps to isolate it if you find it is.\u003c/p\u003e\n"]]],[],null,["This lesson focuses on the questions you should ask about your data\nand model in production systems.\n\nIs each feature helpful?\n\nYou should continuously monitor your model to remove features that contribute\nlittle or nothing to the model's predictive ability. If the input data for\nthat feature abruptly changes, your model's behavior might also abruptly\nchange in undesirable ways.\n\nAlso consider the following related question:\n\n- Does the usefulness of the feature justify the cost of including it?\n\nIt is always tempting to add more features to the model. For example,\nsuppose you find a new feature whose addition makes your model's predictions\nslightly better. Slightly better predictions certainly seem better than\nslightly worse predictions; however, the extra feature adds to your\nmaintenance burden.\n\nIs your data source reliable?\n\nSome questions to ask about the reliability of your input data:\n\n- Is the signal always going to be available or is it coming from an unreliable source? For example:\n - Is the signal coming from a server that crashes under heavy load?\n - Is the signal coming from humans that go on vacation every August?\n- Does the system that computes your model's input data ever change? If so:\n - How often?\n - How will you know when that system changes?\n\nConsider creating your own copy of the data you receive from the\nupstream process. Then, only advance to the next version of the upstream\ndata when you are certain that it is safe to do so.\n\nIs your model part of a feedback loop?\n\nSometimes a model can affect its own training data. For example, the\nresults from some models, in turn, become (directly or indirectly) input\nfeatures to that same model.\n\nSometimes a model can affect another model. For example, consider two\nmodels for predicting stock prices:\n\n- Model A, which is a bad predictive model.\n- Model B.\n\nSince Model A is buggy, it mistakenly decides to buy stock in Stock X.\nThose purchases drive up the price of Stock X. Model B uses the price\nof Stock X as an input feature, so Model B can come to some false\nconclusions about the value of Stock X. Model B could, therefore,\nbuy or sell shares of Stock X based on the buggy behavior of Model A.\nModel B's behavior, in turn, can affect Model A, possibly triggering a\n[tulip mania](https://wikipedia.org/wiki/Tulip_mania) or a slide in\nCompany X's stock.\n\nExercise: Check your understanding \nWhich **three** of the following models are susceptible to a feedback loop? \nA traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features. \nSome beachgoers are likely to base their plans on the traffic forecast. If there is a large beach crowd and traffic is forecast to be heavy, many people may make alternative plans. This may depress beach turnout, resulting in a lighter traffic forecast, which then may increase attendance, and the cycle repeats. \nA book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased). \nBook recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future. \nA university-ranking model that rates schools in part by their selectivity---the percentage of students who applied that were admitted. \nThe model's rankings may drive additional interest to top-rated schools, increasing the number of applications they receive. If these schools continue to admit the same number of students, selectivity will increase (the percentage of students admitted will go down). This will boost these schools' rankings, which will further increase prospective student interest, and so on... \nAn election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed. \nIf the model does not publish its forecast until after the polls have closed, it is not possible for its predictions to affect voter behavior. \nA housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features. \nIt is not possible to quickly change a house's location, size, or number of bedrooms in response to price forecasts, making a feedback loop unlikely. However, there is potentially a correlation between size and number of bedrooms (larger homes are likely to have more rooms) that may need to be teased apart. \nA face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly. \nThere is no feedback loop here, as model predictions don't have any impact on the photo database. However, versioning of the input data is a concern here, as these monthly updates could potentially have unforeseen effects on the model. \n[Help Center](https://support.google.com/machinelearningeducation)"]]