第 5 步:调整超参数
使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。
我们必须选择一些超参数来定义和训练模型。我们依靠直觉、示例和最佳实践建议。但是,我们首选的超参数值可能无法产生最佳结果。这只是一个很好的训练起点。每个问题都不相同,并且调整这些超参数将有助于优化模型,以便更好地表示当前问题的性质。让我们来看看使用的一些超参数以及调整它们的含义:
模型的层数:神经网络中的层数是其复杂性的一个指标。选择该值时必须小心谨慎。层数过多会使模型学习有关训练数据的过多信息,从而导致过拟合。层过少会限制模型的学习能力,导致出现欠拟合。对于文本分类数据集,我们尝试使用一层、两层和三层 MLP。具有两个层的模型表现良好,在某些情况下效果优于三层模型。同样,我们还尝试了包含 4 层和 6 层的 sepCNN,并且四层模型的性能良好。
每层的单元数:层中的单元必须包含层执行的转换的信息。对于第一层,这取决于特征的数量。在后续层中,单元数取决于在上一个层中扩展或收缩表示法。尽量减少图层之间的信息损失。我们尝试了 [8, 16, 32, 64]
范围内的单位值,32/64 个单位的效果最佳。
丢弃率:在模型中,丢弃层用于正则化。它们定义了要下降的输入比例,以防出现过拟合。建议范围:0.2–0.5。
学习速率:这是神经网络权重在各迭代期间发生更改的速率。较大的学习速率可能会导致权重出现大幅波动,我们可能永远无法找出它们的最佳值。低学习速率是好现象,但模型需要更多迭代才能收敛。建议从最低点开始,例如 1e-4。如果训练速度很慢,请提高此值。如果您的模型没有学习,请尝试降低学习速率。
此外,我们还优化了几个针对我们的 sepCNN 模型的额外超参数:
内核大小:卷积窗口的大小。建议值:3 或 5。
嵌入维度:我们用于表示字词嵌入的维度数量,即每个字词向量的大小。建议的值:50–300。 在我们的实验中,我们使用包含 200 个维度的 GloVe 嵌入和一个预训练嵌入层。
使用这些超参数看看哪些参数效果最好。为您的使用场景选择性能最佳的超参数后,模型就可以部署了。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-27。
[null,null,["最后更新时间 (UTC):2025-07-27。"],[[["\u003cp\u003eInitial hyperparameter choices provide a starting point for model training, but further tuning is crucial to optimize performance for specific text classification problems.\u003c/p\u003e\n"],["\u003cp\u003eThe number of layers in a neural network impacts its complexity, with two-layer MLPs and four-layer sepCNNs showing promising results in text classification.\u003c/p\u003e\n"],["\u003cp\u003eKey hyperparameters to adjust include the number of units per layer (32 or 64 performed well), dropout rate (0.2-0.5 recommended), and learning rate (start low and adjust based on training progress).\u003c/p\u003e\n"],["\u003cp\u003eFor sepCNN models, optimizing kernel size (3 or 5) and embedding dimensions (50-300) further enhances performance.\u003c/p\u003e\n"],["\u003cp\u003eExperimenting with different hyperparameter combinations is essential to achieve the best model performance for your specific use case before deployment.\u003c/p\u003e\n"]]],[],null,["We had to choose a number of hyperparameters for defining and training the\nmodel. We relied on intuition, examples and best practice recommendations. Our\nfirst choice of hyperparameter values, however, may not yield the best results.\nIt only gives us a good starting point for training. Every problem is different\nand tuning these hyperparameters will help refine our model to better represent\nthe particularities of the problem at hand. Let's take a look at some of the\nhyperparameters we used and what it means to tune them:\n\n- **Number of layers in the model** : The number of layers in a neural network is\n an indicator of its complexity. We must be careful in choosing this value. Too\n many layers will allow the model to learn too much information about the\n training data, causing overfitting. Too few layers can limit the model's\n learning ability, causing underfitting. For text classification datasets, we\n experimented with one, two, and three-layer MLPs. Models with two layers\n performed well, and in some cases better than three-layer models. Similarly, we\n tried [sepCNN](https://developers.google.com/machine-learning/glossary?utm_source=DevSite&utm_campaign=Text-Class-Guide&utm_medium=referral&utm_content=glossary&utm_term=sepCNN#depthwise-separable-convolutional-neural-network-sepcnn)s\n with four and six layers, and the four-layer models performed well.\n\n- **Number of units per layer** : The units in a layer must hold the information\n for the transformation that a layer performs. For the first layer, this is\n driven by the number of features. In subsequent layers, the number of units\n depends on the choice of expanding or contracting the representation from the\n previous layer. Try to minimize the information loss between layers. We tried\n unit values in the range `[8, 16, 32, 64]`, and 32/64 units worked well.\n\n- **Dropout rate** : Dropout layers are used in the model for\n [regularization](https://developers.google.com/machine-learning/glossary/?utm_source=DevSite&utm_campaign=Text-Class-Guide&utm_medium=referral&utm_content=glossary&utm_term=dropout-regularization#dropout_regularization).\n They define the fraction of input to drop as a precaution for overfitting.\n Recommended range: 0.2--0.5.\n\n- **Learning rate**: This is the rate at which the neural network weights change\n between iterations. A large learning rate may cause large swings in the weights,\n and we may never find their optimal values. A low learning rate is good, but the\n model will take more iterations to converge. It is a good idea to start low, say\n at 1e-4. If the training is very slow, increase this value. If your model is not\n learning, try decreasing learning rate.\n\nThere are couple of additional hyperparameters we tuned that are specific to our\nsepCNN model:\n\n1. **Kernel size**: The size of the convolution window. Recommended values: 3 or\n 5.\n\n2. **Embedding dimensions**: The number of dimensions we want to use to represent\n word embeddings---i.e., the size of each word vector. Recommended values: 50--300.\n In our experiments, we used GloVe embeddings with 200 dimensions with a pre-\n trained embedding layer.\n\nPlay around with these hyperparameters and see what works best. Once you have\nchosen the best-performing hyperparameters for your use case, your model is\nready to be deployed."]]