[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["缺少我需要的資訊","missingTheInformationINeed","thumb-down"],["過於複雜/步驟過多","tooComplicatedTooManySteps","thumb-down"],["過時","outOfDate","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["示例/程式碼問題","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2024-11-08 (世界標準時間)。"],[],[],null,["You saw in the [previous exercise](/machine-learning/crash-course/neural-networks/nodes-hidden-layers#exercise_2) that just adding\nhidden layers to our network wasn't sufficient to represent nonlinearities.\nLinear operations performed on linear operations are still linear.\n\nHow can you configure a neural network to learn\nnonlinear relationships between values? We need some way to insert nonlinear\nmathematical operations into a model.\n\nIf this seems somewhat familiar, that's because we've actually applied\nnonlinear mathematical operations to the output of a linear model earlier in\nthe course. In the [Logistic Regression](/machine-learning/crash-course/logistic-regression)\nmodule, we adapted a linear regression model to output a continuous value from 0\nto 1 (representing a probability) by passing the model's output through a\n[**sigmoid function**](/machine-learning/glossary#sigmoid-function).\n\nWe can apply the same principle to our neural network. Let's revisit our model\nfrom [Exercise 2](/machine-learning/crash-course/neural-networks/nodes-hidden-layers#exercise_2) earlier, but this time, before\noutputting the value of each node, we'll first apply the sigmoid function:\n\nTry stepping through the calculations of each node by clicking the **\\\u003e\\|** button\n(to the right of the play button). Review the mathematical operations performed\nto calculate each node value in the *Calculations* panel below the graph.\nNote that each node's output is now a sigmoid transform of the linear\ncombination of the nodes in the previous layer, and the output values are\nall squished between 0 and 1.\n\nHere, the sigmoid serves as an\n[**activation function**](/machine-learning/glossary#activation_function)\nfor the neural network, a nonlinear transform of a neuron's output value\nbefore the value is passed as input to the calculations of the next\nlayer of the neural network.\n\nNow that we've added an activation function, adding layers has more impact.\nStacking nonlinearities on nonlinearities lets us model very complicated\nrelationships between the inputs and the predicted outputs. In brief, each layer\nis effectively learning a more complex, higher-level function over the raw\ninputs. If you'd like to develop more intuition on how this works,\nsee [Chris Olah's excellent blog post](https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/).\n\nCommon activation functions\n\nThree mathematical functions that are commonly used as activation functions are\nsigmoid, tanh, and ReLU.\n\nThe sigmoid function (discussed above) performs the following transform on input\n$x$, producing an output value between 0 and 1:\n\n\\\\\\[F(x)=\\\\frac{1} {1+e\\^{-x}}\\\\\\]\n| The term *sigmoid* is often used more generally to refer to any S-shaped function. A more technically precise term for the specific function $F(x)=\\\\frac{1} {1+e\\^{-x}}$ is *logistic function*.\n\nHere's a plot of this function:\n**Figure 4. Plot of the sigmoid function.**\n\nThe tanh (short for \"hyperbolic tangent\") function transforms input $x$ to\nproduce an output value between --1 and 1:\n\n\\\\\\[F(x)=tanh(x)\\\\\\]\n\nHere's a plot of this function:\n**Figure 5. Plot of the tanh function.** Note that the range of the sigmoid function is 0 to 1, and the range of the tanh function is --1 to 1\n\nThe **rectified linear unit** activation function (or **ReLU**, for\nshort) transforms output using the following algorithm:\n\n- If the input value $x$ is less than 0, return 0.\n- If the input value $x$ is greater than or equal to 0, return the input value.\n\nReLU can be represented mathematically using the max() function: \n$$F(x)=max(0,x)$$\n\nHere's a plot of this function:\n**Figure 6. Plot of the ReLU function.**\n\nReLU often works a little better as an activation function than a smooth\nfunction like sigmoid or tanh, because it is less susceptible to the\n[**vanishing gradient problem**](/machine-learning/glossary#vanishing-gradient-problem)\nduring [neural network training](/machine-learning/crash-course/neural-networks/backpropagation). ReLU is also significantly easier\nto compute than these functions.\n\nOther activation functions\n\nIn practice, any mathematical function can serve as an activation function.\nSuppose that \\\\(\\\\sigma\\\\) represents our activation function.\nThe value of a node in the network is given by the following\nformula: \n$$\\\\sigma(\\\\boldsymbol w \\\\cdot \\\\boldsymbol x+b)$$\n\n[Keras](https://keras.io/) provides out-of-the-box support for many\n[activation functions](https://keras.io/api/layers/activations/).\nThat said, we still recommend starting with ReLU.\n\nSummary\n\nThe following video provides a recap of everything you've learned thus far\nabout how neural networks are constructed: \n\nNow our model has all the standard components of what people usually\nmean when they refer to a neural network:\n\n- A set of nodes, analogous to neurons, organized in layers.\n- A set of learned weights and biases representing the connections between each neural network layer and the layer beneath it. The layer beneath may be another neural network layer, or some other kind of layer.\n- An activation function that transforms the output of each node in a layer. Different layers may have different activation functions.\n\nA caveat: neural networks aren't necessarily always better than\nfeature crosses, but neural networks do offer a flexible alternative that works\nwell in many cases.\n| **Key terms:**\n|\n| - [Activation function](/machine-learning/glossary#activation_function)\n| - [Sigmoid function](/machine-learning/glossary#sigmoid-function)\n- [Vanishing gradient problem](/machine-learning/glossary#vanishing-gradient-problem) \n[Help Center](https://support.google.com/machinelearningeducation)"]]