Figure nn.11
(a) The result of combining two opposite-facing soft threshold functions to produce a ridge.
(b) The result of combining two ridges to produce a bump.
This turns out to be an easy problem if we think of a network the right way: as a function hw(x) parameterized by the weights w. Given an input vector x =(x1, x2), the activations ofthe input units are set to (a1,a2) = (x1, x2). The output at unit 5 is given by :
Thus, we have the output expressed as a function of the inputs and the weights. A similar expression holds for unit 6. As long as we can calculate the derivatives of such expressions with respect to the weights, we can use the gradient-descent loss-minimization method to train the network.
Before delving into learning rules, let us look at the ways in which networks generate complicated functions. For example, by adding two opposite-facing soft threshold functions and thresholding the result, we can obtain a "ridge" function as shown in Figure NN.11(a). Combining two such ridges at right angles to each other (i.e., combining the outputs from four hidden units), we obtain a "bump" as shown in Figure NN.11(b).
With more hidden wlits, we can produce more bumps of different sizes in more places. In fact, with a single, sufficiently large Iudden layer, it is poss.ible to represent any continuous ftmction of the inputs with arbitrary accuracy; with two layers, even discontinuous functions can be represented. Unfortunately, for any particular network structure, it is harder to characterize exactly which functions can be represented and which ones cannot.
Related Posts :
Neural Networks
Neural Network Structures
Neural Networks Single Layer
Data Classification
Neural Networks
Neural Network Structures
Neural Networks Single Layer
Data Classification