Neural Networks : Single Layer Feed-Forward

Figure nn.07
(a) A perceptron network with two inputs and two output units.
(b) A neural network with two inputs, one hidden layer of two units, and one output unit.
A network with all the inputs connected directly to the outputs is called a single-layer neural network, or a perceptron network. Figure nn.07 shows a simple two-input, two-output perceptron network. With such a network, we might hope to leam the two-bit adder function, for example. Here are all the train.ing data we will need:

The first thing to notice is that a perceptron network with m outputs is really m separate networks, because each weight affects only one of the outputs. Thus, there will be m separate training processes. Furthermore, depending on the type of activation function used, the training processes will be either the perceptron learning rule or gradient descent rule for the logistic regression.

Figure nn.08 Linear separability in threshold perceptrons. Black dots indicate a point in the input space where the value of the function is 1, and white dots indicate a point where the value is 0. The perceptron retums I on the region on the non-shaded side of the line. In (c), no such line exists that correctly classifies the inputs.

If you try either method on the two-bit-adder data, something interesting happens. Unit y₃ learns the carry function easily, but unit y4 completely fails to learn the sum function. No, unit y4 is not defective! The problem is with the sum function itself. This works fine for the carry function, which is a logical AND (see Figure nn.08 (a)). The sum function, however, is an XOR (exclusive OR) of the two inputs. As Figure nn.08(c) illustrates, this function is not linearly separable so the perceptron cannot learn it.
The linearly separable functions constitute just a small fraction of all Boolean functions. The inability of perceptrons to learn even such simple functions as XOR was a significant setback to the nascent neural network commtmity in the 1960s. Perceptrons are far from useless, however. Moreover, a perceptron can represent some quite "complex" Boolean functions very compactly. For example, the majority function, which outputs a 1 only if morethan half of its n inputs are 1, can be represented by a perceptron with each w_i = 1 and with w_o = - n/2. A decision tree would need exponentially many nodes to represent this function.

Figure nn.09 Comparing the performance of perceptrons and decision trees.
a. Perceptrons are bett er at learning the majority function of 1 1 inputs.
b. Decision trees are better at learning the Will Wait predicate in the restaurant example.

Figure nn.09 shows the learning curve for a perceptron on two different problems. On the left, we show the curve for leaming the majority function with 1 1 Boolean inputs (i.e., the function outputs a 1 if 6 or more inputs are 1 ). As we would expect, the perceptron leams the function quite quickly, because the majority function is linearly separable. On the other hand, the decision-tree learner makes no progress, because the majority ftmction is very hard (although not impossible) to represent as a decision tree. On the right, we have the restaurant example. The solution problem is easily represented as a decision tree, but is not linearly separable. The best plane through the data correctly classi ties only 65%.

Neural Networks : Single Layer Feed-Forward

www.CodeNirvana.in