- To AND two decisions together, they are placed in series in the tree. The first part of Figure tr.02 illustrates a tree with two decisions, both of which need to be true in order for action 1 to be carried out. This tree has the logic "if A AND B, then carry out action 1, otherwise carry out action 2".
- To OR two decisions together, we also use the decisions in series, but with the two actions swapped over from the AND example above. The second part of Figure tr.02 illustrates this. If either test returns true, then action 1 is carried out. Only if neither test passes is action 2 run. This tree has the logic "if A OR B, then carry out action 1, otherwise carry out action 2".
Figure tr.02 Trees representing "AND" and "OR"
This ability for simple decision trees to build up any logical combination of tests is used in other decision making systems. In order to extend decision tree induction to a wider variety of problems, a number of issues must be addressed. We will briefly mention several, suggesting that a full understanding is best obtained by doing the associated exercises:
- Missing data: In many domains, not all the attribute values will be known for every example. The values might have gone unrecorded, or they might be too expensive to obtain. nus gives rise to two problems:
First, given a complete decision tree, how should one classify an example that is missing one of the test attributes?
Second, how should one modify the information-gain formula when some examples have unknown values for the attribute? - Multivalued attributes: When an attribute has many possible values, the information gain measure gives an inappropriate indication of the attribute's usefulness. In the extreme case, an attribute such as ExactTime has a different value for every example, which means each subset of examples is a singleton with a unique classification, and the information gain measure would have its highest value for this attribute. But choosing this split first is unlikely to yield the best tree. One solution is to use the gain ratio. Another possibility is to allow a Boolean test of the form A = Vk. That is, picking out just one of the possible values for an attribute, leaving the remaining values to possibly be tested later in the tree.
- Continuous and integer-valued input attributes: such as Height and Weight, have an infinite set of possible values. Rather than generate infinitely many branches, decision-tree learning algorithms typically find the split point that gives the highest information gain. For example, at a given node in the tree, it tnight be the case that testing on Weight >160 gives the most information.
Efficient methods exist for finding good split points: start by sorting the values of the attribute, and then consider only split points that are between two examples in sorted order that have different classifications, while keeping track of the running totals of positive and negative examples on each side of the split point. Splitting is the most expensive part of real-world decision tree learning applications. - Continuous-valued output attributes: If we are trying to predict a numerical output value, such as the price of an apartment, then we need a regression tree rather than a classification tree. A regression tree has at each leaf a linear function of some subset of numerical attributes, rather than a single value.
For example, the branch for twobedroom apartments might end with a linear ftmction of square footage, munber of bathrooms, and average income for the neighborhood. The learning algorithm must decide when to stop splitting and begin applying linear regression over the attributes.