CMSC 671 Homework #6

Out 11/15/01, due 12/6/01 
(note later deadline than previously announced)

This is it -- the last homework! Can you believe it??

1. Probability theory (15 pts.)

(R&N 14.5) Using the basic laws of probability theory (the three axioms of probability, the definition of conditional probability, the product rule, and/or Bayes' rule), prove the following theorems:

(a) (7 pts.) Prove the conditionalized version of the general product rule:

P(A, B | E) = P(A | B, E) P(B | E)
(b) (8 pts.) Prove the conditionalized version of Bayes' rule:
P(A | B, C) = P(B | A, C) P(A | C) / P(B | C)

2. Bayesian networks (45 pts.)

(Adapted from R&N 15.2) In the local nuclear power station, there is an alarm that senses when a temperature gauge exceeds a given threshold. The gauge measures the core temperature. Consider the Boolean variables A (alarm sounds), FA (alarm is faulty), and FG (gauge is faulty), and the discrete-valued nodes G (gauge reading) and T (actual core temperature).

In parts (c) through (f), you should simplify the probability equations by using the following shorthand:
     T = temperature is high         ~T = temperature is not high (normal)
     G = gauge G reads high       ~G = gauge G reads normal
     A = alarm goes off                ~A = alarm doesn't go off
     FG = gauge G is faulty         ~FG = gauge G is working
     FA = alarm is faulty              ~FA = alarm is working
 
(a) (5 pts.) Draw a belief net for this domain, given that the gauge is more likely to fail when the core temperature gets too high.

(b) (2 pts.) Is your network a polytree? Why or why not?

(c) (4 pts.) Suppose that there are just two possible actual and measured temperatures, Normal and High, and that the gauge gives the incorrect temperature x% of the time when it is working, but y% of the time when it is faulty. Give the conditional probability table associated with G. Please specify your CPT in the following form:
 
P (G | T, FG) T = Normal T = Normal T = High T = High
FG ~FG FG ~FG
~G
G

(d) (4 pts.) Suppose the alarm works unless it is faulty, in which case it never goes off. Give the conditional probability table associated with A. Use the same CPT table format, this time for P (A | G, FA).

(e) (25 pts.) Suppose the alarm and gauge are working, and the alarm sounds. Calculate the probability that the core temperature is too high. For this problem, you will also need to know that P(T) = p, P (FG | T) = g, and P (FG | ~T) = h.  Hint: Because FA and A are conditionally independent of T given G,
       P(T | FG, G, A, FA) = P(T | FG, G).
Now go back and look carefully at the behavior of the alarm, given the gauge, and think about what G must be.  Next, start applying Bayes' Rule and/or the definition of conditional probability to move the terms around until you arrive at known terms (e.g., probabilities that are expressed explicitly in the CPTs).
 

(f) (5 pts.) Suppose we add a second temperature gauge H, connected so that the alarm goes off (if it's working) when either gauge reads high. Where do H and FH (the event of H failing) go in the network? What is the new CPT associated with A?

3. Decision tree learning (40 pts.)

The following table gives a data set for deciding whether to play or cancel a ball game, depending on the weather conditions.
 
Outlook Temp (F) Humidity (%) Windy? Class
sunny 75 70 true Play
sunny 80 90 true Don't Play
sunny 85 85 false Don't Play
sunny 72 95 false Don't Play
sunny 69 70 false Play
overcast 72 90 true Play
overcast 83 78 false Play
overcast 64 65 true Play
overcast 81 75 false Play
rain 71 80 true Don't Play
rain 65 70 true Don't Play
rain 75 80 false Play
rain 68 80 false Play
rain 70 96 false Play

(a) (15 pts.) At the root node for a deecision tree in this domain, what is the information gain associated with each attribute? (Use a threshold of 75 for humidity and temperature (i.e., humidity <= 75 / humidity > 75, temperature <= 75 / temperature > 75.)

(b) (15 pts.) Again at the root node, what is the gain ratio associated with each attribute (using the same thresholds as in (b))?

(c) (10 pts.) Suppose you build a decision tree that splits on the Outlook attribute at the root node. How many children nodes are there are at the first level of the decision tree? Which branches require a further split in order to create leaf nodes with instances belonging to a single class? For each of these branches, which attribute can you split on to complete the decision tree building process at the next level (i.e., so that at level 2, there are only leaf nodes)? Draw the resulting decision tree, showing the decisions (class predictions) at the leaves.