CMSC 471 - Homework #6

Out 11/27/12, due 12/6/12

I. Decision tree learning (30 pts.)

The following table gives a data set for deciding whether to play or cancel a ball game, depending on the weather conditions.

Outlook	Temp (F)	Humidity (%)	Windy?	Class
sunny	75	70	true	Play
sunny	80	90	true	Don't Play
sunny	85	85	false	Don't Play
sunny	72	95	false	Don't Play
sunny	69	70	false	Play
overcast	72	90	true	Play
overcast	83	78	false	Play
overcast	64	65	true	Play
overcast	81	75	false	Play
rain	71	80	true	Don't Play
rain	65	70	true	Don't Play
rain	75	80	false	Play
rain	68	80	false	Play
rain	70	96	false	Play

Information Gain (10 pts.) At the root node for a decision tree in this domain, what would the information gain associated with a split on the Outlook attribute? What would it be for a split at the root on the Humidity attribute? (Use a threshold of 75 for humidity (i.e., assume a binary split: humidity <= 75 / humidity > 75.)
Gain Ratios (10 pts.) Again at the root node, what is the gain ratio associated with the Outlook attribute? What is the gain ratio for the Humidity attribute at the root (using the same threshold as in (a))?
Decision Tree (10 pts.) Suppose you build a decision tree that splits on the Outlook attribute at the root node. How many children nodes are there are at the first level of the decision tree? Which branches require a further split in order to create leaf nodes with instances belonging to a single class? For each of these branches, which attribute can you split on to complete the decision tree building process at the next level (i.e., so that at level 2, there are only leaf nodes)? Draw the resulting decision tree, showing the decisions (class predictions) at the leaves.

II. Learning Bayes Nets (25 pts.)

(a) Optimal Parameter Estimation (10 pts.) Given the data set above, give the conditional probability tables if the following variables are conditionally dependent: Temp is CD on Outlook, Humidity is CD on Outlook, Windy? is CD on Humidity, and Class is CD on Temp and Windy?. For this problem, discretize Temp into < 70 and ≥ 70, Humidity into < 75 and ≥ 75, and Outlook into (sunny) and (overcast or rain). (Therefore all your variables should be binary for this BN)

(b) Structure learning (15 pts.) Given the Bayes Net you just constructed, show one refinement operation you could make to the structure of the BN (add/remove/reverse a CD link) and show the change in the likelihood of the data given your new model. This will require you to update the CPT and compute the likelihood of the data given both the old and new model.

III. MDPs and RL (45 pts)

The diagram below shows a gridworld domain in which the agent starts at the upper left location. The upper and lower rows are both "one-way streets," since only the actions shown by arrows are available.

Actions that attempt to move the agent into a wall (the outer borders, or the thick black wall between all but the leftmost cell of the top and bottom rows) leave the agent in the same state it was in with probability 1, and have reward -2. If the agent tries to move to the right from the upper right or lower right locations, with probability 1, it is teleported to the far left end of the corresponding row, with reward as marked (-10 and +20, respectively). All other actions have the expected effect (move up, down, left, or right) with probability .9, and leave the agent in the same state it was in with probability .1. These actions all have reward -1, except for the transitions that are marked in the upper left and lower left cells. (Note that the marked transitions only give the indicated reward if the action succeeds in moving the agent in that direction.)

(a) MDP (10 pts) Give the MDP for this domain only for the state transitions starting from each of the states in the top row, by filling in a state-action-state transition table (showing only the state transitions with non-zero probability). You should refer to each state by its row and column index, so the upper left state is [1,1] and the lower right state is [2,4].

To get you started, here are the first few lines of the table:

State s Action a New state s' p(s'|s,a) r(s,a,s')
[1,1] Up [1,1] 1.0 -2
[1,1] Right [1,1] 0.1 -1
[1,1] Right [1,2] 0.9 +20

(b) Value function (20 pts) Suppose the agent follows a randomized policy π (where each available action in any given state has equal probability) and uses a discount factor of γ=.85. Given the partial value function (V^π; U^π in Russell & Norvig's terminology) shown below, fill in the missing V^π values. Show and explain your work.

(c) Policy (15 pts) Given the value function V^π computed in (b), what new policy π' would policy iteration produce at the next iteration? Show your answer as a diagram (arrows on the grid) or as a state-action table.

State s	Action a	New state s'	p(s'\|s,a)	r(s,a,s')
[1,1]	Up	[1,1]	1.0	-2
[1,1]	Right	[1,1]	0.1	-1
[1,1]	Right	[1,2]	0.9	+20