Data Mining Exercise: December 2013

An Exercise on Mining Association Rules

An Insurance Company is planning to launch a new Insurance product in the market by packaging some of the add-ons it had been offering in its motor Insurance policies. It has listed out following five add-ons for this exercise.

a) Depreciation waiver benefit
b) Daily cash allowance while the car is under repair
c) Cover for personal luggage loss
d) On road assistance to car and car passengers
e) Medical Extension.

From the records, following number of incidences of various combinations of add-ons is found.

f(a) = 489, f(b) = 823, f(c) = 56, f(d) = 372, f(e) = 649, f(a,b) = 682, f(a,c) = 68,
f(a,d) = 532, f(a,e) = 89, f(b,c) = 312, f(b,d) = 279, f(b,e) = 24, f(c,d) = 52, f(c,e) = 10,
f(d,e) = 189, f(a,b,c) = 227, f(a,b,d) = 295, f(a,b,e) = 378, f(a,c,d) = 1, f(a,c,e) = 10,
f(a,d,e) = 512, f(b,c,d) = 2, f(b,c,e) = 13, f(b,d,e) = 187, f(c,d,e) = 10, f(a,b,c,d) = 0, f(a,c,d,e) = 5, f(b,c,d,e) = 2, f(a,b,d,e) = 198, f(a,b,c,e) = 4, f(a,b,c,d,e) = 85.

Here f(x,y,z) means number of policies in which add-ons x,y and z have been taken and remaining add-ons have not been taken.

You work on these data to find some association rules. The association rules will be in following form:

“A person opting for add-ons x,y is likely to opt for w as well”.

Let us call the first set i.e. x,y as current set and the second set w as associated set. There can be any number of add-ons from a,b,c,d,e in these sets. An add-on cannot be in both the sets.

Let there be a threshold value and a threshold probability for accepting an association as association rule. Threshold value is the minimum number of incidences for current set and the threshold probability is the minimum probability of someone taking the associated set add-ons if he has taken the current set add-ons.

For this exercise take the threshold support value as 2000 and the threshold probability as 0.4.

An Exercise on Clustering

Use the k-means algorithm to cluster the following 9 examples into 3 clusters:

A1=(2,10), A2=(2,6), A3=(8,5), A4=(4,7), A5=(7,5), A6=(6,3), A7=(1,2), A8=(3,10), A9=(6,4). Suppose that the initial seeds (centers of each cluster) are A1, A4 and A7.

An Exercsie on Classification

An Insurance organization wants to classify it’s customers based on following parameters:

Annual Income -- < Rs. 3 lakhs; Rs. 3 lakhs to < Rs. 8 lakhs; Rs. 8 lakhs to <Rs 20 lakhs; Rs. 20 lakhs and above.

Premium paid in current policies -- < Rs. 5 K; Rs. 5K to < 8 K; 8 K to < 15 K; 15 K and above.

Past Claim – Yes; No.

A customer is classified as an Excellent customer if his annual income is Rs. 3 lakhs or more, premium paid is Rs. 8 K or more and has no past claim.

A customer is classified as a Good customer if (i) he has past claim but his annual income is greater than equal to Rs. 8 lakhs and has paid premium of Rs. 15 K or above; (ii) he has no past claim and premium paid by him is Rs 8 K or above.

In all other cases, a customer is classified as a valuable customer.

Check the consistency of this system and suggest an algorithm to do such classification.

An Exercsie on Simulation

Probability that a health policy become a claim policy is equal to 0.3 if age of the policy holder is more than 50 years otherwise it is 0.2. This policy is designed for a particular lifestyle people where 20% of the persons are above 50 years of age.

Claim amount distribution is linear varying between Rs. 1000 to Rs. 25000 with 0.5 probability, between Rs. 25000 to Rs. 100000 with 0.3 probability and between Rs. 100000 to Rs. 150000 with 0.2 probability.

10% of the premium collected is accounted for administrative cost.

Simulate this for 1000 policies taking premium as Rs. 6000 per policy. Experiment by changing the claim distribution pattern and other variables in this model.

Monday, December 2, 2013

An Exercise on Mining Association Rules

An Exercise on Clustering

An Exercsie on Classification

An Exercsie on Simulation