4 Churn prediction and retention strategy

In the last chapter, the choice of the variables that have the highest impact on the churn rate was discussed. In this chapter, we use those variables to predict whether a customer will decide not to renew their museum card in 2014. After determining the prediction model with the highest performance, we will then evaluate the best strategy on how to reach by phone and/or email those who are most likely to churn, given our budget.

4.1 Prediction models

The goal of the prediction model for this task is to predict the churn rate of the museum card holders in 2014 based on the their entrance history, demographic information and amount paid in 2013. Any prediction needs to be trained on a set of data called training set, but evaluated on new data the model did not see during training called test set. This is done to avoid overfitting, or in other words, to avoid the model to learn the training data by heart and not being able to generalize to unseen data.

We split the overall dataset into 60% training set and 40% test set. As table 4.1 shows, the proportion of churners is around 38%, which does not indicate an overly unbalanced dataset. As a consequence, no special treatment was deemed necessary to deal with the imbalance on the training side.

Table 4.1: Composition of test set in terms of churn
churn	n
0	20693
1	7978

For the choice of the models, we decided to compare the performance of some of the most widely used classification algorithms, namely logistic regression, whose inference analysis is contained in the previous chapter, conditional inference trees, random forest and C5.0 with boosting.

The performance of the models on the test set can be seen in Figure 4.1. The ROC curve is a plot of the true positive rate, or the proportion of churners correctly classified as such, against the false positive rate, or the proportion of non-churners incorrectly classified as churners. The closer the curve is to the top left corner, the better the model. The area under the curve (AUC) is a measure of the overall performance of the model, with a value of 1 indicating a perfect model and a value of 0.5 indicating a model that is no better than random guessing.

Figure 4.1: ROC curves of models

We can see that the logistic model has the best performance for most of the range, together with the conditional inference trees and C5.0 model with boosting. Table 4.2 summarises these results numerically in terms of precision, recall, AUC, F1 score and accuracy for each model.

Table 4.2: Performance of the models
model	precision	recall	auc	f1	accuracy
Logistic	0.7832142	0.9159136	0.7729627	0.8443821	0.7563392
Tree	0.7836626	0.9202629	0.7722978	0.8464872	0.7590946
C5Boost	0.7868784	0.9146088	0.7668507	0.8459493	0.7595829
RandomForest	0.7919042	0.8792345	0.7569467	0.8332875	0.7460849

Beyond the AUC which was already discussed, accuracy is the simplest measure of performance to look at. It is possible to see that the boosting model has the highest one, followed by the decision tree and the logistic regression. However, accuracy is not always the best measure of performance, especially when the dataset is a bit unbalanced. For this reason, the F1 score is also reported, which is the harmonic mean of precision, or the ratio of correctly predicted churn cases to the total number of positive predictions made by the model, and recall, or ratio of true positive predictions to the total number of actual positive cases. In this case, the conditional inference trees have the highest value.

For the task of this project though, the cost of a false negative is not the same as the cost of a false positive. Failing to identify a churning customer results in a loss of revenue far greater than the cost of contacting a customer who is not going to churn. Consequently, the recall is the most important metric to look at.

The conditional inference tree has the highest recall and this is the model that will be used for defining the retention strategy.

4.2 Retention strategy

The goal of the retention strategy is to define a set of rules to decide which customers to contact and how to contact them. The rules are based on the probability of churn predicted by the model, the cost of contacting a customer and the customer’s value to the museum association for the year 2014. Moreover, the association is bounded by a budget of 5,000€ to spend on contacting customers.

The cost of contacting a customer varies by method. The cost of sending an email is 0.15€, while the cost of calling a customer is 1€. The difference in cost is also reflected in the response rate, which is 15% for emails and 35% for phone calls.

For the value of a customer, the initial revenue is given by the amount paid by the customer for the subscription in 2013. This means that we are assuming that the customer will pay the same amount in 2014 and will be eligible for the same discounts. From this initial revenue it is then deducted the amount of money that the association had to pay to the museums (50% of the ordinary ticket price) every time the customer visited a museum in 2013. The resulting value is the net revenue for the association, projected to 2014, calculated for each customer.

4.2.1 Phone-only and email-only strategies

Initially, a simpler model was considered, where the customers were sorted in descending order by their probability of churn and the association would contact them only either by phone or by email.

The customers most likely to churn are contacted before the ones less likely to churn. If the contacted customer is a churner, then the association gets 35% or 15% the subscription revenue, depending on the response rate of phone calls vs emails, minus 50% of the admission tickets minus the cost of contacting the customer. If the contacted customer is not a churner, then the association will get zero revenue from the campaign, since the customer would have renewed their subscription anyway, but will still have to pay the cost of contacting the customer. The cumulative profit is calculated by summing the profit of each customer contacted in order. It is possible to see the two extreme cases in Figure 4.2.

As we move to the right of each curve, by contacting more and more people, the cumulative profit increases but customers are increasingly not churners. The association starts getting less and less revenue from them; this why both of the curves first flatten out and then start decreasing, as the cost of contacting customers is higher than the revenue generated by the churners. The phone-only strategy, due to its higher cost, has a steeper decrease than the email-only strategy.

The chart also shows the area under the budget constraint of 5,000€ in red. We can see that all the customers could be reached by email, but only 17% of them could be reached by phone. This area allows us to observe how, despite having a higher cumulative profit without a budget constraint, contacting predicted churners only by phone is not the best strategy in this case, since we can see that the maximum of cumulative profit of the email-only strategy is slightly higher than the maximum of the phone-only strategy at the budget constraint.

Figure 4.2: Phone-only and email-only cumulative profits with area under budget

Since the customers are sorted by their probability of churn, it is trivial to find the score cutoff that the association would use if it wanted to exhaust the budget. Tables 4.3 and 4.4 summarise the results, with the cutoffs highlighted in pink and the cumulative values of churners, cost, revenue and profit. With a cutoff of 45%-50%, the association would be able to contact between 4,371 and 5,500 customers by phone, generating a cumulative profit between 20,777€ and 23,522€. Meanwhile the email cutoff associated with the highest profit under the budget stands at 5%-10%, with a profit between 24,923€ and 24,993€. As stated before, this last one would in fact be the best strategy in this simpler case. However, this would not be an efficient strategy, since the budget would not be fully used.

Table 4.3: Phone-only strategy cutoffs
scoreBin	contacted	churners	cum_cost	cum_revenue	cum_profit
0.9-0.8	360	294	360	2758	2398
0.8-0.7	824	607	824	6303	5479
0.7-0.6	2114	1464	2114	13553	11439
0.6-0.5	4371	2721	4371	25148	20777
0.5-0.45	5500	3250	5500	29022	23522
0.45-0.4	6640	3759	6640	33802	27162
0.4-0.35	9730	4903	9730	44296	34566
0.35-0.3	11511	5495	11511	48878	37367
0.3-0.25	13143	5985	13143	53625	40482
0.25-0.2	16223	6694	16223	58221	41998
0.2-0.15	18333	7107	18333	61713	43380
0.15-0.1	22097	7617	22097	65887	43790
0.1-0.05	26072	7907	26072	67441	41369
0.05-0.0	28671	7978	28671	67577	38906

Table 4.4: Email-only strategy cutoffs
scoreBin	contacted	churners	cum_cost	cum_revenue	cum_profit
0.9-0.8	360	294	54	1182	1128
0.8-0.7	824	607	124	2701	2578
0.7-0.6	2114	1464	317	5809	5492
0.6-0.5	4371	2721	656	10778	10122
0.5-0.45	5500	3250	825	12438	11613
0.45-0.4	6640	3759	996	14487	13491
0.4-0.35	9730	4903	1460	18984	17525
0.35-0.3	11511	5495	1727	20948	19221
0.3-0.25	13143	5985	1971	22982	21011
0.25-0.2	16223	6694	2433	24952	22518
0.2-0.15	18333	7107	2750	26448	23698
0.15-0.1	22097	7617	3315	28237	24923
0.1-0.05	26072	7907	3911	28903	24993
0.05-0.0	28671	7978	4301	28962	24661

The limitations of these two extreme cases are evident. First of all, as it was just said, the strategy is not efficient in terms of budget. Finally, the customers are sorted by their probability of churn, while weighting the probability of churn by the net value of the customer would be more appropriate and a combination of phone calls and emails possibly more rewarding. This is why a more sophisticated strategy was devised.

4.3 The combined strategy

The more optimal strategy that we designed and advise the association to follow is the following.

First, the customers are now sorted by the product of their probability of churn and their net value. This means that the customers who are most likely to churn and are most valuable to the association are contacted first. The customers who do not bring any value are the last ones to be contacted, if the budget allows it.

Second, the association would use a combination of phone calls and emails to reach the customers more likely to leave its service. From an economic point of view, we think that starting with phone calls for the most valuable customers and then switching to emails for the less valuable ones is the best foundation for this strategy, with still allows the possibility to contact all the customers by email or also a large part of them if it makes financial sense.

Defining a discretionary cutoff for the phone calls and emails, and for the amount of people to contact, was not deemed appropriate. The reason is that this would not be based on data, but on the intuition of the analyst, and could not be applicable to new datasets of the future years. Relying on data-driven decision-making is crucial for creating a sustainable strategy that can be adapted to changing circumstances over time. It ensures that the approach remains objective and relevant as customer behavior and business conditions evolve.

This is why a computation approach was chosen instead, with the development of an algorythm. What the algorythm does is try for each share of contacted customers the best combinations of phone and email percentages that maximize the cumulative profit under the budget constraint. The five top results are shown in Table 4.5.

Table 4.5: Top 5 optimal combinations
perc_contacted	phone_perc	email_perc	cum_cost	cum_profit
0.41	0.19	0.81	4987.471	31039.71
0.34	0.24	0.76	4975.438	31009.04
0.30	0.27	0.73	4962.230	30960.86
0.37	0.22	0.78	4956.359	30952.97
0.38	0.21	0.79	4968.829	30941.49

First of all, we can see that no optimal combinations has either all phone calls or all emails. Instead, it is normally used a combination of the two, with the phone percentage being around 20% and the rest being emails.

The optimal one allows the association to contact 41% of the customers, with 19% of them contacted by phone and 81% by email. The cumulative profit is 31,039€ which around 6,000€ more than the email-only strategy and 7,000€ more than the phone-only strategy.

An important thing to note is that the budget is not perfectly exhausted with a cumulative cost of 4,987€ compared to the budget constraint of 5,000€. However, this difference is not attributable to a particular “choice” of the algorithm but to the fact that the iterations of algorithm are over the percentiles and thus with every move we are adding many more customers. The algorithm stops in that position because if it were to add more it would go over the budget constraint. So it is more of a error in precision than an actual choice in not using the whole budget. Instead, it is clearly seen how the algorithm tries to use as much as the budget as possible.

Figure 4.3: Optimal strategy

Figure 4.3 shows the curve of the cumulative profit of the optimal strategy. On it, we can clearly see, depicted with different colors, the phone, email and out-of-budget segments of the curve.

With this important findings, it is possible to state that a combination of the two contact campaigns, phone and email, is better than the extreme cases. Moreover, once more, these findings allow us to see how the association can only contact the customers which are more likely to churn and are more valuable, while discarding the ones which are less likely to churn and are less valuable. This is a very important aspect of the strategy, since it allows the association to save money and time, while still maximizing the profit.