Note: For this blog I’ve tried adding in R function definitions as I go. If you find them useful, let me know. If they’re ugly then also let me know and I’ll just provide links to a repo with the functions in it.

Probabilistic CLV Model

Using the mighty power of Probability Theory we can now formulate a (relatively simple) probabilistic model for CLV.


  • \(T\) = random variable indicating cancellation time
  • \(r\) = retention rate

RECALL: The assumptions for an SRM are still being applied here. The retention rate (\(r\)) must be constant over time and across customers. Also the event that a customer cancels in period \(t\) is independent of the event that a customer cancels in any other period.

Under these assumptions, \(T\) has a geometric distribution, which has the following probability mass function (PMF):

\[f(t) = P(T = t) = r^{t-1}(1-r) \tag{3}\] The rest of the calculations in this section are based on the fact that under the current assumptions, time until cancellation (T) follows a geometric distribution.

“But what does this PMF mean for our model?”. Good question. Here are the answers:

  • Customers must be retained for \(t-1\) periods, then terminate their relationship with the company
  • Because termination is assumed to be independent, we can multiply retention probabilities such that:
    • \(r^{t-1}\) = probability of retaining customer for \(t-1\) periods
    • \((1-r)\) = probability of customer terminating relationship in time period \(t\)

So what can we do with this?

Assuming this distribution, let \(t\) represent a realisation of a termination time \(T\). So this PMF can be used to calculate the probability that a customer will terminate their relationship in a specific time period \(t\) (for a known retention rate \(r\)).

The following shows the probability that a customer will terminate their relationship in the 5th time period, given a retention rate of 0.8:

geom_dist_pmf <- function(r, t){
  r^(t-1) * (1-r)

geom_dist_pmf(0.8, 5)
## [1] 0.08192

So there is an 8.192% chance that the customer will make 4 payments, and then terminate their relationship in the 5th time period.

This is useful, but it may be of more interest to know a customers probability of continuing a relationship until a certain time period. This can also be thought of as the customer surviving until this time period.

Survival Function

There will be a lot of definitions here, and then an example that makes them much clearer. The survival function is defined as:

\[S(t) = P(T \ge t) = r^{t-1} \tag{4}\]

There are a few ways to interpret this equation:

  • The probability that a customer terminates at time \(t\) or later
  • The probability that a customer survives the first \(t-1\) time periods

We can also find quantiles of \(T\). Let \(P_\alpha\) represent the \(\alpha\) quantile of the distribution of \(T\). With this we can find the time period by which \(\alpha\) percent of the initial customers will have terminated their relationship. So in mathematical notation, we are finding the following probabilities:

\[ P(T ≤ P_\alpha) = \alpha \]

\[P(T ≥ P_\alpha) = 1 − \alpha\]

These probabilities can be found by solving:

\[S(t) = P(T \ge P_α) = r^{P_α - 1} = 1 - α \tag{5}\]

Which gives

\[P_α = 1 + \frac{log(1-α)}{log(r)} \tag{6}\]

Then the expected value of a geometric distribution can be found with:

\[E(T) = \frac{1}{1-r} \tag{7}\]

These equations can be understood more intuitively with an example.

EX 3:

For a retention rate of 80%, what is the probability of a customer terminating their relationship with the company in the 5th time period?

geom_dist_pmf(0.8, 5)
## [1] 0.08192

So this gives the probability that a customer will terminate their relationship at exactly \(t = 5\).

With this same retention rate (80%), what is the probability that a customer will still have a relationship with the company at time \(t = 5\) or later?

survival <- function(r, t){
  r ^ (t-1)

survival(0.8, 5)
## [1] 0.4096

This gives the probability that the customer still has a relationship with the company by \(t = 5\).

At what time period will 10% of the initial customers have terminated their relationship with the company? 50%? 90%?

survival_quantiles <- function(alpha, r){
  1 + (log(1-alpha) / log(r))

survival_quantiles(0.1, 0.8)
## [1] 1.472165
survival_quantiles(0.5, 0.8)
## [1] 4.106284
survival_quantiles(0.9, 0.8)
## [1] 11.31885

So 10% of customers will have terminated their relationship after \(1.47\) time periods, 50% will be gone be \(t = 4.11\), and 90% will be gone by \(t = 11.31\).

What is the expected (mean) time until termination for customers in a company with an 80% retention rate?

geom_expected <- function(r){
  1 / (1-r)

## [1] 5

So this company can expect 4 payments from each of their customers, and then they will terminate their relationship in the 5th time period.

What will be the expected time until termination if the retention rate is raised to 90%?

## [1] 10

If the retention rate were 90%, the company could expect 9 payments from each customer before they leave. This is 5 more payments than if the retention rate were 80%, so srsly just tell your business all they have to do is retain more customers.

This highlights an important use case of the survival function - understanding the effect of changing retention rates.

Then CLV is still calculated as the sum of present values of future cash flows. But the annuity formulas from previous sections cannot be used here. \(T\) is now considered a random variable, which means that it has a distribution. We will use the expected value of CLV as a summary statistic to understand this distribution. The formulas for expected CLV given below cover the two possible ways in which contract payments can be made:

  1. Payment occurs at the end of payment period

\[E(CLV) = \frac{m(1+d)}{1+d-r} \tag{8}\] 2. Payment occurs at the beginning of payment period

\[E(CLV) = \frac{mr}{1+d-r} \tag{9}\]

We will need these later, so I’ll define the R functions here:

# expected clv for geometric T, cash flow at end of period
expected_clv_ordinary <- function(r, m, d){
  m * r / (1 + d - r)

# expected clv for geometric T, cash flow at beginning of period
expected_clv_immediate <- function(r, m, d){
  m * (1 + d) / (1 + d - r)

EX 4:

Find the expected time until termination and expected CLV of a single customer assuming a monthly discount rate of 1%, monthly cash flows of $25 at the beginning of each month, and retention rates of 70%, 75%,… 95%, and 98%. Plot CLV against the retention rate.

  clv_ex <- tibble(r = c(0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.98))
  clv_ex <- clv_ex %>%
    mutate(`E(T)` = geom_expected(r),
           `E(CLV)` = expected_clv_immediate(r, 25, 0.01))
## # A tibble: 7 x 3
##       r `E(T)` `E(CLV)`
##   <dbl>  <dbl>    <dbl>
## 1  0.7    3.33     81.5
## 2  0.75   4        97.1
## 3  0.8    5.      120. 
## 4  0.85   6.67    158. 
## 5  0.9   10.      230. 
## 6  0.95  20.0     421. 
## 7  0.98  50.0     842.
  clv_ex %>%
    ggplot(aes(x = r, y = `E(CLV)`)) +
    geom_line() +
    labs(title = 'Expected CLV with Changing Retention Rates',
         x = 'Retention Rate (r)')

This plot shows the effects of an increasing retention rate. You can see that marginal increases have a much higher impact the closer the retention rate is to 100%. If you know the retention rate of your business, and can predict how much it will cost to increase that retention rate, this analysis above can quantify expected returns.

Here comes another example.

EX 5: (from Berger and Nasr, 1998, Case 1)

A company pays, on average, $50 per customer yearly on promotional expenses. The yearly retention rate is 75%. The yearly gross contribution per customer is expected to amount to $260. An appropriate annual discount rate is 20%. Find CLV, assuming that you have just acquired a customer, receive the first payment at time 0, and will incur marketing costs at the end of each year.

This has a bit of a funky solution. We’re going to use the expected number of customers in two ways. The first will be to calculate their expected payments to the company, and the second will be to calculate their expected cost. The costs will then be subtracted from the payments to give a profit contribution, and the values discounted into the future as per usual.

Luckily we’ve already defined some functions that make this super easy to calculate:

expected_clv_immediate(0.75, 260, 0.2) - expected_clv_ordinary(0.75, 50, 0.2)
## [1] 610

So the expected lifetime value of this customer that you’ve just acquired is $610.

So what’s next?

So far we’ve only considered cases where we already know the retention rate \(r\). The next blog will detail ways in which this retention rate can be calculated from transaction data.