# Datascience

• This blog post was inspired by a conversation that I had recently with some people who were interested in hiring a data scientist. This is a company of very switched on people, but they’ve never hired a data scientist before. So there needed to be a lot of communication. A lot. Of communication. Here’s the a list of the ‘soft skill’ lessons that I learned from these conversations: Always trust that you will come to a mutual understanding, it will just take time Assume everyone is equally intelligent, but also assume everyone has been given a different definition of every technical term that is mentioned Assume everyone is using the wrong word for everything all the time Don’t waste time being frustrated; take every agreement as a victory and keep moving forward People absorb ideas and explanations better in writing - listening and thinking at the same time is hard You and the client are probably using the same words to describe different things and confusing each other; ask questions to clarify If people aren’t understanding you, you need to find a better way to explain yourself Everything will always be much clearer tomorrow But on top of the regular… ‘challenges’ of communication, a much more specific misunderstanding happened that I thought was worth writing about.

• So we have seen the importance of retention rates, and how they affect expected CLV. So the next question is: how do we estimate the retention rates? So far we’ve been assuming that $$r$$ is already known. Warning: This gets real math, real fast, but it has a nice elegant result at the end. Also remember that we’re still operating with the assumptions that customers pay for a subscription, and if they stop paying (i.

• Note: For this blog I’ve tried adding in R function definitions as I go. If you find them useful, let me know. If they’re ugly then also let me know and I’ll just provide links to a repo with the functions in it. Probabilistic CLV Model Using the mighty power of Probability Theory we can now formulate a (relatively simple) probabilistic model for CLV. Let $$T$$ = random variable indicating cancellation time $$r$$ = retention rate RECALL: The assumptions for an SRM are still being applied here.

• The simple retention model (SRM) are applicable in situations where a customer enters into a contract. The contract specifies that the customer must make regular payments at equal time intervals (most commonly monthly). The number of payments to be made is also set at the start of the contract (e.g. 12 monthly payments in a year long contract). Each payment must be of the same amount. In this model, customers are not allowed to cancel the contract before it expires.

• In the past, companies were not particularly focused on considering their customers as human beings with emotions and the free will to make choices. Instead they were just viewed as a series of transactions: discrete-time cash flow events that either generated a profit or a loss for the company. So the focus of a business was just to decrease the cost to serve these customers and increase the profit margins of their products.