Cohort Analysis with python
In simple terms, a cohort is just a group of various subjects that share a defining feature. We can then observe the behaviour of a cohort over time and then it can be compared to other cohorts. This article is a presentation of a cohort analysis with Python.
What is Cohort Analysis ?
A cohort represents a group of people or an area of study that in general share something in common within a specific period. For Eg- In business Problems, a cohort would represent the number of customers and users. For Example: the number of purchases made.
Cohorts analysis helps and makes it easy to analyze the behaviour of a user and the market trends without looking at every consumer data individually.
Why to use cohort analysis?
It is important for the growth of any kind of business, as it uses the information in a very specific and logically provided manner. The best part of cohort analysis is its algorithms that provides the company with assistance to answer some of the most targeted questions by examining the necessary and the most important data. The advantages of businesses by using Cohort are:
- It helps to understand the user behaviour that might the affect the business in various other ways
- It can also help to analyze the customer churn rate
- It helps in analyzing the most important aspect i.e. hitting the point where the customer engagement can be increased.
Types of cohorts
There are mainly three kinds of cohorts:
- Time Cohort
- Behaviour Cohort
- Size Cohort
Time cohorts are customers who have signed up for a product or service during a specific period. This sort of analysis can be helpful to show whether how the customers behave towards the company’s product or service.
Behavioural Cohorts are customers who’ve purchased a product or subscribed to service in the past. It has the ability to group all the customers according to the service to which they’ve subscribed. Every customer has their own different needs and the services chosen by them vary differently. Understanding the behaviour helps the customer to make segmented products and services depending on the customers needs.
Size cohorts is beneficial to understand the cohorts depending upon the size of the customers who purchase the products. This categorization supports in various acquisitions to maintain the spending by the firm over their products and that can create an accurate supply in the market.
Cohort analysis project in python
The following section is a self paced data science tutorial, starting with importing the necessary and required dataset:
Certain libraries are imported like pandas, numpy, datetime for data cleaning. Libraries like matplotlib and seaborn for visualization. Libraries like Scikit-learn for machine learning algorithms.
The dataset imported here is an Online Retail dataset and we use certain commands to import the dataset and then view the top 10 rows present in it.
We then move ahead to check the information of the dataset by using the .info() command.
We now need to check the null values and the missing values in the dataset.
Once that is done, we then need to clean the data and clean the duplicated values:
to note that the minimum for the unit price is zero and that for quantity is with a negative value.
Once the data cleaning is over, we then need to start running a cohort analysis, but there are some labels that needs to be created:
- Charging period: String portrayal of the year and month of a solitary exchange/receipt.
- Associate Group: A string portrayal of the year and month of a client’s first buy. This mark is normal to all solicitations for a specific client.
- Associate Period/Cohort Index: Full portrayal of a client’s stage in their “life expectancy”. The number addresses the quantity of months since the primary buy.
Above shown is a heatmap showing the customer retention as a useful metric to understand the number of active customers, wherein loyalty shows the percentage of active customers in comparison to the total number of customers.
I hope this article was insightful with the concepts of cohort analysis.
GitHub Link — https://github.com/advait27/cohort-analysis.git