Marwah Faraj

Table of Content

Introduction: Problem
Data: Description
EDA: Understand And Analyze The Data
Hypothesis Tests
Data Preprocessing:

Feature Engineering
LOG Transformation.
Imbalance class solution

Model Development:

Model selection:

Baseline Models
Advanced Exploration

Performance

Evaluation Metrics

Recommendations and Conclusion
Tools

Introduction

In today's competitive subscription-based business landscape, understanding the impact of pricing strategies on customer retention is crucial for maintaining a loyal customer base and optimizing revenue. This project aims to analyze the effects of a second planned price increase on a subscription service, following an initial price hike. By examining historical data and customer behavior, we will assess how previous and upcoming price changes influence churn rates and customer lifetime value (CLTV). Our goal is to provide actionable insights and recommendations to inform future pricing strategies and improve customer retention.

Data

The telecoms churn dataset utilized for this analysis comprises comprehensive customer information from a subscription-based service. It includes various attributes such as customer demographics, subscription details, service usage, and financial metrics.

Key columns in the dataset are:

Demographic Information:

CustomerID
Gender
Senior Citizen
Partner
Dependents

Subscription Details:

Tenure Months
Contract
Paperless Billing
Payment Method

Service Usage:

Phone Service
Multiple Lines
Internet Service
Online Security
Online Backup
Device Protection
Tech Support
Streaming TV
Streaming Movies

Financial Metrics:

Monthly Charges
Total Charges
CLTV (Customer Lifetime Value)

Price Increase Indicators:

Price_Increase_Oct2023
Price_Increase_Jul2024

Churn-Related Data:

Churn Label
Churn Value
Churn Score
Churn Reason

Two crucial columns were added to the date, Price_Increase_Oct2023 and Price_Increase_Jul2024, which indicate customer status before and after the respective price increase periods. These columns are essential for evaluating the impact of the price changes on customer retention. This rich dataset allows for a thorough analysis of the factors influencing customer churn and the effects of pricing adjustments on retention and revenue.

EDA: Understand And Analyze The Data

a. Impact of Price Increases on Customer Churn Rate

After the first price increase in October 2023, the churn rate rose significantly from 20% to 35%, indicating a substantial impact on customer retention. The second price increase in July 2024 further elevated the churn rate from 30% to 45%. These observations underscore the sensitivity of customers to pricing changes, with each price hike leading to a notable increase in churn rates.

b. Correlation Matrix

This heatmap highlights the strength and direction of relationships between pairs of variables, to understand the relationships between different numeric variables in the dataset. Strong correlations can help identify key drivers of churn and other important metrics, guiding further analysis and model building.

Positive correlation recorded between Tenure_After_Frirst_Increase

and Tenure_After_Second_Increase

indicating that customers who remained longer after the first price increase are likely to stay longer after the second increase as well.

c. Numeric Features Distribution

Monthly Charges: The distribution of monthly charges shows the variability in the amount customers are billed each month. This can help identify any common pricing tiers and understand the spread of charges among the customer base.
Total Charges: The total charges distribution provides insights into the cumulative amount billed to customers over their subscription period. This helps in understanding the overall expenditure patterns of customers and identifying high-value customers.
Tenure Months: The tenure distribution reveals the duration for which customers have stayed subscribed to the service. This information is crucial for analyzing customer loyalty and the average subscription length.

d. Churn Rate by Contract Type

The graph illustrates the churn rate by contract type, providing a visual comparison of churn rates across different contract durations. It clearly differentiates between customers who churned and those who did not. The higher churn rate recorded when the contract type is month to month, and the lowest recorded when the contract type is 2 year.

e. CLTV (Customer Lifetime Value) by Churn Status

the median CLTV for customers who did not churn observed significantly higher than for those who did. Specifically, non-churned customers have a median CLTV that is approximately 35% higher than that of churned customers. This suggests that higher customer lifetime value is strongly associated with better retention rates.

f. Average CLTV by Tenure Range

The visualization reveals that customers with longer tenures tend to have higher CLTVs. For instance, customers who have been subscribed for 48-60 months have a mean CLTV approximately 60% higher than those subscribed for 0-12 months. This indicates that the longer a customer remains subscribed, the more valuable they become to the company. These insights can help in developing targeted strategies to enhance customer retention and maximize lifetime value.

Hypothesis Tests

Hypothesis 1: Impact of First Price Increase on Churn Rate

Null Hypothesis (H0): The first price increase in October 2023 does not affect the churn rate.

Alternative Hypothesis (H1): The first price increase in October 2023 affects the churn rate.

Conclusion: The p-value for the hypothesis testing related to the first price increase is 0.02, which is less than the chosen significance level (alpha = 0.01). Therefore, we reject the null hypothesis (H0). This suggests that there is significant evidence to support the idea that the first price increase in October 2023 affects the churn rate.

Chi-Square Statistic: 7.361394650867041

P-Value: 0.006663908329690478

Reject the null hypothesis: The first price increase affects the churn rate.

Hypothesis 2: Impact of Second Price Increase on Churn Rate

Null Hypothesis (H0): The second price increase in July 2024 does not affect the churn rate.

Alternative Hypothesis (H1): The second price increase in July 2024 affects the churn rate.

Conclusion: The p-value for the hypothesis testing related to the second price increase is 0.15, which is greater than the chosen significance level (alpha = 0.05). Therefore, we fail to reject the null hypothesis (H0). This indicates that there is no significant evidence to support the idea that the second price increase in July 2024 affects the churn rate.

Chi-Square Statistic: 7.36139465086704

P-Value: 0.1526390832969048

Reject the null hypothesis: The second price increase affects the churn rate.

Hypothesis 3: Difference in CLTV Before and After First Price Increase

Null Hypothesis (H0): There is no difference in the mean CLTV of customers before and after the first price increase in October 2023.

Alternative Hypothesis (H1): There is a difference in the mean CLTV of customers before and after the first price increase in October 2023.

Conclusion: The p-value for the hypothesis testing related to CLTV before and after the first price increase is 0.02, which is less than the chosen significance level (alpha = 0.05). Therefore, we reject the null hypothesis (H0). This suggests that there is significant evidence to support the idea that there is no difference in the mean CLTV of customers before and after the first price increase in October 2023.

T-Statistic: -0.6535336144490119

P-Value: 0.022618259996623

Reject the null hypothesis: There is a difference in the mean CLTV before and after the first price increase.

Data Preprocessing:

Feature Engineering
LOG Transformation
Imbalance class solution

a. Feature Engineering:

In the feature engineering phase of the project, two new columns were added, Price_Increase_Oct2023 and Price_Increase_Jul2024, to capture the impact of the respective price increases on customer behavior.

b. Log Transformation:

Additionally, to address skewness and reduce the impact of outliers, a log transformation was applied to the Monthly Charges, Total Charges, and CLTV columns. This transformation helped normalize the distributions of these features, making them more suitable for analysis and modeling.

c. Imbalance class solution

In our dataset, we initially observed a significant class imbalance in the churn data, with 3,649 non-churned customers and only 1,281 churned customers. To address this imbalance, we applied two different techniques: SMOTE (Synthetic Minority Over-sampling Technique) and class weighting.

SMOTE: SMOTE was used to resample the training data, creating synthetic samples for the minority class. This balanced the dataset, resulting in 3,649 churned and 3,649 non-churned customers.
Class Weighting: Alternatively, we adjusted the class weights in the models to give more importance to the minority class during training, without altering the original data distribution.

Model Development

In this case study, I explored various machine learning algorithms to predict customer churn. The goal was to identify which customers are likely to discontinue their subscriptions following a price increase. The algorithms evaluated included Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machine (SVM), and a Neural Network.

Algorithms:

Logistic Regression: A linear model that estimates the probability of a binary outcome, useful for its simplicity and interpretability.
Random Forest: An ensemble learning method that builds multiple decision trees and merges them to get a more accurate and stable prediction.
Gradient Boosting: An ensemble technique that builds models sequentially, each new model correcting errors made by the previous ones.
Support Vector Machine (SVM): A powerful classification algorithm that finds the hyperplane that best separates the data into classes.
Neural Network: A deep learning model that can capture complex patterns in the data through multiple layers of neurons.

Techniques:

Randomized Search: This is a technique for hyperparameter tuning where a random subset of hyperparameters is selected and evaluated. This is useful when you have a large hyperparameter space and want to efficiently explore it without testing every possible combination.

Cross-Validation: This involves splitting the data into multiple folds and training the model on some folds while validating it on the remaining fold(s). This helps in assessing the model's performance and ensuring it is not overfitting to the training data.

Metrics to Consider

Precision: Precision measures the proportion of true positive predictions among all positive predictions. In the context of churn prediction, it tells us how many of the customers we predicted would churn actually did churn. High precision is important when the cost of acting on a false positive (e.g., offering a retention incentive to a customer who wouldn't have churned) is high.
Recall: Recall (or Sensitivity) measures the proportion of true positive predictions among all actual positives. In the context of churn prediction, it indicates how many of the customers who actually churned were correctly identified by the model. High recall is important when the cost of missing a true positive (e.g., losing a customer who could have been retained with an intervention) is high.
F1 Score: The F1 score is the harmonic mean of precision and recall. It provides a balance between precision and recall and is particularly useful when you need to find an optimal balance between the two.

Why F1 Score Might Be Preferred

Imbalanced Data: Given that the data is imbalanced, precision and recall provide more meaningful insights than accuracy. The F1 score, being a combination of both, helps in balancing the trade-off between precision and recall.
Balanced Evaluation: The F1 score ensures that neither precision nor recall is overwhelmingly favored, making it a balanced metric that takes both false positives and false negatives into account.
Business Context: In a business context, both false positives and false negatives have costs. For example, a false positive might mean offering a discount unnecessarily, while a false negative might mean losing a customer who could have been retained. The F1 score helps in managing these trade-offs effectively.

Model Performance

Best Model: Gradient Descent Classifier

Best Model Hyperparameter Tuning:

Best Model: Gradient Descent Classifier

Recommendations

Price Sensitivity: By Examine the coefficients for Price_Increase_Jul2024, and it is significantly positive, which indicates that price increases are strongly associated with higher churn rates.

Customer Segmentation: Identify segments of customers who are less sensitive to price changes and consider focusing price increases on these segments.

Alternatives to Price Increase: Explore alternatives such as offering more value-added services or improving customer engagement to justify the price increase.

High CLTV Customers: Focus retention strategies on customers with high CLTV to maximize long-term revenue.

Churn Prediction: Use CLTV as a significant feature in churn prediction models to identify at-risk high-value customers.

Pros of the Recommendation:

1. Targeted Marketing: By segmenting users based on demographic data, the company can tailor marketing campaigns to specific user groups, improving engagement and conversion rates.

2. Personalized Offers: Understanding the behavior of different segments allows for personalized subscription offers, which can enhance customer satisfaction and loyalty.

3. Churn Reduction: Identifying segments with high churn rates enables targeted retention strategies, potentially reducing overall churn.

4. Revenue Optimization: By analyzing the impact of fees, the company can find the optimal price point for different segments, maximizing revenue while maintaining customer satisfaction.

5. Resource Allocation: Focusing on segments with higher profitability can help allocate resources more effectively, enhancing operational efficiency.

Cons of the Recommendation:

1. Data Privacy Concerns: Collecting and segmenting user data raises privacy concerns and requires strict compliance with data protection regulations.

2. Complexity in Implementation: Segmenting users and creating personalized offers can be complex and resource-intensive, requiring advanced data analytics capabilities.

3. Risk of Over-Segmentation: Over-segmenting the market may lead to small, unprofitable segments, diluting the impact of marketing efforts.

4. Dynamic Market Conditions: User behavior and market conditions can change rapidly, making it challenging to maintain accurate and relevant segments over time.

5. Potential Bias: Relying heavily on demographic data may introduce biases, potentially overlooking other important factors influencing user behavior.

Assessing the Sustainability of the Model

To determine whether the user segmentation and personalized marketing strategy is a sustainable model moving forward, we need to consider several factors. These factors include customer retention, revenue growth, cost of implementation, market adaptability, and long-term customer satisfaction. Here’s how we can assess the sustainability:

1-Customer Retention and Churn Analysis:

- Continuously monitor churn rates across different segments to ensure that retention strategies are effective.

- Use predictive modeling to forecast future churn and take proactive measures to mitigate it.

2-Revenue Growth and Profitability:

- Track revenue growth from different segments to ensure that personalized pricing strategies are driving profitability.

- Analyze the lifetime value (LTV) of customers in each segment to understand their long-term contribution to revenue.

3-Cost of Implementation:

- Evaluate the costs associated with implementing and maintaining segmentation and personalization strategies, including data analytics tools, marketing campaigns, and human resources.

- Compare these costs with the revenue generated to ensure a positive return on investment (ROI).

4-Market Adaptability:

- Assess the flexibility of the segmentation model to adapt to changes in market conditions and user behavior.

- Regularly update the segmentation criteria and personalization strategies based on the latest data and trends.

5-Customer Satisfaction and Feedback:

- Collect and analyze customer feedback to gauge satisfaction levels with personalized offers and pricing.

- Use surveys, reviews, and net promoter scores (NPS) to measure customer satisfaction and loyalty.

6-Long-term Trends and Competitive Analysis:

- Monitor long-term trends in the market and compare your strategies with competitors to stay ahead.

- Analyze industry reports and market research to identify potential threats and opportunities.

Conclusion:

Based on our analysis and hypothesis testing, it is evident that a price increase in July 2024 is likely to result in a higher churn rate. Therefore, implementing a price increase in the short term is not recommended. Instead, it is advisable to consider a larger percentage price increase after a longer period. This approach will help mitigate the risk of immediate customer churn while allowing more time to enhance the perceived value and strengthen customer relationships.