Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

Inthe previous post, I have discussed how we can use one of the Survival Analysis techniques called ‘Survival Curve’ to analyze how the customer retention rates change over time with different cohorts.

Introduction to Survival Analysis Part 1― Survival Curve

The name ‘Survival Analysis’ sounds somewhat intimidating. Are we going to analyze someone who are on the edge of… blog.exploratory.io

In order to analyze the customer retention or churn even further, the next question would be:

“what makes the customers stop using the service (churn) or stay (retain)?”

For this, we can build a ‘Survival Model’ by using an algorithm called Cox Regression or also known as Proportional Hazard Model.

The previous Retention Analysis with Survival Curve focuses on the time to event (Churn), but analysis with Survival Model focuses on the relationship between the time to event and the variables (e.g. age, country, operating system, etc.).

Let’s take a look step by step.

Build SurvivalModel

Since we want to keep the previous Retention Cohort Analysis result, instead of starting from scratch, we can create a branch at the step before ‘Group By’ so that we can start from the step where we have the data we need.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

In the newly created branch, you can select ‘Build Survival Analysis Model (Cox Regression)’ from Add button menu.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

Similar to what we have done for calculating the survival curves before, we can set ‘ weeks_on_service ’ column to Survival Time, ‘ is_churned ’ column to Survival Status, and the columns we are interested in seeing their impacts on the ‘churn’ event for Predictors.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

In this case, we are interested in seeing how the operating system (OS) they use and the countries they live would impact on ‘churn’ event, we can select ‘country’ and ‘os’ columns here.

Once you hit Run button, then you will see the model being built and get the summary result like below.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)
Understand Model Summary Information Summary ofFit

Under Summary of Fit section, there are several metrics to evaluate the model.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

’Number of Events’ means the number of the customers who have churned in this scenario.

There are other metrics types for statistical tests.

Likelihood Ratio test Score (Logrank) test Wald test

P value for Likelihood Test (Likelihood Ratio Test P Value) shows less than 0.05, so this model can reject ‘null hypothesis’ of “any of the parameters listed below have no impact on the outcome” at 95% confidence level.

Parameter Estimates

This is where we can find the answers for “what makes the customers leave or stay?”


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

Under Parameter Estimates section, positive Estimate values mean that they would impact on the event, in this case that is ‘churn’, negatively. So these are the parameters that could potentially make the customers churn more . And the negative values mean the opposite, that is ‘not churn’. This means, countries like ‘Iceland’, ‘Georgia’, ‘Croatia’ could make customers churn less .

When we click on ‘Estimate’ column header to sort in a descending order, we can see countries like ‘El Salvador’, ‘Uruguay’, ‘Nepal’ would make them churn less.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

However, we need to be careful and think about how much we can be confident with these numbers. For that, we can check their P values to see if those impacts are ‘statistically significant’ or not. Typically, we need these numbers to be less than 0.05 to be ‘statistically significant’ or 95% confidence so that we can reject the null hypothesis of a given parameter not having any impact on the outcome. We can click on P Value column header to sort from the lowest.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

With this information, we can now say that countries like ‘United Kingdom’, ‘Japan’ are making customers less churn while countries like ‘Germany’ and ‘Croatia’ are making them churn more.

We can also take a look at ‘ Confidence Interval ’ ― Conf Low and Conf High. If you see the range contains 0 (or crosses 0), that means that it can impact in either a positive or negative way. Basically the model is saying that it can’t make up its mind given the data it has!;)

To understand this better, we can extract this Parameter Estimates information into a data frame by selecting ‘Extract Parameter Estimates’ from ‘Add’ button menu.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

This will get you a data frame with Parameter Estimates information.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

And, we can quickly go to Viz view and visualize this data.

First, we can use Scatter and assign ‘Term’ to X-Axis and ‘Estimate’ to Y-Axis.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

Next, we can assign the confidence interval columns by checking ‘Show Range’ check box inside Range menu dialog.


Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

The labels at X-Axis are hard to read because every value has ‘country’ or ‘os’ at the beginning. We can remove this by using ‘ str_replace ’ function from ‘stringr’ package, inside ‘mutate’ command like below.

mutate(term = str_replace(term, "country", ""))
Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

As you see some countries have wide ranges of the confidence intervals. As we saw before, some of the variables (countries and os) have large P values, so we can filter out some of them by using filter command like below.

filter(p_value < 0.05)
Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)

As you can see, these variables’ confidence intervals vary but their ranges stay only at one side, either positive or negative. With this view, we can say that countries like India, Japan, or operating system like Mac decrease the risk of churn while countries like Germany, Croatia, Phillippines, increase the risk.

But, how should we interpret these numbers of ‘estimate’ column? Also, how come we are not seeing ‘windows’ in this output? For this, we will need to take a look at something called Hazard Ratio.

Hazard Ratio You can find ‘Hazard Ratio’ column at the end of Parameter Estimate table in the summary view. It is really a result of exponentiating the estimate values (log likelihood). Hazard Ratio values should b

本文系统(windows)相关术语:三级网络技术 计算机三级网络技术 网络技术基础 计算机网络技术

主题: Windows
分页:12
转载请注明
本文标题:Introduction to Survival Analysis Part 2 ― Survival Model (Cox Regression)
本站链接:http://www.codesec.net/view/531578.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 系统(windows) | 评论(0) | 阅读(28)