添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
爽快的大熊猫  ·  【Python】PyCharm ...·  3 周前    · 
小猫猫  ·  Web自动化测试—— ...·  1 周前    · 
淡定的钥匙  ·  Troubleshoot login to ...·  2 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Getting "Perfect separation detected, results not available" while building the Logistic Regression model

Ask Question

As part of my assignment I am building logistic regression model but I am getting an error "Perfect separation detected, results not available" while building it.

**X_train :-**
      year     amt_spnt      rank
1   -1.723034   -0.418500   0.272727
2   0.716660    2.088507    -0.636364
3   1.174102    -0.558333   -1.545455
4   -0.503187   -1.297451   1.181818
5   1.326583    -0.628250   -1.545455
**y_train :-** 
1    0
2    1
3    1
4    0
5    1
Name: result, dtype: int64
**Logistic Model code:-** 
import statsmodels.api as sm
logm1 = sm.GLM(y_train,(sm.add_constant(X_train)), family = sm.families.Binomial())
logm1.fit().summary()
**Dataset before and after scaling**

This is a model setting issue, because of the perfect separation, your model can not converge. Perfect separation means there is one (or more) variable in your independent variables that can perfectly distinct dependent variable = 0 from dependent variable = 1. See the following example:

Y 0 0 0 0 0 0 1 1 1 1

X 1 2 3 4 4 4 5 6 7 8

If X <= 4, Y = 0

If X > 4, Y = 1

A short answer to your question is to find such variable in your independent variable and remove it from your model.

Thank you for your response. But I can't see such feature in my dataset. I have edited my question with full dataset (of 10 rows in total) before scaling and after scaling. Could you please help me is there any variable that is causing the issue? – Upendra Dama Apr 12, 2020 at 16:45 Hi, I simplified what is a perfect separation issue, your data does not seem to have the issue I describe above, but it is a quasi-complete separation issue which caused by a combination of independent variables. I did not often use 'statsmodel' for modeling, but I was trying to do the modeling in other software, and it turns out that "year" is the variable that causes the perfect separation issue. After I moved "year", the model did not converge either, and it is "rank" that still causes the perfect separation issue. – Neo Apr 12, 2020 at 19:40 In statistics, it is usually because your sample size is small and one or a combination of IVs can almost perfectly predict the DV. Usually, there are three ways for this issue: 1. increase sample size so that one or a combination of IVs are less likely to predict the DV; 2. delete the IVs that cause perfect separation, in this case, "year" and "rank"; 3. recode the IVs that cause perfect separation. It will be helpful if you can give a little bit background on how the DV and IVs are and how they are measured. – Neo Apr 12, 2020 at 19:40

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.