机器学习入门(六)——评价分类结果

   日期:2020-10-05     浏览:145    评论:0    
核心提示:目录

目录

  • 一.混淆矩阵
  • 二.精准率和召回率
  • 三.Precision-Recall的平衡
  • 四.ROC曲线
  • 五.多分类问题中的混淆矩阵

一.混淆矩阵


二.精准率和召回率



上图说明只看准确率是远远不够的。

测试数据:

from sklearn import datasets
from sklearn.model_selection import train_test_split

digits = datasets.load_digits()
X = digits.data
y = digits.target.copy()
 
y[digits.target==9] = 1
y[digits.target!=9] = 0

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)

逻辑回归预测:

from sklearn.linear_model import LogisticRegression
 
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
log_reg.score(X_test, y_test)           #0.9755555555555555
y_predict = log_reg.predict(X_test)

三.Precision-Recall的平衡

左边为0,右边为1,五角星为1,圆圈为0

精准率增大时召回率降低

精准率高,则对特别有把握时才预测对,则以前本该算预测对的,变成不对,召回率就越低

召回率高,降低判断, 10%的概率也说有病

自定义曲线:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
 
digits = datasets.load_digits()
X = digits.data
y = digits.target.copy()
 
y[digits.target==9] = 1
y[digits.target!=9] = 0
 
from sklearn.model_selection import train_test_split
 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)
 
from sklearn.linear_model import LogisticRegression
 
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
decision_scores = log_reg.decision_function(X_test)
 
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
 
precisions = []
recalls = []
thresholds = np.arange(np.min(decision_scores), np.max(decision_scores), 0.1)
for threshold in thresholds:
    y_predict = np.array(decision_scores >= threshold, dtype='int')
    precisions.append(precision_score(y_test, y_predict))
    recalls.append(recall_score(y_test, y_predict))

使用sklearn中的包:

from sklearn.metrics import precision_recall_curve
precisions,recalls,thresholds = precision_recall_curve(y_test,decision_scores)



横轴是P,纵轴是R

PR曲线靠外或xy轴的面积大则对应的模型好

四.ROC曲线




五.多分类问题中的混淆矩阵

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
 
digits = datasets.load_digits()
X = digits.data
y = digits.target
 
from sklearn.model_selection import train_test_split
 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=666)
 
from sklearn.linear_model import LogisticRegression
 
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
log_reg.score(X_test, y_test)

y_predict = log_reg.predict(X_test)

from sklearn.metrics import precision_score
 
precision_score(y_test, y_predict,average='micro')


 
打赏
 本文转载自:网络 
所有权利归属于原作者,如文章来源标示错误或侵犯了您的权利请联系微信13520258486
更多>最近资讯中心
更多>最新资讯中心
0相关评论

推荐图文
推荐资讯中心
点击排行
最新信息
新手指南
采购商服务
供应商服务
交易安全
关注我们
手机网站:
新浪微博:
微信关注:

13520258486

周一至周五 9:00-18:00
(其他时间联系在线客服)

24小时在线客服