如何在python中查看correlation方法的完整输出

How to see the full output of the correlation method in python

提问人:darklord84 提问时间:4/14/2021 更新时间:4/14/2021 访问量:586

问:

我正在尝试找到蘑菇数据集中所有列之间的相关性。但是当我在列上运行相关方法时,我得到了一些相关值,但对于许多列,这些值被“...”隐藏。如何查看这些值。


import pandas as pd
import numpy as np
from sklearn import preprocessing

df = pd.read_csv("mushrooms.csv")
print(df.head())

le = preprocessing.LabelEncoder()
for col in df.columns:
    df[col] = le.fit_transform(df[col])
df.head()

correlation_df = df.corr()
print(correlation_df)

output-- 如果在 cap-color 之后看到列数据用 ... 表示。大约有 23 列,但我只能看到大约 8 列的相关数据

class                     1.000000   0.052951     0.178446  -0.031384  ...  -0.411771           0.171961    0.298686  0.217179
cap-shape                 0.052951   1.000000    -0.050454  -0.048203  ...  -0.025457          -0.073416    0.063413 -0.042221
cap-surface               0.178446  -0.050454     1.000000  -0.019402  ...  -0.106407           0.230364    0.021555  0.163887
cap-color                -0.031384  -0.048203    -0.019402   1.000000  ...   0.162513          -0.293523   -0.144770  0.033925
bruises                  -0.501530  -0.035374     0.070228  -0.000764  ...   0.692973          -0.285008    0.088137 -0.075095
odor                     -0.093552  -0.021935     0.045233  -0.387121  ...  -0.281387           0.469055   -0.043623 -0.026610
gill-attachment           0.129200   0.078865    -0.034180   0.041436  ...  -0.146689          -0.029524    0.165575 -0.030304
gill-spacing             -0.348387   0.013196    -0.282306   0.144259  ...  -0.195897           0.047323   -0.529253 -0.154680
gill-size                 0.540024   0.054050     0.208100  -0.169464  ...  -0.460872           0.622991    0.147682  0.161418
gill-color               -0.530566  -0.006039    -0.161017   0.084659  ...   0.629398          -0.416135   -0.034090 -0.202972
stalk-shape              -0.102019   0.063794    -0.014123  -0.456496  ...  -0.291444           0.258831    0.087383 -0.269216
stalk-root               -0.379361   0.030191    -0.126245   0.321274  ...   0.210155          -0.536996   -0.306747 -0.007668
stalk-surface-above-ring -0.334593  -0.030417     0.089090  -0.060837  ...   0.390091           0.100764    0.079604 -0.058076
stalk-surface-below-ring -0.298801  -0.032591     0.107965  -0.047710  ...   0.394644           0.130974    0.046797 -0.039628
stalk-color-above-ring   -0.154003  -0.031659     0.066050   0.002364  ...  -0.048878           0.271533   -0.240261  0.042561
stalk-color-below-ring   -0.146730  -0.030390     0.068885   0.008057  ...  -0.034284           0.254518   -0.242792  0.041594
veil-type                      NaN        NaN          NaN        NaN  ...        NaN                NaN         NaN       NaN
veil-color                0.145142   0.072560    -0.016603   0.036130  ...  -0.143673          -0.003600    0.124924 -0.040581
ring-number              -0.214366  -0.106534    -0.026147  -0.005822  ...   0.058312           0.338417   -0.242020  0.235835
ring-type                -0.411771  -0.025457    -0.106407   0.162513  ...   1.000000          -0.487048    0.211763 -0.212080
spore-print-color         0.171961  -0.073416     0.230364  -0.293523  ...  -0.487048           1.000000   -0.126859  0.185954
population                0.298686   0.063413     0.021555  -0.144770  ...   0.211763          -0.126859    1.000000 -0.174529
habitat                   0.217179  -0.042221     0.163887   0.033925  ...  -0.212080           0.185954   -0.174529  1.000000
Python Pandas 数据帧 关联

评论

0赞 teddcp 4/14/2021
其他 8 列是字符串/对象类型列吗?Corealtion 仅适用于数值列。请提供数据集或快照的链接。

答:

0赞 Sulphur 4/14/2021 #1

对于部分:“许多列的值被”...“隐藏。我怎样才能看到这些值”

这是因为默认情况下,如果列太多而无法显示,它会隐藏列。我不确定是哪个,并且您的输出图像是否相关,但您需要查看使用 .print(df.head())df.head()print(correlation_df).iloc

例:

# df is the dataframe with all columns
df_1 = df.iloc[:,0:11] # all rows of column 0-10
df_2 = df.iloc[:,11:21] # all rows for columns 11-20
0赞 Scrapper 4/14/2021 #2

相反,您可以使用“热图”来清晰地了解相关性,您将获得一个用颜色区分的相关性图,以便您可以清楚地理解它。

import seaborn as sns
import matplotlib.pyplot as plt
f,ax=plt.subplots(figsize=(20,20))
sns.heatmap(df.corr(),annot=True)