python - pandas: pandas.DataFrame.describe returns information on only one column -
for kaggle dataset (rules prohibit me sharing data here, readily accessible here),
import pandas df_train = pandas.read_csv( "01 - data/act_train.csv.zip" ) df_train.describe()
i get:
>>> df_train.describe() outcome count 2.197291e+06 mean 4.439544e-01 std 4.968491e-01 min 0.000000e+00 25% 0.000000e+00 50% 0.000000e+00 75% 1.000000e+00 max 1.000000e+00
whereas same dataset df_train.columns
gives me:
>>> df_train.columns index(['people_id', 'activity_id', 'date', 'activity_category', 'char_1', 'char_2', 'char_3', 'char_4', 'char_5', 'char_6', 'char_7', 'char_8', 'char_9', 'char_10', 'outcome'], dtype='object')
and df_train.dtypes
gives me:
>>> df_train.dtypes people_id object activity_id object date object activity_category object char_1 object char_2 object char_3 object char_4 object char_5 object char_6 object char_7 object char_8 object char_9 object char_10 object outcome int64 dtype: object
am missing reason why pandas describe
s 1 column in dataset?
by default, describe
works on numeric dtype columns. add keyword-argument include='all'
. from documentation:
if include string ‘all’, output column-set match input one.
to clarify, default arguments describe
include=none, exclude=none
. behavior results is:
none both (default). result include numeric-typed columns or, if none are, categorical columns.
also, notes section:
the output dataframe index depends on requested dtypes:
for numeric dtypes, include: count, mean, std, min, max, , lower, 50, , upper percentiles.
for object dtypes (e.g. timestamps or strings), index include count, unique, common, , frequency of common. timestamps include first , last items.
Comments
Post a Comment