dataframe - Python pandas: delete the data in a data frame that the size of data is below a value -
i have data frame called df(this example, real data big, please consider computing speed) following:
name id text tom 1 a1 lucy 2 b1 john 3 c1 tick 4 d1 tom 1 a2 lucy 2 b2 john 3 c2 tick 4 d2 tom 1 a3 lucy 2 b3 john 3 c3 tick 4 d3 tom 1 a4 tick 4 d4 tom 1 a5 lucy 2 b5 tick 4 d5
the dataframe can grouped name(tom, john, lucy, tick). want delete data size of each group(by name)is less 5. mean since size of name of lucy , john less 5, want delete these data , new df(just have tick , tom data), such as.
could tell me how it,please! thanks!
i think can use filter this. 1 line:
df = pd.dataframe({'name': ['tom','lucy','john','tick','tom','lucy','john','tick', 'tom', 'lucy','john','tick','tom','tick','tom', 'lucy','tick'], 'id':[1,2,3,4,1,2,3,4,1,2,3,4,1,4,1,2,4],'text':['a1','b1','c1','d1','a2','b2','c2','d2','a3','b3','c3','d3','a4','d4','a5','b5','d5']}) df.groupby('name').filter(lambda x: len(x) >= 5)
and output tick , tom:
id name text 0 1 tom a1 3 4 tick d1 4 1 tom a2 7 4 tick d2 8 1 tom a3 11 4 tick d3 12 1 tom a4 13 4 tick d4 14 1 tom a5 16 4 tick d5
Comments
Post a Comment