dataframe - Python pandas: delete the data in a data frame that the size of data is below a value -


i have data frame called df(this example, real data big, please consider computing speed) following:

   name   id     text     tom    1      a1      lucy   2      b1     john   3      c1     tick   4      d1     tom    1      a2      lucy   2      b2     john   3      c2     tick   4      d2     tom    1      a3      lucy   2      b3     john   3      c3     tick   4      d3     tom    1      a4      tick   4      d4     tom    1      a5      lucy   2      b5     tick   4      d5 

the dataframe can grouped name(tom, john, lucy, tick). want delete data size of each group(by name)is less 5. mean since size of name of lucy , john less 5, want delete these data , new df(just have tick , tom data), such as.

could tell me how it,please! thanks!

i think can use filter this. 1 line:

df = pd.dataframe({'name': ['tom','lucy','john','tick','tom','lucy','john','tick', 'tom', 'lucy','john','tick','tom','tick','tom', 'lucy','tick'], 'id':[1,2,3,4,1,2,3,4,1,2,3,4,1,4,1,2,4],'text':['a1','b1','c1','d1','a2','b2','c2','d2','a3','b3','c3','d3','a4','d4','a5','b5','d5']})  df.groupby('name').filter(lambda x: len(x) >= 5) 

and output tick , tom:

   id  name text 0    1   tom   a1 3    4  tick   d1 4    1   tom   a2 7    4  tick   d2 8    1   tom   a3 11   4  tick   d3 12   1   tom   a4 13   4  tick   d4 14   1   tom   a5 16   4  tick   d5 

Comments

Popular posts from this blog

amazon web services - S3 Pre-signed POST validate file type? -

c# - Check Keyboard Input Winforms -