python - Pandas groupby sum using two DataFrames -


i have 2 large pandas dataframes , use them guide each other in fast sum operation. 2 frames this:

frame1:

samplename  gene1   gene2   gene3 sample1         1       2       3 sample2         4       5       6 sample3         7       8       9 

(in reality, frame1 1,000 rows x ~300,000 columns)

frame2:

featurename geneid feature1    gene1 feature1    gene3 feature2    gene1 feature2    gene2 feature2    gene3 

(in reality, frame2 ~350,000 rows x 2 columns, ~17,000 unique features)

i sum columns of frame1 frame2's groups of genes. example, output of 2 above frames be:

samplename  feature1    feature2 sample1            4           6 sample2           10          15 sample3           16          24 

(in reality, output ~1,000 rows x 17,000 columns)

is there way minimal memory usage?

if want decrease memory usage, think best option iterate on first dataframe since has 1k rows.

dfs = [] frame1 = frame1.set_index('samplename') idx, row in frame1.iterrows():     dfs.append(frame2.join(row, on='geneid').groupby('featurename').sum()) pd.concat(dfs, axis=1).t 

yields

featurename  feature1  feature2 sample1             4         6 sample2            10        15 sample3            16        24 

Comments

Popular posts from this blog

amazon web services - S3 Pre-signed POST validate file type? -

c# - Check Keyboard Input Winforms -