python - Pandas groupby sum using two DataFrames -
i have 2 large pandas dataframes , use them guide each other in fast sum operation. 2 frames this:
frame1:
samplename gene1 gene2 gene3 sample1 1 2 3 sample2 4 5 6 sample3 7 8 9
(in reality, frame1 1,000 rows x ~300,000 columns)
frame2:
featurename geneid feature1 gene1 feature1 gene3 feature2 gene1 feature2 gene2 feature2 gene3
(in reality, frame2
~350,000 rows x 2 columns, ~17,000 unique features)
i sum columns of frame1 frame2's groups of genes. example, output of 2 above frames be:
samplename feature1 feature2 sample1 4 6 sample2 10 15 sample3 16 24
(in reality, output ~1,000 rows x 17,000 columns)
is there way minimal memory usage?
if want decrease memory usage, think best option iterate on first dataframe since has 1k rows.
dfs = [] frame1 = frame1.set_index('samplename') idx, row in frame1.iterrows(): dfs.append(frame2.join(row, on='geneid').groupby('featurename').sum()) pd.concat(dfs, axis=1).t
yields
featurename feature1 feature2 sample1 4 6 sample2 10 15 sample3 16 24
Comments
Post a Comment