python - Nearest match merge on two columns (pandas) -
similar 1 of previous questions (merge dataframes on nearest datetime / timestamp), merge 2 pandas data frames on 2 datetime columns using nearest match:
let , b 2 dataframes follows:
a = pd.dataframe({"id":["a", "a", "c" ,"b", "b"], "init_date":["01/01/2015","07/02/2014","08/02/1999","01/01/1991","06/22/2014"], "fin_date":["04/16/1923","09/24/1945","06/24/1952","11/26/1988","10/05/1990"]}) in [15]: out[15]: id fin_date init_date 0 04/16/1923 01/01/2015 1 09/24/1945 07/02/2014 2 c 06/24/1952 08/02/1999 3 b 11/26/1988 01/01/1991 4 b 10/05/1990 06/22/2014 b = pd.dataframe({"id":["a", "a", "c" ,"b", "b"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"],"fin_date":["12/10/1926","01/01/1944","08/21/1955","12/12/1987","11/05/1991"], "value": ["3","5","1","7","8"] }) in [11]: b out[11]: id date fin_date value 0 02/15/2015 12/10/1926 3 1 06/30/2014 01/01/1944 5 2 c 07/02/1999 08/21/1955 1 3 b 10/05/1990 12/12/1987 7 4 b 06/24/2014 11/05/1991 8
the resulting data frame should following:
in [21]: c out[21]: id fin_date init_date value 0 04/16/1923 01/01/2015 3 1 09/24/1945 07/02/2014 5 2 c 06/24/1952 08/02/1999 1 3 b 11/26/1988 01/01/1991 7 4 b 10/05/1990 06/22/2014 8
the general problem potentially not have close match neither init_date nor fin_date, however, interested in solution when there exact matches init_date, example.
note 1 difficulty 1 match might closer value in init_date on final date, while competing match might opposite. in case, prefer 1 closer init_date. knowledge, after attempting similar approach 1 in link, reindexing "nearest" not implemented multi-indexing.
thank , appreciate help,
pd.merge(a,b['value'],on=['id','fin_date'],how='left')
Comments
Post a Comment