r - Adding an column for the category of glm coeffients in broom results -
is there way add column result of broom package's tidy function can act relate term column both original names used in formula argument , columns in data argument.
for example if run following get:
library(ggplot2) library(dplyr) mod <- glm(mpg ~ wt + qsec + as.factor(carb), data = mtcars) tidy(mod) # term estimate std.error statistic p.value # 1 (intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02 # 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 # 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 # 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 # 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 # 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 # 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 # 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 what looking this:
# term estimate std.error statistic p.value term_base # 1 (intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02 # 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt # 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec # 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb # 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb # 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb # 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb # 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb not bothered if first row in new column empty, intercept or 1. need can match term column original variable names passed formula?
edit
would if didn't depend on using as.factor in formula, e.g. work on:
mod <- glm(mpg ~ wt + qsec + carb, data = mtcars %>% mutate(carb = factor(carb))) tidy(mod) # term estimate std.error statistic p.value # 1 (intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02 # 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 # 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 # 4 carb2 0.004133826 1.5321134 0.00269812 9.978695e-01 # 5 carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 # 6 carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 # 7 carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 # 8 carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01
we can use regex create 'term_base' column
tidy(mod) %>% mutate(term_base = sub("intercept", "", gsub(".*\\(|\\).*", "", term))) # term estimate std.error statistic p.value term_base #1 (intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02 #2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt #3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec #4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb #5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb #6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb #7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb #8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb the as.factor can removed 'term' if mutate 'carb' factor before glm step
mtcars %>% mutate(carb = factor(carb)) %>% glm(formula = mpg ~wt + qsec + carb, data = .) %>% tidy(.) %>% mutate(term_base = sub("\\(.*\\)|\\d+", "", term)) # term estimate std.error statistic p.value term_base #1 (intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02 #2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt #3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec #4 carb2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb #5 carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb #6 carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb #7 carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb #8 carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb
Comments
Post a Comment