Introduction
First, we apply labels on the well-known mtcars
dataset:
library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
mpg = "Miles/(US) gallon",
cyl = "Number of cylinders",
disp = "Displacement (cu.in.)",
hp = "Gross horsepower",
drat = "Rear axle ratio",
wt = "Weight (1000 lbs)",
qsec = "1/4 mile time",
vs = "Engine",
vs = c("V-engine" = 0,
"Straight engine" = 1),
am = "Transmission",
am = c("Automatic" = 0,
"Manual"=1),
gear = "Number of forward gears",
carb = "Number of carburetors"
)
Table construction consists of at least of three functions chained
with magrittr pipe operator: %>%
. At first we need to
specify variables for which statistics will be computed with
tab_cells
. Secondary, we calculate statistics with one of
tab_stat_*
functions. And last, we finalize table creation
with tab_pivot
:
dataset %>% tab_cells(variable) %>% tab_stat_cases() %>% tab_pivot()
.
We can split our statistics by columns with tab_cols
or by
rows with tab_rows
. After that we can sort table with
tab_sort_asc
, drop empty rows/columns with
drop_rc
and transpose with tab_transpose
.
Generally, resulting table is just a data.frame so we can use arbitrary
operations on it. Statistic is always calculated on the last cells,
column/row variables, weight, missing values and subgroup. To define new
cell/column/row variables we can call appropriate function one more
time. tab_pivot
defines how we combine different statistics
and where statistic labels will appear - inside/outside
rows/columns.
Multiple variable and multiple summary statistics
mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(total(), am) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
tab_pivot() %>%
tab_caption("Multiple variable and multiple summary statistics")
Multiple variable and multiple summary statistics
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
Manual
|
Miles/(US) gallon
|
Mean
|
20.1
|
|
17.1
|
24.4
|
Std. dev.
|
6.0
|
|
3.8
|
6.2
|
Valid N
|
32.0
|
|
19.0
|
13.0
|
Displacement (cu.in.)
|
Mean
|
230.7
|
|
290.4
|
143.5
|
Std. dev.
|
123.9
|
|
110.2
|
87.2
|
Valid N
|
32.0
|
|
19.0
|
13.0
|
Gross horsepower
|
Mean
|
146.7
|
|
160.3
|
126.8
|
Std. dev.
|
68.6
|
|
53.9
|
84.1
|
Valid N
|
32.0
|
|
19.0
|
13.0
|
Weight (1000 lbs)
|
Mean
|
3.2
|
|
3.8
|
2.4
|
Std. dev.
|
1.0
|
|
0.8
|
0.6
|
Valid N
|
32.0
|
|
19.0
|
13.0
|
1/4 mile time
|
Mean
|
17.8
|
|
18.2
|
17.4
|
Std. dev.
|
1.8
|
|
1.8
|
1.8
|
Valid N
|
32.0
|
|
19.0
|
13.0
|
Multiple variable and multiple summary statistics - statistic lables
in columns
mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(total(), am) %>%
tab_stat_fun(Mean = w_mean, "Valid N" = w_n, method = list) %>%
tab_pivot() %>%
tab_caption("Multiple variable and multiple summary statistics - statistic lables in columns")
Multiple variable and multiple summary statistics - statistic lables in
columns
|
|
#Total
|
|
Transmission
|
|
Mean
|
|
Valid N
|
|
Automatic
|
|
Manual
|
|
|
|
|
|
Mean
|
Valid N
|
|
Mean
|
Valid N
|
Miles/(US) gallon
|
20.1
|
|
32
|
|
17.1
|
19
|
|
24.4
|
13
|
Displacement (cu.in.)
|
230.7
|
|
32
|
|
290.4
|
19
|
|
143.5
|
13
|
Gross horsepower
|
146.7
|
|
32
|
|
160.3
|
19
|
|
126.8
|
13
|
Weight (1000 lbs)
|
3.2
|
|
32
|
|
3.8
|
19
|
|
2.4
|
13
|
1/4 mile time
|
17.8
|
|
32
|
|
18.2
|
19
|
|
17.4
|
13
|
Linear regression by groups
mtcars %>%
tab_cells(sheet(mpg, disp, hp, wt, qsec)) %>%
tab_cols(total(), am) %>%
tab_stat_fun_df(
function(x){
frm = reformulate(".", response = as.name(names(x)[1]))
model = lm(frm, data = x)
sheet('Coef.' = coef(model),
confint(model)
)
}
) %>%
tab_pivot() %>%
tab_caption("Linear regression by groups")
Linear regression by groups
|
|
#Total
|
|
Transmission
|
|
Coef.
|
|
2.5 %
|
|
97.5 %
|
|
Automatic
|
|
Manual
|
|
|
|
|
|
|
|
Coef.
|
2.5 %
|
97.5 %
|
|
Coef.
|
2.5 %
|
97.5 %
|
(Intercept)
|
27.3
|
|
9.6
|
|
45.1
|
|
21.8
|
-1.9
|
45.5
|
|
13.3
|
-21.9
|
48.4
|
Displacement (cu.in.)
|
0.0
|
|
0.0
|
|
0.0
|
|
0.0
|
0.0
|
0.0
|
|
0.0
|
-0.1
|
0.1
|
Gross horsepower
|
0.0
|
|
-0.1
|
|
0.0
|
|
0.0
|
-0.1
|
0.0
|
|
0.0
|
0.0
|
0.1
|
Weight (1000 lbs)
|
-4.6
|
|
-7.2
|
|
-2.0
|
|
-2.3
|
-5.0
|
0.4
|
|
-7.7
|
-12.5
|
-2.9
|
1/4 mile time
|
0.5
|
|
-0.4
|
|
1.5
|
|
0.4
|
-0.7
|
1.6
|
|
1.6
|
-0.2
|
3.4
|
Subtotals at the bottom of the table
mtcars %>%
tab_cells(mpg, qsec) %>%
tab_cols(total(), vs) %>%
tab_rows(subtotal(cyl, 1:2, 3:4, "TOTAL 5 and more" = 5 %thru% hi, position = "bottom")) %>%
tab_stat_mean() %>%
tab_pivot() %>%
tab_caption("Subtotals at the bottom of the table")
Subtotals at the bottom of the table
|
|
|
|
#Total
|
|
Engine
|
|
|
|
|
|
|
V-engine
|
Straight engine
|
Number of cylinders
|
1
|
Miles/(US) gallon
|
Mean
|
|
|
|
|
|
|
1/4 mile time
|
Mean
|
|
|
|
|
|
2
|
Miles/(US) gallon
|
Mean
|
|
|
|
|
|
|
1/4 mile time
|
Mean
|
|
|
|
|
|
3
|
Miles/(US) gallon
|
Mean
|
|
|
|
|
|
|
1/4 mile time
|
Mean
|
|
|
|
|
|
4
|
Miles/(US) gallon
|
Mean
|
|
26.7
|
|
26.0
|
26.7
|
|
1/4 mile time
|
Mean
|
|
19.1
|
|
16.7
|
19.4
|
6
|
Miles/(US) gallon
|
Mean
|
|
19.7
|
|
20.6
|
19.1
|
|
1/4 mile time
|
Mean
|
|
18.0
|
|
16.3
|
19.2
|
8
|
Miles/(US) gallon
|
Mean
|
|
15.1
|
|
15.1
|
|
|
1/4 mile time
|
Mean
|
|
16.8
|
|
16.8
|
|
TOTAL 1/2
|
Miles/(US) gallon
|
Mean
|
|
|
|
|
|
|
1/4 mile time
|
Mean
|
|
|
|
|
|
TOTAL 3/4
|
Miles/(US) gallon
|
Mean
|
|
26.7
|
|
26.0
|
26.7
|
|
1/4 mile time
|
Mean
|
|
19.1
|
|
16.7
|
19.4
|
TOTAL 5 and more
|
Miles/(US) gallon
|
Mean
|
|
16.6
|
|
16.1
|
19.1
|
|
1/4 mile time
|
Mean
|
|
17.2
|
|
16.7
|
19.2
|
Nets
Net, contrary to subtotal
, remove original
categories.
mtcars %>%
tab_cells(mpg) %>%
tab_cols(total(), vs) %>%
tab_rows(net(cyl, 1:2, 3:4, "NET 5 and more" = 5 %thru% hi, prefix = "NET ")) %>%
tab_stat_mean() %>%
tab_pivot() %>%
tab_caption("Nets in rows, custom prefix")
Nets in rows, custom prefix
|
|
|
|
#Total
|
|
Engine
|
|
|
|
|
|
|
V-engine
|
Straight engine
|
Number of cylinders
|
NET 1/2
|
Miles/(US) gallon
|
Mean
|
|
|
|
|
|
NET 3/4
|
Miles/(US) gallon
|
Mean
|
|
26.7
|
|
26.0
|
26.7
|
NET 5 and more
|
Miles/(US) gallon
|
Mean
|
|
16.6
|
|
16.1
|
19.1
|
Significance testing on column percent
Letters marks cells which are significantly greater than cells in the
appropriate columns. -
and +
marks values
which are lower/greater than values in the first column. Significance
testing on column percent should be applied on the result of
tab_stat_cpct
with total row.
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
significance_cpct(compare_type = c("first_column", "subtable"), sig_level = 0.05) %>%
tab_caption("Significance testing on column percent")
Significance testing on column percent
|
|
#Total
|
|
Engine
|
|
|
|
V-engine
|
|
Straight engine
|
|
|
|
A
|
|
B
|
Number of cylinders
|
4
|
34.4
|
|
5.6 -
|
|
71.4 + A
|
6
|
21.9
|
|
16.7
|
|
28.6
|
8
|
43.8
|
|
77.8 +
|
|
|
#Total cases
|
32
|
|
18
|
|
14
|
Significance testing on means
Significance testing on means should be applied on the result of
tab_stat_mean_sd_n
.
mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(total(), am) %>%
tab_stat_mean_sd_n() %>%
tab_pivot() %>%
significance_means(compare_type = c("first_column", "subtable")) %>%
tab_caption("Significance testing on means")
Significance testing on means
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Miles/(US) gallon
|
Mean
|
20.1
|
|
17.1 -
|
|
24.4 + A
|
Std. dev.
|
6.0
|
|
3.8
|
|
6.2
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Displacement (cu.in.)
|
Mean
|
230.7
|
|
290.4 B
|
|
143.5 -
|
Std. dev.
|
123.9
|
|
110.2
|
|
87.2
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Gross horsepower
|
Mean
|
146.7
|
|
160.3
|
|
126.8
|
Std. dev.
|
68.6
|
|
53.9
|
|
84.1
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Weight (1000 lbs)
|
Mean
|
3.2
|
|
3.8 + B
|
|
2.4 -
|
Std. dev.
|
1.0
|
|
0.8
|
|
0.6
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
1/4 mile time
|
Mean
|
17.8
|
|
18.2
|
|
17.4
|
Std. dev.
|
1.8
|
|
1.8
|
|
1.8
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Multiple-response variables with weighting
Here we load data with multiple-responce questions.
mrset
means that we treat set of variables as multiple
response varibale with category encoding. For dichotomy encoding use
mdset
.
data(product_test)
codeframe_likes = num_lab("
1 Liked everything
2 Disliked everything
3 Chocolate
4 Appearance
5 Taste
6 Stuffing
7 Nuts
8 Consistency
98 Other
99 Hard to answer
")
set.seed(1)
product_test = product_test %>%
let(
# recode age by groups
age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2),
wgt = runif(.N, 0.25, 4),
wgt = wgt/sum(wgt)*.N
) %>%
apply_labels(
age_cat = "Age",
age_cat = c("18 - 25" = 1, "26 - 35" = 2),
a1_1 = "Likes. VSX123",
b1_1 = "Likes. SDF456",
a1_1 = codeframe_likes,
b1_1 = codeframe_likes
)
product_test %>%
tab_cells(mrset(a1_1 %to% a1_6), mrset(b1_1 %to% b1_6)) %>%
tab_cols(total(), age_cat) %>%
tab_weight(wgt) %>%
tab_stat_cpct() %>%
tab_sort_desc() %>%
tab_pivot() %>%
tab_caption("Multiple-response variables with weighting")
Multiple-response variables with weighting
|
|
#Total
|
|
Age
|
|
|
|
18 - 25
|
26 - 35
|
Likes. VSX123
|
Nuts
|
68.0
|
|
82.5
|
53.7
|
Taste
|
39.3
|
|
48.4
|
30.3
|
Chocolate
|
34.5
|
|
31.9
|
37.0
|
Appearance
|
33.6
|
|
26.7
|
40.4
|
Stuffing
|
31.5
|
|
23.6
|
39.3
|
Consistency
|
12.9
|
|
7.7
|
18.0
|
Disliked everything
|
0.7
|
|
|
1.4
|
Liked everything
|
|
|
|
|
Other
|
|
|
|
|
Hard to answer
|
|
|
|
|
#Total cases
|
150
|
|
70
|
80
|
Likes. SDF456
|
Nuts
|
63.3
|
|
62.1
|
64.4
|
Taste
|
31.6
|
|
35.2
|
28.0
|
Appearance
|
30.3
|
|
28.8
|
31.8
|
Chocolate
|
27.2
|
|
27.9
|
26.4
|
Stuffing
|
21.5
|
|
14.3
|
28.7
|
Consistency
|
8.2
|
|
1.7
|
14.7
|
Disliked everything
|
2.9
|
|
3.3
|
2.5
|
Other
|
1.0
|
|
|
1.9
|
Liked everything
|
|
|
|
|
Hard to answer
|
|
|
|
|
#Total cases
|
150
|
|
70
|
80
|
Side-by-side variables comparison
To make side-by-side comparison we use “|” to suppress variable
labels and put these labels to the statistic labels. Statistics labels
we place in columns with tab_pivot
.
product_test %>%
tab_cols(total(), age_cat) %>%
tab_weight(wgt) %>%
# '|' is needed to prevent automatic labels creation from argument
tab_cells("|" = unvr(mrset(a1_1 %to% a1_6))) %>%
tab_stat_cpct(label = var_lab(a1_1)) %>%
tab_cells("|" = unvr(mrset(b1_1 %to% b1_6))) %>%
tab_stat_cpct(label = var_lab(b1_1)) %>%
tab_pivot(stat_position = "inside_columns") %>%
tab_caption("Side-by-side variables comparison")
Side-by-side variables comparison
|
|
#Total
|
|
Age
|
|
Likes. VSX123
|
|
Likes. SDF456
|
|
18 - 25
|
|
26 - 35
|
|
|
|
|
|
Likes. VSX123
|
Likes. SDF456
|
|
Likes. VSX123
|
Likes. SDF456
|
Liked everything
|
|
|
|
|
|
|
|
|
|
Disliked everything
|
0.7
|
|
2.9
|
|
|
3.3
|
|
1.4
|
2.5
|
Chocolate
|
34.5
|
|
27.2
|
|
31.9
|
27.9
|
|
37.0
|
26.4
|
Appearance
|
33.6
|
|
30.3
|
|
26.7
|
28.8
|
|
40.4
|
31.8
|
Taste
|
39.3
|
|
31.6
|
|
48.4
|
35.2
|
|
30.3
|
28.0
|
Stuffing
|
31.5
|
|
21.5
|
|
23.6
|
14.3
|
|
39.3
|
28.7
|
Nuts
|
68.0
|
|
63.3
|
|
82.5
|
62.1
|
|
53.7
|
64.4
|
Consistency
|
12.9
|
|
8.2
|
|
7.7
|
1.7
|
|
18.0
|
14.7
|
Other
|
|
|
1.0
|
|
|
|
|
|
1.9
|
Hard to answer
|
|
|
|
|
|
|
|
|
|
#Total cases
|
150
|
|
150
|
|
70
|
70
|
|
80
|
80
|
Multiple tables in the loop with knitr
To make the task more practical we will create table with means for
variables which have more than 6 unique elements. For other variables we
will calculate column percent table. Note that you need to set
results='asis'
in the chunk options.
# here we specify dataset and banner
banner = mtcars %>%
tab_cols(total(), am)
for(each in colnames(mtcars)){
# note ..$ which is used for indirect reference to variable
# specify variable
curr_table = banner %>%
tab_cells(..$each)
# calculate statistics
if(length(unique(mtcars[[each]]))>6){
curr_table = curr_table %>%
tab_stat_mean_sd_n() %>%
tab_pivot() %>%
significance_means()
} else {
curr_table = curr_table %>%
tab_stat_cpct() %>%
tab_pivot() %>%
significance_cpct()
}
# finalize table
curr_table %>%
tab_caption("Variable name: ", each) %>%
htmlTable() %>%
print()
}
Variable name: mpg
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Miles/(US) gallon
|
Mean
|
20.1
|
|
17.1
|
|
24.4 A
|
Std. dev.
|
6.0
|
|
3.8
|
|
6.2
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Variable name: cyl
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Number of cylinders
|
4
|
34.4
|
|
15.8
|
|
61.5 A
|
6
|
21.9
|
|
21.1
|
|
23.1
|
8
|
43.8
|
|
63.2 B
|
|
15.4
|
#Total cases
|
32
|
|
19
|
|
13
|
Variable name: disp
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Displacement (cu.in.)
|
Mean
|
230.7
|
|
290.4 B
|
|
143.5
|
Std. dev.
|
123.9
|
|
110.2
|
|
87.2
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Variable name: hp
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Gross horsepower
|
Mean
|
146.7
|
|
160.3
|
|
126.8
|
Std. dev.
|
68.6
|
|
53.9
|
|
84.1
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Variable name: drat
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Rear axle ratio
|
Mean
|
3.6
|
|
3.3
|
|
4.0 A
|
Std. dev.
|
0.5
|
|
0.4
|
|
0.4
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Variable name: wt
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Weight (1000 lbs)
|
Mean
|
3.2
|
|
3.8 B
|
|
2.4
|
Std. dev.
|
1.0
|
|
0.8
|
|
0.6
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Variable name: qsec
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
1/4 mile time
|
Mean
|
17.8
|
|
18.2
|
|
17.4
|
Std. dev.
|
1.8
|
|
1.8
|
|
1.8
|
Unw. valid N
|
32.0
|
|
19.0
|
|
13.0
|
Variable name: vs
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Engine
|
V-engine
|
56.2
|
|
63.2
|
|
46.2
|
Straight engine
|
43.8
|
|
36.8
|
|
53.8
|
#Total cases
|
32
|
|
19
|
|
13
|
Variable name: am
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Transmission
|
Automatic
|
59.4
|
|
100.0
|
|
|
Manual
|
40.6
|
|
|
|
100.0
|
#Total cases
|
32
|
|
19
|
|
13
|
Variable name: gear
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Number of forward gears
|
3
|
46.9
|
|
78.9
|
|
|
4
|
37.5
|
|
21.1
|
|
61.5 A
|
5
|
15.6
|
|
|
|
38.5
|
#Total cases
|
32
|
|
19
|
|
13
|
Variable name: carb
|
|
#Total
|
|
Transmission
|
|
|
|
Automatic
|
|
Manual
|
|
|
|
A
|
|
B
|
Number of carburetors
|
1
|
21.9
|
|
15.8
|
|
30.8
|
2
|
31.2
|
|
31.6
|
|
30.8
|
3
|
9.4
|
|
15.8
|
|
|
4
|
31.2
|
|
36.8
|
|
23.1
|
6
|
3.1
|
|
|
|
7.7
|
8
|
3.1
|
|
|
|
7.7
|
#Total cases
|
32
|
|
19
|
|
13
|