---
title: "Examples"
date: "`r Sys.Date()`"
output: 
    html_document:
        toc: true
vignette: >
    %\VignetteIndexEntry{Examples}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{utf8}
---
  
### Introduction

First, we apply labels on the well-known `mtcars` dataset:
```{r, message=FALSE, warning=FALSE}
library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

```
Table construction consists of at least of three functions chained with magrittr pipe operator: `%>%`. At first we need to specify variables for which statistics will be computed with `tab_cells`. Secondary, we calculate statistics with one of `tab_stat_*` functions. And last, we finalize table creation with `tab_pivot`: `dataset %>% tab_cells(variable) %>% tab_stat_cases() %>% tab_pivot()`. We can split our statistics by columns with `tab_cols` or by rows with `tab_rows`. After that we can sort table with `tab_sort_asc`, drop empty rows/columns with `drop_rc` and transpose with `tab_transpose`. Generally, resulting table is just a data.frame so we can use arbitrary operations on it. Statistic is always calculated on the last cells, column/row variables, weight, missing values and subgroup. To define new cell/column/row variables we can call appropriate function one more time. `tab_pivot` defines how we combine different statistics and where statistic labels will appear - inside/outside rows/columns. 

### Simple column percent
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(vs) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    tab_caption("Simple column percent")

```
### Split by columns and rows
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    tab_caption("Split by columns and rows")

```
### Multiple banners, table is sorted by total
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs, am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    tab_sort_desc() %>% 
    tab_caption("Multiple banners, table is sorted by total")

```
### Nested banners
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs %nest% am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    tab_caption("Nested banners")

```
### Multiple nested banners
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(carb) %>% 
    tab_cols(total(), list(cyl, vs) %nest% am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    tab_caption("Multiple nested banners")

```
### Multiple variable and multiple summary statistics
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(total(), am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
    tab_pivot() %>% 
    tab_caption("Multiple variable and multiple summary statistics")
```
### Multiple variable and multiple summary statistics - statistic lables in columns
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(total(), am) %>% 
    tab_stat_fun(Mean = w_mean, "Valid N" = w_n, method = list) %>%
    tab_pivot() %>% 
    tab_caption("Multiple variable and multiple summary statistics - statistic lables in columns")
```
### Filter dataset and exclude empty columns
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_subgroup(am == 0) %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs %nest% am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    drop_empty_columns() %>%
    tab_caption("Filter dataset and exclude empty columns")
```
### Total at the top of the table
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct(total_row_position = "above",
                  total_label = c("number of cases", "row %"),
                  total_statistic = c("u_cases", "u_rpct")) %>% 
    tab_pivot() %>% 
    tab_caption("Total at the top of the table")
```
### Three different statistics in each cell - stat. labels in rows
```{r, message=FALSE, warning=FALSE}
mtcars %>%
    tab_cells(am) %>%
    tab_cols(total(), vs) %>%
    tab_total_row_position("none") %>% 
    tab_stat_cpct(label = "col %") %>%
    tab_stat_rpct(label = "row %") %>%
    tab_stat_tpct(label = "table %") %>%
    tab_pivot(stat_position = "inside_rows") %>% 
    tab_caption("Three different statistics in each cell - stat. labels in rows")
```
### Three different statistics in each cell - stat. labels in columns
```{r, message=FALSE, warning=FALSE}
mtcars %>%
    tab_cells(am) %>%
    tab_cols(total(), vs) %>%
    tab_total_row_position("none") %>% 
    tab_stat_cpct(label = "col %") %>%
    tab_stat_rpct(label = "row %") %>%
    tab_stat_tpct(label = "table %") %>%
    tab_pivot(stat_position = "inside_columns") %>% 
    tab_caption("Three different statistics in each cell - stat. labels in columns")
```
### Stacked statistics
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_mean() %>%
    tab_stat_se() %>% 
    tab_stat_valid_n() %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    tab_caption("Stacked statistics")

```
### Stacked statistics with section headings
```{r, message=FALSE, warning=FALSE}   
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_row_label("#Summary statistics") %>% 
    tab_stat_mean() %>%
    tab_stat_se() %>% 
    tab_stat_valid_n() %>% 
    tab_row_label("#Column percent") %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    tab_caption("Stacked statistics with section headings")
```
### Stacked statistics - different statistics for different variables
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cols(total(), am) %>% 
    tab_cells(mpg, hp, qsec) %>% 
    tab_stat_mean() %>%
    tab_cells(cyl, carb) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    tab_caption("Stacked statistics - different statistics for different variables")
```
### Linear regression by groups
```{r, message=FALSE, warning=FALSE}    
mtcars %>% 
    tab_cells(sheet(mpg, disp, hp, wt, qsec)) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_fun_df(
        function(x){
            frm = reformulate(".", response = as.name(names(x)[1]))
            model = lm(frm, data = x)
            sheet('Coef.' = coef(model), 
                  confint(model)
            )
        }    
    ) %>% 
    tab_pivot() %>% 
    tab_caption("Linear regression by groups")
```

### Subtotals
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(mpg) %>% 
    tab_cols(total(), vs) %>% 
    tab_rows(subtotal(cyl, 1:2, 3:4, "5 and more" = 5 %thru% hi)) %>% 
    tab_stat_mean() %>% 
    tab_pivot() %>% 
    tab_caption("Subtotals in rows")
```

### Subtotals at the bottom of the table
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(mpg, qsec) %>% 
    tab_cols(total(), vs) %>% 
    tab_rows(subtotal(cyl, 1:2, 3:4, "TOTAL 5 and more" = 5 %thru% hi, position = "bottom")) %>% 
    tab_stat_mean() %>% 
    tab_pivot() %>% 
    tab_caption("Subtotals at the bottom of the table")
```

### Nets

Net, contrary to `subtotal`, remove original categories.
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(mpg) %>% 
    tab_cols(total(), vs) %>% 
    tab_rows(net(cyl, 1:2, 3:4, "NET 5 and more" = 5 %thru% hi, prefix = "NET ")) %>% 
    tab_stat_mean() %>% 
    tab_pivot() %>% 
    tab_caption("Nets in rows, custom prefix")
```
### Nets with complex grouping

```{r, message=FALSE, warning=FALSE} 
mtcars %>% 
    tab_cells(net(mpg, "Low mpg" = less(mean(mpg)), "High mpg" = greater_or_equal(mean(mpg)))) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_cases() %>% 
    tab_pivot() %>% 
    tab_caption("Nets with complex grouping")
```

### Significance testing on column percent

Letters marks cells which are significantly greater than cells in the appropriate columns. `-` and `+` marks values which are lower/greater than values in the first column. Significance testing on column percent should be applied on the result of `tab_stat_cpct` with total row.

```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    significance_cpct(compare_type = c("first_column", "subtable"), sig_level = 0.05) %>% 
    tab_caption("Significance testing on column percent")
```

### Significance testing on means

Significance testing on means should be applied on the result of `tab_stat_mean_sd_n`.
```{r, message=FALSE, warning=FALSE}
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(total(), am) %>% 
    tab_stat_mean_sd_n() %>%
    tab_pivot() %>% 
    significance_means(compare_type = c("first_column", "subtable")) %>% 
    tab_caption("Significance testing on means")
```

### Multiple-response variables with weighting

Here we load data with multiple-responce questions. `mrset` means that we treat set of variables as multiple response varibale with category encoding. For dichotomy encoding use `mdset`.
```{r, message=FALSE, warning=FALSE}

data(product_test)
codeframe_likes = num_lab("
                          1 Liked everything
                          2 Disliked everything
                          3 Chocolate
                          4 Appearance
                          5 Taste
                          6 Stuffing
                          7 Nuts
                          8 Consistency
                          98 Other
                          99 Hard to answer
                          ")

set.seed(1)
product_test = product_test %>% 
    let( 
        # recode age by groups
        age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2),
        wgt = runif(.N, 0.25, 4),
        wgt = wgt/sum(wgt)*.N
    )  %>% 
    apply_labels(
        age_cat = "Age",
        age_cat = c("18 - 25" = 1, "26 - 35" = 2),
        a1_1 = "Likes. VSX123",
        b1_1 = "Likes. SDF456",
        a1_1 = codeframe_likes,
        b1_1 = codeframe_likes
    )

product_test %>% 
    tab_cells(mrset(a1_1 %to% a1_6), mrset(b1_1 %to% b1_6)) %>% 
    tab_cols(total(), age_cat) %>% 
    tab_weight(wgt) %>% 
    tab_stat_cpct() %>% 
    tab_sort_desc() %>% 
    tab_pivot() %>% 
    tab_caption("Multiple-response variables with weighting")
```

### Side-by-side variables comparison

To make side-by-side comparison we use "|" to suppress variable labels and put these labels to the statistic labels.
Statistics labels we place in columns with `tab_pivot`.

```{r, message=FALSE, warning=FALSE}
product_test %>% 
    tab_cols(total(), age_cat) %>% 
    tab_weight(wgt) %>% 
    # '|' is needed to prevent automatic labels creation from argument
    tab_cells("|" = unvr(mrset(a1_1 %to% a1_6))) %>% 
    tab_stat_cpct(label = var_lab(a1_1)) %>% 
    tab_cells("|" = unvr(mrset(b1_1 %to% b1_6))) %>% 
    tab_stat_cpct(label = var_lab(b1_1)) %>% 
    tab_pivot(stat_position = "inside_columns") %>% 
    tab_caption("Side-by-side variables comparison")
```

### Multiple tables in the loop with knitr

To make the task more practical we will create table with means for variables which have more than 6 unique elements. For other variables we will calculate column percent table. **Note that you need to set `results='asis'` in the chunk options.**

```{r, message=FALSE, warning=FALSE, results='asis'}

# here we specify dataset and banner
banner = mtcars %>%
    tab_cols(total(), am)

for(each in colnames(mtcars)){
    # note ..$ which is used for indirect reference to variable
    # specify variable
    curr_table = banner %>% 
        tab_cells(..$each)
    # calculate statistics
    if(length(unique(mtcars[[each]]))>6){
        curr_table = curr_table %>% 
            tab_stat_mean_sd_n() %>% 
            tab_pivot() %>% 
            significance_means()
    } else {
        curr_table = curr_table %>% 
            tab_stat_cpct() %>% 
            tab_pivot() %>% 
            significance_cpct()
    }
    # finalize table
    curr_table %>% 
        tab_caption("Variable name: ", each) %>% 
        htmlTable() %>% 
        print()
}


```