Session2a - Data wrangingling with tidyverse – Data Carpentry: From Data Wrangling to Data Visualisation

```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ### Usage and Adaptation of Data Carpentry Materials: Most material found in this document has been adapted from [Data Carpentry][https://datacarpentry.org/r-socialsci/] materials, under the [creative commons attribution license][https://creativecommons.org/licenses/by/4.0/]. Minor amendments have been made to allow for compatability in order. ### Exercise 0 Go ahead and load in tidyverse as usual. ```{r Tidyverse and Here Loading} library(tidyverse) ``` ------------- For this workshop, we will need to import a tidy data set. We will continue to use the data set from this morning. ```{r DataImport} interviews <- read_csv("https://raw.githubusercontent.com/datacarpentry/r-socialsci/main/episodes/data/SAFI_clean.csv") ``` ### Exercise 1 Using pipes, subset the `interviews` data to include interviews where respondents were members of an irrigation association (`memb_assoc`) and retain only the columns `affect_conflicts`, `liv_count`, and `no_meals`. Save this in an object called 'interviews_restricted' ```{r Subsetting Task} interviews %>% filter(memb_assoc == "yes") %>% select(affect_conflicts, liv_count, no_meals) ``` ------------- ### Exercise 2 Create a new dataframe from the `interviews` data that meets the following criteria: contains only the `village` column and a new column called `total_meals` containing a value that is equal to the total number of meals served in the household per day on average (`no_membrs` times `no_meals`). Only the rows where `total_meals` is greater than 20 should be shown in the final dataframe. This should be saved as 'interview_total_meals' **Hint**: think about how the commands should be ordered to produce this data frame. ```{r New Variable Task} interviews_total_meals <- interviews %>% mutate(total_meals = no_membrs * no_meals) %>% filter(total_meals > 20) %>% select(village, total_meals) ``` ------------- ### Exercise 3 How many households in the survey have an average of two meals per day? Three meals per day? Are there any other numbers of meals represented? ```{r Counting Task} interviews %>% count(no_meals) ``` ------------- ### Exercise 4 Use `group_by()` and `summarise()` to find the mean, min, and max number of household members for each village. Also add the number of observations (hint: see `?n`). ```{r Summary Task} interviews %>% group_by(village) %>% summarize( mean_no_membrs = mean(no_membrs), min_no_membrs = min(no_membrs), max_no_membrs = max(no_membrs), n = n() ``` ------------- ### Exercise 5 Create a new dataframe (named `interviews_months_lack_food`) that has one column for each month and records `TRUE` or `FALSE` for whether each interview respondent was lacking food in that month. ```{r Pivoting Task} interviews_months_lack_food <- interviews %>% separate_longer_delim(months_lack_food, delim = ";") %>% mutate(months_lack_food_logical = TRUE) %>% pivot_wider(names_from = months_lack_food, values_from = months_lack_food_logical, values_fill = list(months_lack_food_logical = FALSE)) ```