Introduction

This exercise focuses on loading the raw data and cleaning/processing it for further analysis.

The raw data for this exercise comes from the following citation: McKay, Brian et al. (2020), Virulence-mediated infectiousness and activity trade-offs and their impact on transmission potential of patients infected with influenza, Dryad, Dataset, https://doi.org/10.5061/dryad.51c59zw4v.


Required Packages

The following R packages are required for this exercise:


Load Raw Data

Load the raw data downloaded from provided DOI link.

#path to data
#note the use of the here() package and not absolute paths
data_location <- here::here("data","flu","SympAct_Any_Pos.Rda")

#load data. 
#because the data is in an .Rda format, we can use the "ReadRDS" function in base R.
#the typical "load" function does not work (data is RDS not RDA)
rawdata <- base::readRDS(data_location)

#take a look at the data
dplyr::glimpse(rawdata)
## Rows: 735
## Columns: 63
## $ DxName1           <fct> "Influenza like illness - Clinical Dx", "Acute tonsi~
## $ DxName2           <fct> NA, "Influenza like illness - Clinical Dx", "Acute p~
## $ DxName3           <fct> NA, NA, NA, NA, NA, NA, NA, NA, "Fever, unspecified"~
## $ DxName4           <fct> NA, NA, NA, NA, NA, NA, NA, NA, "Other fatigue", NA,~
## $ DxName5           <fct> NA, NA, NA, NA, NA, NA, NA, NA, "Headache", NA, NA, ~
## $ Unique.Visit      <chr> "340_17632125", "340_17794836", "342_17737773", "342~
## $ ActivityLevel     <int> 10, 6, 2, 2, 5, 3, 4, 0, 0, 5, 9, 1, 3, 6, 5, 2, 2, ~
## $ ActivityLevelF    <fct> 10, 6, 2, 2, 5, 3, 4, 0, 0, 5, 9, 1, 3, 6, 5, 2, 2, ~
## $ SwollenLymphNodes <fct> Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, No, Yes, Y~
## $ ChestCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y~
## $ ChillsSweats      <fct> No, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, ~
## $ NasalCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y~
## $ CoughYN           <fct> Yes, Yes, No, Yes, No, Yes, Yes, Yes, Yes, Yes, No, ~
## $ Sneeze            <fct> No, No, Yes, Yes, No, Yes, No, Yes, No, No, No, No, ~
## $ Fatigue           <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye~
## $ SubjectiveFever   <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes~
## $ Headache          <fct> Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes~
## $ Weakness          <fct> Mild, Severe, Severe, Severe, Moderate, Moderate, Mi~
## $ WeaknessYN        <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye~
## $ CoughIntensity    <fct> Severe, Severe, Mild, Moderate, None, Moderate, Seve~
## $ CoughYN2          <fct> Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes~
## $ Myalgia           <fct> Mild, Severe, Severe, Severe, Mild, Moderate, Mild, ~
## $ MyalgiaYN         <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye~
## $ RunnyNose         <fct> No, No, Yes, Yes, No, No, Yes, Yes, Yes, Yes, No, No~
## $ AbPain            <fct> No, No, Yes, No, No, No, No, No, No, No, Yes, Yes, N~
## $ ChestPain         <fct> No, No, Yes, No, No, Yes, Yes, No, No, No, No, Yes, ~
## $ Diarrhea          <fct> No, No, No, No, No, Yes, No, No, No, No, No, No, No,~
## $ EyePn             <fct> No, No, No, No, Yes, No, No, No, No, No, Yes, No, Ye~
## $ Insomnia          <fct> No, No, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, Y~
## $ ItchyEye          <fct> No, No, No, No, No, No, No, No, No, No, No, No, Yes,~
## $ Nausea            <fct> No, No, Yes, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Y~
## $ EarPn             <fct> No, Yes, No, Yes, No, No, No, No, No, No, No, Yes, Y~
## $ Hearing           <fct> No, Yes, No, No, No, No, No, No, No, No, No, No, No,~
## $ Pharyngitis       <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, ~
## $ Breathless        <fct> No, No, Yes, No, No, Yes, No, No, No, Yes, No, Yes, ~
## $ ToothPn           <fct> No, No, Yes, No, No, No, No, No, Yes, No, No, Yes, N~
## $ Vision            <fct> No, No, No, No, No, No, No, No, No, No, No, No, No, ~
## $ Vomit             <fct> No, No, No, No, No, No, Yes, No, No, No, Yes, Yes, N~
## $ Wheeze            <fct> No, No, No, Yes, No, Yes, No, No, No, No, No, Yes, N~
## $ BodyTemp          <dbl> 98.3, 100.4, 100.8, 98.8, 100.5, 98.4, 102.5, 98.4, ~
## $ RapidFluA         <fct> Presumptive Negative For Influenza A, NA, Presumptiv~
## $ RapidFluB         <fct> Presumptive Negative For Influenza B, NA, Presumptiv~
## $ PCRFluA           <fct> NA, NA, NA, NA, NA, NA,  Influenza A Not Detected, N~
## $ PCRFluB           <fct> NA, NA, NA, NA, NA, NA,  Influenza B Not Detected, N~
## $ TransScore1       <dbl> 1, 3, 4, 5, 0, 2, 2, 5, 4, 4, 2, 3, 2, 5, 3, 5, 1, 5~
## $ TransScore1F      <fct> 1, 3, 4, 5, 0, 2, 2, 5, 4, 4, 2, 3, 2, 5, 3, 5, 1, 5~
## $ TransScore2       <dbl> 1, 2, 3, 4, 0, 2, 2, 4, 3, 3, 1, 2, 2, 4, 2, 4, 1, 4~
## $ TransScore2F      <fct> 1, 2, 3, 4, 0, 2, 2, 4, 3, 3, 1, 2, 2, 4, 2, 4, 1, 4~
## $ TransScore3       <dbl> 1, 1, 2, 3, 0, 2, 2, 3, 2, 2, 0, 1, 1, 3, 1, 3, 1, 3~
## $ TransScore3F      <fct> 1, 1, 2, 3, 0, 2, 2, 3, 2, 2, 0, 1, 1, 3, 1, 3, 1, 3~
## $ TransScore4       <dbl> 0, 2, 4, 4, 0, 1, 1, 4, 3, 3, 2, 2, 2, 4, 3, 4, 0, 4~
## $ TransScore4F      <fct> 0, 2, 4, 4, 0, 1, 1, 4, 3, 3, 2, 2, 2, 4, 3, 4, 0, 4~
## $ ImpactScore       <int> 7, 8, 14, 12, 11, 12, 8, 7, 10, 7, 13, 17, 11, 13, 9~
## $ ImpactScore2      <int> 6, 7, 13, 11, 10, 11, 7, 6, 9, 6, 12, 16, 10, 12, 8,~
## $ ImpactScore3      <int> 3, 4, 9, 7, 6, 7, 3, 3, 6, 4, 7, 11, 6, 8, 4, 4, 5, ~
## $ ImpactScoreF      <fct> 7, 8, 14, 12, 11, 12, 8, 7, 10, 7, 13, 17, 11, 13, 9~
## $ ImpactScore2F     <fct> 6, 7, 13, 11, 10, 11, 7, 6, 9, 6, 12, 16, 10, 12, 8,~
## $ ImpactScore3F     <fct> 3, 4, 9, 7, 6, 7, 3, 3, 6, 4, 7, 11, 6, 8, 4, 4, 5, ~
## $ ImpactScoreFD     <fct> 7, 8, 14, 12, 11, 12, 8, 7, 10, 7, 13, 17, 11, 13, 9~
## $ TotalSymp1        <dbl> 8, 11, 18, 17, 11, 14, 10, 12, 14, 11, 15, 20, 13, 1~
## $ TotalSymp1F       <fct> 8, 11, 18, 17, 11, 14, 10, 12, 14, 11, 15, 20, 13, 1~
## $ TotalSymp2        <dbl> 8, 10, 17, 16, 11, 14, 10, 11, 13, 10, 14, 19, 13, 1~
## $ TotalSymp3        <dbl> 8, 9, 16, 15, 11, 14, 10, 10, 12, 9, 13, 18, 12, 16,~

Overall Processing

The first step is to conduct some over all processing to create a dataset to be used in most of the analysis:

#this can be accomplished using the select function in dplyr / tidyverse

#while we could pipe this into one operation, separating each line makes de-bugging issues easier

#remove variables containing "Score"
data1 <- rawdata %>% dplyr::select(-contains("Score"))

#remove variables containing "Total"
data2 <- data1 %>% dplyr::select(-contains("Total"))

#remove variables containing "FluA"
data3 <- data2 %>% dplyr::select(-contains("FluA"))

#remove variables containing "FluB"
data4 <- data3 %>% dplyr::select(-contains("FluB"))

#remove variables containing "Dxname"
data5 <- data4 %>% dplyr::select(-contains("Dxname"))

#remove variables containing "Activity"
data6 <- data5 %>% dplyr::select(-contains("Activity"))

#remove variable "Unique.Visit"
data7 <- data6 %>% dplyr::select(-contains("Unique.Visit"))

#check to make sure we have the correct columns remaining
dplyr::glimpse(data7)
## Rows: 735
## Columns: 32
## $ SwollenLymphNodes <fct> Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, No, Yes, Y~
## $ ChestCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y~
## $ ChillsSweats      <fct> No, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, ~
## $ NasalCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y~
## $ CoughYN           <fct> Yes, Yes, No, Yes, No, Yes, Yes, Yes, Yes, Yes, No, ~
## $ Sneeze            <fct> No, No, Yes, Yes, No, Yes, No, Yes, No, No, No, No, ~
## $ Fatigue           <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye~
## $ SubjectiveFever   <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes~
## $ Headache          <fct> Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes~
## $ Weakness          <fct> Mild, Severe, Severe, Severe, Moderate, Moderate, Mi~
## $ WeaknessYN        <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye~
## $ CoughIntensity    <fct> Severe, Severe, Mild, Moderate, None, Moderate, Seve~
## $ CoughYN2          <fct> Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes~
## $ Myalgia           <fct> Mild, Severe, Severe, Severe, Mild, Moderate, Mild, ~
## $ MyalgiaYN         <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye~
## $ RunnyNose         <fct> No, No, Yes, Yes, No, No, Yes, Yes, Yes, Yes, No, No~
## $ AbPain            <fct> No, No, Yes, No, No, No, No, No, No, No, Yes, Yes, N~
## $ ChestPain         <fct> No, No, Yes, No, No, Yes, Yes, No, No, No, No, Yes, ~
## $ Diarrhea          <fct> No, No, No, No, No, Yes, No, No, No, No, No, No, No,~
## $ EyePn             <fct> No, No, No, No, Yes, No, No, No, No, No, Yes, No, Ye~
## $ Insomnia          <fct> No, No, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, Y~
## $ ItchyEye          <fct> No, No, No, No, No, No, No, No, No, No, No, No, Yes,~
## $ Nausea            <fct> No, No, Yes, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Y~
## $ EarPn             <fct> No, Yes, No, Yes, No, No, No, No, No, No, No, Yes, Y~
## $ Hearing           <fct> No, Yes, No, No, No, No, No, No, No, No, No, No, No,~
## $ Pharyngitis       <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, ~
## $ Breathless        <fct> No, No, Yes, No, No, Yes, No, No, No, Yes, No, Yes, ~
## $ ToothPn           <fct> No, No, Yes, No, No, No, No, No, Yes, No, No, Yes, N~
## $ Vision            <fct> No, No, No, No, No, No, No, No, No, No, No, No, No, ~
## $ Vomit             <fct> No, No, No, No, No, No, Yes, No, No, No, Yes, Yes, N~
## $ Wheeze            <fct> No, No, No, Yes, No, Yes, No, No, No, No, No, Yes, N~
## $ BodyTemp          <dbl> 98.3, 100.4, 100.8, 98.8, 100.5, 98.4, 102.5, 98.4, ~
base::summary(data7)
##  SwollenLymphNodes ChestCongestion ChillsSweats NasalCongestion CoughYN  
##  No :421           No :326         No :131      No :170         No : 75  
##  Yes:314           Yes:409         Yes:604      Yes:565         Yes:660  
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##  Sneeze    Fatigue   SubjectiveFever Headache      Weakness   WeaknessYN
##  No :340   No : 64   No :230         No :115   None    : 49   No : 49   
##  Yes:395   Yes:671   Yes:505         Yes:620   Mild    :224   Yes:686   
##                                                Moderate:341             
##                                                Severe  :121             
##                                                                         
##                                                                         
##                                                                         
##   CoughIntensity CoughYN2      Myalgia    MyalgiaYN RunnyNose AbPain   
##  None    : 47    No : 47   None    : 79   No : 79   No :211   No :642  
##  Mild    :156    Yes:688   Mild    :214   Yes:656   Yes:524   Yes: 93  
##  Moderate:360              Moderate:327                                
##  Severe  :172              Severe  :115                                
##                                                                        
##                                                                        
##                                                                        
##  ChestPain Diarrhea  EyePn     Insomnia  ItchyEye  Nausea    EarPn    
##  No :501   No :636   No :622   No :316   No :553   No :477   No :573  
##  Yes:234   Yes: 99   Yes:113   Yes:419   Yes:182   Yes:258   Yes:162  
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##  Hearing   Pharyngitis Breathless ToothPn   Vision    Vomit     Wheeze   
##  No :705   No :121     No :438    No :569   No :716   No :656   No :514  
##  Yes: 30   Yes:614     Yes:297    Yes:166   Yes: 19   Yes: 79   Yes:221  
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##     BodyTemp     
##  Min.   : 97.20  
##  1st Qu.: 98.20  
##  Median : 98.50  
##  Mean   : 98.94  
##  3rd Qu.: 99.30  
##  Max.   :103.10  
##  NA's   :5
#last step is to remove any NA observations
processed_data <- stats::na.omit(data7)

#summary of processed data using skimr package
skimr::skim(processed_data)
Data summary
Name processed_data
Number of rows 730
Number of columns 32
_______________________
Column type frequency:
factor 31
numeric 1
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
SwollenLymphNodes 0 1 FALSE 2 No: 418, Yes: 312
ChestCongestion 0 1 FALSE 2 Yes: 407, No: 323
ChillsSweats 0 1 FALSE 2 Yes: 600, No: 130
NasalCongestion 0 1 FALSE 2 Yes: 563, No: 167
CoughYN 0 1 FALSE 2 Yes: 655, No: 75
Sneeze 0 1 FALSE 2 Yes: 391, No: 339
Fatigue 0 1 FALSE 2 Yes: 666, No: 64
SubjectiveFever 0 1 FALSE 2 Yes: 500, No: 230
Headache 0 1 FALSE 2 Yes: 615, No: 115
Weakness 0 1 FALSE 4 Mod: 338, Mil: 223, Sev: 120, Non: 49
WeaknessYN 0 1 FALSE 2 Yes: 681, No: 49
CoughIntensity 0 1 FALSE 4 Mod: 357, Sev: 172, Mil: 154, Non: 47
CoughYN2 0 1 FALSE 2 Yes: 683, No: 47
Myalgia 0 1 FALSE 4 Mod: 325, Mil: 213, Sev: 113, Non: 79
MyalgiaYN 0 1 FALSE 2 Yes: 651, No: 79
RunnyNose 0 1 FALSE 2 Yes: 519, No: 211
AbPain 0 1 FALSE 2 No: 639, Yes: 91
ChestPain 0 1 FALSE 2 No: 497, Yes: 233
Diarrhea 0 1 FALSE 2 No: 631, Yes: 99
EyePn 0 1 FALSE 2 No: 617, Yes: 113
Insomnia 0 1 FALSE 2 Yes: 415, No: 315
ItchyEye 0 1 FALSE 2 No: 551, Yes: 179
Nausea 0 1 FALSE 2 No: 475, Yes: 255
EarPn 0 1 FALSE 2 No: 568, Yes: 162
Hearing 0 1 FALSE 2 No: 700, Yes: 30
Pharyngitis 0 1 FALSE 2 Yes: 611, No: 119
Breathless 0 1 FALSE 2 No: 436, Yes: 294
ToothPn 0 1 FALSE 2 No: 565, Yes: 165
Vision 0 1 FALSE 2 No: 711, Yes: 19
Vomit 0 1 FALSE 2 No: 652, Yes: 78
Wheeze 0 1 FALSE 2 No: 510, Yes: 220

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
BodyTemp 0 1 98.94 1.2 97.2 98.2 98.5 99.3 103.1 ▇▇▂▁▁

We now have a newly processed dataframe with 730 observations and 32 variables, which is our goal.

Machine Learning Processing

The analysis that applies machine learning models requires data that is further processed. There are two steps involved:


Feature Variable Removal

In the output above, there are three variables that have both a severity score and a yes/no feature: weakness, cough, and myalgia. There are actually two variables for cough yes/no. These variables are strongly correlated and therefore affect model performance. Solution: remove all yes/no versions of variables for which a severity score exists.

#variable names to remove: WeaknessYN, MyalgiaYN, CoughYN, CoughYN2
featadj_data <- dplyr::select(processed_data, -c(WeaknessYN, MyalgiaYN, CoughYN, CoughYN2))

These severity scores are also ordered, so we need to specify the order: None < Mild < Moderate < Severe.

#myalgia
featadj_data$Myalgia <- ordered(featadj_data$Myalgia, labels = c("None", "Mild", "Moderate", "Severe"))

#weakness
featadj_data$Weakness <- ordered(featadj_data$Weakness, labels = c("None", "Mild", "Moderate", "Severe"))

#cough
featadj_data$CoughIntensity <- ordered(featadj_data$CoughIntensity, labels = c("None", "Mild", "Moderate", "Severe"))

#double check to confirm code worked
skimr::skim(featadj_data)
Data summary
Name featadj_data
Number of rows 730
Number of columns 28
_______________________
Column type frequency:
factor 27
numeric 1
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
SwollenLymphNodes 0 1 FALSE 2 No: 418, Yes: 312
ChestCongestion 0 1 FALSE 2 Yes: 407, No: 323
ChillsSweats 0 1 FALSE 2 Yes: 600, No: 130
NasalCongestion 0 1 FALSE 2 Yes: 563, No: 167
Sneeze 0 1 FALSE 2 Yes: 391, No: 339
Fatigue 0 1 FALSE 2 Yes: 666, No: 64
SubjectiveFever 0 1 FALSE 2 Yes: 500, No: 230
Headache 0 1 FALSE 2 Yes: 615, No: 115
Weakness 0 1 TRUE 4 Mod: 338, Mil: 223, Sev: 120, Non: 49
CoughIntensity 0 1 TRUE 4 Mod: 357, Sev: 172, Mil: 154, Non: 47
Myalgia 0 1 TRUE 4 Mod: 325, Mil: 213, Sev: 113, Non: 79
RunnyNose 0 1 FALSE 2 Yes: 519, No: 211
AbPain 0 1 FALSE 2 No: 639, Yes: 91
ChestPain 0 1 FALSE 2 No: 497, Yes: 233
Diarrhea 0 1 FALSE 2 No: 631, Yes: 99
EyePn 0 1 FALSE 2 No: 617, Yes: 113
Insomnia 0 1 FALSE 2 Yes: 415, No: 315
ItchyEye 0 1 FALSE 2 No: 551, Yes: 179
Nausea 0 1 FALSE 2 No: 475, Yes: 255
EarPn 0 1 FALSE 2 No: 568, Yes: 162
Hearing 0 1 FALSE 2 No: 700, Yes: 30
Pharyngitis 0 1 FALSE 2 Yes: 611, No: 119
Breathless 0 1 FALSE 2 No: 436, Yes: 294
ToothPn 0 1 FALSE 2 No: 565, Yes: 165
Vision 0 1 FALSE 2 No: 711, Yes: 19
Vomit 0 1 FALSE 2 No: 652, Yes: 78
Wheeze 0 1 FALSE 2 No: 510, Yes: 220

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
BodyTemp 0 1 98.94 1.2 97.2 98.2 98.5 99.3 103.1 ▇▇▂▁▁


Low (“near-zero”) variance predictors

The skimr output shows there are some predictors that are fairly unbalanced with most patients reporting no and only a few yes. This can be handled automatically in tidymodels with step_nzv(), but it can be better to do it manually to ensure scientific relevance. Here, we will remove binary predictors that have <50 entries in one category. According to the skimr::skim output, there are two: Hearing and Vision.

#drop Hearing and Vision from the dataset to create processed dataset for ML analysis
ML_processed <- dplyr::select(featadj_data, -c(Hearing, Vision))

#summary of data using skimr package
skimr::skim(ML_processed)
Data summary
Name ML_processed
Number of rows 730
Number of columns 26
_______________________
Column type frequency:
factor 25
numeric 1
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
SwollenLymphNodes 0 1 FALSE 2 No: 418, Yes: 312
ChestCongestion 0 1 FALSE 2 Yes: 407, No: 323
ChillsSweats 0 1 FALSE 2 Yes: 600, No: 130
NasalCongestion 0 1 FALSE 2 Yes: 563, No: 167
Sneeze 0 1 FALSE 2 Yes: 391, No: 339
Fatigue 0 1 FALSE 2 Yes: 666, No: 64
SubjectiveFever 0 1 FALSE 2 Yes: 500, No: 230
Headache 0 1 FALSE 2 Yes: 615, No: 115
Weakness 0 1 TRUE 4 Mod: 338, Mil: 223, Sev: 120, Non: 49
CoughIntensity 0 1 TRUE 4 Mod: 357, Sev: 172, Mil: 154, Non: 47
Myalgia 0 1 TRUE 4 Mod: 325, Mil: 213, Sev: 113, Non: 79
RunnyNose 0 1 FALSE 2 Yes: 519, No: 211
AbPain 0 1 FALSE 2 No: 639, Yes: 91
ChestPain 0 1 FALSE 2 No: 497, Yes: 233
Diarrhea 0 1 FALSE 2 No: 631, Yes: 99
EyePn 0 1 FALSE 2 No: 617, Yes: 113
Insomnia 0 1 FALSE 2 Yes: 415, No: 315
ItchyEye 0 1 FALSE 2 No: 551, Yes: 179
Nausea 0 1 FALSE 2 No: 475, Yes: 255
EarPn 0 1 FALSE 2 No: 568, Yes: 162
Pharyngitis 0 1 FALSE 2 Yes: 611, No: 119
Breathless 0 1 FALSE 2 No: 436, Yes: 294
ToothPn 0 1 FALSE 2 No: 565, Yes: 165
Vomit 0 1 FALSE 2 No: 652, Yes: 78
Wheeze 0 1 FALSE 2 No: 510, Yes: 220

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
BodyTemp 0 1 98.94 1.2 97.2 98.2 98.5 99.3 103.1 ▇▇▂▁▁

We now have a newly processed dataframe with 730 observations and 26 variables to be used for the machine learning analysis.


Save Processed Data

Save the processed data to be referenced in subsequent analyses.

#for the overall processed data:
# location to save file
save_data_location <- here::here("data","flu","processeddata.rds")

# save data as RDS
saveRDS(processed_data, file = save_data_location)

#for the machine learning processed data:
# location to save file
save_data_location2 <- here::here("data","flu","ML_data.rds")

# save data as RDS
saveRDS(ML_processed, file = save_data_location2)