Introduction to R and Rstudio

Session - Showing more {dplyr} functions

Zoë Turner

More {dplyr}

The following are useful functions and some examples of their capabilities for manipulating data.

select()

Selecting can be by column name

beds_data |> 
  select(org_code, 
         org_name)
# A tibble: 4,558 × 2
   org_code org_name                                    
   <chr>    <chr>                                       
 1 R1A      Worcestershire Health And Care              
 2 R1C      Solent                                      
 3 R1E      Staffordshire And Stoke On Trent Partnership
 4 R1F      Isle Of Wight                               
 5 R1H      Barts Health                                
 6 R1J      Gloucestershire Care Services               
 7 RA2      Royal Surrey County Hospital                
 8 RA3      Weston Area Health                          
 9 RA4      Yeovil District Hospital                    
10 RA7      University Hospitals Bristol                
# ℹ 4,548 more rows

Or position (including a range from:to)

beds_data |> 
  select(3:5)
# A tibble: 4,558 × 3
   org_name                                     beds_av occ_av
   <chr>                                          <dbl>  <dbl>
 1 Worcestershire Health And Care                   129    117
 2 Solent                                           105     82
 3 Staffordshire And Stoke On Trent Partnership      NA     NA
 4 Isle Of Wight                                     54     42
 5 Barts Health                                      NA     NA
 6 Gloucestershire Care Services                     NA     NA
 7 Royal Surrey County Hospital                      NA     NA
 8 Weston Area Health                                NA     NA
 9 Yeovil District Hospital                          NA     NA
10 University Hospitals Bristol                      NA     NA
# ℹ 4,548 more rows

Deselecting

beds_data |> 
  select(-org_code)
# A tibble: 4,558 × 4
   date       org_name                                     beds_av occ_av
   <date>     <chr>                                          <dbl>  <dbl>
 1 2013-09-01 Worcestershire Health And Care                   129    117
 2 2013-09-01 Solent                                           105     82
 3 2013-09-01 Staffordshire And Stoke On Trent Partnership      NA     NA
 4 2013-09-01 Isle Of Wight                                     54     42
 5 2013-09-01 Barts Health                                      NA     NA
 6 2013-09-01 Gloucestershire Care Services                     NA     NA
 7 2013-09-01 Royal Surrey County Hospital                      NA     NA
 8 2013-09-01 Weston Area Health                                NA     NA
 9 2013-09-01 Yeovil District Hospital                          NA     NA
10 2013-09-01 University Hospitals Bristol                      NA     NA
# ℹ 4,548 more rows

Select everything()

Re-position a column and then refer to everything else

beds_data |> 
  select(org_name,
         everything())
# A tibble: 4,558 × 5
   org_name                                   date       org_code beds_av occ_av
   <chr>                                      <date>     <chr>      <dbl>  <dbl>
 1 Worcestershire Health And Care             2013-09-01 R1A          129    117
 2 Solent                                     2013-09-01 R1C          105     82
 3 Staffordshire And Stoke On Trent Partners… 2013-09-01 R1E           NA     NA
 4 Isle Of Wight                              2013-09-01 R1F           54     42
 5 Barts Health                               2013-09-01 R1H           NA     NA
 6 Gloucestershire Care Services              2013-09-01 R1J           NA     NA
 7 Royal Surrey County Hospital               2013-09-01 RA2           NA     NA
 8 Weston Area Health                         2013-09-01 RA3           NA     NA
 9 Yeovil District Hospital                   2013-09-01 RA4           NA     NA
10 University Hospitals Bristol               2013-09-01 RA7           NA     NA
# ℹ 4,548 more rows

Select starts_with()

Select columns which start with the same text

beds_data |> 
  select(starts_with("org"))
# A tibble: 4,558 × 2
   org_code org_name                                    
   <chr>    <chr>                                       
 1 R1A      Worcestershire Health And Care              
 2 R1C      Solent                                      
 3 R1E      Staffordshire And Stoke On Trent Partnership
 4 R1F      Isle Of Wight                               
 5 R1H      Barts Health                                
 6 R1J      Gloucestershire Care Services               
 7 RA2      Royal Surrey County Hospital                
 8 RA3      Weston Area Health                          
 9 RA4      Yeovil District Hospital                    
10 RA7      University Hospitals Bristol                
# ℹ 4,548 more rows

Also ends_with()

contains()

Searches for strings in the column names without the use of %wildcards%

beds_data |> 
  select(contains("s_a"))
# A tibble: 4,558 × 1
   beds_av
     <dbl>
 1     129
 2     105
 3      NA
 4      54
 5      NA
 6      NA
 7      NA
 8      NA
 9      NA
10      NA
# ℹ 4,548 more rows

Using n() and n_distinct()

beds_data |> 
  summarise(number = n(), # distinct number of org_name
            distinct_number = n_distinct(org_name),
            .by = org_code) |> 
  filter(distinct_number > 1) |> 
  arrange(desc(distinct_number))
# A tibble: 22 × 3
   org_code number distinct_number
   <chr>     <int>           <int>
 1 RDE          21               3
 2 RTG          21               3
 3 RA9          21               2
 4 RBN          21               2
 5 RD1          21               2
 6 RD8          21               2
 7 RDU          21               2
 8 RGM          21               2
 9 RGN          21               2
10 RJ7          21               2
# ℹ 12 more rows

End session