Introduction to R and Rstudio

Session - Import a messy csv

Zoë Turner

Importing a csv that needs tidying

Exploring Mental Health (MH) Inpatient Capacity

The following is some analysis of Mental Health inpatient capacity in England.

Background

Maintaining clinical effectiveness and safety when a ward is fully occupied is a serious challenge for staff.

Inappropriate out of area placements have an added cost and also mean patients are separated from their social support networks.

The Data

KH03 returns (bed numbers and occupancy) by organisation, published by NHS England.

Scraped from the NHSE statistics website which is partially cleaned

Accessible spreadsheets

This is a perfect opportunity to say that are Government data standards to make spreadsheets and charts accessible.

Importing a messy dataset

Click on the file beds_data.csv in the Files pane (bottom right) and then Import Dataset...

Screenshot of the import wizard with the beds_ae.csv data populating it. Notice that something isn't right.

Cleaning in the wizard

It’s a common issue to have blank metadata and blank lines at the top of spreadsheets

Screenshot of the beds_data.csv data in the import wizard with the Title and Source rows highlighted.

Skip rows

Screenshot of where to skip rows in the import wizard. In this case skip 3 rows so the column headers appear at the top of the preview.

Move the cursor to another area to update the view immediately.

Dates may be a problem

Screenshot again with one more thing to fix: Click on the drop down menu by the column called date.

Default US dates on UK data

Screenshot of a wizard pop up to change the date format to %d/%m/%Y.

Final data

# A tibble: 4,558 × 5
   date       org_code org_name                                   beds_av occ_av
   <date>     <chr>    <chr>                                        <dbl>  <dbl>
 1 2013-09-01 R1A      Worcestershire Health And Care                 129    117
 2 2013-09-01 R1C      Solent                                         105     82
 3 2013-09-01 R1E      Staffordshire And Stoke On Trent Partners…      NA     NA
 4 2013-09-01 R1F      Isle Of Wight                                   54     42
 5 2013-09-01 R1H      Barts Health                                    NA     NA
 6 2013-09-01 R1J      Gloucestershire Care Services                   NA     NA
 7 2013-09-01 RA2      Royal Surrey County Hospital                    NA     NA
 8 2013-09-01 RA3      Weston Area Health                              NA     NA
 9 2013-09-01 RA4      Yeovil District Hospital                        NA     NA
10 2013-09-01 RA7      University Hospitals Bristol                    NA     NA
# ℹ 4,548 more rows

End session