13.07.2015 Views

data consistency, completeness and cleaning - The INCLEN Trust

data consistency, completeness and cleaning - The INCLEN Trust

data consistency, completeness and cleaning - The INCLEN Trust

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Checking for Invalid Numeric Values• <strong>The</strong> techniques for checking invalid numeric <strong>data</strong> are quite differentfrom the techniques used with character <strong>data</strong>– Examine minimum <strong>and</strong> maximum values for each numeric variable– Internal <strong>consistency</strong> methods; if we see that most of the <strong>data</strong> values fallwithin a certain range of values, then any values that fall far enough outsidethe range may be <strong>data</strong> errors– Run a univariate analysis, focusing especially on• Number of non-missing observations, number of observation not equal to zero <strong>and</strong>the number of observation greater than zero are of most interest at this stage• Extremes shows the five lowest <strong>and</strong> five highest values for numeric variables• Quantiles• Mean• St<strong>and</strong>ard deviation to decide on constitute reasonable cutoffs for low <strong>and</strong> high <strong>data</strong>value• Range• Graphic displays: a stem-<strong>and</strong> leaf plot, a box plot <strong>and</strong> a normal probability plot• Check the medical records for the extreme values <strong>and</strong> write a note to the<strong>data</strong> center about the findings to help in further <strong>cleaning</strong> of these <strong>data</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!