3 Data Entry Accuracy - EMGO

3 Data Entry Accuracy - EMGO 3 Data Entry Accuracy - EMGO

11.07.2015 Views

Title of the document:Data Entry AccuracyPage. 1 of 2Rev. Nr.: Effective date:1.2 31 Oct 2013HB Nr. : 1.3-31. AimDetermining the accuracy of the data entry.2. Definitions3. KeywordsDouble data entry, entry error4. DescriptionPrior to starting the data cleaning process the researcher needs to decide whether it is necessaryto carry out an extensive (full double data entry) or less extensive (sampling) evaluation for dataentry errors (typing errors) and interpretation errors by the data entry clerk. The necessity of full orpartial double data entry is determined by issues such as:- Irregularities observed during data collection;- The complexity of the questionnaires entered (large risk of interpretation errors);- The required reliability of the data (double input is usually standard practice for drugresearch);- Doubt about the reliability and accuracy of the data entry clerk(s);- Whether controls have been built in to the data entry programme to detect inconsistenciesand out-of-range values;Double data entry of all data is the ideal situation and the most reliable. However, this is usuallynot a feasible option, given the time and staffing required. One should then set off with monitoringthe data in samples, starting with the most complex questionnaires. If the decision is made tofocus on part of the database, then a note should be made in the logbook of the number ofrecords re-entered and the percentage of inconsequential errors found.To evaluate the entry errors by sampling, the project leader should draw a small sample from therespondents (approx. 5%), and have the questionnaires or registration forms re-entered into anempty database. This second input should be carried out in principle by someone other than theindividual who completed the first data entry. For instance, if the first round of data entry is doneby a project assistant the second can be carried out by the researcher. The reliability of theentered data can be assessed by comparing the first and second round of data entry (using aspecial software programme, see details).If the amount of errors discovered is greater than 3% per questionnaire (in comparison to the totalnumber of variables inputted), then the questionnaire needs to be double-entered in its entirety!Subsequently, the first and second input should be compared in the same way. If the second inputis carried out by the same person, then the permissible margin of error is smaller than 3%, i.e.1.5%.This procedure applies both to manual input in for instance Blaise, as well as to scannedquestionnaires. For scanned questionnaires the forms from the sample are re-scanned into aseparate file.The project leader will ensure this procedure is carried out (on time).

Title of the document:<strong>Data</strong> <strong>Entry</strong> <strong>Accuracy</strong>Page. 1 of 2Rev. Nr.: Effective date:1.2 31 Oct 2013HB Nr. : 1.3-31. AimDetermining the accuracy of the data entry.2. Definitions3. KeywordsDouble data entry, entry error4. DescriptionPrior to starting the data cleaning process the researcher needs to decide whether it is necessaryto carry out an extensive (full double data entry) or less extensive (sampling) evaluation for dataentry errors (typing errors) and interpretation errors by the data entry clerk. The necessity of full orpartial double data entry is determined by issues such as:- Irregularities observed during data collection;- The complexity of the questionnaires entered (large risk of interpretation errors);- The required reliability of the data (double input is usually standard practice for drugresearch);- Doubt about the reliability and accuracy of the data entry clerk(s);- Whether controls have been built in to the data entry programme to detect inconsistenciesand out-of-range values;Double data entry of all data is the ideal situation and the most reliable. However, this is usuallynot a feasible option, given the time and staffing required. One should then set off with monitoringthe data in samples, starting with the most complex questionnaires. If the decision is made tofocus on part of the database, then a note should be made in the logbook of the number ofrecords re-entered and the percentage of inconsequential errors found.To evaluate the entry errors by sampling, the project leader should draw a small sample from therespondents (approx. 5%), and have the questionnaires or registration forms re-entered into anempty database. This second input should be carried out in principle by someone other than theindividual who completed the first data entry. For instance, if the first round of data entry is doneby a project assistant the second can be carried out by the researcher. The reliability of theentered data can be assessed by comparing the first and second round of data entry (using aspecial software programme, see details).If the amount of errors discovered is greater than 3% per questionnaire (in comparison to the totalnumber of variables inputted), then the questionnaire needs to be double-entered in its entirety!Subsequently, the first and second input should be compared in the same way. If the second inputis carried out by the same person, then the permissible margin of error is smaller than 3%, i.e.1.5%.This procedure applies both to manual input in for instance Blaise, as well as to scannedquestionnaires. For scanned questionnaires the forms from the sample are re-scanned into aseparate file.The project leader will ensure this procedure is carried out (on time).


Title of the document:<strong>Data</strong> <strong>Entry</strong> <strong>Accuracy</strong>Page. 2 of 2Rev. Nr.: Effective date:1.2 31 Oct 2013HB Nr. : 1.3-35. DetailsThe second input will always be undertaken with a(n) (empty) copy of the original input system.This could, for instance, be the input screens in Blaise. However, this could also involve opticallyreadable forms where only a sample is re-scanned.The original file can be linked to the file with the double entry using the DIFF programme. DIFF isdeveloped by the data management department. (with the potential assistance of a member of theD&S department). In this process, both the original data as well as the data entered twice need tobe converted from the original input programme into an SPSS file. Any differences at the level ofrespondents are written to a report file.Audit questions:− Has the data entry been assessed? If not, why not? If so: How this the assessment beencarried out, and what was the result?− For poor results (more than 3% errors): Was the entire questionnaire re-entered, or wereother actions taken? If so, which?6. Appendices/references/linksThe Introductiebijeenkomst <strong>Data</strong>management deel 2 explains the “Monitoring of data input”extensively.For comparison of (double) entered files, the DIFF program is available for <strong>EMGO</strong>+ researchersat the <strong>Data</strong> Management department ( datamanagement.emgo@vumc.nl)7. AmendmentsV1.2: 31 Oct 2013: availability of a new comparison programme.V1.1: 1 Jan 2010: English translation.V1.0: 31 Mar 2004.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!