Unspecified counting practices used in a data collection may create rounding to certain ‘based’ number that can have serious consequences on data quality. Statistical methods for analysing missing data are commonly used to deal with the issue but it could actually aggravate the problem. Rounded data are not missing data, instead some observations were just systematically lumped to certain based numbers reflecting the rounding process or counting behaviour. A new method to analyse rounded data would therefore be academically valuable. The neural network model developed in this study fills the gap and serves the purpose by complementing and enhancing the conventional statistical methods. The model detects, analyses, and quantifies the existence of periodic structures in a data set because of rounding. The robustness of the model is examined using simulated data sets containing specific rounding numbers of different levels. The model is also subjected to theoretical and numerical tests to confirm its validity before being used on real applications. Overall, the model performs very well making it suitable for many applications. The assessment results show the importance of using the right best fit in rounding detection. The detection power and cut-off point estimation also depend on data distribution and rounding based numbers. Detecting rounding of prime numbers is easier than non-prime numbers due to the unique characteristics of the former. The bigger the number, the easier is the detection. This is in a complete contrast with non-prime numbers, where the bigger the number, the more will be the “factor” numbers distracting rounding detection. Using uniform best fit on uniform data produces the best result and lowest cut-off point. The consequence of using a wrong best fit on uniform data is however also the worst. The model performs best on data containing 10-40% rounding levels as less or more rounding levels produce unclear rounding pattern or distort the rounding detection, respectively. The modulo-test method also suffers the same problem. Real data applications on religious census data confirms the modulo-test finding that the data contains rounding base 5, while applications on cigarettes smoked and alcohol consumed data show good detection results. The cigarettes data seem to contain rounding base 5, while alcohol consumption data indicate no rounding patterns that may be attributed to the ways the two data were collected. The modelling applications can be extended to other areas in which rounding is common and can have significant consequences. The modelling development can he refined to include data-smoothing process and to make it user friendly as an online modelling tool. This will maximize the model’s potential use.
|Date of Award||2007|
- University of Northampton
|Supervisor||Philip Picton (Supervisor)|