Comparación de métodos de detección de datos anómalos multivariantes mediante un estudio de simulación | Comparison of multivariate methods for outliers detection by simulation
Resumen
Los valores anómalos son un problema omnipresente en la recolección de datos, son observaciones que se desvían en alguna dirección respecto al comportamiento general del resto del conjunto de datos y pueden afectar los resultados de aplicar métodos estadísticos univariantes o multivariantes. Es fundamental la detección de estos valores, ya sea para eliminarlos o para atenuar sus efectos en el análisis. Se han desarrollado varios métodos para la detección de valores anómalos, entre ellos están la Distancia Robusta de Mahalanobis (DRM) de Rousseeuw y Van Zomeren (1990), la Curtosis-1 de Peña y Prieto (2001) y el método FGR de Filzmoser, Garrett y Reimann (2005). En este artículo se compararon estos tres métodos, en cinco escenarios de correlación considerando variables explicativas con varios porcentajes de anómalos, mediante análisis comparativo de aplicar estos métodos en datos simulados. Los resultados evidencian que la Curtosis-1 es más eficiente que la DRM y el método FGR para la detección de valores anómalos multivariantes, independientemente de la proporción de éstos y la presencia de correlación entre las variables consideradas en el estudio.
Palabras clave: Valores anómalos multivariantes, detección, comparación, simulación.
ABSTRACT
Outliers constitute a constant problem in data collection, they are observations that deviate from the general pattern of the rest of the data and thus can affect the results that derive from the application of univariate and multivariate statistical methods. It is essential to detect these observations, either to eliminate them or to mitigate their effect on the analysis. Several outlier detection methods have been developed, including the Robust Mahalanobis Distance (DRB) by Rousseeuw and Van Zomeren (1990), the Kurtosis-1 by Peña and Prieto (2001) and the FGR method by Filzmoser, Garrett y Reimann (2005). These three methods were compared in this article, in five correlation scenarios considering explanatory variables with several percentages of outliers, by using comparative analysis of these methods in simulated data. Results show that the kurtosis-1 method is more efficient than DRM and FGR for the detection of multivariate outliers, regardless the proportion of outliers and the presence of correlation among variables in the research study.
Key words: Multivariate outliers, detection, comparison, simulation.
Palabras clave: Valores anómalos multivariantes, detección, comparación, simulación.
ABSTRACT
Outliers constitute a constant problem in data collection, they are observations that deviate from the general pattern of the rest of the data and thus can affect the results that derive from the application of univariate and multivariate statistical methods. It is essential to detect these observations, either to eliminate them or to mitigate their effect on the analysis. Several outlier detection methods have been developed, including the Robust Mahalanobis Distance (DRB) by Rousseeuw and Van Zomeren (1990), the Kurtosis-1 by Peña and Prieto (2001) and the FGR method by Filzmoser, Garrett y Reimann (2005). These three methods were compared in this article, in five correlation scenarios considering explanatory variables with several percentages of outliers, by using comparative analysis of these methods in simulated data. Results show that the kurtosis-1 method is more efficient than DRM and FGR for the detection of multivariate outliers, regardless the proportion of outliers and the presence of correlation among variables in the research study.
Key words: Multivariate outliers, detection, comparison, simulation.
Texto completo:
PDFEnlaces refback
- No hay ningún enlace refback.