A Comparative Evaluation of AI Imputation Techniques for Enhancing Data Quality in Big Data

Authors

  • Bilal Shah

Abstract

This paper explores a range of AI-driven imputation methods, including machine learning algorithms such as k-nearest neighbors (KNN), decision trees, random forests, and deep learning-based techniques like autoencoders and generative adversarial networks (GANs). The study also evaluates hybrid approaches combining multiple imputation techniques to optimize results. Key criteria for comparison include the accuracy of imputed values, computational efficiency, scalability, and robustness against different types of missing data patterns (e.g., missing at random, missing completely at random). Challenges in applying these methods to big data, such as handling high-dimensionality, large-scale datasets, and ensuring minimal data distortion, are also discussed. Through experimental analysis and real-world case studies, the paper demonstrates how AI-based imputation techniques outperform traditional methods (e.g., mean imputation, forward fill) in terms of maintaining data integrity and enhancing predictive model performance. The study concludes by highlighting best practices for selecting and implementing AI imputation strategies, ensuring that big data can be utilized effectively for accurate and actionable insights.

Downloads

Published

2023-09-06