AI-Driven Data Preparation: Optimizing Machine Learning Pipelines through Automated Data Preprocessing Techniques
Abstract
Data preparation is a critical stage in machine learning (ML) pipelines, accounting for the majority of time spent on model development. Despite its importance, the process is often labor-intensive and prone to human error. In this paper, we explore the use of artificial intelligence (AI) to automate data preparation, focusing on tasks such as data cleaning, transformation, feature engineering, and handling missing values. We demonstrate how AI-driven approaches optimize data quality, improve model performance, and reduce time-to-market for machine learning applications. The paper also discusses challenges, solutions, and future trends in AI-powered data preparation.