AI-Powered Agricultural Yield Forecasting System
Machine Learning Project Report - Phase 2
By Pushkarjay Ajay
An end-to-end machine learning system that predicts crop yield based on environmental conditions, soil properties, and farm management practices. The system integrates multiple datasets, performs intelligent feature engineering, and serves predictions via a REST API.
Distribution of crop yields across the unified dataset shows a right-skewed pattern, with most yields concentrated between 200-800 kg/hectare.
Correlation analysis reveals key relationships between features and yield. Fertilizer amount and irrigation schedule show strongest positive correlations.
Comparison of average yields across different crop types reveals significant variations. Sugarcane shows highest yields while pulses show lowest.
Regional analysis shows yield variations across Indian states. Punjab and Haryana show consistently higher yields due to better irrigation infrastructure.
Scatter plots reveal the relationship between key environmental factors and crop yield. Optimal ranges exist for each parameter.
Gradient Boosting model reveals the relative importance of each feature in predicting yield. Fertilizer amount and rainfall are the most influential predictors.
Visualization of actual vs predicted yields shows the model's accuracy across different yield ranges. The tight clustering around the diagonal indicates strong predictive performance.
Before generating the synthetic dataset, the original data was analyzed for outliers. Issues like temperature values in Fahrenheit (>50°C) were identified and addressed.
A synthetic dataset of 75,000 records was generated to ensure complete data coverage with realistic correlations between agricultural features.
Four regression algorithms were trained and evaluated using 5-fold cross-validation. Gradient Boosting achieved the best overall performance and was selected for production.
🎯 Why Gradient Boosting? Best CV R² (0.9603), faster inference, smaller model, and better outlier handling.
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
API information |
GET |
/health |
Health check |
GET |
/features |
List required features |
POST |
/predict |
Single prediction |
POST |
/predict-batch |
Batch predictions |
Ready for production deployment and further enhancements