Multimodal Quality Prediction for Die Castings
Built a multimodal ML system at Xiaomi to predict mechanical properties of automotive components, reducing MAE by 53% and deployed to production.
Motivation
As Xiaomi’s automotive production scaled up with its self-developed M1 aluminum alloy, efficient methods for predicting the mechanical properties of ultra-large integrated die-cast components became essential. Traditional quality assurance relies on destructive tensile testing — cutting out samples from finished parts — which is costly, slow, and wastes material. During my internship at Xiaomi AI Lab, I developed an end-to-end multimodal ML system that predicts three critical mechanical properties (Yield Strength, Ultimate Tensile Strength, and Elongation) from production process data alone, enabling real-time quality monitoring without destructive testing.
Multimodal Fusion Architecture
The system fuses three data modalities through a three-stage architecture:
Stage 1 — Feature Extraction (each encoder pre-trained on ~30,000 unlabeled samples):
- Thermal images: Deep CNN encoder extracting surface temperature distribution patterns from high-resolution thermal images of movable and fixed dies, captured before and after mold release agent spraying.
- Die-casting process parameters: MLP encoder processing 28 tabular features including mold temperatures, injection speeds (max/min), metal liquid temperatures, vacuum parameters, and water/oil cooling temperatures.
- Thermocouple time series: LSTM encoder capturing temporal correlations in 16-channel mold temperature evolution during the cooling phase.
Stage 2 — Feature Aggregation: Horizontal concatenation of encoded features from all three modalities, processed through an attention-based aggregation layer to balance contributions.
Stage 3 — Prediction: CatBoost gradient boosting tree as the predictive backbone, selected after comparative evaluation against SVM (MSE: 42.7), MLP (MSE: 47.6), RandomForest (MSE: 36.8), with CatBoost achieving the lowest MSE of 34.3.
Semi-Supervised Learning with Distribution-Based Filtering
With only 148 labeled samples (from destructive testing) but 32,865 unlabeled production samples, a purely supervised approach would severely underfit. The semi-supervised strategy iteratively expands the training set:
- Train initial models on 148 labeled samples
- Predict labels for all 32,865 unlabeled samples
- Filter pseudo-labels: only retain predictions within mean ± 3 standard deviations of the true label distribution
- Add accepted pseudo-labeled samples to training set
- Retrain and repeat until convergence
This distribution-aware filtering prevents model drift from noisy pseudo-labels while effectively leveraging the vast unlabeled data pool.
Joint Training Mechanism
Exploiting materials science knowledge about aluminum alloy strength-toughness relationships, models are trained hierarchically:
- YS model trained first on fused features
- UTS model trained with YS predictions as additional input features (positive YS-UTS correlation)
- EL model trained with both YS and UTS predictions (negative strength-elongation correlation)
Inference follows the same sequence, ensuring physically consistent predictions across the three properties.
SHAP Interpretability Analysis
SHAP value analysis revealed physically meaningful feature importance:
- UTS prediction: Water cooling temperature (TempW0-1) is the most critical parameter — higher water temperature reduces UTS, consistent with metallurgical expectations
- YS prediction: Oil temperature (TempO5-1) dominates, directly correlating with yield strength development
- EL prediction: Injection speed parameters (average low/high speed, injection time) are primary drivers, as filling dynamics directly affect porosity and ductility
Results
Test set performance (factory real-world data):
| Metric | YS (MPa) | UTS (MPa) | EL (%) |
|---|---|---|---|
| RMSE | 23.08 | 17.72 | 2.35 |
| MAPE | 4.7% | 2.9% | 7.7% |
- Overall R² = 0.907 on the test dataset
- Residual analysis: 82% of YS residuals within ±3 MPa; 69% of UTS within ±10 MPa; 84% of EL within ±2%
- Ablation study confirmed each modality’s contribution: removing thermal image features increased UTS MAE by 2.3, tabular features by 1.1, and time series by 1.6
Production deployment: The system was integrated into the factory’s Manufacturing Execution System (MES) for real-time quality monitoring with incremental retraining capability as new products are introduced.