Multimodal Quality Prediction for Die Castings

Built a multimodal ML system at Xiaomi to predict mechanical properties of automotive components, reducing MAE by 53% and deployed to production.

Motivation

As Xiaomi’s automotive production scaled up with its self-developed M1 aluminum alloy, efficient methods for predicting the mechanical properties of ultra-large integrated die-cast components became essential. Traditional quality assurance relies on destructive tensile testing — cutting out samples from finished parts — which is costly, slow, and wastes material. During my internship at Xiaomi AI Lab, I developed an end-to-end multimodal ML system that predicts three critical mechanical properties (Yield Strength, Ultimate Tensile Strength, and Elongation) from production process data alone, enabling real-time quality monitoring without destructive testing.

Left: The die-casting manufacturing process with 100+ sensors and 4 thermal imaging cameras. Right: The three-stage multimodal fusion architecture combining CNN, MLP, and LSTM encoders with CatBoost prediction.

Multimodal Fusion Architecture

The system fuses three data modalities through a three-stage architecture:

Stage 1 — Feature Extraction (each encoder pre-trained on ~30,000 unlabeled samples):

  1. Thermal images: Deep CNN encoder extracting surface temperature distribution patterns from high-resolution thermal images of movable and fixed dies, captured before and after mold release agent spraying.
  2. Die-casting process parameters: MLP encoder processing 28 tabular features including mold temperatures, injection speeds (max/min), metal liquid temperatures, vacuum parameters, and water/oil cooling temperatures.
  3. Thermocouple time series: LSTM encoder capturing temporal correlations in 16-channel mold temperature evolution during the cooling phase.

Stage 2 — Feature Aggregation: Horizontal concatenation of encoded features from all three modalities, processed through an attention-based aggregation layer to balance contributions.

Stage 3 — Prediction: CatBoost gradient boosting tree as the predictive backbone, selected after comparative evaluation against SVM (MSE: 42.7), MLP (MSE: 47.6), RandomForest (MSE: 36.8), with CatBoost achieving the lowest MSE of 34.3.

Left: Denoising Auto-Encoder architecture for thermocouple time series feature extraction and dimensionality reduction. Right: Joint training exploits materials-science correlations --- UTS and YS are positively correlated, while both are negatively correlated with Elongation.

Semi-Supervised Learning with Distribution-Based Filtering

With only 148 labeled samples (from destructive testing) but 32,865 unlabeled production samples, a purely supervised approach would severely underfit. The semi-supervised strategy iteratively expands the training set:

  1. Train initial models on 148 labeled samples
  2. Predict labels for all 32,865 unlabeled samples
  3. Filter pseudo-labels: only retain predictions within mean ± 3 standard deviations of the true label distribution
  4. Add accepted pseudo-labeled samples to training set
  5. Retrain and repeat until convergence

This distribution-aware filtering prevents model drift from noisy pseudo-labels while effectively leveraging the vast unlabeled data pool.

Joint Training Mechanism

Exploiting materials science knowledge about aluminum alloy strength-toughness relationships, models are trained hierarchically:

  1. YS model trained first on fused features
  2. UTS model trained with YS predictions as additional input features (positive YS-UTS correlation)
  3. EL model trained with both YS and UTS predictions (negative strength-elongation correlation)

Inference follows the same sequence, ensuring physically consistent predictions across the three properties.

SHAP Interpretability Analysis

SHAP value analysis revealed physically meaningful feature importance:

  • UTS prediction: Water cooling temperature (TempW0-1) is the most critical parameter — higher water temperature reduces UTS, consistent with metallurgical expectations
  • YS prediction: Oil temperature (TempO5-1) dominates, directly correlating with yield strength development
  • EL prediction: Injection speed parameters (average low/high speed, injection time) are primary drivers, as filling dynamics directly affect porosity and ductility

Results

Test set performance (factory real-world data):

Metric YS (MPa) UTS (MPa) EL (%)
RMSE 23.08 17.72 2.35
MAPE 4.7% 2.9% 7.7%
  • Overall R² = 0.907 on the test dataset
  • Residual analysis: 82% of YS residuals within ±3 MPa; 69% of UTS within ±10 MPa; 84% of EL within ±2%
  • Ablation study confirmed each modality’s contribution: removing thermal image features increased UTS MAE by 2.3, tabular features by 1.1, and time series by 1.6

Production deployment: The system was integrated into the factory’s Manufacturing Execution System (MES) for real-time quality monitoring with incremental retraining capability as new products are introduced.