Federated Learning with Adaptive Device Selection

Overview

This M.Eng design project at Cornell University (supervised by Prof. Qing Zhao) investigated Federated Learning in heterogeneous environments where edge devices have varying data distributions, computational capabilities, and network bandwidth. Federated Learning enables collaborative model training across distributed devices without sharing raw data, preserving user privacy — but real-world deployments face significant challenges from device and data heterogeneity.

Federated Learning architecture: distributed edge devices collaboratively train a shared global model by exchanging model updates (not raw data) with a central server.

Algorithms Compared

We implemented and compared three major FL algorithms on the MNIST dataset using logistic regression, with 100 communication rounds and 20 local epochs per round (Google Colab, V100 GPUs):

1. FedAvg (McMahan et al., 2017) — The foundational FL algorithm:

Clients selected uniformly at random each round
Optimization target: \(\min_w f(w) = \sum p_k F_k(w)\)
Serves as the baseline for comparison

2. FedProx (Li et al., 2020) — Addresses statistical heterogeneity:

Adds a proximal regularization term to prevent local models from diverging too far from the global model
Modified local objective: \(\min_w h(w; w^t) = F_k(w) + \frac{\mu}{2} \|w - w^t\|^2\)
Designed specifically for Non-IID client environments

3. q-FedAvg (Li et al., 2019) — Prioritizes fairness:

Re-weights the objective to emphasize poorly-performing clients
Modified objective: \(\min_w f(w) = \sum \frac{p_k}{q+1} F_k^{q+1}(w)\)
Parameter \(q\) controls fairness emphasis (larger \(q\) = more weight to high-loss clients)
Reduces variance among devices’ test accuracy by 45%

Left: FedAvg convergence under IID vs Non-IID data with 10 clients --- Non-IID data causes noticeably worse accuracy and higher oscillation. Right: Comparison of all three algorithms under Non-IID setting, showing q-FedAvg's slower but fairer convergence.

Experimental Results

Effect of Data Distribution

Non-IID data consistently produces worse accuracy than IID, with higher oscillation across communication rounds
Prior work (136K smartphone study) showed heterogeneity decreases FL accuracy by 9.2% and increases convergence time by 250%
Increasing client count from 10 to 50 significantly narrows the IID/Non-IID gap, as more clients enable better exploration of the optimization landscape

Algorithm Convergence (Non-IID, 10 clients)

FedAvg and FedProx show similar convergence rates and final accuracy under controlled conditions
q-FedAvg exhibits noticeably slower convergence, a direct consequence of its fairness weighting that prioritizes underperforming clients
All three algorithms reach similar final accuracy, but the paths differ substantially

Key Insight: Fairness-Performance Trade-off

q-FedAvg demonstrates an explicit trade-off — it reduces accuracy variance across clients at the cost of slower overall convergence. This has practical implications: applications requiring equitable performance across all users (e.g., healthcare) may prefer q-FedAvg, while latency-sensitive applications may prefer FedAvg/FedProx.

Comprehensive Literature Survey

We reviewed 40+ papers and categorized FL challenges into four heterogeneity dimensions:

Heterogeneity Type	Description	Key Methods
Data Space	Feature/label size differences	Data augmentation, GANs
Statistical	Non-IID data distributions	FedProx, ISFedAvg, FedPNS
System	Varying compute/network capabilities	FedCS, OCEAN
Fairness	Biased client selection	q-FedAvg, Eiffel, Oort

We also evaluated four client selection paradigms: random (diversity), greedy (performance), clustering (similarity grouping), and multi-armed bandit (adaptive exploration-exploitation balance).

Joint work with Joash Shankar (jcs556) at Cornell University, 2023–2024.