Distribution-Agnostic Linear Unbiased Estimation With Saturated Weights for Heterogeneous Data

Grassi, Francesco; Coluccia, Angelo

doi:10.1109/TSP.2023.3293908

The challenging problem of distribution-agnostic linear (weighted) unbiased estimation of a global parameter from heterogeneous and unbalanced data is addressed. This setup may originate in different signal processing contexts involving the joint processing of non-homogeneous groups of data whose statistical distribution is unknown, with (possibly highly) diverse sample sizes. Since sample estimators of the local variances are inaccurate in the low-sample regime, suitable weighting schemes are required. For this problem, we study a family of estimators based on the idea of trimmed weights, i.e., proportional to the sample size but with a proper saturation. Such an approach is theoretically analyzed, showing that it can be linked to the Maximum Entropy principle under uncertainty on the data generative model (as well as to a broader class of cost functions). Different criteria for setting the "cut-off" threshold between the linear and saturated regions are analyzed, also obtaining a reduced-complexity approximation of the optimal minimum-variance estimator for a generalized mixed-effect model. To this aim, a further contribution is that several estimators of an hyperparameter are derived and analyzed. The proposed approach is analyzed theoretically and its performance are assessed against state-of-the-art estimators. An illustrative application to real-world COVID-19 data is also finally developed.