The challenging problem of distribution-agnostic linear (weighted) unbiased estimation of a global parameter from heterogeneous and unbalanced data is addressed. This setup may originate in different signal processing contexts involving the joint processing of non-homogeneous groups of data whose statistical distribution is unknown, with (possibly highly) diverse sample sizes. Since sample estimators of the local variances are inaccurate in the low-sample regime, suitable weighting schemes are required. For this problem, we study a family of estimators based on the idea of trimmed weights, i.e., proportional to the sample size but with a proper saturation. Such an approach is theoretically analyzed, showing that it can be linked to the Maximum Entropy principle under uncertainty on the data generative model (as well as to a broader class of cost functions). Different criteria for setting the "cut-off" threshold between the linear and saturated regions are analyzed, also obtaining a reduced-complexity approximation of the optimal minimum-variance estimator for a generalized mixed-effect model. To this aim, a further contribution is that several estimators of an hyperparameter are derived and analyzed. The proposed approach is analyzed theoretically and its performance are assessed against state-of-the-art estimators. An illustrative application to real-world COVID-19 data is also finally developed.

Distribution-Agnostic Linear Unbiased Estimation With Saturated Weights for Heterogeneous Data

Angelo Coluccia
2023-01-01

Abstract

The challenging problem of distribution-agnostic linear (weighted) unbiased estimation of a global parameter from heterogeneous and unbalanced data is addressed. This setup may originate in different signal processing contexts involving the joint processing of non-homogeneous groups of data whose statistical distribution is unknown, with (possibly highly) diverse sample sizes. Since sample estimators of the local variances are inaccurate in the low-sample regime, suitable weighting schemes are required. For this problem, we study a family of estimators based on the idea of trimmed weights, i.e., proportional to the sample size but with a proper saturation. Such an approach is theoretically analyzed, showing that it can be linked to the Maximum Entropy principle under uncertainty on the data generative model (as well as to a broader class of cost functions). Different criteria for setting the "cut-off" threshold between the linear and saturated regions are analyzed, also obtaining a reduced-complexity approximation of the optimal minimum-variance estimator for a generalized mixed-effect model. To this aim, a further contribution is that several estimators of an hyperparameter are derived and analyzed. The proposed approach is analyzed theoretically and its performance are assessed against state-of-the-art estimators. An illustrative application to real-world COVID-19 data is also finally developed.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11587/514546
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact