We present a deterministic parallel algorithm for the k-majority problem, that can be used to find in parallel frequent items, i.e. those whose multiplicity is greater than a given threshold, and is therefore useful to process iceberg queries and in many other different contexts of applied mathematics and information theory. The algorithm can be used both in the online (stream) context and in the offline setting, the difference being that in the former case we are restricted to a single scan of the input elements, so that verifying the frequent items that have been determined is not allowed (e.g. network traffic streams passing through internet routers), while in the latter a parallel scan of the input can be used to determine the actual k-majority elements. To the best of our knowledge, this is the first parallel algorithm solving the proposed problem.
Finding frequent items in parallel
CAFARO, Massimo;
2011-01-01
Abstract
We present a deterministic parallel algorithm for the k-majority problem, that can be used to find in parallel frequent items, i.e. those whose multiplicity is greater than a given threshold, and is therefore useful to process iceberg queries and in many other different contexts of applied mathematics and information theory. The algorithm can be used both in the online (stream) context and in the offline setting, the difference being that in the former case we are restricted to a single scan of the input elements, so that verifying the frequent items that have been determined is not allowed (e.g. network traffic streams passing through internet routers), while in the latter a parallel scan of the input can be used to determine the actual k-majority elements. To the best of our knowledge, this is the first parallel algorithm solving the proposed problem.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.