Reliably tracking large network flows in order to determine so-called elephant flows, also known as heavy hitters or frequent items, is a common data mining task. Indeed, this kind of information is crucial in many different contexts and applications. Since storing all of the traffic is impossible, owing to the fact that flows arrive as an unbounded, infinite length stream, many different algorithms have been designed for traffic measurement using only a limited amount of memory. CountMax is a recently published algorithm that solves this problem approximately by combining ideas from the CountMin and MisraGries algorithms. In this paper we introduce CMSS which cleverly combines ideas from the CountMin and Space Saving algorithms. We compare CMSS and CountMax on both synthetic and real datasets. The experimental results show that both algorithms achieve 100% recall, and can retrieve the frequent items without false negatives even with a limited amount of budgeted memory. Regarding the precision, CMSS proves to be robust and independent from all of the parameters, whilst CountMax is severely affected by false positives. Finally, both algorithms provide, in practice, the same frequency estimation quality. It follows that CMSS outperforms CountMax with regard to overall accuracy whilst providing the same frequency estimation quality.

CMSS: Sketching based reliable tracking of large network flows

Cafaro, Massimo
Methodology
;
Epicoco, Italo
Methodology
;
Pulimeno, Marco
Methodology
2019-01-01

Abstract

Reliably tracking large network flows in order to determine so-called elephant flows, also known as heavy hitters or frequent items, is a common data mining task. Indeed, this kind of information is crucial in many different contexts and applications. Since storing all of the traffic is impossible, owing to the fact that flows arrive as an unbounded, infinite length stream, many different algorithms have been designed for traffic measurement using only a limited amount of memory. CountMax is a recently published algorithm that solves this problem approximately by combining ideas from the CountMin and MisraGries algorithms. In this paper we introduce CMSS which cleverly combines ideas from the CountMin and Space Saving algorithms. We compare CMSS and CountMax on both synthetic and real datasets. The experimental results show that both algorithms achieve 100% recall, and can retrieve the frequent items without false negatives even with a limited amount of budgeted memory. Regarding the precision, CMSS proves to be robust and independent from all of the parameters, whilst CountMax is severely affected by false positives. Finally, both algorithms provide, in practice, the same frequency estimation quality. It follows that CMSS outperforms CountMax with regard to overall accuracy whilst providing the same frequency estimation quality.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11587/432026
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 9
social impact