Real-Time Object Finding for the Visually Impaired Using an Image-to-Speech Wearable Device

Velázquez, R.; Pissaloux, E.; Del-Valle-Soto, C.; Garzón-Castro, C. L.; De Fazio, R.; Visconti, P.

doi:10.1007/978-3-032-10862-3_20

This paper introduces a novel wearable assistive device designed to enhance the daily lives of visually impaired people (VIP) by providing continuous, real-time object finding. The proposed device integrates a miniature camera and a system on module (SoM) computing unit. Deep learning-based algorithms, specifically Faster R-CNNs, run on the SoM targeting to detect and recognize pertinent objects in the images provided by the camera. An integrated speaker conveys audio sentences letting the user know the presence of object(s) in the visual space. Experimental results conducted in real-world conditions show an 86% mean average precision for object recognition and a 215 ms mean computing time for real-time processing. The proposed image-to-speech device offers a practical and efficient solution to assist VIP in finding everyday objects.