Knowledge Location Recognition.ai
A hackathon project to help visually impaired people find objects using voice commands and detect collisions using computer vision.
Video Demo
This project utilizes two industry standard computer vision libraries, YOLO for object detection, and MiDaS for relative depth perception.
The goal of this project was to provide a convenient way for pathfinding and object detection using computer vision for visually impaired users.
The functionalities of this project were broken down into two, pathfinding and object detection.
For pathfinding, the video stream from the camera would give images to MiDaS to check for any object that is close to the user, if it is, that image would go into YOLO to be classified and sent to an LLM which would create a warning for the user, then ElevenLabs text to speech was utilized to voice the warning out.
For object detection, the speech to text model from ElevenLabs was used to give the user an opportunity to prompt for their item, if the item is detected in the frame using YOLO, the LLM was given position and depth data to generate a message for the user that was again read out.
Github Repo