Our team has been working on this technology for the better part of 7 years, as our two cofounders were working through their respective PhD’s at Columbia University in New York. The transformational component of our technology is the multimodal aspect, essentially combining all the different modalities that occur throughout a piece of video content and fusing all of that information together. From a high level, we are training the computer to watch video in the same way a human does. One of the differentiating aspects of our platform is the ability to seamlessly train custom models without any annotated data. This solves what is called the cold start problem in academia, and is a core challenge when building computer vision and AI solutions. A prospector or finder (whether in the public or private sector) should be interested in this technology for a variety of reasons. First, by leveraging our technology, an organization can utilize its workforce in more efficient ways. A human operator is no longer required to manually tag the plethora of video content being produced and captured. They can now focus on activities that yield more value. Second, our technology is significantly faster than a human operator. We are able to extract all of the multimodal information found in a 60-minute video in 11 minutes. This exponentially increases the amount of video that can be analyzed by an organization. And lastly, our technology is significantly cheaper than other solutions that are available in the market. We have worked to ensure our algorithms run as efficiently as possible, with the cost savings being passed on to the end customer.
Vidrovr builds computer vision and artificial intelligence (AI) solutions specifically focused on video. Essentially, we are able to detect and recognize people, objects, actions, on screen text, and audio within a specific video asset. To accomplish these tasks, we leverage our proprietary algorithms, which in turn power derivative solutions.