Rummaging through CCTV footage to find something specific is not just cumbersome but also extremely time-consuming. While it can be manageable for home users, it is a big undertaking for organisations, enterprises, government bodies, and public/private institutions.
Gurugram-based VOGIC AI decided to solve this real-world problem using AI.
AIM caught up with Arijit Biswas, the CEO and co-founder of VOGIC AI, for an exclusive chat about how they’re tackling the problem, some of its tech internals, and how they plan to make it secure.
How are They Solving Chaos in CCTV Footage?
Biswas explained that the CCTV footage is usually unstructured and in high volume, considering CCTVs run 24/7. It is tough to analyse a video or imagery data with so much going on. Citing Microsoft Power BI and Excel, which exist for numerical and text data, he said there are no such popular tools helping the cause in the case of video data.
VOGIC AI empowers organisations with tools and pre-built modules through which one can extract information based on contexts like ‘a person walking close to a car’ or ‘a person trying to capture a photo of the car’.
He elaborated on this and said, “The Army might have a different context, the law enforcement agencies might have a different context and a chain of retail stores might have a different context. There are different institutes and physical setups that have different contexts.”
So, whether it is drone information or satellite information, with VOGIC AI, footages from such organisations can be analysed easily.
What’s The Tech Under-the-Hood?
The company’s name combines ‘video’ with ‘logic’. When asked how VOGIC AI integrates with the various CCTV platforms nationwide to provide its AI capabilities, Biswas explained that most CCTV vendors like CP Plus, Honeywell, Bosch, and others follow a standardised protocol called the Open Network Video Interface Forum (ONVIF). Their system is compatible with the same, enabling them to work with all kinds of OEMs.
In addition, some companies use a video management system like Milestone. They integrate their solution directly through the CCTV cameras or via the video management systems that the companies use.
So, what’s the tech stack (or the AI model) making it all this possible?
Biswas stated that they use a mix of conventional neural networks and large vision language models (VLMs), which are LLMs for videos, fine-tuned to the context of a CCTV camera. The VLM works with the concept of images and text pairs, which helps index the footage.
He further explained that the first layer of indexing (the heavy workload) is done by the neural networks, and the next layer of contextual information is added by the VLMs. The base model for the VLM is LLaVA, which was further trained using CCTV-specific videos to build VOGIC AI’s solution.
What Were The Challenges in Building This?
Biswas highlighted that the primary challenge was to acquire the video data for training, considering it is highly sensitive in nature. He also said that existing AI systems struggle to extract meaningful context from video footage, leading to false alerts.
Lastly, such VLMs are computationally intensive, which prompts the use of expensive GPUs, which may not be ideal.
While the platform has largely solved these challenges, it is taking further steps, such as making a crowdsourcing platform to encourage individuals to contribute video data, decluttering the VLM to a smaller model and adding contextual information to the footage.
How Does It Ensure Data Privacy?
For customers with a large CCTV infrastructure, the company deploys most of its code based on the customer’s private cloud. The same applies for organisations with a data centre.
The company also provides a GPU box, which connects to the host’s internet network, and processes data from within the network. However, the company confirms that nowhere in the process is the data extracted or sent anywhere else.
In addition to this, it takes several safety measures, like data anonymisation with face blurring. Though the system can detect if the subject is a male or a female, it will not identify the person unless permitted to do so. Customers can choose to toggle this feature as per the data privacy laws in their respective country.
“We have partnered with companies like Lenovo and Dell, and we are an NVIDIA Inception & Metropolis partner. A lot of innovations that NVIDIA carries out in security and safety are integrated into our systems too,” Biswas added.
Revenue Model and Future Plans
The revenue model for VOGIC AI is straightforward, it includes charging customers per CCTV camera for continuous deployment. It charges a monthly license fee or as per the footage duration if it involves analysing a lot of archived data.
VOGIC AI is focused on businesses and organisations but also aims to integrate its solution for law enforcement and India’s national security, using a business-to-government (B2G) model. The company has worked on projects like Varanasi Smart City and with a few drone surveillance companies. It is in the process of being tested by law enforcement agencies.
Biswas also mentioned his plans to explore working with security system integrators outside India, particularly in the Middle East.