Eurosatory 2024 – ST Engineering details its AGIL AI-enabled systems

Sam Cranny-Evans

ST Engineering has detailed its suite of artificial intelligence (AI) capabilities, which are used in Singapore to provide decision support and data fusion in civil security during an interview with EDR On-line at the Eurosatory 2024 exhibition in Paris

The system has been developed by ST Engineering for Singapore to process enormous amounts of data from different agencies and fuse data into a centralised command and operations hub.

The system architecture has been developed over time and started with command hubs that were set up for each of the relevant services – such as healthcare, public utilities, homeland security and border control.There are three key AI-enabled products that power the network, they are AGIL Response, AGIL Vision, and Merlin. The system works by using AI at the edge of the network to process and interrogate the data gathered by different types of sensors. For example, CCTV cameras can be fitted with computers carrying the AGIL Vision capability, which may include facial recognition algorithms. Those cameras can feed metadata on individuals back to the central command post, which is equipped with additional AI capabilities that allow all of the CCTV video footage gathered to be interrogated using text prompts. “Using our natural language processing, it would be possible to search all of the footage for a person wearing an Adidas bag,” William Goh, ST Engineering’s Head of International Software and AI told EDR On-Line.

click on image to enlarge

In a theoretical scenario, imagine that an individual is observed acting suspiciously around a government or military site and that the security services were first alerted to this via a phone call. That data could be automatically linked through the central command hub where CCTV data from the area can be interrogated using descriptors to locate the individual. “We can add other data sources into the feed,” Goh explained, “for example, to examine an individual’s web activity once we have identified them as a potential threat,” he added. So, the individual would be identified and web activity indicating hostile intent is shared with the central command hub. AGIL vision and ST Engineering’s other algorithms would provide consolidated data or data which is most relevant and additional systems at the central command hub would interrogate that data further to provide command and decision support. “We have Merlin, which can provide recommendations and assessments. If that individual was carrying a gun or bomb, Merlin would gather information on bed availability from nearby hospitals so that we could prepare for potential casualties and alert them,” Goh said. This kind of capability has many applications outside of homeland security, it can be used to coordinate a response to natural disasters as well.

If a threat or hazard has been identified, the system includes a separate AI capability called AGIL Response, which provides recommendations on the type of response and availability of responders. This might be a connection between the Central Command hub and a police unit to intercept the identified individual.

At a system architecture level, the network can be described overall as a large cloud computing capability held by the Central Command hub equipped with AI-enabled capabilities such as AGIL Vision and Response to interpret and interrogate the data that is received from thousands of sensors. The sensors may be enabled by AI to interpret or analyse data at the edge. Each of the services, for example the army, police, health service, is a smaller representation of this model with their own computing capabilities.

The data is transferred from the edge to the command hub using cellular networks through 4G and 5G. “Private data is stored on private clouds, and metadata from those private clouds is sent to the central cloud, as opposed to the data itself,” Goh explained. Generative AI – AI that can generate new outputs from inputs such as text or images – is used to translate all of these inputs into usable formats. This helps the commander build a better understanding of the situation and make more informed decisions.

click on image to enlarge

One straightforward example of generative AI within the ST Engineering portfolio is AGIL Vision. “AGIL Vision leverages LLMs [large language models] and generative AI to create a flexible intelligence tool that retrieves more insights for further investigations,” Goh said. It is the tool that would enable the Command Hub personnel to search for an individual wearing an Adidas bag in the example on the left, for example. “AGIL Vision can run on a computer in a vehicle or on cloud for large-scale deployment,” Goh explained. This concept was demonstrated by ST Engineering at the company’s stand at Eurosatory. A web camera connected to the computing capability carrying AGIL Vision was given a watchlist of terms. For example, “tank”, and was able to detect a gun, the company’s logo, and somebody making a peace sign with their fingers. As each term was detected, the system provided an alert. “This is not meta-tagged data, the machine is inferring what it is seeing based on some pre-trained models,” Goh explained.

This type of capability is often provided by something known as a foundational model. To provide a succinct explanation of how foundational models work, EDR On-line asked another foundational model – Google Gemini – to explain what they are. It replied, “Think of a foundational model like a well-educated person. They have a broad knowledge base but might need specific training for a particular job.” Typically, this type of AI is self-supervising and it is not possible to understand how it recognises that it is seeing a gun, but it is possible to know what data it was trained on, and what results it produces. So, ST Engineering’s AGIL Vision may have received specific training based upon the threats it would be expected to observe but would be able to recognise a large number of terms without this.

“AGIL Vision generates a report from what it sees and the terms that you enter, it uses this to generate a recommendation or assessment of the situation,” Goh explained as he pointed to the AI-generated report, which is shown in the image below. Note, that not only has the AI identified the gun, but it has correctly determined that it is a toy, that the author and ST Engineering staff are enjoying a “light-hearted moment.” Based upon the AI’s understanding of the situation, it recommends that no further action is required. With the ability to generate metadata from video, it reduces the communications bandwidth required to send information, especially when compared with streaming the video content itself. The local user of the sensor can set the watch list, or a commander at a higher echelon can establish them for dissemination to other elements of the network.

ST Engineering is working on a similar command and control system for its armed forces, which they are calling Digital C4. The system is intended to provide multi-domain coordination for the Singapore Armed Forces. “It is currently on our technology roadmap and will connect all of the ST Engineering-provided C2 systems from the tactical level up to the strategic level,” Goh said. Digital C4 will require a lot more edge computing than the homeland security capability as cloud computing cannot be relied upon in the dynamic and changing scenarios of warfare. This will involve vehicle integration of capabilities like AGIL Vision. The company is currently examining the use of VHF, cellular networks, and SATCOM to pass data between the nodes of the C4 network. As mentioned above, edge AI can condense the outputs of video data into a report and transmit that report as opposed to the video footage itself, which dramatically reduces the bandwidth required.

Many militaries are considering their future approach to warfare and believe that multi-domain operations are key. This creates multiple technical challenges in passing data from a sensor to an effector in real time. From the communications infrastructure to the computing requirements, however, if multi-domain coordination can be achieved in real time, it will shorten planning cycles and engagement times, which is expected to make forces more capable and lethal.

Photos by S. Cranny-Evans