Are AI systems stronger together?
The press’ opinions didn’t make sense. I stared at my screen in confusion, scrolling through a page of run-off sentences and not-so-subtle advertisements from the most popular news articles in 2020: phrases like “Some have already.”, “Watch the latest episode here…”, and “Hollywood heartthrob”.
The machine learning model I fine-tuned extracted dozens of these “opinions” and presented them to me. Contextually, these phrases made sense in their own right — they were part of much larger news articles tackling narratives around the COVID-19 pandemic, education, and the entertainment industry. While my machine learning (ML) model seemed to catch on to some strong sentiments, it wasn’t useful for the bigger picture.
Why? My team was constructing a pipeline of ML models for news categorization. I was responsible for using a BERT model to find sentences in individual articles that expressed strong opinions. These opinions would then be passed to a second model, the USE, to calculate the similarity between the opinions from BERT and cluster the associated news articles.
But BERT’s outputs weren’t relevant enough for the USE model to use.
Scratching my head, it was at that moment I realized we weren’t just looking to find strong opinions — we were looking to find strong opinions that made sense.
And then came the Aha! moment. What if we used a smaller USE model to post-process the BERT model’s outputs right before sending them to the main USE model? We would use this smaller USE model to measure the similarity of opinions identified by the BERT model with the articles themselves and discard the opinions that were not semantically similar to the article itself. All of this could be done while keeping the original pipeline intact.

With this tiny trick of using one machine learning model to augment another, I started to see more meaning in the opinions extracted from my model: quotations from interviewees, expressions of disbelief and joy, and personal anecdotes. Standpoints such as “I’ve seen countless posts on social media… makes for a particularly wistful form of escapism”, “…there are concerns about their beliefs on…”, and “’There is just no way you can be…’, said…”.
Composite systems helped my team mine more meaning from our models for our master’s project.
These types of systems, sometimes referred to as ML feedback loops, are popular across academia and industry — in fact, Gartner recently identified composite AI as an emerging technology trend. But what do these workflows look like for the data scientist behind the wheel?
Composite systems are by no sense new. There are many adjacent ML sub-spaces that come to mind when we talk about this, including ensemble models (ex: random forests 🌳🌳🌳) or even multi-task learning! It’s going to be exciting to see how these methodologies intersect with composite AI.
Here are a few case studies of how composite systems are being used in production today.
Case study 1: Healthcare
Composite AI systems are not limited to combining multiple ML model. This class of AI technologies can also include the use of multiple data processing techniques and additional operations to help with objectives like supply chain management and personalization.
SAS shares an interesting use case with Amsterdam University Medical Center: Using a combination of computer vision, data visualization, and machine learning processes to determine if chemotherapy is effective for patients!
The medical researchers cite that being able to start with statistical analysis and visualizations and then drill down to the details of patient scans is a key benefit of using composite AI. The diagram below elucidates a processing pipeline deployed for colorectal liver metastases morphometry with CT scans.

Case study 2: Voice Assistants
In order to consume, process, and generate responses to commands, voice assistants should be capable of performing multiple tasks effectively.
The Webex team illustrated a sample natural language processing pipeline for Webex Assistant (below), comprised of ML models for speech-to-text, entity recognition, question answering, and more.
Each of these models can require their own data cleaning elements, and evaluations of robustness. As the authors demonstrate, if the first speech to text component transcribes the voice command wrong, the entire pipeline can be impacted by bad data quality!

Case study 3: Manufacturing
The OECD Framework for Classifying AI Systems cites an exciting example of composite AI systems in the manufacturing domain, taken from the Qlector LEAP solution.
As part of this use case, multiple AI models consume data related to and originating from a manufacturing plant. These models are applied towards diverse tasks, like human profiling, market forecasting, anomaly detection, and more. The outputs are then aggregated as part of a knowledge graph used for different types of decision-making.
