Swarm Learning Used to Develop Robust AI Algorithms for Cancer Diagnosis While Protecting Patient Privacy - BioMed Advances

AI systems have been developed for assisting with the diagnosis of medical conditions, but training those algorithms requires large cohorts of patient data. That typically requires hospitals to release medical images, which often contain information that allows patients to be identified. The AI-based systems are often owned and controlled by private companies, so providing medical images with identifiable information carries a privacy risk. There are also restrictions on international transfers of personal data, so training AI systems on large data sets from multiple countries can be problematic.

Scientists at the University of Leeds in the United Kingdom have been investigating a form of artificial intelligence known as swarm learning to get around these privacy issues to allow the use of large data sets for training robust AI algorithms. With swarm learning, partners in the program conduct training of AI models using their own data. For instance, hospitals in the United States could train the algorithms, along with hospitals in the United Kingdom, Europe, and beyond. All partners work separately using their own data, and the results are then combined to create more robust algorithms, yet there is no transfer of personal data which ensures the privacy of patients is protected and monopolistic data governance is avoided.

With this approach, the trained algorithms are sent to a central computer by all partners where they are combined to create an optimized algorithm, without sending any local data or patient information. The optimized algorithm is then sent back to each partner, where it can be reapplied to the original data to scan images with improved accuracy.

The researchers applied this swarm learning technique to large, multicentric datasets of gigapixel histopathology images from over 5,000 patients in cohorts in Northern Ireland, Germany, and the United States, then validated the predictive performance of the algorithm on two independent datasets in the United Kingdom. The researchers found that the SL-based approach using large datasets resulted in their AI models outperforming most locally trained models, and achieved performance on a par with algorithms trained using merged datasets.

“We show that AI models trained using SL can predict BRAF mutational status and microsatellite instability directly from hematoxylin and eosin (H&E)-stained pathology slides of colorectal cancer,” explained the researchers, although this approach could be applied to train distributed AI models to conduct any histopathology image analysis task, without having to transfer any personally identifiable data.

You can read more about the study in the paper – Swarm learning for decentralized artificial intelligence in cancer histopathology – which was recently published in Nature Medicine. DOI: 10.1038/s41591-022-01768-5