Key Success Factors for Successful Implementation of AI Based Segmentation Algorithms in Clinical Radiology Practice

With the rise of large amounts of patient’s clinical and imaging data, the development of artificial intelligence tools based on machine learning and deep learning capable of performing several tasks such as image classification or regression, organ segmentation or feature extraction, has soared over the past few years [1]. These developments create many opportunities for radiologists and are likely to impact their routine practice in the long run by providing tools that will improve the accuracy and efficiency of diagnosis and prognosis. It will arguably allow radiologists to spend more time on complex problem solving by rebarbative tasks and help grasping more useful information from medical images [2]. Despite a great research interest, many challenges are getting in the way of an efficient, safe and ethical implementation of those tools in radiologists’ daily practice [3]. Nevertheless, not all tasks and modalities of medical AI have reached the same level of maturity nor are they developing at the same pace.


Introduction
With the rise of large amounts of patient's clinical and imaging data, the development of artificial intelligence tools based on machine learning and deep learning capable of performing several tasks such as image classification or regression, organ segmentation or feature extraction, has soared over the past few years [1]. These developments create many opportunities for radiologists and are likely to impact their routine practice in the long run by providing tools that will improve the accuracy and efficiency of diagnosis and prognosis. It will arguably allow radiologists to spend more time on complex problem solving by rebarbative tasks and help grasping more useful information from medical images [2]. Despite a great research interest, many challenges are getting in the way of an efficient, safe and ethical implementation of those tools in radiologists' daily practice [3]. Nevertheless, not all tasks and modalities of medical AI have reached the same level of maturity nor are they developing at the same pace.

Overview of Segmentation Tools
Image segmentation is a pixel-wise classification of an image to located specific objects (organs, substructures or lesions) which allows, between other things, quantitative analysis of volume and shape. Segmentation is a very time-consuming procedure, an expert can spend up to four hours for the segmentation of a single case in a complex location such as head and neck [4]. It is also a task that is dependent on the level of expertise of the operator and suffers from a high inter and intra operator variability [5]. In some contexts such as radiotherapy, tumor segmentation can be of critical importance to limit the irradiation of normal tissues. Segmentation can also be a first step for other image analysis such as local texture feature extraction. For all these reasons, radiologists and patients would benefit a lot from a reliable and automated segmentation tool. This

Comparison of Segmentation Studies for Head and Neck Structures
We will now go on with a comparison of three studies that A second study illustrated another attempt of automated segmentation of 16 head and neck organs and resulted in a very high dice score of 0.98 for the brain (down to 0.65 for the cochlea) [8]. Compared to the previous study, the advantage of this work is to use a much larger training dataset of around 3495 patients for training and to test the performances on an external multicentric dataset. However the ground truth segmentation was performed by various anonymous operators with no review by an expert and the training exams all came from the same center, which reduces the robustness of the algorithm and increases bias.
The final study considered here brings interesting perspectives on the effort that has been made to reduce the ground truth segmentation's bias [5]. The reference segmentation was decided by consensus between two trained expert radiologists (third expertise was requested in case of disagreement). Another point is made regarding the analysis of the performance of the algorithm.
The predicted segmentations were compared with a panel of 8 radiologists who judged that almost 90% of those were satisfying.
They also quantitatively analyzed the increase of performance of radiologists assisted by the algorithm and concluded that it reduces intra operator variability by 36%, inter-operator variability by 55% and reduce time dedicated to this task by 40%.

Research to Clinical Practice
A few factors can be highlighted to ensure the reproducibility of results as well as effective and safe transfer to the clinical practice of automated segmentation algorithms. First, it must be noted that current deep learning methods (such as the different versions of the U-Net) have the potential to reach very high dice scores, and will at some point only reproduce the bias of the training database and ground truth segmentation if those are not built with enough care. Therefore it is essential to promote the development of large qualitative databases with a great variety of cases (multicentric, different operators and imaging machines) to ensure generalization of the results. A special effort should be made for the definition of the ground truth segmentation to avoid any human bias. This can be done by using a consensus from two experts or implementing a systematic and independent review by an expert. Finally, it is important but to also conduct a review of the performances by human experts considering predefined criteria and not only one or two metrics (dice score, Hausdorff distance), which tend to be less meaningful as the algorithm gets better.