Share this project
Machine Learning for Breast Cancer Diagnosis in Developing Countries
Summary
In this thesis, we investigated the feasibility of automated breast thermography using computer aided diagnosis for breast cancer in developing countries. We wanted to explore alternatives and think creatively about solving the problems in our setting (developing countries). Existing breast cancer detection technology poses some challenges in our setting due to minimal human expertise, high equipment cost, social and cultural barriers such as privacy and stigma. The technology developed during this thesis attempted to address some of these issues by combining the benefits of breast thermography with the advances in machine learning. A brief summary of different chapters presented in this thesis is described below.
Chapter 2 of this thesis is a review paper that was focused on understanding the challenges associated with the conventional breast imaging modalities and appreciating the potential benefits of infrared thermography as an imaging modality for breast cancer detection. We first explored the biological relation between the abnormal heat patterns and the presence of malignancy and provided a detailed review on the different criteria used for manual interpretation of the breast thermal images and the challenges such as high expertise, cognitive overload and subjectivity associated with the manual interpretation. This was followed by a survey on the existing large-scale studies since 1960, that showed the efficacy of breast thermography for malignancy detection. Finally, we studied the recent advances in the hardware and the software technologies of infrared breast thermography for achieving better accuracy with the combination of automated interpretation and highly sensitive infrared cameras.
To automate the crucial preprocessing steps in the imaging protocol, we discussed novel segmentation and view tagging approaches that are required for automated diagnosis. Due to the limited data available in the initial stages of experimentation, we proposed an image processing approach with heuristics as discussed in Chapter 3 for tagging and segmentation of the thermal images. However, this technique was highly dependent on the imaging protocol and any deviation from the imaging protocol resulted in erroneous predictions and thus making it unsuitable for clinical practice. Hence, we proposed a deep learning architecture in Chapter 4 to accurately predict the breast region and the view angle by learning high level semantics from a large training dataset. Interestingly, the obtained dice index of 0.92 (92%) with the deep learning approach was very close to the inter-observer dice correlation between two human experts. Also, the obtained dice index was approximately 25% higher than the prior heuristic based segmentation, demonstrating the strength of the deep learning architectures when there is large training data.
From the detected breast regions, blood vessels were automatically extracted as discussed in Chapter 5 to study the vascular properties. Vascular changes are one of the early changes that occur inside the body during the onset of cancer. Hence, examination of this vascularity can help in early detection of cancerous regions. Automated extraction of these vessels was challenging as the heat signature of the vessel boundaries diffused while its transmission to the breast surface. Hence, we proposed a three-level vessel enhancement technique followed by shape and temperature filters to detect potential vessel pixels. We also defined a new diffusivity metric to remove vessels with high diffusivity which allowed for the identification of the prominent vessel structures. The results with the proposed approach were observed to be better and less dependent on the threshold parameters when compared to vessel extraction techniques that were used for other imaging modalities.
Chapter 6 explored different deep learning architectures for extracting thermally active regions (hotspots) from the segmented breast regions using convolutional neural networks. The hotspot extraction helps in localization and understanding the nature and the properties of the abnormal region. We observed that the encoder-decoder architectures are better compared to the point-wise architectures in terms of inference time, dice index and accuracy for hotspot extraction in thermal images. Though the accuracies of the encoder-decoder architectures were high, these results were still inferior compared to an adaptive histogram thresholding approach. This could be due to the small size of the labeled dataset used for training the architectures. Therefore, in Chapter 7, we used an adaptive histogram thresholding for the hotspot detection and extracted medically interpretable features with the help of domain knowledge to characterize the detected hotspot regions. These features were then fed to train a random forest (RF) classifier to predict the probability of malignancy based on the hotspot properties.
Chapter 8 extended the scope of thermal imaging into prognosis by proposing an automated technique for estimating the hormonal status of the detected malignant regions from the breast thermal images. This hormonal status biomarker is an important prognostic factor for treatment planning and survival prediction. Our proposed approach is a preliminary study that showed an accuracy of 80% for estimating the hormonal status using a combination of machine learning and thermography. If these results are replicable on a large-scale population, this could eliminate the need for an additional invasive procedure to obtain these parameters, thereby reducing the cost and providing prompt treatment initiation.
Furthermore, we investigated the robustness of different machine learning classifiers for predicting the risk of breast cancer from the common breast cancer risk factors as discussed in Chapter 9. This risk estimation could help in identifying high-risk women, thereby providing personalized care and diagnosis to these women for either reducing the risk of breast cancer or for early detection of breast cancer. Incomplete or inaccurate information of the risk factors is a major problem that occurs due to fear of social stigmatization or patient’s lack of knowledge about the risk factors and could lead to inaccurate risk prediction. Hence, we evaluated three prominent machine learning classifiers for their robustness to the incomplete and inaccurate information. The proposed custom neural network architecture was found to be superior compared to logistic regression and random forests especially when less than 50% of the data was incorrect or missing.
Though the neural network architecture produced superior results compared to other classifiers even in the presence of incomplete or inaccurate information, the risk estimate was not accurate due to the weak correlation of the risk factors to the actual screening outcome. Therefore, we proposed a novel combination of machine learning with breast thermography and non-imaging features such as lump and nipple discharge for breast cancer pre-screening through a personalized risk assessment, as discussed in Chapter 10. The use of breast thermography made the risk prediction personalized, as it considered the heat emitted from the subject for risk estimation. We further used the predicted risk score to assign the individuals into four risk cohorts, where the likelihood of malignancy monotonically increased with the risk grouping level. These initial results are encouraging and could help in creating a tailored screening regime for the individuals based on their risk to detect malignancies in their early stages.
Chapter 11 summarized the overall research, discussed the potential implications of this thesis in the breast cancer continuum of care and shared our thoughts on further research needed to improve early detection of breast cancer in developing and underdeveloped countries.
See also these dissertations


Advancing Contemporary Coronary Diagnostics and Interventions


Life in the Mist: When Nothing Feels Worth It


MILITARY AI TECHNOLOGIES UNDER INTERNATIONAL HUMANITARIAN LAW


Omics Studies of Cardiometabolic and Skeletal Traits


Holistic Integration of Desktop Virtual Reality Technology in Higher Education


Rethinking dietary fibers in poultry nutrition
We print for the following universities















