Gene Selection and Cancer Classification Using a Multidimensional Fuzzy Deep Learning Approach for Gene Expression Data

  • Mahmood Khalsan

Student thesis: Doctoral Thesis

Abstract

Deep Learning approaches are powerful techniques commonly employed for developing cancer prediction models using associated gene expression and mutation data. This thesis provides a comprehensive review of recent cancer studies that have employed gene expression data from several cancer types (i.e Breast, Lung, Kidney, Liver, Gallbladder, Gastric, and Thyroid) for survival prediction, tumour identification, and stratification as well as providing an overview of biomarker studies that are associated with these cancer types. The thesis captures multiple aspects of machine learning-associated cancer studies, including cancer classification, cancer prediction, identification of biomarker genes, microarray, and RNA-Seq data. The thesis discussed the technical issues with current cancer classification models and the corresponding measurement tools for determining the activity levels of gene expression between cancerous tissues and noncancerous tissues. The work has not only highlighted the issues but also attempted to address the issues that arise in the previous studies. One of the notable issues that employ gene expression data for cancer classification is the high dimensionality of the available datasets.
As a result, the work developed a fusion three feature selection initial approach to reduce the number of genes and increase the accuracy of cancer classification by using the concept of intersection. fusion three feature selection method was developed to select an optimal subset of genes that would be used as identifiers for classification and reduce the dimensionality of the available data in gene expression. The results ratings for employed cancer datasets ranged from (95.5% to 98%) accuracy, (94.4% to 100%) precision, (94% to 100%) recall, and (95.7% to 98%) f1-score.
Therefore, a new fuzzy gene selection approach was developed to identify significant genes to facilitate cancer classification and reduce the dimensionality of the available gene expression data. This method demonstrated that has the ability for classifying cancer accurately in evaluating most of the datasets that were employed. The results indicate that FGS enhanced the performance of the five common classifiers with the majority of cancer expression datasets employed and particularly when the FGS and MLP were applied together. The average results were 97%,97.3%, 96.5%, and 96.77% for accuracy, precision, recall, and f1-score respectively. Next, Fuzzy gene selection-wrapper plus is offered as the second significant addition to this thesis. Fuzzy gene selection-wrapper plus attempts to reduce the number of genes selected by Fuzzy gene selection while maintaining the accuracy attained. The results showed that Fuzzy gene selection-wrapper plus was able to reduce the number of genes selected by Fuzzy gene selection method up to 82% without sacrificing accuracy and other evaluation metrics.
Finally, a novel fuzzy classifier method developed to enhance the accuracy and increased the generalisation of the proposed algorithm in most of employing datasets. Fuzzy classifier demonstrated that it continuously achieved the highest results in all employed datasets compared to classical classifiers. The average results were 98%, 98.2%, 96.5%, and 97% for accuracy, precision, recall, and f1-score respectively for all employed datasets. The thesis integrated the developed approaches (Fuzzy gene selection-wrapper plus, and Fuzzy classifier) into a single, fully automated end-to-end model called multidimensional fuzzy deep learning.
Date of Award4 Apr 2024
Original languageEnglish
Awarding Institution
  • University of Northampton
SupervisorMichael Opoku Agyeman (Director of Studies), Mu Mu (Supervisor), Suraj Ajit (Supervisor), EMAN SALIH AL-SHAMERY (Supervisor) & Lee Machado (Advisor)

Keywords

  • Gene Selection
  • Deep learning
  • Cancer Classification

Cite this

'