Volume 23 - Issue 4

Mini Review Biomedical Science and Research Biomedical Science and Research CC by Creative Commons, CC-BY

Predictive Modeling of Metabolomics Data to Identify Potential Biomarkers in Renal Cell Carcinoma

*Corresponding author: Prasad V Bharatam, Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research (NIPER), Sector-67, S. A. S. Nagar (Mohali), India.

Received: August 06, 2024; Published: August 13, 2024

DOI: 10.34297/AJBSR.2024.23.003110

Abstract

Renal Cell Carcinoma (RCC) is a rare human cancer whose prevalence is rapidly rising, early recognition is essential for the effective disease treatment. A statistical approach for biomarkers identification using metabolomics data has been attempted in this work. The metabolomics data for RCC is extracted from metabolomics workbench database with study ID ST001706. The study consisted of 50 metabolites and 256 patients; 174 are normal individuals and 82 are RCC patients. From 50 Metabolites, top metabolites (as biomarkers) are identified using advanced statistical techniques such as the t-test, Principal Component Analysis (PCA) and Partial Least Square Analysis (PLS). These statistical results identified eight biomarkers i.e. Trigonelline, Hippuric acid, 4-hydroxyhippuric acid, 4-amino hippuric acid, Mannitol, Pyruvic acid, Scyllo-Insitol and Deoxycholic acid. The Gaussian software was used to obtain the 3D Structures of the metabolites and to calculate their electronic parameters. Relative quantification of these biomarkers was done using Heatmap. ROC Curve Analysis has been performed to characterize biomarkers early in RCC. The biological significance of identified top metabolites has been evaluated by identifying the metabolic pathways in which the metabolites are involved.

Keywords: Renal cell carcinoma, Biomarkers, Metabolomics

Abbreviations: RCC: Renal Cell Carcinoma; MIOX: Myo-Inositol Oxygenase; SMILES: Simplified Molecular Input Line Entry System; HOMO: Highest Occupied Molecular Orbital; LUMO: Lowest Unoccupied Molecular Orbital; AUROC: Area Under Receiver Operating Characteristics; LC–MS: Liquid Chromatography–Mass Spectrometry; AUC: Area Under Curve; ROC: Receiver Operator Characteristic; PCA: Principal Component Analysis; PLS: Partial Least Square; DFT: Density Functional Theory

Introduction

Metabolomics is a relatively new approach of evaluating the composition of a biofluid (plasma or urine) or tissue in which small molecule metabolites are studied. This collection of small molecules, when analyzed and interpreted, can provide a distinct signature that can be used for diagnosis as well as to determine gross metabolic differences between a normal and a diseased state [1]. Renal Cell Carcinoma (RCC) is another name for hypernephroma, renal cancer, or kidney cancer. It is the third most common urological oncology, accounting for 2-3% of all malignancies [2]. RCC is a fast-growing cancer that frequently spreads to other organs such as the lungs [2]. Among urogenital cancers, RCC has the highest mortality rate, and its prevalence has steadily increased. RCC is curable with surgery if detected in time, although a minority is at risk of recurrence [3]. Symptoms of RCC are usually absent in the early stages. As the disease progresses, the patient may develop symptoms such as a lump on the back, hematuria, lower back pain, unexplained weight loss, fatigue, anemia, and hypertension. Some risk factors for the disease include a family history of RCC, dialysis treatment, hypertension, obesity and cigarette smoking. RCC is treated with surgery, radiation therapy, chemotherapy, immunotherapy, and targeted therapy [4]. In metabolomic studies data-driven technology provides numerous insights into metabolic modelling and tends to help with pharmaceutical research, nutrition, and toxicity [5].

The metabolomics approach in addition to biomarker discovery can identify new druggable targets because understanding the metabolic disorder and altered biochemical pathways that occur with disease progression can provide insight into possible new treatments for that disease by identifying inhibitors of altered pathways among new and already existing drugs. As a result, metabolomics lends itself to a two-pronged approach to the clinical problem, addressing both disease symptoms and providing novel treatment methods [6]. Bifarin, et al., reported Machine Learning-Enabled Renal Cell Carcinoma Status Prediction Using Multiplatform Urine-Based Metabolomics: The study cohort consisted of 105 RCC patients and 179 controls separated into two subcohorts: the model cohort and the test cohort. The model cohort was used to choose discriminating features using univariate, wrapper, and embedded techniques [7]. Falegan, et al., Preoperative fasting urine and serum samples were collected from patients with clinical renal masses and metabolomics and multivariate statistical analysis were performed using 1H NMR and GCMS (gas chromatography-mass spectrometry). RCC had higher levels of glycolytic and Tricarboxylic Acid (TCA) cycle intermediates compared to benign masses [8]. Zheng, et al., used serum metabolome data from 104 participants, including healthy individuals and early-stage RCC patients, to train and validate the SOM model. For the early detection of RCC, a biomarker cluster of seven metabolites (alanine, creatine, choline, isoleucine, lactate, leucine, and valine) was identified. Using a biomarker cluster, the trained SOM model was able to classify 22 test subjects into the appropriate categories [9].

Bowei Xi, et al., published a chapter on statistical analysis and modelling of mass spectrometry-based metabolomics data, in which the multivariate statistical techniques were used in metabolomics studies, ranging from biomarker selection to model building and validation [10]. Ska, et al., conducted a classification study on diagnostic statistics double-check validation [11]. Kim, et al., found that quinolinate, 4-hydroxybenzoate, and gentisate are differentially expressed with a false discovery rate of 0.26, and these metabolites are implicated in common amino acid and oxidative metabolism pathways, which is consistent with high tumour protein breakdown and utilization and the Warburg effect [12]. Because targeted therapy for RCC has adverse effects, it is critical to identify potential targets for early diagnosis and treatment. Because metabolomics data is complex, there is a need for metabolomics research using statistical modelling to predict meaningful insights. This research used statistical methods and metabolomics data to find potential biomarkers for the early detection of Renal Cell Carcinoma. The following sections provide more information.

Methodology

Data Collection

Olatomiwa, OB; David, AG.; Machine Learning-Enabled Renal Cell Carcinoma Status Prediction Using Multiplatform Urine-Based Metabolomics, J. Proteome Res. 2021, 20, 7, 3629–3641-Olatomiwa O Bifarin, et al., Collected Data from liquid chromatography–mass spectrometry (LC–MS) and Nuclear Magnetic Resonance (NMR) and potential metabolomic panels for RCC were discovered using Machine Learning (ML). The study cohort consisted of 82 RCC patients and 174 controls. The data for this study is available in Metabolomics workbench database (https://www.metabolomicsworkbench.org/) with Study IDs: ST001705 and ST001706.6

Data Analysis

Various statistical techniques like median normalization, t-test, PCA, PLS were used to analyze the data.

Median Normalization assumes that the samples of a data set are separated by a constant. It scales the sample values on a common scale to have same median. By choosing the median instead of the mean, it helps to remove some of the outliers in the data [13]. The t-test is used to evaluate if a method influences both samples and if the groups are different from each other. An unpaired t-test compares the averages/means of two independent or unrelated groups to see whether there is a statistically significant difference between them [14]. Principal Component Analysis is a dimensionality reduction method for extracting important variables (in the form of Principal components) from many variables in a data set [15]. Instead of using the original data, Partial Least Squares (PLS) reduce the data to a smaller set of uncorrelated components and performs least squares regression on these components. It solves the multicollinearity problem by constructing latent vectors that explain both the independent and dependent variables. When more than one dependent variable needs to be predicted, this method is utilized [16].

The dataset obtained had a wider range of values initially the data was preprocessed using median normalization. The significant metabolites were then identified by applying unpaired t-test on the Normalized Data. Top metabolites were selected as those with a significance value (P value) of less than 0.05 and were employed in further analysis. Following the t test, the top significant metabolites were further analyzed using the PCA method. The principal components were identified, and the top 20 metabolites are extracted based on Variable Importance Number using statistical Software, which were then validated and considered as RCC biomarkers. Following PCA, the top metabolites from the t test were validated using the PLS method. The top metabolites reported after PCA and PLS were compared to the literature, and the common metabolites found in all three were considered as potential biomarkers, which were subsequently studied to learn more about their toxic effects in Renal Cell Carcinoma. Quantum chemical studies were performed on Identified Biomarkers to Study their Toxicity using Gaussian 09 Software. All the geometry optimizations were carried out using Density Functional Theory (DFT) employing 6-31+G(d) basis set and the B3LYP functional [17]. Then the global electrophilicity index (w) of identified biomarkers was calculated, which measures the energy of stabilization when an optimal electronic charge transfer from the environment to the system occurs. In order to prove the toxicity, Toxfree Tool18 was used and Biological Significance of Biomarkers associated with Renal Cell Carcinoma was studied by identifying the molecular pathways in which metabolites are involved.

Results and Discussion

Identifying Biomarkers using Predictive Statistical Modelling

Data Preprocessing by Median Normalization: The data preprocessing was done by imputing the missing values and Median Normalization. The missing values were identified, and they were replaced with median values. Median Normalization was carried out for each metabolite. Those Normalized Values were used in further Analysis to identify Biomarkers.

T- test: For the Normalized dataset unpaired t test was carried out using Statistical Analysis Module of MetaboAnalyst Tool19 (https://www.metaboanalyst.ca/). The metabolites having the p-value less than 0.05 are considered as the significant metabolites which have a significant effect on Renal Cell Carcinoma. Based on the p-value 33 metabolites are identified significant metabolites and those were taken for further processing (Table1).

Biomedical Science &, Research

Table 1: t-test results Showing 33 Significant Metabolites.

Principal Component Analysis (PCA): The principal component analysis was performed using statistical package statistic 13.3 27 (TIBCO Software Inc) on the top 33 metabolites obtained after t-test. The results of PCA in Statistic gave the significance values of metabolites in the power column which implies the probability of metabolite responsible for causing the disease and based on the value of power the ranking is given in the variable importance column (Table 2). The top 20 ranking metabolites were considered for further analysis Table 2.

Biomedical Science &, Research

Table 2: PCA Results Showing Top 20 Metabolites.

PLS Results: PLS analysis was performed to validate the results from PCA. The PLS was also Performed by statistica software and the variable importance of each metabolite in the dataset is given in VIP column which implies the significant values of metabolites. The ranking was provided according to the VIP column (Table 3). The metabolites with top 20 ranks were selected as significant metabolites for identification of biomarkers Table 3. The top metabolites obtained after both PCA and PLS were compared with the literature and the metabolites present in all three i.e. PCA, PLS and literature were considered as the potential biomarkers. They were 8 metabolites which were common those are Trigonelline, Hippuric acid, 4-hydroxy hippuric acid, 4-amino hippuric acid, Mannitol, Pyruvic acid, Scyllo-Insitol and Deoxycholic acid and these were further explored to know their toxic effects in Renal Cell Carcinoma.

Biomedical Science &, Research

Table 3: PLS Results Showing Top 20 Metabolites.

Electronic Structure Analysis of Identified Biomarkers

Quantum Chemical Calculations were performed using Gaussian 09 Software on 8 Biomarkers. All the Geometry Optimizations were carried out using Density Functional Theory (DFT) employing 6-31+G (d, p) basis set and the B3LYP Functional to get their HOMO and LUMO values which were further used for calculating global electrophilicity index (Table 4). The electrophilicity Index values of these biomarkers are High (>2), which denotes that these all are toxic metabolites and highly Responsible for the Disease. Hence electrophilicity is responsible for the observed RCC.

Biomedical Science &, Research

Table 4: Global Electrophilicity index values of Biomarkers.

Toxtree Tool

Based on the decision tree approach by applying Cramer rule to the SMILES notation of Biomarkers, Toxtree software gives the Class of Toxicity .The software makes the decision based on the information available in the literature and classify the metabolites as High toxic, Intermediate Toxic and Low toxic [18]. The toxicity of the top 8 metabolites according to Toxtree software is shown in Table 5.

Biomedical Science &, Research

Table 5: Toxicity of top metabolites according to Toxtree software.

Relative Quantification of Metabolites by Heat Map

Heatmap is used to identify features that are unusually high/low using stronger intensities of one color to represent lower levels of the variable and increasing intensities of a different color to represent higher levels [19]. Heat Map of Renal Cell Carcinoma Biomarkers was constructed using MetaboAnalyst 5.0 Tool (Figure 1) up regulated and down regulated metabolites can be seen clearly in Table 6.

The concentration of up regulated Metabolites like Mannitol, Deoxycholic Acid and Pyruvic acid is increased in RCC patients when compared to Normal Healthy Individuals. Similarly for Down Regulated Metabolites like Trigonelline, Scyllo-Insitol, 4-hydroxy hippuric acid, amino hippuric acid and hippuric acid, the concentration decreased in RCC patients when compared to Normal Healthy Individuals. From these variations in the concentrations of the metabolites in diseased patients in comparison to normal healthy individuals, it can be concluded that a set of metabolites are down regulated, while another set were unregulated significantly in diseased individuals which can help in prognosis of the Renal cell carcinoma These results were consistent with the existing literature.

Biomedical Science &, Research

Figure 1: Heat Map of RCC Biomarkers.

Biomedical Science &, Research

Table 6: Up regulated and down regulated metabolites.

ROC Curve Analysis

A Receiver Operating Characteristic (ROC) analysis was performed using Metaboanalyst 5.0 on these 8 biomarkers to assess their diagnostic accuracy and characterise them in early stage of RCC.20 The AUC, sensitivity, specificity, and 95% confidence intervals of the eight identified Potential urinary biomarkers for RCC early diagnosis are shown in Table 7. To evaluate the diagnostic accuracy of these identified potential urinary biomarkers for RCC, a predictive model for patient classification was constructed using each identified biomarker. Pyruvic acid and Deoxycholic acid showed an AUC Value of 0.854 (Figure 2) and 0.807 (Figure 3) respectively and regarded as best biomarkers having high predictive accuracy and better distinguish the Control and RCC group when compared to other 6 biomarkers. Overall, 8 metabolites showed clinical potential diagnostic value, with an AUC of 0.923 (Figure 4) and an 81 % Predictive Accuracy (Figure 5). It has been observed that combination of biomarkers is more helpful than single biomarkers for early diagnosis of the disease Table 7.

Biomedical Science &, Research

Table 7: ROC Curve Analysis.

Biomedical Science &, Research

Figure 1: AUC–ROC Curve of Pyruvic acid.

Biomedical Science &, Research

Figure 2: AUC–ROC Curve of Deoxycholic acid.

Biomedical Science &, Research

Figure 3: AUC–ROC Curve of All Metabolite Panel.

Biomedical Science &, Research

Figure 4: Predictive Accuracy with All Metabolite Panel.

The Metabolic Pathways in Which Identified Metabolites are Involved

Trigonelline: Márcia S Monteiro, et al., in ‘‘Nuclear Magnetic Resonance metabolomics reveals an excretory metabolic signature of renal cell carcinoma’’ Reported that Trigonelline could be related to certain foods (e.g. coffee), but it can also be produced by endogenous niacin methylation. Reduced excretion of Trigonelline was found to be reported in patients with liver cancer, ovarian cancer, pancreatic ductal adenocarcinoma, and lung cancer. This change, together with the decreasing tendency of trigonellinamide shows that nicotinate and nicotinamide metabolism is disturbed. Putative interpretation of the identified metabolites changing in RCC compared to controls revealed possible unspecific effects involving hippurate, trigonelline, and trigonellinamide, emphasizing the importance of diet and gut microflora, as well as nicotinate and nicotinamide metabolism and anti-oxidative mechanisms as less specific systemic cancer effects [20]. The Nicotinate and Nicotinamide Pathway in which Trigonelline metabolite is collected from KEGG Pathway Database

https://www.kegg.jp/kegg-bin/show_pathway?hsa00760.

Pyrvuic Acid: Márcia S Monteiro, et al., in ‘Nuclear Magnetic Resonance metabolomics reveals an excretory metabolic signature of renal cell carcinoma’ Reported that the increased levels of excreted pyruvic acid implies enhanced glycolysis activity. Increased glycolytic flux and altered TCA cycle function are well-known cancer hallmarks, affecting not only cellular energetic efficiency but also anabolic/biosynthetic efficiency, because intermediates in these pathways are diverted to the synthesis of proteins, nucleic acids, lipids, and cholesterol, and generally aid in the maintenance of cellular redox, genetic, and epigenetic status required for cancer cell proliferation [20]. The Glycolysis Pathway in which pyruvic acid metabolite is collected from KEGG Pathway Database

https://www.kegg.jp/kegg-bin/show_pathway?hsa00010

Deoxycholic Acid: Márcia S Monteiro, et al., in ‘‘Nuclear Magnetic Resonance metabolomics reveals an excretory metabolic signature of renal cell carcinoma’’ Reported that the presence of higher levels of Deoxycholic acid (Bile Acid) in both blood and urine of hepatocellular carcinoma patients as bile acid resonances implies that RCC has an influence on endogenous cholesterol metabolism. Bile acids are a significant indication of Liver injury. However, however at point, the exact connection between these compounds and RCC is unclear [20]. The Secondary Bile Acid Biosynthesis Pathway in which Deoxycholic acid metabolite is collected from KEGG Pathway Database

https://www.kegg.jp/kegg-bin/show_pathway?ko00121+K23231

Scyllo-Insitol /Myo-Inositol: Piotr Popławski, et al., in ‘‘Integrated transcriptomic and metabolomic analysis shows that disturbances in metabolism of tumor cells contribute to poor survival of RCC patients’’ reported that in RCC tumors Myo-inositol level was reduced by 16-fold and was associated with decreased MIOX. However, given the lower myo-inositol levels in RCC tumors, a decrease in MIOX expression may appear counter-intuitive. Decreased myo-inositol level in RCC tumors may be the result from increased excretion in urine. This study suggested that Myo-inositol and Myo-inositol pathway may offer attractive targets for potential treatment of RCC patients [21]. The Inositol Phosphate Pathway in which Myo-inositol metabolite is collected from KEGG Pathway Database

https://www.kegg.jp/kegg-bin/show_pathway?hsa00562

Mannitol: Leuthold, et al., conducted a comprehensive metabolomic and lipidomic profiling of human and porcine kidney tissue study using a LC-QTOF-MS analytical approach. PCA analysis showed differentiation in aqueous extracted metabolites from RCC compared to adjacent nontumor tissue. The metabolite level of metabolite Mannitol is increased in the RCC tissue. This Study concluded that Disruption in fructose and mannose metabolism pathway of Mannitol is responsible for RCC [22]. The Fructose and Mannose metabolism pathway in which Mannitol metabolite is collected from KEGG Pathway Database.

https://www.kegg.jp/kegg-bin/highlight_pathway?map=hsa00051

Hippuric Acid, 4-Hydroxyhippuric Acid and 4-Aminohippuric Acid: Daniela Rodrigues, et al., Márcia, Monteiro; Carmen, Jerónimo; Rui, Henrique; Luís, Belo; Maria de Lourdes Bastos, Paula Guedes de Pinho, Márcia, Carvalho; Renal cell carcinoma: a critical analysis of metabolomic biomarkers emerging from current model systems, Translational Research, 2017,180,1-11 Hippuric acid, 4-hydroxyhippuric acid and 4-aminohippuric acid, phenylalanine downstream metabolites showed potential as RCC biomarkers. It was found diminished in urine of RCC patients, suggesting that cancer cells rapidly metabolize it, which might cause impaired secretion into renal tubules. Hippuric acid was also found decreased in other renal diseases, emphasizing its poor specificity as an individual biomarker for RCC detection [23]. The Phenylalanine Pathway in which Hippuric acid, 4-hydroxyhippuric acid and 4-aminohippuric acid metabolites is collected from KEGG Pathway Database https://www.kegg.jp/kegg-bin/highlight_pathway?map=hsa00360

The above-mentioned metabolic pathways have been associated with Renal Cell Carcinoma in the literature, indicating that alterations in these pathways can cause Renal Cell Carcinoma.

Conclusion

Renal Cell Carcinoma (RCC) is a heterogeneous disease that is usually asymptomatic until late stage. There is an urgent need for RCC specific biomarkers identification that may be exploited clinically for diagnostic and prognostic purposes. In this study, metabolomic data was statistically explored for the identification of biomarkers in Renal Cell Carcinoma. The raw dataset obtained from Metabolomics Society was preprocessed using Metaboanalyst 5.0 , a web-based interface used for Metabolomics Data Analysis , and further statistical methods such as t-tests, PCAs, and PLS were applied using Statistic Software to identify Significant Metabolites. The top ranking eight metabolites obtained after both PCA and PLS were compared with the literature and the common metabolites present in all the three were considered as the potential biomarkers. Trigonelline, Hippuric acid, 4-hydroxy hippuric acid, 4-amino hippuric acid, Mannitol, Pyruvic acid, Scyllo-Insitol and Deoxycholic acid are considered as the Biomarkers of Renal Cell Carcinoma [24].

These metabolites were further explored to understand their effect on the body. The quantum chemical studies were performed using Gaussian Software to evaluate their Electronic Parameters and Toxicity. A Toxicity Prediction tool called Toxtree was used to classify the biomarkers based on toxicity, which showed 2 metabolites i.e. Trigonelline, Deoxycholic acid in the high toxic class (Class III) and 6 metabolites i.e. Hippuric acid, 4-hydroxy hippuric acid, 4-amino hippuric acid, Mannitol, Pyruvic acid and Scyllo-Insitol in Low Toxic Class (Class I). Relative Quantification using Heatmap showed that Mannitol, Deoxycholic Acid and Pyruvic acid are upregulated metabolites and Trigonelline, Scyllo-Insitol, 4 - hydroxy hippuric acid, 4- amino hippuric acid and hippuric acid are downregulated metabolites. From ROC curve analysis it has been observed that combination of biomarkers is more helpful than single biomarkers for early diagnosis of the disease. The toxic metabolites were further explored for their metabolic pathways and their association with Renal Cell Carcinoma using the literature. It can be predicted that the disturbance in the identified metabolic pathways due to these eight metabolites may be the cause of observed Renal Cell Carcinoma.

Funding

Thanks to Department of Biotechnology, Ministry of Science and Technology, Government of India under DBT project number - BT/PR40164/BTIS/137/17/2021.

Acknowledgements

None.

Conflict of Interest

None.

References

Sign up for Newsletter

Sign up for our newsletter to receive the latest updates. We respect your privacy and will never share your email address with anyone else.