Analyzing and exploring Graph Attention Networks and protein-based language models for predicting Porhyromonas gingivalis resistant efflux protein sequences

Yadalam, Pradeep Kumar; Natarajan, Prabhu Manickam; Shetty, Naresh; Marrapodi, Maria Maddalena; Uzunçıbuk, Hande; Cicciù, Marco; Minervini, Giuseppe

doi:10.17219/dmp/186143

Background. Antimicrobial resistance (AMR) must be predicted to combat antibiotic-resistant illnesses. Based on high-priority AMR genomes, it is possible to track resistance and focus treatment to stop global outbreaks. Large language models (LLMs) are essential for identifying Porhyromonas gingivalis multi-resistant efflux genes to prevent resistance. Antibiotic resistance is a serious problem; however, by studying specific bacterial genomes, we can predict how resistance develops and find better kinds of treatment. Objectives. This paper explores using advanced models to predict the sequences of proteins that make P. gingivalis resistant to treatment. Understanding this approach could help prevent AMR more effectively. Material and methods. This research utilized multi-drug-resistant efflux protein sequences from P. gingivalis, identified through UniProt ID A0A0K2J2N6_PORGN, and formatted as FASTA sequences for analysis. These sequences underwent rigorous detection and quality assurance processes to ensure their suitability for computational analysis. The study employed the DeepBIO framework, which integrates LLMs with deep attention networks to process FASTA sequences. Results. The analysis revealed that the Long Short-Term Memory (LSTM)-attention, ProtBERT and BERTGAT models achieved sensitivity scores of 0.9 across the board, with accuracy rates of 89.5%, 88.5% and 90.5%, respectively. These results highlight the effectiveness of the models in identifying P. gingivalis strains resistant to multiple drugs. Furthermore, the study assessed the specificity of the LSTM-attention, ProtBERT and BERTGAT models, which achieved scores of 0.89, 0.87 and 0.90, respectively. Specificity, or the genuine negative rate, measures the ability of a model to accurately identify non-resistant cases, which is crucial for minimizing false positives in AMR detection. Conclusions. When utilized clinically, this LLM approach will help prevent AMR, which is a global problem. Understanding this approach may enable researchers to develop more effective treatment strategies that target specific resistant genes, reducing the likelihood of resistance development. Ultimately, this approach could play a pivotal role in preventing AMR on a global scale.