Date: 19 April 2023
Time: 12:00 (noon)
Venue: Online (via )
Speaker: Dr Petr Simecek
The realm of natural language processing (NLP) has witnessed a revolution with the advent of massive language models, such as GPT3.5, OPT, and BLOOM. Recently, similar neural network architectures have been adapted to genomics and proteomics, paving the way for advancements in these domains. In this presentation, we will discuss existing DNA and protein language models, namely DNABert, ProtBertBFD, and ESM2, and illustrate how they can be tuned to specific objectives. Furthermore, we will elucidate how the model's embeddings encapsulate both evolutionary and functional information, highlighting their significance. To conclude, we will demonstrate this methodology by addressing the problem of detecting a topological knot on the protein backbone. Precisely, we will classify proteins to be knotted or not based solely on their sequence.
Please register via the .
The Data Science Research Platform (DSRP) at the University of Malta conducts research in the interdisciplinary field of data science. The scope of the group is to use signal processing, machine learning and statistics to develop innovative techniques and to extract useful knowledge from various data sources in an effective manner to benefit the wider public.
For more information about the DSRP, please visit the UM webpage.
To receive notifications about future events organised by the DSRP, please .

 
								 
								