Please use this identifier to cite or link to this item:
/library/oar/handle/123456789/122825| Title: | SADIP : semi-automated data integration system for protein databases |
| Authors: | Aquilina, Jurgen (2022) |
| Keywords: | Bioinformatics Databases SQL (Computer program language) |
| Issue Date: | 2022 |
| Citation: | Aquilina, J. (2022). SADIP: semi-automated data integration system for protein databases (Bachelor's dissertation). |
| Abstract: | Biologists must commonly combine information from different biological databases, by manually following cross-references (hyperlinks), using the distinct access methods and data formats provided by the databases. Past research in data integration has outlined several approaches which can integrate biological databases to provide a unified view. One approach is known as data warehousing. The current state of the art in biological data warehousing, requires bespoke software development and maintenance for each database. In our view, this is infeasible given the large number of constantly changing biological databases with varying access methods and data formats. This project aims to develop a tool which can automatically integrate biological information from different databases into a data warehouse, using user-defined configurations. This tool was applied to construct a property graph database with integrated information from 10 protein databases. This allows bioinformaticians to specify complex queries through the Standard Query Language (SQL). On top of this, a web-based user interface was developed which provides biologists with all integrated information related to a single protein identified by a UniProtKB identifier. The obtained results for the utilised configuration show that developing such a tool is feasible. However, the developed prototype requires further amendments to improve its flexibility, robustness, and security. Further results obtained show that the data warehouse provides biologists with a considerable amount of valuable information but should be extended to incorporate a wider variety of biological information. Finally, the results highlighted performance deficiencies for nested information and structural domains. |
| Description: | B.Sc. IT (Hons)(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/122825 |
| Appears in Collections: | Dissertations - FacICT - 2022 Dissertations - FacICTCIS - 2022 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2208ICTICT391305069209_1.PDF Restricted Access | 3.09 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
