OAR@UM Collection:/library/oar/handle/123456789/1184762026-06-22T14:21:32Z2026-06-22T14:21:32ZBERT for sentiment analysis of Japanese Twitter/library/oar/handle/123456789/1301082024-12-20T14:05:13Z2024-01-01T00:00:00ZTitle: BERT for sentiment analysis of Japanese Twitter
Abstract: This publication introduces novel, open-source resources for sentiment analysis on Japanese Twitter. BERT for Japanese Twitter is a pre-trained model that is highly competent in the target domain and adaptable to a variety of tasks. Japanese Twitter Sentiment 1k (JTS1k) is a compact sentiment analysis dataset optimized for balance and reliability. This combination of pre-trained model and dataset was used to fine-tune a sentiment analysis model that broadly applies to Japanese social networking services (SNS): BERT for Japanese SNS Sentiment. The primary focus of this project is domain adaptation. Using an established Japanese BERT model as a foundation, domain adaptation was achieved by optimizing the vocabulary and continuing pre-training on a large Twitter corpus. Similar methodology was used to develop Twitter Multilingual RoBERTa (XLM-T) (Barbieri et al., 2022), which is the state-of-the-art multilingual Twitter model. By using a monolingual approach, this study developed a more efficient model that outperformed XLM-T in the target language. This project explored fundamental elements of corpus construction, corpus refinement, dataset annotation, preprocessing, pre-training, fine-tuning, and benchmarking. It concludes with a demonstration that the sentiment model is valid, useful, and sensitive to changes in public sentiment that correlate with real-world events.
Description: M.A.(Melit.)2024-01-01T00:00:00ZVisually grounded language generation : data, models and explanations beyond descriptive captions/library/oar/handle/123456789/1277732024-10-21T07:16:35Z2024-01-01T00:00:00ZTitle: Visually grounded language generation : data, models and explanations beyond descriptive captions
Abstract: Vision and Language are two essential capabilities by which we can talk about what we see and communicate it to others, ultimately allowing us to perform tasks, and understand the world. Modeling such interaction is critical to creating agents able to understand, at least to some extent, the world we perceive. This challenge is generally known as multimodal grounding and corresponds to the capability of a model to create meaningful connections between different modalities to solve a task. Ungrounded models do not properly interleave the two modalities yet they can perform well on downstream tasks, leading to misleading and potentially harmful behaviors. Among other fields, Explainable Artificial Intelligence research has moved forward in recent years, proposing methods able to help scrutinize the inner workings of these models and therefore, also assess their grounding capabilities. However, these methods have some relevant limitations, especially on generative models and they are still unpopular in Vision and Language research. Vision and Language research has mostly focused on performing and evaluating tasks involving the identification and recognition of objects and entities, as they represent the most basic meaningful information represented in a visual scene that can be used as a building block to compose complex multimodal relations, especially on the visual modality. However, in the textual modality, objects represent only a limited amount of linguistic information as language is enriched by words and expressions that do not always correspond to concrete physical objects. Some linguistic expressions can represent complex contexts and situational knowledge that goes beyond the objects visible in the images. For example, describing a picture as a “picnic” (high-level) triggers a whole set of expectations about the scene, making the mention of the objects and entities, totally redundant and uninformative e.g. “people eating food on the grass” (low-level). The latter description is object-centric and it is most likely generated by an automatic captioning system, whereas the former is more human-like and naturally used by humans. The general lack of interest in this relevant aspect by the research community created a potential gap in the overall assessment of the capability of the large-scale models to fully understand the “language”, in the “vision and language”, preventing a potential gain in terms of overall output quality for multimodal models in generative settings. In this thesis, we dive into this direction with the aim to discover whether large pre-trained Vision and Language models can handle high-level linguistic descriptions and to what extent they are able to effectively ground them into the visual modality; implications for both language understanding and generation are of interest in this work. Moving away from object-centric descriptions we potentially change the paradigm used to assess multimodal grounding. We analyze potential changes in terms of tasks and evaluation methods introducing an explainability framework designed to complement the currently available tools to assess models’ multimodal grounding capabilities in generative settings.
Description: Ph.D.(Melit.)2024-01-01T00:00:00ZXenophobic hate speech in Malta : a critical discourse analytic perspective of Times of Malta comments/library/oar/handle/123456789/1277712025-02-17T13:34:47Z2024-01-01T00:00:00ZTitle: Xenophobic hate speech in Malta : a critical discourse analytic perspective of Times of Malta comments
Abstract: This thesis seeks to uncover the ideologies and values imbued within the language used to describe migrants in below-the-line newspaper comment data in Malta. More precisely, it seeks to understand how the representation of social actors involved in discourse about migration reveals axiological information and stance pertaining to migrants. The complexity of this research goal necessitates an interdisciplinary methodological approach within which to frame the analysis. Hence, the research of this thesis is embedded within Critical Discourse Analysis and Corpus Assisted Discourse Studies. A number of corpora were constructed using data from the Times of Malta online newspaper forum to investigate the linguistic constructions of the social actors represented in the data, and the way that those constructions reveal ideologies and values therewith. Specifically, topic modelling was used to extract a subset of the Times of Malta comment data directly pertaining to migration, while multiple annotators were used to additionally formulate a sub-corpus of xenophobic hate speech therein. Subsequently, the data were analysed within the scope of van Leeuwen’s (2008) semantic representation of social actors, and Halliday and Matthiessen’s (2014) system of transitivity. Further, following the analysis of the annotated hate speech data, broader generalisations were made using corpus methods in an attempt to extrapolate the findings to the full Times of Malta dataset. Through the analysis, this thesis shows that the Maltese are represented as the undisputed in-group – they are patriots and protectors, while a specific group of migrants are the definite out-group, who are not welcome on the island. The language used to describe this latter group consistently represents them in a position of subordination through which they are described as unwelcome guests who should return from whence they came. The Maltese, on the other hand, are consistently represented as the dominant group whose resources are being depleted by these undesirable people. In this respect, the discourse examined offers valuable insight into real-world treatment of the out-group whereby they are excluded from housing and other resources, in addition to facing discrimination daily.
Description: Ph.D.(Melit.)2024-01-01T00:00:00ZGetting to the root of the Maltese broken plural/library/oar/handle/123456789/1190742024-03-06T10:19:13Z2024-01-01T00:00:00ZTitle: Getting to the root of the Maltese broken plural
Abstract: This study argues that the Maltese broken plural is derived from a tri- or quadriliteral root, as opposed to from an existing word from. Additionally, this study argues that the ‘pattern’ (that is, the proposed skeletal CV morph) is not a morph, but rather an epiphenomenon of the derivation. To support these arguments, the present study sketches a decompositional, late-insertionist derivation of the Maltese broken plural utilizing the frameworks of Distributed Morphology and Optimality Theory. It is argued that the [+plural] feature projects in two different nodes in the morphosyntax (in the n head and in the Num head), resulting in the derivation of either a sound plural or a broken plural. Vocalic melody allomorphs are specified to a set of root morphemes and compete with one another for insertion at Spell-Out. On the phonological branch of the derivation, Optimality Theory is able to capture the attested variation in prosodic structure of the broken plurals by positioning the vocalic melodies within the root morph, as per the constraints on syllabic well-formedness. Thus, it is the interaction between the constraints, vocalic melody, and root that give rise to prosodic variation, not a ‘pattern’ morph.
Description: M.A.(Melit.)2024-01-01T00:00:00Z