OAR@UM Collection: /library/oar/handle/123456789/104240 Wed, 05 Nov 2025 05:08:39 GMT 2025-11-05T05:08:39Z Autoencoder-based voice conversion for ASR data augmentation /library/oar/handle/123456789/108015 Title: Autoencoder-based voice conversion for ASR data augmentation Abstract: Automatic Speech Recognition (ASR) has progressed rapidly in languages such as English since there are abundant amounts of available data. Understudied languages, such as Maltese, in common with other under-resourced languages, have less data available for system training. The state-of-the-art systems, based on deep neural architectures rely on the availability of large datasets running into hundreds of hours. Lately, the MASRI project at the University of Malta has compiled a corpus of around 8 hours of speech. This is the largest corpus of Maltese speech data for ASR, but is still insufficient for training on such deep neural architectures with the same expectation of success as other high resource languages. One possible way of augmenting the currently available training data for ASR in Maltese is to fine-tune the recogniser to work on a single target voice. Synthetic data produced by the FITA synthesiser can help to produce limitless hours of training data. However, all real speech utterances to be transcribed would therefore require voice conversion to the same target voice, which is then fed to the ASR. In this study, the possibility of utilising autoencoder-based voice conversion techniques to perform this operation was investigated. The first part of the work consisted of the creation of an aligned parallel dataset of Mel spectrograms of utterances from the multispeaker MASRI corpus and the Maltese speech synthesiser. In the second stage, several autoencoder architectures were trained using this parallel dataset in order to study the possibilities of converting several human voices to one target synthetic voice. Three different autoencoder architectures were selected from all architectures and were titled Basic-AE, U-Net-AE and AE-DNN. Unseen data from the created dataset was used to test the architectures. Each architecture was tested and evaluated by observing the output consisting of the Mel spectrogram and the corresponding audio predicted by each model, by Log Spectral Distortion (LSD) objective evaluation and Mean Opinion Score (MOS) subjective evaluation from a survey that tested voice similarity and intelligibility. Although direct comparison was not possible since different datasets were used, LSD results outperformed similar work using AE-DNN technique but MOS results were slightly lower than other similar works. Description: M.Sc. (Melit.) Sat, 01 Jan 2022 00:00:00 GMT /library/oar/handle/123456789/108015 2022-01-01T00:00:00Z Despeckling of synthetic aperture radar images /library/oar/handle/123456789/108012 Title: Despeckling of synthetic aperture radar images Abstract: Synthetic Aperture Radar (SAR) is an active imaging technique based on the transmission and backscattering of microwave signals. SAR imagery is commonly obtained through spaceborne systems and has many applications in areas such as topography, forestry and earthquake monitoring. When measuring the intensity of the scattered signals, different objects in the captured area produce unique scatters and in cases of high surface roughness (in comparison to the signal wavelength), coherent scatters create random constructive and destructive interference, which results in an undesirable noise known as speckle. This type of noise significantly degrades the quality of SAR images, which hinders the performance of analysis tasks such as classification and segmentation. This study first investigates the statistical characteristics of speckle noise, and compares the effects of commonly used pre-processing techniques on raw SAR data. To do this, Sentinel-1 images are collected from a range of dates, and then temporally averaged to get a clean estimate. By taking the ratio image between the noisy and clean images, the noise signal is extracted and the analysis is carried out. This fundamental understanding of the noise model is then used to generate synthetic speckle, and apply it to natural images to create a suitable dataset. A series of machine learning models are trained on the pairs of clean and noisy data, with a focus on integrating features from Convolutional Neural Networks (CNNs), Denoising Autoencoders (AEs) and residual neural networks. To evaluate the models’ performance, reference-based image quality metrics such as the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index (SSIM) are used, for images with synthetically generated noise. In cases where the ground truth is not available, metrics that are more common in remote sensing like the Equivalent Number of Looks (ENL) and the Mean of Ratio (MoR) are used instead. The models with the best performance are analysed further, and compared with modern despeckling implementations. From the chosen test set of 102 synthetically noisy images, the best trained model obtained a notable average PSNR of 26.3502 dB and an average SSIM of 0.9034. This performance also translates to the chosen SAR images, where decent smoothing and detail preservation were observed. Description: M.Sc. (Melit.) Sat, 01 Jan 2022 00:00:00 GMT /library/oar/handle/123456789/108012 2022-01-01T00:00:00Z A comparative study of algorithms utilizing predictive models in the scheduling of shared demand responsive transport /library/oar/handle/123456789/108011 Title: A comparative study of algorithms utilizing predictive models in the scheduling of shared demand responsive transport Abstract: The project investigates the effect of predicting future demand on the efficiency of Demand Responsive Transport (DRT) systems, defined as a service offered by a centralised operator that dispatches shared vehicles to serve on-demand travel requests. Efficient DRT systems promise to play a crucial part in mitigating climate change and traffic congestion in urban areas. A literature review of the research in this field is carried out. The methods reviewed are grouped into three classes; sampling-based approaches, routing-based approaches, and approaches with implicit consideration of future demand. The implementation of two variants of a sampling-based approach is described. To predict near-future demand, one variant uses a simple frequentist model and the other makes use of a state-of-the-art Neural Network model. These algorithms are then compared to the myopic baseline algorithm (re-implemented in this study) across a wide range of tests which consider variables such as the fleet size, vehicle capacity, and percentage of total ridesharing demand captured by the operator. The experiments are based on simulations using data from the NYC Taxi and Limousine Commission dataset. The simulations were carried out on the open-source Jargo simulator. The results show that in low-demand scenarios (when the operator captures around 5% of the demand in Manhattan), the demand-aware assignment algorithm can effectively replace algorithms that reactively rebalance the fleet. The demand-aware assignment algorithm was shown to maintain high service rates (>90%) while reducing the total distance travelled (when compared to the reactive rebalancing algorithm), therefore contributing to lower emissions and traffic congestion. On the other hand, in the case of simulations that consider higher demand scenarios, the demand-aware assignment algorithm did not fare any better than the rebalancing algorithm, with some cases even negatively affecting the overall efficiency. In conclusion, demand-aware algorithms are better suited during the low-demand periods, which constitute a rather understudied niche where machine learning could potentially be useful in improving the level of service by making more efficient use of the resources available. Description: M.Sc. (Melit.) Sat, 01 Jan 2022 00:00:00 GMT /library/oar/handle/123456789/108011 2022-01-01T00:00:00Z Development and optimisation of an IoT system /library/oar/handle/123456789/107856 Title: Development and optimisation of an IoT system Abstract: In recent years, Internet of Things (IoT) has been adopted into multiple applications, through which the monitoring of a specific environment can be supervised through a wireless network of sensors. Low-cost, off-the-shelf sensors nowadays have also advanced in terms of accuracy, reliability, and accessibility, of which are capable in being assigned within such a network. One of the persisting problems in this field is the lack of system optimisation through the integrated device’s process of operation, resulting in extra expenses such as excessive data transmission and battery consumption. This dissertation proposes a new approach of implementing an IoT based system using off-the-shelf devices, with a focus on device optimisation. This includes the means of communication and data aggregation methods between the devices distributed within the IoT system, mainly targeting the gateway. This statement was achieved by minimising the overall amount of data being transferred onto the data server by negating any redundant or repeated information collected through the sensing devices. The solution was applied to a vibration monitoring and sensing system, which can be easily replicated and a similar IoT system can be adapted for other similar scenarios. We observe that the optimised gateways are just as powerful as an unoptimised system, whilst saving up to 14% battery consumption over a 10-hour testing period per device. Description: M.Sc. (Melit.) Sat, 01 Jan 2022 00:00:00 GMT /library/oar/handle/123456789/107856 2022-01-01T00:00:00Z