OAR@UM Collection:

Sparse vector autoregression with application to multivariate cryptocurrency time series

2019-01-01T00:00:00Z

Title: Sparse vector autoregression with application to multivariate cryptocurrency time series Abstract: The vector autoregressive (VAR) model as proposed by Christopher A. Sims ( l980b) has been widely used in the context of high dimensional time series. It has been praised for its ability to capture temporal and cross-sectional dependencies which may exist between different time series. However, over the years, improvements were made to address the problem of noisy estimates caused by correlations which are insignificant. We mention the Least Absolute Shrinkage Selection Operator (LASSO) method of simultaneously estimating and regularizing the VAR models with the aim of reducing the insignificant parameters of the V AR coefficient matrices. We will look into the estimation procedures and properties of unregularized VAR and see how they compare to unregularized VAR models, also known as Sparse-VAR. Amongst the properties, we discuss Granger causality and its implications on the multivariate time series. The performance of these models is illustrated by applying them to the scenario of time series of cryptocurrency prices. We study the main differences between the unregularized and regularized VAR models and, for the latter, analyse the effects different values of the LASSO shrinkage parameter have on the estimated VAR transition matrices. We also see how the different models interpret the dependencies between different cryptocurrencies and confirm whether historical values of one cryptocurrency have any impact on predicting other cryptocurrency prices. We proceed with applying time series cross-validation on the available dataset for the purpose of comparing the predictive performance of the unregularized and regularized models. The findings indicate that sparse-VAR is able to make slight improvements in the quality of the forecasts produced. We also see how the method of estimating the LASSO shrinkage parameter also plays an important part in the improvement of prediction errors. Description: B.SC.(HONS)STATS.&OP.RESEARCH

A study on sparse methods for PLS-DA

2019-01-01T00:00:00Z

Title: A study on sparse methods for PLS-DA Abstract: The term Discriminant Analysis (DA) refers to a collection of multivariate statistical techniques used to classify entities into a number of pre-defined groups. DA techniques follow two main steps being the discrimination step and the classification step. The former step involves the formation of a boundary which maximizes separation between the groups considered. The latter step then uses the information obtained from the discrimination step to predict the group membership of any new entities. The main focus will be on Fisher's Linear Discriminant Analysis (LDA), which considers a linear boundary for separation between groups. LDA encounters a number of issues such as the presence of multicollinearity in the attributes when dealing with high-dimensional data, where the sample size n is smaller than the number of attributes p. A possible solution for this problem is to introduce regularization techniques such as Dimension Reduction methods (DR) that reduce the p-dimensional attributes to a lower q dimension, where q < p. Amongst the most popular of this group of methods is the Partial Least Squares (PLS) method, which extends LDA to a high dimensional setting. This hybrid method is known as PLSDA and it is the main protagonist of this study. Further modification on PLS-DA is considered through a concept known as sparsity. Sparsity in PLS-DA involves the application of penalization methods such as LASSO and Ridge Regression to shrink and select the most influential attributes, producing a technique known as Sparse Partial Least Squares Discriminant Analysis. There are two different sparse PLS-DA methods known as SPLSDA and sPLS-DA, which differ in the order of variable selection, dimension reduction and classification. We refer to them as Sparse Method 1 and Sparse Method 2, respectively. Various measures of the classification ability and parameter estimates chosen are discussed and applied to two real data sets to determine if sparsity improves classification ability and interpretability, and whether there is a difference in performance for both Sparse Methods. Description: B.SC.(HONS)STATS.&OP.RESEARCH

Using tree-based methods for churn-related problems in I-Gaming

2019-01-01T00:00:00Z

Title: Using tree-based methods for churn-related problems in I-Gaming Abstract: Customer churn occurs when an existing client stops doing business with a company. For example, this could mean closing some type of account, cancelling a subscription or membership, or not renewing a contract. In the betting industry, one main reason for churning is self-exclusion. The focus in this dissertation is on the use of tree-based methods for classification, specifically for classifying the reasons for customer churn, and self-exclusion. The theory behind the classification and regression tree algorithm will be discussed, particularly classification trees. Decision trees will be used as building blocks to understand the theory behind random forests and boosting. These two techniques are constructed from an ensemble of decision trees. The performance of the classification methods mentioned will be explored by applying them on two real-life datasets coming from the betting industry. The performance of these three techniques will be compared to the performance of the benchmark of statistical models for classification - logistic regression. Description: B.SC.(HONS)STATS.&OP.RESEARCH

Analysing dichotomous and polytomous responses to items related to xenophobia using item response theory

2019-01-01T00:00:00Z

Title: Analysing dichotomous and polytomous responses to items related to xenophobia using item response theory Abstract: Item response theory (IRT), have many research applications, particularly in psychology. The idea behind IRT is that the probability of a response to an item is a mathematical function of person and item parameters. The person parameter is a single latent trait which cannot be measured directly, including personality trait such as attitude, ability, perception and behaviour. The item parameters include the difficulty of the item (known as the ‘location’, which represents its location on the difficulty scale) and the discrimination of the item (known as the ‘slope’, which represents how steeply individuals’ responses to an item vary with their latent personality trait). There are different types of IRT models including dichotomous and multichotomous IRT models. The former are appropriate when the response to an item has two possible categories. These include the 1-parameter (1-PL) and 2-parameter (2-PL) logistic models, known as Rasch models. The latter are appropriate when the response to an item has an ordinal categorical (Likert) scale. These include the Partial Credit model (PCM) and the Rating Scale model (RSM). A number of local and foreign participants will be asked to respond to a number of items related to xenophobia, including other demographic and psychographic details. All the above models will be fitted to this dataset using the facilities of GLLAMM, which is a subroutine of STATA®. Description: M.SC.STATISTICS