With money laundering presenting a threat to the world economy, global Anti-Money Laundering (AML) guidelines were drawn to address and to a certain extent prevent the damage caused by this activity.
In fact, the EU has recently adopted the Fifth Anti-Money Laundering Directive’ (5AMLD) to counter the financing of terrorism and money-laundering, a law which applies to exchanges, allowing users to trade crypto-to-crypto or crypto-to-fiat and wallet service providers.
Given the rise in popularity of cryptocurrencies, new algorithms have been developed in order to detect money laundering quicker and more effectively, adding more layers of complexity in the systems used.
A paper written by Dylan Vassallo and Dr Vincent Vella (from the Department of Artificial Intelligence) and Dr Joshua Ellul (from the Centre for Distributed Ledger Technologies and the Department of Computer Science, titled “Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies”, identifies three areas which require further exploration in this regard.
Firstly, there is no comparative analysis investigating tree-based ensembles, which are the type of models used in this domain.
Secondly, the techniques involved in applied data-sampling either lead to the removal of useful information in the case of random under sampling, or an increase in overfitting as exact copies are created in the case of random oversampling.
Thirdly, the model drift in cryptocurrency transactional data is often overlooked.
The article sheds light on some techniques to help mitigate this problem by comparing the performance of state-of-the-art online and offline gradient boosting algorithms, to improve the detection of illicit activities through data sampling techniques and by handling concept drift more effectively.
The researchers propose an innovative adaptation of XGBoost, coined as Adaptive Stacked eXtreme Gradient Boosting (ASXGB), that improves the handling of concept drift.
By comparing ASXGB against state-of-the-art adaptive learners, they plan to reduce the false-negative rate.
It is proposed that future work should also take into consideration potential memory issues attributed to the proposed method.
You may read the paper, published by Springer Link, in its entirety .
All the software developed leading to this study has been open-sourced and made publicly available on GitHub, together with sampled data found on Google Drive.
