Please use this identifier to cite or link to this item:
/library/oar/handle/123456789/140085| Title: | Analysing script features for performance forecasting in film production : a data-driven approach leveraging NLP |
| Authors: | Cini, Karl (2025) |
| Keywords: | Motion picture plays Natural language processing (Computer science) Machine Learning Regression analysis Sentiment analysis |
| Issue Date: | 2025 |
| Citation: | °ä¾±²Ô¾±,³æ20;°.³æ20;(2025).³æ20;´¡²Ô²¹±ô²â²õ¾±²Ô²µ³æ20;²õ³¦°ù¾±±è³Ù³æ20;´Ú±ð²¹³Ù³Ü°ù±ð²õ³æ20;´Ú´Ç°ù³æ20;±è±ð°ù´Ú´Ç°ù³¾²¹²Ô³¦±ð³æ20;´Ú´Ç°ù±ð³¦²¹²õ³Ù¾±²Ô²µ³æ20;¾±²Ô³æ20;´Ú¾±±ô³¾³æ20;±è°ù´Ç»å³Ü³¦³Ù¾±´Ç²Ô³æ20;:³æ20;²¹³æ20;»å²¹³Ù²¹-»å°ù¾±±¹±ð²Ô³æ20;²¹±è±è°ù´Ç²¹³¦³ó³æ20;±ô±ð±¹±ð°ù²¹²µ¾±²Ô²µ³æ20;±·³¢±Ê³æ20;(²Ñ²¹²õ³Ù±ð°ù’s³æ20;»å¾±²õ²õ±ð°ù³Ù²¹³Ù¾±´Ç²Ô). |
| Abstract: | The film industry has always been an important entertainment avenue for audiences of all ages. Following the impact of the COVID-19 pandemic, the industry has undergone significant changes due to the rise of streaming platforms, yet thousands of films are produced every year and billions of dollars are generated annually worldwide. With the number of viewers expected to reach 1.9billion by 2029 the film industry remains a strong inspiration for writers and producers alike. Despite the shift to streaming, demand for good quality scripts remains a core element of this industry, rendering the screenplay a pivotal tool in deciding whether to green light a movie or not. This dissertation explores the application of Natural Language Processing (NLP) and Machine Learning (ML) techniques to analyse movie scripts, with the aim of extracting valuable insights and patterns that are able to predict the audience rating as collated by the Internet Movie Database (IMDb). This research investigates methods for concatenating features extracted from scripts to known information at evaluation stage and compile a vector that will be the basis for training an ML model. It attempts to address the problem faced by producers in deciding in which movies to invest their funds. By providing a sound method to sift through and rank the various script projects presented to them, they can focus on scripts that are likely to perform better. Methods adopted in this research include the use of lexicons for the extraction of linguistic features, the analysis of emotional arcs in movies, embedding strategies for the script and statistical features generated from sentiment analysis. These features are concatenated to writer, producer, director and actor specific factors to train various regression models. Using a forward rolling window training strategy, the models are tested on previously unseen data achieving, using the best performing model, an R2 of 0.5255, a MAPE of 0.1183, an RMSE of 0.7604, and a MAE of 0.5859 for regression metrics, and a one-away score of 0.8476, accuracy of 0.8375, and an F1-score of 0.7913 for the classification section. |
| Description: | M.Sc.(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/140085 |
| Appears in Collections: | Dissertations - FacICT - 2025 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2518ICTICT501205075940_1.PDF | 4.65 MB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
