Deep Learning for Effective Classification and Information Extraction of Financial Documents

Valentin-Adrian Serbanescu, Maruf A. Dhali*

*Corresponding author voor dit werk

OnderzoeksoutputAcademicpeer review

Samenvatting

The financial and accounting sectors are encountering increased demands to effectively manage large volumes of documents in today’s digital environment. Meeting this demand is crucial for accurate archiving, maintaining efficiency and competitiveness, and ensuring operational excellence in the industry. This study proposes and analyzes machine learning-based pipelines to effectively classify and extract information from scanned and photographed financial documents, such as invoices, receipts, bank statements, etc. It also addresses the challenges associated with financial document processing using deep learning techniques. This research explores several models, including LeNet5, VGG19, and MobileNetV2 for document classification and RoBERTa, LayoutLMv3, and GraphDoc for information extraction. The models are trained and tested on financial documents from previously available benchmark datasets and a new dataset with financial documents in Romanian. Results show MobileNetV2 excels in classification tasks (with accuracies of 99.24% with data augmentation and 93.33% without augmentation), while RoBERTa and LayoutLMv3 lead in extraction tasks (with F1-scores of 0.7761 and 0.7426, respectively). Despite the challenges posed by the imbalanced dataset and cross-language documents, the proposed pipeline shows potential for automating the processing of financial documents in the relevant sectors.
Originele taal-2English
TitelProceedings of the 14th International Conference on Pattern Recognition Applications and Methods
RedacteurenModesto Castrillon-Santana, Maria De Marsico, Ana Fred
Plaats van productiePorto, Portugal
UitgeverijSciTePress
Pagina's749-756
Aantal pagina's7
Volume1
ISBN van elektronische versie978-989-758-730-6
DOI's
StatusPublished - 2025

Publicatie series

Naam
ISSN van elektronische versie2184-4313

Vingerafdruk

Duik in de onderzoeksthema's van 'Deep Learning for Effective Classification and Information Extraction of Financial Documents'. Samen vormen ze een unieke vingerafdruk.

Citeer dit