Table of contents detection in financial documents

Published in COLING, 2020

Recommended citation: Kosmajac, Dijana and Saeidi, Mozhgan and Taylor, Stacey. (2020). "Table of contents detection in financial documents." booktitle={Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation}, pages={169--173}, year={2020} Journal 1. 1(2). https://aclanthology.org/2020.fnp-1.29/

This paper is about specific charactristics in detecting document structure. Title Detection and Table of Contents Generation are two of these important components. In this paper, we show that using tesseract with Levenstein distance, we were able to correctly classify the title to an F1 measure 0.73 and 0.87, and the table-of-contents to a harmonic mean of 0.36 and 0.39, in English and French respectively.