TY - JOUR
T1 - Document understanding for a broad class of documents
AU - Aiello, Marco
AU - Monz, Christof
AU - Todoran, Leon
AU - Worring, Marcel
N1 - Relation: http://www.rug.nl/informatica/organisatie/overorganisatie/iwi
Rights: University of Groningen. Research Institute for Mathematics and Computing Science (IWI)
PY - 2002
Y1 - 2002
N2 - We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal effectively with these information sources, we define a document representation general and flexible enough to represent complex documents. To handle such a broad document class, it uses generic document knowledge only, which is identified explicitly. The proposed system integrates components based on computer vision, artificial intelligence, and natural language processing techniques. The system is fully implemented and experimental results on heterogeneous collections of documents for each component and for the entire system are presented.
AB - We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal effectively with these information sources, we define a document representation general and flexible enough to represent complex documents. To handle such a broad document class, it uses generic document knowledge only, which is identified explicitly. The proposed system integrates components based on computer vision, artificial intelligence, and natural language processing techniques. The system is fully implemented and experimental results on heterogeneous collections of documents for each component and for the entire system are presented.
KW - Natural language processing
KW - Qualitative spatial reasoning
KW - Reading order detection
KW - Logical object classification
KW - Document understanding
U2 - 1007/s10032-002-0080-x
DO - 1007/s10032-002-0080-x
M3 - Article
VL - 5
SP - 1
EP - 16
JO - International Journal on Document Analysis and Recognition
JF - International Journal on Document Analysis and Recognition
SN - 1433-2825
ER -