This is neat. Over Docsumo, I've had fun to build one of the pipelines [0] to extract tables from any kinds of documents.
Our older pipelines use image-processing-based approaches. However, they had too much assumptions in them (for instance, header texts, column types, etc).
Now, we've moved onto to ML-based approach to train generic models that can be applied to variety of documents for table structure recognition.
Now, we've moved onto to ML-based approach to train generic models that can be applied to variety of documents for table structure recognition.
[0] - https://docsumo.com/free-tools/extract-tables-from-pdf-image...