I am working on an OCR project and need to create a dataset consisting of approximately 1247 pages from 6 books. I need to crop the images line by line and transcribe the text for training a model. What is the best approach for accurately detecting lines in these old book pages?
Best practices