Best Software for 2025 is now live!

How do I extract the paragraph position? the same way as we extract the positions in word documents.

1 comment
Looks like you’re not logged in.
Users need to be logged in to answer questions
Log In
KH
0
Hi Praveen, The most powerful way to extract a paragraph’s position and other data from a PDF document is the iText 7 add-on pdf2Data, which also has an online demo: https://pdf2data.online/ Maybe this Stack Overflow answer by iText’s Alexey Subach can help you: https://stackoverflow.com/questions/55807256/how-can-i-get-the-position-of-the-specified-keyword-in-itext7 While pdf2data is the optimal approach, you can do basic extractions with iText 7 Core using a regular expression: PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputFile)); ILocationExtractionStrategy strategy = new RegexBasedLocationExtractionStrategy("regular expression"); PdfCanvasProcessor canvasProcessor = new PdfCanvasProcessor(strategy); canvasProcessor.processPageContent(pdfDocument.getPage(1)); pdfDocument.close(); strategy.getResultantLocations(); // now contains all the locations of the matching text If you want an answer for your specific case, then it is better to post a more detailed question on Stack Overflow pointing out what you have tried and where you are stuck. If you have a commercial license, you will also have access to iText customer support over Jira. Kind regards, Kenneth Holvoet iText Software
Looks like you’re not logged in.
Users need to be logged in to write comments
Log In
Reply