How do I extract the paragraph position? the same way as we extract the positions in word documents.

Question

PV

Praveen kumar V.

--

How do I extract the paragraph position? the same way as we extract the positions in word documents.

Asked about 4 years ago

Other Integrated Development Environments (IDE)

Comment

1 comment

1

Looks like you’re not logged in.

Users need to be logged in to answer questions

Log In

Kenneth H. · Answer 1 · 2021-09-23T06:04:08-05:00

Hi Praveen, The most powerful way to extract a paragraph’s position and other data from a PDF document is the iText 7 add-on pdf2Data, which also has an online demo: https://pdf2data.online/ Maybe this Stack Overflow answer by iText’s Alexey Subach can help you: https://stackoverflow.com/questions/55807256/how-can-i-get-the-position-of-the-specified-keyword-in-itext7 While pdf2data is the optimal approach, you can do basic extractions with iText 7 Core using a regular expression: PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputFile)); ILocationExtractionStrategy strategy = new RegexBasedLocationExtractionStrategy("regular expression"); PdfCanvasProcessor canvasProcessor = new PdfCanvasProcessor(strategy); canvasProcessor.processPageContent(pdfDocument.getPage(1)); pdfDocument.close(); strategy.getResultantLocations(); // now contains all the locations of the matching text If you want an answer for your specific case, then it is better to post a more detailed question on Stack Overflow pointing out what you have tried and where you are stuck. If you have a commercial license, you will also have access to iText customer support over Jira. Kind regards, Kenneth Holvoet iText Software

How do I extract the paragraph position? the same way as we extract the positions in word documents.

About iText by Apryse