How do I extract the paragraph position? the same way as we extract the positions in word documents.

Question

PV

Praveen kumar V.

--

How do I extract the paragraph position? the same way as we extract the positions in word documents.

Posée il y a environ 4 ans

Other Integrated Development Environments (IDE)

Commentaire

1 commentaire

1

On dirait que vous n'êtes pas connecté.

Les utilisateurs doivent être connectés pour répondre aux questions

Se connecter

Kenneth H. · Answer 1 · 2021-09-23T06:04:08-05:00

Bonjour Praveen, La manière la plus puissante d'extraire la position d'un paragraphe et d'autres données d'un document PDF est l'add-on pdf2Data d'iText 7, qui dispose également d'une démo en ligne : https://pdf2data.online/ Peut-être que cette réponse sur Stack Overflow par Alexey Subach d'iText peut vous aider : https://stackoverflow.com/questions/55807256/how-can-i-get-the-position-of-the-specified-keyword-in-itext7 Bien que pdf2data soit l'approche optimale, vous pouvez effectuer des extractions basiques avec iText 7 Core en utilisant une expression régulière : PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputFile)); ILocationExtractionStrategy strategy = new RegexBasedLocationExtractionStrategy("expression régulière"); PdfCanvasProcessor canvasProcessor = new PdfCanvasProcessor(strategy); canvasProcessor.processPageContent(pdfDocument.getPage(1)); pdfDocument.close(); strategy.getResultantLocations(); // contient maintenant tous les emplacements du texte correspondant Si vous avez une licence commerciale, vous aurez également accès au support client d'iText via Jira. Cordialement, Kenneth Holvoet iText Software

How do I extract the paragraph position? the same way as we extract the positions in word documents.

À propos de iText by Apryse