Display and Extract Text from PDFs in React.js Using `react-pdf`
To read text from a PDF file in a React.js application, you can use libraries like pdfjs-dist or react-pdf. Below, I’ll guide you through a simple example using the react-pdf library which is a wrapper for pdfjs-dist. This allows you to easily read and display PDF files. For extracting text specifically, you may need to rely directly on pdfjs-dist.
Reading PDF using react-pdf
-
Install Dependencies
First, you need to install
react-pdfandpdfjs-dist.npm install @react-pdf-viewer/core npm install pdfjs-dist -
Set Up Component
Create a component that will load and display the PDF file.
// src/components/PdfReader.js import React, { useState } from 'react'; import { Document, Page } from 'react-pdf/dist/esm/entry.webpack'; // Import PDF components const PdfReader = () => { const [file, setFile] = useState(null); const [numPages, setNumPages] = useState(null); const onDocumentLoadSuccess = ({ numPages }) => { setNumPages(numPages); }; const handleFileChange = (event) => { setFile(event.target.files[0]); }; return ( <div> <input type="file" onChange={handleFileChange} /> {file && ( <Document file={file} onLoadSuccess={onDocumentLoadSuccess}> {Array.from(new Array(numPages), (el, index) => ( <Page key={`page_${index + 1}`} pageNumber={index + 1} /> ))} </Document> )} </div> ); }; export default PdfReader; -
Use the Component
Use the
PdfReadercomponent in your application.// src/App.js import React from 'react'; import PdfReader from './components/PdfReader'; function App() { return ( <div className="App"> <h1>PDF Reader</h1> <PdfReader /> </div> ); } export default App;
Extracting Text from PDF
If you specifically need to extract text content from PDF, you might work directly with pdfjs-dist as react-pdf is more tailored to viewing.
-
Read PDF using
pdfjs-distUsing
pdfjs-dist, you can access the PDF content and extract text.import React, { useState } from 'react'; import * as pdfjsLib from 'pdfjs-dist'; pdfjsLib.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.js`; const PdfTextExtractor = () => { const [text, setText] = useState(""); const extractText = async (file) => { const fileReader = new FileReader(); fileReader.onload = async function() { const typedarray = new Uint8Array(this.result); const pdf = await pdfjsLib.getDocument(typedarray).promise; let extractedText = ''; for (let i = 1; i <= pdf.numPages; i++) { const page = await pdf.getPage(i); const textContent = await page.getTextContent(); extractedText += textContent.items.map(item => item.str).join(' '); } setText(extractedText); }; fileReader.readAsArrayBuffer(file); }; const handleFileChange = (event) => { extractText(event.target.files[0]); }; return ( <div> <input type="file" onChange={handleFileChange} /> <div> <h3>Extracted Text:</h3> <p>{text}</p> </div> </div> ); }; export default PdfTextExtractor; -
Use the Text Extractor Component
Integrate this into your app similarly to how you set up the
PdfReader.
Using either approach, you should now be able to either view or extract text from a PDF file in your React.js application. Note that working with PDFs can be resource-intensive, and some complex PDFs might not render precisely due to limitations in text extraction libraries.