Display and Extract Text from PDFs in React.js Using `react-pdf`
To read text from a PDF file in a React.js application, you can use libraries like pdfjs-dist
or react-pdf
. Below, I’ll guide you through a simple example using the react-pdf
library which is a wrapper for pdfjs-dist
. This allows you to easily read and display PDF files. For extracting text specifically, you may need to rely directly on pdfjs-dist
.
Reading PDF using react-pdf
-
Install Dependencies
First, you need to install
react-pdf
andpdfjs-dist
.npm install @react-pdf-viewer/core npm install pdfjs-dist
-
Set Up Component
Create a component that will load and display the PDF file.
// src/components/PdfReader.js import React, { useState } from 'react'; import { Document, Page } from 'react-pdf/dist/esm/entry.webpack'; // Import PDF components const PdfReader = () => { const [file, setFile] = useState(null); const [numPages, setNumPages] = useState(null); const onDocumentLoadSuccess = ({ numPages }) => { setNumPages(numPages); }; const handleFileChange = (event) => { setFile(event.target.files[0]); }; return ( <div> <input type="file" onChange={handleFileChange} /> {file && ( <Document file={file} onLoadSuccess={onDocumentLoadSuccess}> {Array.from(new Array(numPages), (el, index) => ( <Page key={`page_${index + 1}`} pageNumber={index + 1} /> ))} </Document> )} </div> ); }; export default PdfReader;
-
Use the Component
Use the
PdfReader
component in your application.// src/App.js import React from 'react'; import PdfReader from './components/PdfReader'; function App() { return ( <div className="App"> <h1>PDF Reader</h1> <PdfReader /> </div> ); } export default App;
Extracting Text from PDF
If you specifically need to extract text content from PDF, you might work directly with pdfjs-dist
as react-pdf
is more tailored to viewing.
-
Read PDF using
pdfjs-dist
Using
pdfjs-dist
, you can access the PDF content and extract text.import React, { useState } from 'react'; import * as pdfjsLib from 'pdfjs-dist'; pdfjsLib.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.js`; const PdfTextExtractor = () => { const [text, setText] = useState(""); const extractText = async (file) => { const fileReader = new FileReader(); fileReader.onload = async function() { const typedarray = new Uint8Array(this.result); const pdf = await pdfjsLib.getDocument(typedarray).promise; let extractedText = ''; for (let i = 1; i <= pdf.numPages; i++) { const page = await pdf.getPage(i); const textContent = await page.getTextContent(); extractedText += textContent.items.map(item => item.str).join(' '); } setText(extractedText); }; fileReader.readAsArrayBuffer(file); }; const handleFileChange = (event) => { extractText(event.target.files[0]); }; return ( <div> <input type="file" onChange={handleFileChange} /> <div> <h3>Extracted Text:</h3> <p>{text}</p> </div> </div> ); }; export default PdfTextExtractor;
-
Use the Text Extractor Component
Integrate this into your app similarly to how you set up the
PdfReader
.
Using either approach, you should now be able to either view or extract text from a PDF file in your React.js application. Note that working with PDFs can be resource-intensive, and some complex PDFs might not render precisely due to limitations in text extraction libraries.