Display and Extract Text from PDFs in React.js Using `react-pdf`

To read text from a PDF file in a React.js application, you can use libraries like pdfjs-dist or react-pdf. Below, I’ll guide you through a simple example using the react-pdf library which is a wrapper for pdfjs-dist. This allows you to easily read and display PDF files. For extracting text specifically, you may need to rely directly on pdfjs-dist.

Reading PDF using `react-pdf`

Install Dependencies

First, you need to install react-pdf and pdfjs-dist.
```
npm install @react-pdf-viewer/core
npm install pdfjs-dist
```

Set Up Component

Create a component that will load and display the PDF file.

// src/components/PdfReader.js
import React, { useState } from 'react';
import { Document, Page } from 'react-pdf/dist/esm/entry.webpack'; // Import PDF components

const PdfReader = () => {
  const [file, setFile] = useState(null);
  const [numPages, setNumPages] = useState(null);

  const onDocumentLoadSuccess = ({ numPages }) => {
    setNumPages(numPages);
  };

  const handleFileChange = (event) => {
    setFile(event.target.files[0]);
  };

  return (
    <div>
      <input type="file" onChange={handleFileChange} />
      {file && (
        <Document file={file} onLoadSuccess={onDocumentLoadSuccess}>
          {Array.from(new Array(numPages), (el, index) => (
            <Page key={`page_${index + 1}`} pageNumber={index + 1} />
          ))}
        </Document>
      )}
    </div>
  );
};

export default PdfReader;

Use the Component

Use the PdfReader component in your application.

// src/App.js
import React from 'react';
import PdfReader from './components/PdfReader';

function App() {
  return (
    <div className="App">
      <h1>PDF Reader</h1>
      <PdfReader />
    </div>
  );
}

export default App;

Extracting Text from PDF

If you specifically need to extract text content from PDF, you might work directly with pdfjs-dist as react-pdf is more tailored to viewing.

Read PDF using pdfjs-dist

Using pdfjs-dist, you can access the PDF content and extract text.

import React, { useState } from 'react';
import * as pdfjsLib from 'pdfjs-dist';

pdfjsLib.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.js`;

const PdfTextExtractor = () => {
  const [text, setText] = useState("");

  const extractText = async (file) => {
    const fileReader = new FileReader();
    fileReader.onload = async function() {
      const typedarray = new Uint8Array(this.result);
      const pdf = await pdfjsLib.getDocument(typedarray).promise;

      let extractedText = '';
      for (let i = 1; i <= pdf.numPages; i++) {
        const page = await pdf.getPage(i);
        const textContent = await page.getTextContent();
        extractedText += textContent.items.map(item => item.str).join(' ');
      }
      setText(extractedText);
    };
    fileReader.readAsArrayBuffer(file);
  };
  
  const handleFileChange = (event) => {
    extractText(event.target.files[0]);
  };
  
  return (
    <div>
      <input type="file" onChange={handleFileChange} />
      <div>
        <h3>Extracted Text:</h3>
        <p>{text}</p>
      </div>
    </div>
  );
};

export default PdfTextExtractor;

Use the Text Extractor Component

Integrate this into your app similarly to how you set up the PdfReader.

Using either approach, you should now be able to either view or extract text from a PDF file in your React.js application. Note that working with PDFs can be resource-intensive, and some complex PDFs might not render precisely due to limitations in text extraction libraries.

Display and Extract Text from PDFs in React.js Using `react-pdf`

Reading PDF using react-pdf

Extracting Text from PDF

Dive deeper:

Reading PDF using `react-pdf`