Implementing and Customizing Typo Tolerance in Typesense for Improved Search Results

118 views

Typesense provides built-in support for typo tolerance, making it easier for users to find relevant results even if they make spelling mistakes or typos in their search queries. This feature helps improve the search experience by returning more accurate results despite minor inaccuracies in the search terms.

Implementing Typo Tolerance in Typesense

Below is a step-by-step guide on how to leverage typo tolerance in Typesense.

1. Configure Typo Tolerance

Typo tolerance in Typesense is enabled by default, and its behavior can be customized using various settings in your search query. These settings include num_typos, which specifies the number of typos allowed, and typo_tokens_threshold, among others.

2. Create and Configure Collections

First, you need to create a collection with an appropriate schema. Here’s an example schema for a books collection:

const booksSchema = {
  name: 'books',
  fields: [
    { name: 'title', type: 'string' },
    { name: 'author', type: 'string' },
    { name: 'publication_year', type: 'int32' },
    { name: 'genres', type: 'string[]', facet: true }
  ],
  default_sorting_field: 'publication_year'
};

// Create the collection
client.collections().create(booksSchema).then(data => {
  console.log(data);
}).catch(error => {
  console.error(error);
});

3. Index Documents

Here’s an example of indexing documents into the books collection:

const documents = [
  {
    'id': '1',
    'title': 'The Great Gatsby',
    'author': 'F. Scott Fitzgerald',
    'publication_year': 1925,
    'genres': ['Fiction', 'Classic']
  },
  {
    'id': '2',
    'title': 'To Kill a Mockingbird',
    'author': 'Harper Lee',
    'publication_year': 1960,
    'genres': ['Fiction', 'Drama']
  }
];

client.collections('books').documents().import(documents).then(result => {
  console.log(result);
}).catch(error => {
  console.error(error);
});

4. Perform Typo-Tolerant Search Queries

When performing search queries, you can specify the typo tolerance settings in the query parameters. Here’s an example:

const performSearch = async (query, queryBy) => {
  try {
    const result = await client.collections('books').documents().search({
      q: query,
      query_by: queryBy,
      num_typos: 2,  // Allow up to 2 typos
      typo_tokens_threshold: 3  // Consider tokens with 3 or more characters for typos
    });

    console.log(result);
    return result;
  } catch (error) {
    console.error('Search Error', error);
  }
};

// Example usage
performSearch('Grat Gatsby', 'title');

This example would correctly return results for "The Great Gatsby" despite the typo in the search query.

Advanced Typo Tolerance Settings

Typesense offers several advanced configurations for controlling typo tolerance behavior:

  • num_typos: Defines the maximum number of typos allowed in the search query. The default is 2.
  • typo_tokens_threshold: Sets the minimum length of tokens to be considered for typo tolerance. Short tokens (like "a" or "is") are ignored. The default is 3.
  • min_len_1typo: Sets the minimum length of tokens to allow 1 typo. Default is 4.
  • min_len_2typos: Sets the minimum length of tokens to allow 2 typos. Default is 8.

Here is an example of utilizing these settings in your search query:

const performAdvancedSearch = async (query, queryBy) => {
  try {
    const result = await client.collections('books').documents().search({
      q: query,
      query_by: queryBy,
      num_typos: 2,
      typo_tokens_threshold: 3,
      min_len_1typo: 4,
      min_len_2typos: 8
    });

    console.log(result);
    return result;
  } catch (error) {
    console.error('Advanced Search Error', error);
  }
};

// Example usage
performAdvancedSearch('Kal Mockingbird', 'title');

Monitoring and Analytics

To ensure your typo tolerance settings are optimal, keep track of search performance through monitoring and analytics. Gather data on:

  • Common Typos: Identify frequent typos to fine-tune your settings.
  • Search Latency: Monitor the impact of typo tolerance on search performance.
  • User Behavior: Analyze how often typo tolerance corrects user queries.

Conclusion

By enabling and fine-tuning typo tolerance settings in Typesense, you can significantly enhance the search experience by ensuring users get relevant results even when their search queries contain typos. This approach can be particularly beneficial for applications with diverse user bases and high search volumes.

Implementing typo tolerance is straightforward with Typesense's built-in settings, and customizing it for your specific needs can help you deliver a more accurate and user-friendly search experience.