Implementing and Customizing Typo Tolerance in Typesense for Improved Search Results
Typesense provides built-in support for typo tolerance, making it easier for users to find relevant results even if they make spelling mistakes or typos in their search queries. This feature helps improve the search experience by returning more accurate results despite minor inaccuracies in the search terms.
Implementing Typo Tolerance in Typesense
Below is a step-by-step guide on how to leverage typo tolerance in Typesense.
1. Configure Typo Tolerance
Typo tolerance in Typesense is enabled by default, and its behavior can be customized using various settings in your search query. These settings include num_typos
, which specifies the number of typos allowed, and typo_tokens_threshold
, among others.
2. Create and Configure Collections
First, you need to create a collection with an appropriate schema. Here’s an example schema for a books
collection:
const booksSchema = {
name: 'books',
fields: [
{ name: 'title', type: 'string' },
{ name: 'author', type: 'string' },
{ name: 'publication_year', type: 'int32' },
{ name: 'genres', type: 'string[]', facet: true }
],
default_sorting_field: 'publication_year'
};
// Create the collection
client.collections().create(booksSchema).then(data => {
console.log(data);
}).catch(error => {
console.error(error);
});
3. Index Documents
Here’s an example of indexing documents into the books
collection:
const documents = [
{
'id': '1',
'title': 'The Great Gatsby',
'author': 'F. Scott Fitzgerald',
'publication_year': 1925,
'genres': ['Fiction', 'Classic']
},
{
'id': '2',
'title': 'To Kill a Mockingbird',
'author': 'Harper Lee',
'publication_year': 1960,
'genres': ['Fiction', 'Drama']
}
];
client.collections('books').documents().import(documents).then(result => {
console.log(result);
}).catch(error => {
console.error(error);
});
4. Perform Typo-Tolerant Search Queries
When performing search queries, you can specify the typo tolerance settings in the query parameters. Here’s an example:
const performSearch = async (query, queryBy) => {
try {
const result = await client.collections('books').documents().search({
q: query,
query_by: queryBy,
num_typos: 2, // Allow up to 2 typos
typo_tokens_threshold: 3 // Consider tokens with 3 or more characters for typos
});
console.log(result);
return result;
} catch (error) {
console.error('Search Error', error);
}
};
// Example usage
performSearch('Grat Gatsby', 'title');
This example would correctly return results for "The Great Gatsby" despite the typo in the search query.
Advanced Typo Tolerance Settings
Typesense offers several advanced configurations for controlling typo tolerance behavior:
num_typos
: Defines the maximum number of typos allowed in the search query. The default is 2.typo_tokens_threshold
: Sets the minimum length of tokens to be considered for typo tolerance. Short tokens (like "a" or "is") are ignored. The default is 3.min_len_1typo
: Sets the minimum length of tokens to allow 1 typo. Default is 4.min_len_2typos
: Sets the minimum length of tokens to allow 2 typos. Default is 8.
Here is an example of utilizing these settings in your search query:
const performAdvancedSearch = async (query, queryBy) => {
try {
const result = await client.collections('books').documents().search({
q: query,
query_by: queryBy,
num_typos: 2,
typo_tokens_threshold: 3,
min_len_1typo: 4,
min_len_2typos: 8
});
console.log(result);
return result;
} catch (error) {
console.error('Advanced Search Error', error);
}
};
// Example usage
performAdvancedSearch('Kal Mockingbird', 'title');
Monitoring and Analytics
To ensure your typo tolerance settings are optimal, keep track of search performance through monitoring and analytics. Gather data on:
- Common Typos: Identify frequent typos to fine-tune your settings.
- Search Latency: Monitor the impact of typo tolerance on search performance.
- User Behavior: Analyze how often typo tolerance corrects user queries.
Conclusion
By enabling and fine-tuning typo tolerance settings in Typesense, you can significantly enhance the search experience by ensuring users get relevant results even when their search queries contain typos. This approach can be particularly beneficial for applications with diverse user bases and high search volumes.
Implementing typo tolerance is straightforward with Typesense's built-in settings, and customizing it for your specific needs can help you deliver a more accurate and user-friendly search experience.