The pocess of breaking up text into tokens is called tokenization. Analyzers are
nothing but the components which controls Tokenization.
Sitecore’s Content Search API comes configured with the standard analyzer by default,
however it’s possible to configure a synonym analyzer if you need this
functionality (i.e. searching for a synonym of a word in content finds
that result). Sitecore ships with its own implementation of a synonym
analyzer: Sitecore.ContentSearch.LuceneProvider.Analyzers.SynonymAnalyzer.
The key to the synonym analyzer is providing it a list of synonyms, which need to be
set in your own custom XML file. The reason for this is that Sitecore includes
its own synonym engine implementation that uses XML files to store the synonym
mappings.
Configuring the Synonym Analyzer
- In ContentSearch.Lucene.DefaultIndexConfiguration.config ,
change the inner defaultAnalyzer parameter reference from
the standard analyzer to the synonym analyzer:
- Now, unlike the standard analyzer, the synonym analyzer requires an implementation of
an ISynonymEngine as its parameter:
<param hint="engine" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.XmlSynonymEngine,
Sitecore.ContentSearch.LuceneProvider">
- Sitecore’s implementation of that engine is able to read from XML files, and its
requires a path to the XML file as its only parameter:
<param hint="xmlSynonymFilePath">C:\inetpub\wwwroot\website\Data\synonyms.xml
</param>
4. Putting it all together,
<param desc="defaultAnalyzer"
type="Sitecore.ContentSearch.LuceneProvider.Analyzers.SynonymAnalyzer,
Sitecore.ContentSearch.LuceneProvider">
<param hint="engine"
type="Sitecore.ContentSearch.LuceneProvider.Analyzers.XmlSynonymEngine,
Sitecore.ContentSearch.LuceneProvider">
<param
hint="xmlSynonymFilePath">C:\inetpub\wwwroot\yoursite\Data\synonyms.xml
</param>
</param>
</param>
5. Defining Synonyms in XML
All terms listed in the same group are synonyms of each other.So for example, if a content item
has the word “quick” in its CMS content but you search for the word “rapid” you will get that content item as a result.
Comments
Post a Comment