Getting Started
Hunspell revolves around dictionary files (.dic
) and affix files (.aff
). The dictionary contains all the words of a given language, with references to rules indicating how the words may be inflected. The affix then contains all the rules, so the .dic
and .aff
files go hand in hand.
Where to find a dictionary?
This package doesn't contain the dictionaries needed for the package to work, so you'll instead need to find these elsewhere.
The WeCantSpell.Hunspell, which is used internally in this package, refers to the titoBouzout/Dictionaries GitHub repository. Here you'll be able to find dictionaries for many different languages.
However most of the dictionaries are several years old, so you may be able to find newer and more updated Hunspell dictionaries from other sources. For instance, the Danish dictionary in this repository is by the group Stavekontrolden. So instead of using a six year old dictionary, you can grab the most recent version from their website instead.
Loading a dictionary
This package is a wrapper for the WeCantSpell.Hunspell package, and builds on top of their implementation of Hunspell.
Our wrapper is represented by the HunspellTextAnalyzer
class. If you have the .dic
and .aff
files on disk, you may load it as shown in the example below:
@using System.Web.Hosting
@using Skybrud.TextAnalysis.Hunspell
@{
// Map the path to the dictionary and affix files
string dic = HostingEnvironment.MapPath("~/App_Data/Hunspell/da-DK.dic");
string aff = HostingEnvironment.MapPath("~/App_Data/Hunspell/da-DK.aff");
// Load a new text analyzer (Hunspell wrapper)
HunspellTextAnalyzer analyzer = HunspellTextAnalyzer.CreateFromFiles(dic, aff);
}
Websites and Multi Lingual
This package targets .NET Standard, allowing it be used used in a number different applications and scenarios. We build this package to improve the user experience of text based search to be used in either ASP.NET or ASP.NET Core.
Loading a dictionary takes a bit of tike - not much, but enough to matter if you load the dictionary over and over again for each request. So in a web based context, it may be recommended to save the a loaded HunspellTextAnalyzer
instance either for a duration of time or for the duration of the application. On the other hand, this may use a bit more memory. I don't have any exact numbers, but this is usually a price we are happy to pay for faster access to the Hunspell dictionaries.
A given site or web application may also use more than one language, so the HunspellRepository
shown below illustrates a way to load and access dictionaries based on a given culture.
If, for instance, the HunspellRepository
is hooked up with dependency injection, you can control the lifetime of instances of this class. Eg. something like services.AddSingleton<HunspellRepository>()
will ensure dictionaries stay loaded from the first time they're requested and until the application is shutdown.
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Web.Hosting;
using Skybrud.TextAnalysis.Hunspell;
namespace HunspellTests {
/// <summary>
/// Class representing a repository for loading and accessing culture specific instances of
/// <see cref="HunspellTextAnalyzer"/>.
/// </summary>
public class HunspellRepository {
private readonly Dictionary<string, HunspellTextAnalyzer> _analyzers;
/// <summary>
/// Initializes a new repository.
/// </summary>
public HunspellRepository() {
_analyzers = new Dictionary<string, HunspellTextAnalyzer>();
}
/// <summary>
/// Returns the <see cref="HunspellTextAnalyzer"/> for the current culture, or <c>null</c> if unable to load a
/// new text analyzer.
/// </summary>
/// <returns>An instance of <see cref="HunspellTextAnalyzer"/> if successful; otherwise, <c>false</c>.</returns>
public HunspellTextAnalyzer GetAnalyzer() {
return GetAnalyzer(CultureInfo.CurrentCulture);
}
/// <summary>
/// Returns the <see cref="HunspellTextAnalyzer"/> for the specified <paramref name="cultureInfo"/>, or
/// <c>null</c> if unable to load a new text analyzer.
/// </summary>
/// <param name="cultureInfo">The culture info.</param>
/// <returns>An instance of <see cref="HunspellTextAnalyzer"/> if successful; otherwise, <c>false</c>.</returns>
public HunspellTextAnalyzer GetAnalyzer(CultureInfo cultureInfo) {
// Base the file name on the culture name
string filename = cultureInfo.Name;
// Have we already loaded the analyzer for "culture"?
if (_analyzers.TryGetValue(filename, out HunspellTextAnalyzer analyzer)) return analyzer;
// Map the path to the Hunspell directory
string dir = HostingEnvironment
.MapPath("~/App_Data/Hunspell");
// Map the paths to the dictionary and affix files
string dicPath = $"{dir}/{filename}.dic";
string affPath = $"{dir}/{filename}.aff";
// Return null if neither file exists
if (!File.Exists(dicPath)) return null;
if (!File.Exists(affPath)) return null;
// Initialize a new analyzer
analyzer = HunspellTextAnalyzer
.CreateFromFiles(dicPath, affPath);
// Append the analyzer to the internal dictionary
_analyzers.Add(filename, analyzer);
// Return the analyzer
return analyzer;
}
/// <summary>
/// Gets the <see cref="HunspellTextAnalyzer"/> for the specified <paramref name="cultureInfo"/>.
/// </summary>
/// <param name="cultureInfo">The culture info.</param>
/// <param name="analyzer">When this method returns, holds the loaded <see cref="HunspellTextAnalyzer"/> if
/// successful; otherwise, <c>false</c>.</param>
/// <returns><c>true</c> if successful; otherwise, <c>false</c>.</returns>
public bool TryGetAnalyzer(CultureInfo cultureInfo, out HunspellTextAnalyzer analyzer) {
analyzer = GetAnalyzer(cultureInfo);
return analyzer != null;
}
}
}
The HunspellRepository
class is an example how to set this up - the class is not part of this package.