NAME
    Lingua::Stem::Any - Unified interface to any stemmer on CPAN

VERSION
    This document describes Lingua::Stem::Any v0.02.

SYNOPSIS
        use Lingua::Stem::Any;

        # create German stemmer using the default source module
        $stemmer = Lingua::Stem::Any->new(language => 'de');

        # create German stemmer explicitly using Lingua::Stem::Snowball
        $stemmer = Lingua::Stem::Any->new(
            language => 'de',
            source   => 'Lingua::Stem::Snowball',
        );

        # get stem for word
        $stem = $stemmer->stem($word);

        # get list of stems for list of words
        @stems = $stemmer->stem(@words);

DESCRIPTION
    This module aims to provide a simple unified interface to any stemmer on
    CPAN. It will provide a default available source module when a language
    is requested but no source is requested.

  Attributes
    language
        The following language codes are currently supported.

            ┌────────────┬────┐
            │ Bulgarian  │ bg │
            │ Czech      │ cs │
            │ Danish     │ da │
            │ Dutch      │ nl │
            │ English    │ en │
            │ Finnish    │ fi │
            │ French     │ fr │
            │ Galician   │ gl │
            │ German     │ de │
            │ Hungarian  │ hu │
            │ Italian    │ it │
            │ Latin      │ la │
            │ Norwegian  │ no │
            │ Persian    │ fa │
            │ Portuguese │ pt │
            │ Romanian   │ ro │
            │ Russian    │ ru │
            │ Spanish    │ es │
            │ Swedish    │ sv │
            │ Turkish    │ tr │
            └────────────┴────┘

        They are in the two-letter ISO 639-1 format and are case-insensitive
        but are always returned in lowercase when requested.

            # instantiate a stemmer object
            $stemmer = Lingua::Stem::Any->new(language => $language);

            # get current language
            $language = $stemmer->language;

            # change language
            $stemmer->language($language);

        Country codes such as "cz" for the Czech Republic are not supported,
        nor are IETF language tags such as "pt-PT" or "pt-BR".

    source
        The following source modules are currently supported.

            ┌────────────────────────┬──────────────────────────────────────────────┐
            │ Module                 │ Languages                                    │
            ├────────────────────────┼──────────────────────────────────────────────┤
            │ Lingua::Stem::Snowball │ da nl en fi fr de hu it no pt ro ru es sv tr │
            │ Lingua::Stem::UniNE    │ bg cs fa                                     │
            │ Lingua::Stem           │ da de en fr gl it no pt ru sv                │
            └────────────────────────┴──────────────────────────────────────────────┘

        A module name is used to specify the source. If no source is
        specified, the first available source in the above list with support
        for the current language is used.

            # get current source
            $source = $stemmer->source;

            # change source
            $stemmer->source('Lingua::Stem::UniNE');

    casefold
        Boolean value specifying whether to apply Unicode casefolding to
        words before stemming them. This is enabled by default and is
        performed before normalization when also enabled.

    normalize
        Boolean value specifying whether to apply Unicode NFC normalization
        to words before stemming them. This is enabled by default and is
        performed after casefolding when also enabled.

  Methods
    stem
        Accepts a list of strings, stems each string, and returns a list of
        stems. The list returned will always have the same number of
        elements in the same order as the list provided. When no stemming
        rules apply to a word, the original word is returned.

            @stems = $stemmer->stem(@words);

            # get the stem for a single word
            $stem = $stemmer->stem($word);

        The words should be provided as character strings and the stems are
        returned as character strings. Byte strings in arbitrary character
        encodings are not supported.

    stem_in_place
        Accepts an array reference, stems each element, and replaces them
        with the resulting stems.

            $stemmer->stem_in_place(\@words);

        This method is provided for potential optimization when a large
        array of words is to be stemmed. The return value is not defined.

    languages
        Returns a list of supported two-letter language codes using
        lowercase letters.

            # all languages
            @languages = $stemmer->languages;

            # languages supported by Lingua::Stem::Snowball
            @languages = $stemmer->languages('Lingua::Stem::Snowball');

    sources
        Returns a list of supported source module names.

            # all sources
            @sources = $stemmer->sources;

            # sources that support English
            @sources = $stemmer->sources('en');

TODO
    *   optional stem caching

    *   custom stemming exceptions

SEE ALSO
    Lingua::Stem::Snowball, Lingua::Stem::UniNE, Lingua::Stem

ACKNOWLEDGEMENTS
    This module is brought to you by Shutterstock
    <http://www.shutterstock.com/> (@ShutterTech
    <https://twitter.com/ShutterTech>). Additional open source projects from
    Shutterstock can be found at code.shutterstock.com
    <http://code.shutterstock.com/>.

AUTHOR
    Nick Patch <patch@cpan.org>

COPYRIGHT AND LICENSE
    Š 2013 Nick Patch

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.