|
DLESE Tools v1.6.0 |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.dlese.dpc.index.Stemmer
public class Stemmer
Stemmer implements the Porter stemming algorithm. The Stemmer class transforms a word or array of words into their morphological root form. For example, this algorithm converts the words 'ocean,' 'oceans' and 'oceanic' to the single root 'ocean'.
The static methods getStem(String term)
and getStems(String[] terms)
can be used to quickly convert a word or words to their root form. Example code:
import org.dlese.dpc.index.Stemmer;
...
String word = "oceanic";
String stem = Stemmer.getStem(word); // stem now equals 'ocean'
String string = "A group of words that need to be stemmed";
String[] words = string.split("\\s+"); // Split on white space
String[] stems = Stemmer.getStems(words);
for(int i = 0; i < stems.length; i++){
... do something with the stems ...
}
For more information about the Porter stemming algorithm, see http://www.tartarus.org/~martin/PorterStemmer .
Constructor Summary | |
---|---|
Stemmer()
Constructor for the Stemmer object |
Method Summary | |
---|---|
void |
add(char ch)
Add a character to the word being stemmed. |
void |
add(char[] w,
int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[] array. |
char[] |
getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process. |
int |
getResultLength()
Returns the length of the word resulting from the stemming process. |
static String |
getStem(String term)
Gets the stem of the given english word. |
static String[] |
getStems(String[] terms)
Gets the stems of the given english words. |
static void |
main(String[] args)
Test program for demonstrating the Stemmer. |
void |
stem()
Stem the word placed into the Stemmer buffer through calls to add(). |
static String |
stemWordsInLuceneClause(String string)
Stems each of the words in a given Lucene clause String, returning the same String with the word parts in stemmed form. |
static String |
stemWordsInString(String string)
Stems each of the words or tokens in a given String, returning a String of stemmed tokens with all other characters removed. |
String |
toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.) |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public Stemmer()
Method Detail |
---|
public static final String getStem(String term)
term
- A term in english.
public static final String[] getStems(String[] terms)
terms
- A group of terms in english.
public static final String stemWordsInString(String string)
Example:
oceans and rain AND 44rains http://dlese.org/oceans
is transformed to
ocean and rain AND 44rain http dlese org ocean
string
- A word, phrase, or any arbitrary String.
public static final String stemWordsInLuceneClause(String string)
Example:
titles:("oceans AND oceans44 OR 44oceans and oceanic")^20 or cooled
is transformed to
titles:("ocean AND oceans44 OR 44ocean and ocean")^20 or cool
string
- A word, phrase, Lucene clause, or any arbitrary String.
public void add(char ch)
ch
- DESCRIPTIONpublic void add(char[] w, int wLen)
w
- DESCRIPTIONwLen
- DESCRIPTIONpublic String toString()
toString
in class Object
public int getResultLength()
public char[] getResultBuffer()
public void stem()
public static void main(String[] args)
args
- The command line arguments
|
DLESE Tools v1.6.0 |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |