|
DLESE Tools v1.6.0 |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.dlese.dpc.index.writer.FileIndexingServiceWriter
public abstract class FileIndexingServiceWriter
Abstract class for creating customized Lucene Document
s for different
file formats such as DLESE-IMS, ADN-item, ADN-collection, etc. Concrete sub-classes may be used with a
FileIndexingService
to enable automatic updating of the index whenever changes
in the source file are made. This class, along with the FileIndexingService
,
may be used with a SimpleLuceneIndex
to provide simple search support over
files.
Note: after creating a new concrete FileIndexingServiceWriter, add a switch in RepositoryManager
, method putDirInIndex(DirInfo, String) to select it for
indexing.
The Lucene fields that are created by this class are:
doctype
- The document format type (e.g. dlese_ims, adn, oai_dc, etc.) defined by
concrete classes, with '0' appended to support wildcard searching. readerclass
- The class which is used to read typed Document
s created by the concrete classes, for example "ItemDocReader".default
- The default field containing content added by concrete classes. Generally
this is the field assigned in the Lucene index for default searching.docsource
- The absolute path to the file, which is used by the FileIndexingService
for updating/deleting and may be used by beans or other classes
that wish to have access to the source file.docdir
- The absolute path to the directory where the file resides, which is used by
the FileIndexingService
for updating/deleting and may be used by beans or
other classes.modtime
- The file modification time, which is used by the FileIndexingService
to determine if the file has changed and needs update and may
be used by beans or other classes that wish to query the modtime for the record.filecontent
- The full content of the file, stored but not indexed.deleted
- Set to 'true' if the file or record for this document has been deleted,
otherwise this field does not exist. Stored. valid
- Set to 'true' if the file or record for this document is valid, otherwise
'false'. This field may also be ommited. Not stored. validationreport
- Contains a report that provides validation information about the
underlying file. This field may be ommited. Not stored.
Constructor Summary | |
---|---|
FileIndexingServiceWriter()
|
Method Summary | |
---|---|
protected void |
abortIndexing()
Aborts the indexing process by returning a null index document. |
protected abstract void |
addCustomFields(org.apache.lucene.document.Document newDoc,
org.apache.lucene.document.Document previousRecordDoc,
File sourceFile)
Adds additional custom fields that are unique the document format being indexed. |
protected void |
addDocToRemove(String field,
String value)
Removes a matching item from the index during the FileIndexingService update. |
protected void |
addToAdminDefaultField(String value)
Adds the given String to a text field referenced in the index by the field name 'admindefault'. |
protected void |
addToDefaultField(String value)
Adds the given String to the 'default' and 'stems' fields as text and stemmed text, respectively. |
FileIndexingServiceData |
create(File sourceFile,
org.apache.lucene.document.Document existingLuceneDoc,
FileIndexingPlugin plugin,
HashMap sessionAttr)
Creates the Lucene Document for the given resource or returns null if
unable to create. |
protected abstract void |
destroy()
This method is called at the conclusion of processing and may be used for tear-down. |
HashMap |
getConfigAttributes()
Gets the configuration attributes that were set when the writer was created. |
org.apache.lucene.document.Document |
getDeletedDoc(org.apache.lucene.document.Document previousRecordDoc)
Creates a Lucene Document equal to the exsiting FileIndexingService
Document except the field "deleted" is to "true" and the field "modtime" has been set to the current
time. |
abstract String |
getDocGroup()
Gets the specifier associated with this group of files or null if no group association exists. |
String |
getDocsource()
Gets the absolute path to the file, which is indexed under the 'docsource' field. |
abstract String |
getDocType()
Gets a unique document type key for this kind of record, corresponding to the format type. |
String |
getFileContent()
Gets the full content of the file as a String. |
FileIndexingPlugin |
getFileIndexingPlugin()
Gets the FileIndexingPlugin that has been set for use during indexing, or null if none. |
FileIndexingService |
getFileIndexingService()
Gets the fileIndexingService attribute of the FileIndexingServiceWriter object |
org.apache.lucene.document.Document |
getLuceneDoc()
Gets the Lucene Document that this Writer is building. |
org.apache.lucene.document.Document |
getPreviousRecordDoc()
Gets the previous Document that currently resides in the index for the given resource, or null if none was previously present. |
abstract String |
getReaderClass()
Gets the fully qualified name of the concrete DocReader class that is
used to read this type of Document , for example
"org.dlese.dpc.index.reader.ItemDocReader". |
HashMap |
getSessionAttributes()
Gets a Map of attributes used in a single indexing session. |
File |
getSourceDir()
Gets the sourceDir that holds the file being indexed. |
File |
getSourceFile()
Gets the sourceFile that is being indexed. |
protected String |
getValidationReport()
Gets a report detailing any errors found in the validation of the file, or null if no error was found. |
abstract void |
init(File source,
org.apache.lucene.document.Document previousRecordDoc)
This method is called prior to processing and may be used to for any necessary set-up. |
protected boolean |
isMakingDeletedDoc()
True if the current execution represents a deleted doc is being created. |
boolean |
isValidationEnabled()
Returns true if the files being indexed should be validated, otherwise false. |
protected void |
prtln(String s)
Output a line of text to standard out, with datestamp, if debug is set to true. |
protected void |
prtlnErr(String s)
Output a line of text to error out, with datestamp. |
void |
setConfigAttributes(HashMap attributes)
Sets the configuration attributes - called by the factory method that creates the FileIndexingServiceWriter. |
static void |
setDebug(boolean db)
Sets the debug attribute of the FileIndexingServiceWriter object |
void |
setFileIndexingPlugin(FileIndexingPlugin plugin)
Sets the FileIndexingPlugin that will be used during the indexing process to index additional fields. |
void |
setFileIndexingService(FileIndexingService fileIndexingService)
Sets the fileIndexingService attribute of the FileIndexingServiceWriter object |
protected void |
setIsMakingDeletedDoc(boolean isMakingDeletedDoc)
Sets whether this DocWriter is making a deleted document. |
void |
setValidationEnabled(boolean validateFiles)
Sets whether or not to validate the files being indexed and create a validation report, which is indexed. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public FileIndexingServiceWriter()
Method Detail |
---|
public abstract String getDocType() throws Exception
StandardAnalyzer
so it must be lowercase and should not contain any stop words.
getDocType
in interface DocWriter
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract String getDocGroup() throws Exception
Exception
- If error occuredpublic abstract String getReaderClass()
DocReader
class that is
used to read this type of Document
, for example
"org.dlese.dpc.index.reader.ItemDocReader".
getReaderClass
in interface DocWriter
DocReader
.public abstract void init(File source, org.apache.lucene.document.Document previousRecordDoc) throws Exception
FileIndexingService.addDirectory(java.lang.String, java.lang.Class, java.util.HashMap, org.dlese.dpc.index.writer.FileIndexingPlugin, int)
method.
source
- The source file being indexedpreviousRecordDoc
- An existing Document that currently resides in the index for the given resource, or
null if none was previously present
Exception
- If an error occured during set-up.protected abstract void destroy()
protected abstract void addCustomFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document previousRecordDoc, File sourceFile) throws Exception
Document
class to add a Field
.
The following Lucene Field
types are available for indexing with the
Document
:
Field.Text(string name, string value) -- tokenized, indexed, stored
Field.UnStored(string name, string value) -- tokenized, indexed, not stored
Field.Keyword(string name, string value) -- not tokenized, indexed, stored
Field.UnIndexed(string name, string value) -- not tokenized, not indexed, stored
Field(String name, String string, boolean store, boolean index, boolean tokenize) -- allows control to do
anything you want
Example code:
protected void addCustomFields(Document newDoc, Document previousRecordDoc) throws Exception {
String customContent = "Some content";
newDoc.add(Field.Text("mycustomefield", customContent));
}
newDoc
- The new Document
that is being created for this
resourcepreviousRecordDoc
- An existing Document
that currently resides in
the index for the given resource, or null if none was previously presentsourceFile
- The sourceFile that is being indexed
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public String getFileContent() throws IOException
IOException
- If errorpublic HashMap getConfigAttributes()
public void setConfigAttributes(HashMap attributes)
attributes
- The configuration attributespublic HashMap getSessionAttributes()
public File getSourceFile()
public String getDocsource()
public File getSourceDir()
public org.apache.lucene.document.Document getLuceneDoc()
public org.apache.lucene.document.Document getPreviousRecordDoc()
public void setFileIndexingService(FileIndexingService fileIndexingService)
fileIndexingService
- The new fileIndexingService.public FileIndexingService getFileIndexingService()
public boolean isValidationEnabled()
public void setValidationEnabled(boolean validateFiles)
FileIndexingService
prior to indexing. If true, the
method getValidationReport()
will be called, otherwise it will not.
validateFiles
- True to validate, else false.getValidationReport()
,
FileIndexingService.setValidationEnabled(boolean validateFiles)
protected String getValidationReport() throws Exception
Exception
- If error.protected void addToDefaultField(String value)
value
- A text string to be added to the indexed fields named 'default' and 'stems'protected void addToAdminDefaultField(String value)
value
- A text string to be added to the indexed field named 'admindefault.'public org.apache.lucene.document.Document getDeletedDoc(org.apache.lucene.document.Document previousRecordDoc) throws Throwable
Document
equal to the exsiting FileIndexingService
Document except the field "deleted" is to "true" and the field "modtime" has been set to the current
time.
Design note: This method should be overwritten by subclasses that require more envolved logic for
deletes, and this super method should be called first and then subclassed should check #getIsMakingDeletedDoc
to execute as appropriate.
previousRecordDoc
- An existing FileIndexingService Document that currently resides in the index for
the given file
Throwable
- Thrown if error occursprotected void setIsMakingDeletedDoc(boolean isMakingDeletedDoc)
getDeletedDoc
method.
isMakingDeletedDoc
- Sets the making deleted doc statusprotected final boolean isMakingDeletedDoc()
protected void abortIndexing()
protected void addDocToRemove(String field, String value)
field
- The field to search in.value
- The matching value for the item to remove.public FileIndexingServiceData create(File sourceFile, org.apache.lucene.document.Document existingLuceneDoc, FileIndexingPlugin plugin, HashMap sessionAttr) throws Throwable
Document
for the given resource or returns null if
unable to create. This method is called by class FileIndexingService
.
sourceFile
- The source file to be indexedexistingLuceneDoc
- An existing Document that currently resides in the index for the given
resource, or null if none was previously presentplugin
- The FileIndexingPlugin being used, or nullsessionAttr
- Attributes used in a given indexing session
Throwable
- Thrown if error occurspublic void setFileIndexingPlugin(FileIndexingPlugin plugin)
plugin
- A FileIndexingPlugin to use during indexing.public FileIndexingPlugin getFileIndexingPlugin()
protected final void prtlnErr(String s)
s
- The text that will be output to error out.protected final void prtln(String s)
s
- The String that will be output.public static final void setDebug(boolean db)
db
- The new debug value
|
DLESE Tools v1.6.0 |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |