DLESE Tools
v1.6.0

org.dlese.dpc.index.writer
Class SimpleXMLFileIndexingWriter

java.lang.Object
  extended by org.dlese.dpc.index.writer.FileIndexingServiceWriter
      extended by org.dlese.dpc.index.writer.XMLFileIndexingWriter
          extended by org.dlese.dpc.index.writer.SimpleXMLFileIndexingWriter
All Implemented Interfaces:
DocWriter

public class SimpleXMLFileIndexingWriter
extends XMLFileIndexingWriter

This is the default writer for generic XML formats. Creates a Lucene Document from any valid XML file by stripping the XML tags to extract and index the content. The full content of all Elements and Attributes is indexed in the default and admindefault fields and is stemmed and indexed in the stems field. The reader for this type of Document is XMLDocReader.

Author:
John Weatherley
See Also:
FileIndexingService, XMLDocReader

Constructor Summary
SimpleXMLFileIndexingWriter()
          Constructor for the SimpleXMLFileIndexingWriter object
 
Method Summary
protected  String[] _getIds()
          Returns null to handle by super.
protected  void addFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile)
          Nothing to do here.
protected  void destroy()
          Does nothing.
 String getDescription()
          Gets the description attribute of the SimpleXMLFileIndexingWriter object
 String getDocType()
          Gets the xml format for this document, for example "oai_dc," "adn," "dlese_ims," or "dlese_anno".
 String getReaderClass()
          Gets the name of the concrete DocReader class that is used to read this type of Document, which is "org.dlese.dpc.index.reader.XMLDocReader".
 String getTitle()
          Gets the title attribute of the SimpleXMLFileIndexingWriter object
 String[] getUrls()
          Gets the urls attribute of the SimpleXMLFileIndexingWriter object
protected  String getValidationReport()
          Gets a report detailing any errors found in the validation of the data, or null if no error was found.
protected  Date getWhatsNewDate()
          Returns the date used to determine "What's new" in the library, which is null (unknown).
protected  String getWhatsNewType()
          Returns null (unknown).
 boolean indexFullContentInDefaultAndStems()
          Place the entire XML content into the default and stems search field.
 void init(File sourceFile, org.apache.lucene.document.Document existingDoc)
          This method is called prior to processing and may be used to for any necessary set-up.
 
Methods inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
addCustomFields, getBoundingBox, getCollections, getDeletedDoc, getDocGroup, getDom4jDoc, getFieldContent, getFieldContent, getFieldName, getIds, getIndex, getMyAnnoResultDocs, getMyCollectionDoc, getOaiModtime, getPrimaryId, getRecordDataService, getRelatedIds, getRelatedIdsMap, getRelatedUrls, getRelatedUrlsMap, getTermStringFromStringArray, getXmlIndexer, getXmlIndexerFieldsConfig
 
Methods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter
abortIndexing, addDocToRemove, addToAdminDefaultField, addToDefaultField, create, getConfigAttributes, getDocsource, getFileContent, getFileIndexingPlugin, getFileIndexingService, getLuceneDoc, getPreviousRecordDoc, getSessionAttributes, getSourceDir, getSourceFile, isMakingDeletedDoc, isValidationEnabled, prtln, prtlnErr, setConfigAttributes, setDebug, setFileIndexingPlugin, setFileIndexingService, setIsMakingDeletedDoc, setValidationEnabled
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleXMLFileIndexingWriter

public SimpleXMLFileIndexingWriter()
Constructor for the SimpleXMLFileIndexingWriter object

Method Detail

getDocType

public String getDocType()
                  throws Exception
Gets the xml format for this document, for example "oai_dc," "adn," "dlese_ims," or "dlese_anno".

Specified by:
getDocType in interface DocWriter
Specified by:
getDocType in class FileIndexingServiceWriter
Returns:
The docType value
Throws:
Exception - If errlr.

getReaderClass

public String getReaderClass()
Gets the name of the concrete DocReader class that is used to read this type of Document, which is "org.dlese.dpc.index.reader.XMLDocReader".

Specified by:
getReaderClass in interface DocWriter
Specified by:
getReaderClass in class FileIndexingServiceWriter
Returns:
The STring "org.dlese.dpc.index.reader.XMLDocReader".

init

public void init(File sourceFile,
                 org.apache.lucene.document.Document existingDoc)
          throws Exception
This method is called prior to processing and may be used to for any necessary set-up. This method should throw and exception with appropriate message if an error occurs.

Specified by:
init in class XMLFileIndexingWriter
Parameters:
sourceFile - The sourceFile being indexed.
existingDoc - An existing Document that exists for this in the index.
Throws:
Exception - If error

getWhatsNewDate

protected Date getWhatsNewDate()
                        throws Exception
Returns the date used to determine "What's new" in the library, which is null (unknown).

Specified by:
getWhatsNewDate in class XMLFileIndexingWriter
Returns:
The what's new date for the item
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getWhatsNewType

protected String getWhatsNewType()
                          throws Exception
Returns null (unknown).

Specified by:
getWhatsNewType in class XMLFileIndexingWriter
Returns:
null.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

destroy

protected void destroy()
Does nothing.

Specified by:
destroy in class FileIndexingServiceWriter

getValidationReport

protected String getValidationReport()
                              throws Exception
Gets a report detailing any errors found in the validation of the data, or null if no error was found. This method performs schema validation over the XML.

Overrides:
getValidationReport in class FileIndexingServiceWriter
Returns:
Null if no data validation errors were found, otherwise a String that details the nature of the error.
Throws:
Exception - If error in performing the validation.

_getIds

protected String[] _getIds()
Returns null to handle by super.

Specified by:
_getIds in class XMLFileIndexingWriter
Returns:
Null

getUrls

public String[] getUrls()
Gets the urls attribute of the SimpleXMLFileIndexingWriter object

Specified by:
getUrls in class XMLFileIndexingWriter
Returns:
The urls value

getDescription

public String getDescription()
Gets the description attribute of the SimpleXMLFileIndexingWriter object

Specified by:
getDescription in class XMLFileIndexingWriter
Returns:
The description value

getTitle

public String getTitle()
Gets the title attribute of the SimpleXMLFileIndexingWriter object

Specified by:
getTitle in class XMLFileIndexingWriter
Returns:
The title value

indexFullContentInDefaultAndStems

public boolean indexFullContentInDefaultAndStems()
Place the entire XML content into the default and stems search field.

Specified by:
indexFullContentInDefaultAndStems in class XMLFileIndexingWriter
Returns:
True

addFields

protected void addFields(org.apache.lucene.document.Document newDoc,
                         org.apache.lucene.document.Document existingDoc,
                         File sourceFile)
                  throws Exception
Nothing to do here. All functionality handled by super.

Specified by:
addFields in class XMLFileIndexingWriter
Parameters:
newDoc - The new Document that is being created for this resource
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
sourceFile - The feature to be added to the CustomFields attribute
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

DLESE Tools
v1.6.0