DLESE Tools
v1.6.0

org.dlese.dpc.index.writer
Class ItemFileIndexingWriter

java.lang.Object
  extended by org.dlese.dpc.index.writer.FileIndexingServiceWriter
      extended by org.dlese.dpc.index.writer.XMLFileIndexingWriter
          extended by org.dlese.dpc.index.writer.ItemFileIndexingWriter
All Implemented Interfaces:
DocWriter
Direct Known Subclasses:
ADNFileIndexingWriter, DleseIMSFileIndexingWriter

public abstract class ItemFileIndexingWriter
extends XMLFileIndexingWriter

Abstract class for writing a Lucene Document for a collection of item-level metadata records of a specific format (DLESE IMS, ADN-Item, ADN-Collection, etc). The reader for this type of Document is XMLDocReader or ItemDocReader.


The Lucene Document fields that are created by this class are (in addition the the ones listed for FileIndexingServiceWriter):

title - The tile for the resource. Stored.
description - The description for the resource. Stored.
url - The url to the resoruce. Stored.
Stored. Appended with a '0' at the beginning to support wildcard searching.
metadatapfx - The metadata prefix (format) for this record, for example 'adn' or 'oai_dc'. Stored. Appended with a '0' at the beginning to support wildcard searching.
accessionstatus - The accession status for this record. Stored. Appended with a '0' at the beginning to support wildcard searching.
annotypes - Annotataion types that are refer to this record. Keyword.
annopathways - Annotataion pathways that are refer to this record. Keyword.
associatedids - A list of record IDs that refer to the same resource. Keyword.
valid - Indicates whether the record is valid [true | false]. Not stored.
validationreport - Text describing an error in the validation of the data for this record. Stored. Only indexed if there was a validation error indicated by the valid field containing false.

Author:
John Weatherley
See Also:
ItemDocReader, XMLDocReader, RecordDataService, FileIndexingServiceWriter

Constructor Summary
ItemFileIndexingWriter()
           
 
Method Summary
protected  void addFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile)
          Adds fields to the index that are common to all item-level documents.
protected abstract  void addFrameworkFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc)
          Adds fields to the index that are unique to the given framework.
protected abstract  void destroy()
          This method is called at the conclusion of processing and may be used for tear-down.
protected abstract  Date getAccessionDate()
          Returns the accession date for the item, or null if this item is not accessioned.
protected abstract  String getAccessionStatus()
          Returns the accession status of this record, for example 'accessioned'.
protected abstract  MmdRec[] getAllMmdRecs()
          Returns the MmdRecs for all records associated with this resouce, including myMmdRec.
protected abstract  MmdRec[] getAssociatedMmdRecs()
          Returns the MmdRecs for records in other collections that catalog the same resource.
protected abstract  String getContent()
          Returns the content of the item this record catalogs, or null if not available.
protected abstract  String getContentType()
          Returns the content type of the item this record catalogs, or null if not available.
protected abstract  Date getCreationDate()
          Returns the date this item was first created, or null if not available.
protected abstract  String getCreator()
          Returns the items creator's full name.
protected abstract  String getCreatorLastName()
          Returns the items creator's last name.
abstract  String getDocType()
          Returns a unique document type key for this kind of record, corresponding to the format type.
protected abstract  boolean getHasRelatedResource()
          Returns true if the item has one or more related resource, false otherwise.
protected abstract  String getKeywords()
          Returns the item's keywords sorted and separated by the '+' symbol.
protected  ResultDocList getMyAnnoResultDocs()
          Gets the annotations for this record, null or zero length if none available.
protected abstract  MmdRec getMyMmdRec()
          Returns the MmdRec for this record only.
abstract  String getReaderClass()
          Gets the fully qualified name of the concrete DocReader class that is used to read this type of Document, for example "org.dlese.dpc.index.reader.ItemDocReader".
protected abstract  String[] getRelatedResourceIds()
          Returns the IDs of related resources that are cataloged by ID, or null if none are present
protected abstract  String[] getRelatedResourceUrls()
          Returns the URLs of related resources that are cataloged by URL, or null if none are present
protected abstract  String getValidationReport()
          Gets a report detailing any errors found in the validation of the data, or null if no error was found.
protected  Date getWhatsNewDate()
          Returns the date used to determine "What's new" in the library, which is the item's accession date.
protected  String getWhatsNewType()
          Returns 'itemnew' or 'itemannoinprogress' or 'itemannocomplete' whichever came most recelntly.
 void init(File source, org.apache.lucene.document.Document existingDoc)
          Initialize the subclasses and record data service data.
abstract  void initItem(File source, org.apache.lucene.document.Document existingDoc)
          This method is called prior to processing and may be used to for any necessary set-up.
 
Methods inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
_getIds, addCustomFields, getBoundingBox, getCollections, getDeletedDoc, getDescription, getDocGroup, getDom4jDoc, getFieldContent, getFieldContent, getFieldName, getIds, getIndex, getMyCollectionDoc, getOaiModtime, getPrimaryId, getRecordDataService, getRelatedIds, getRelatedIdsMap, getRelatedUrls, getRelatedUrlsMap, getTermStringFromStringArray, getTitle, getUrls, getXmlIndexer, getXmlIndexerFieldsConfig, indexFullContentInDefaultAndStems
 
Methods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter
abortIndexing, addDocToRemove, addToAdminDefaultField, addToDefaultField, create, getConfigAttributes, getDocsource, getFileContent, getFileIndexingPlugin, getFileIndexingService, getLuceneDoc, getPreviousRecordDoc, getSessionAttributes, getSourceDir, getSourceFile, isMakingDeletedDoc, isValidationEnabled, prtln, prtlnErr, setConfigAttributes, setDebug, setFileIndexingPlugin, setFileIndexingService, setIsMakingDeletedDoc, setValidationEnabled
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ItemFileIndexingWriter

public ItemFileIndexingWriter()
Method Detail

getKeywords

protected abstract String getKeywords()
                               throws Exception
Returns the item's keywords sorted and separated by the '+' symbol. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'keywords' and is also indexed in the 'default' field.

Returns:
The keywords String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCreatorLastName

protected abstract String getCreatorLastName()
                                      throws Exception
Returns the items creator's last name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field the 'default' field only.

Returns:
The creator's last name String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCreator

protected abstract String getCreator()
                              throws Exception
Returns the items creator's full name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'creator'.

Returns:
Creator's full name
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getAccessionStatus

protected abstract String getAccessionStatus()
                                      throws Exception
Returns the accession status of this record, for example 'accessioned'. The String is tokenized, stored and indexed under the field key 'accessionstatus'.

Returns:
The accession status.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getAccessionDate

protected abstract Date getAccessionDate()
                                  throws Exception
Returns the accession date for the item, or null if this item is not accessioned.

Returns:
The accession date for the item, or null if this item is not accessioned.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCreationDate

protected abstract Date getCreationDate()
                                 throws Exception
Returns the date this item was first created, or null if not available.

Returns:
The item creation date or null
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getContent

protected abstract String getContent()
Returns the content of the item this record catalogs, or null if not available. For example the full HTML text of the Web page.

Returns:
The content of the item, or null

getAssociatedMmdRecs

protected abstract MmdRec[] getAssociatedMmdRecs()
Returns the MmdRecs for records in other collections that catalog the same resource. Does not include myMmdRec.

Returns:
The associated MmdRecs, null or empty if none

getAllMmdRecs

protected abstract MmdRec[] getAllMmdRecs()
Returns the MmdRecs for all records associated with this resouce, including myMmdRec.

Returns:
All MmdRecs for this resource, null or empty if none

getMyMmdRec

protected abstract MmdRec getMyMmdRec()
Returns the MmdRec for this record only.

Returns:
The MmdRec for this record, or null

getContentType

protected abstract String getContentType()
Returns the content type of the item this record catalogs, or null if not available. For example "text/html" or "html".

Returns:
The content type of the item, or null

getHasRelatedResource

protected abstract boolean getHasRelatedResource()
                                          throws Exception
Returns true if the item has one or more related resource, false otherwise.

Returns:
True if the item has one or more related resource, false otherwise.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getRelatedResourceIds

protected abstract String[] getRelatedResourceIds()
                                           throws Exception
Returns the IDs of related resources that are cataloged by ID, or null if none are present

Returns:
Related resource IDs, or null if none are available
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getRelatedResourceUrls

protected abstract String[] getRelatedResourceUrls()
                                            throws Exception
Returns the URLs of related resources that are cataloged by URL, or null if none are present

Returns:
Related resource URLs, or null if none are available
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

addFrameworkFields

protected abstract void addFrameworkFields(org.apache.lucene.document.Document newDoc,
                                           org.apache.lucene.document.Document existingDoc)
                                    throws Exception
Adds fields to the index that are unique to the given framework.

Example code:
protected void addFrameworkFields(Document newDoc, Document existingDoc) throws Exception {
  String customContent = "Some content";
  newDoc.add(new Field("mycustomefield", customContent));
}

Parameters:
newDoc - The new Document that is being created for this resource
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getDocType

public abstract String getDocType()
                           throws Exception
Returns a unique document type key for this kind of record, corresponding to the format type. For example "adn," "dlese_ims," or "dlese_anno". The string is parsed using the Lucene StandardAnalyzer so it must be lowercase and should not contain any stop words.

Specified by:
getDocType in interface DocWriter
Specified by:
getDocType in class FileIndexingServiceWriter
Returns:
The docType String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getReaderClass

public abstract String getReaderClass()
Gets the fully qualified name of the concrete DocReader class that is used to read this type of Document, for example "org.dlese.dpc.index.reader.ItemDocReader".

Specified by:
getReaderClass in interface DocWriter
Specified by:
getReaderClass in class FileIndexingServiceWriter
Returns:
The name of the DocReader.

initItem

public abstract void initItem(File source,
                              org.apache.lucene.document.Document existingDoc)
                       throws Exception
This method is called prior to processing and may be used to for any necessary set-up. This method should throw and exception with appropriate message if an error occurs.

Parameters:
source - The source file being indexed
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
Throws:
Exception - If an error occured during set-up.

destroy

protected abstract void destroy()
This method is called at the conclusion of processing and may be used for tear-down.

Specified by:
destroy in class FileIndexingServiceWriter

getValidationReport

protected abstract String getValidationReport()
                                       throws Exception
Gets a report detailing any errors found in the validation of the data, or null if no error was found. This could be implemented by simply performing XML schema validation on the file, or can involve more customized validation of the data if necessary. This method is called after all other methods that access the data (XMLFileIndexingWriter.getTitle(), addFrameworkFields(Document, Document), etc.) so that data verification can be done during those calls, if needed.

Overrides:
getValidationReport in class FileIndexingServiceWriter
Returns:
Null if no data validation errors were found, otherwise a String that details the nature of the error.
Throws:
Exception - If error in performing the validation.

init

public void init(File source,
                 org.apache.lucene.document.Document existingDoc)
          throws Exception
Initialize the subclasses and record data service data.

Specified by:
init in class XMLFileIndexingWriter
Parameters:
source - The source file being indexed.
existingDoc - A Document that previously existed in the index for this item, if present
Throws:
Exception - Thrown if error reading the XML map

getMyAnnoResultDocs

protected ResultDocList getMyAnnoResultDocs()
                                     throws Exception
Gets the annotations for this record, null or zero length if none available. Overrides method in XMLFileIndexingWriter because IDs need initializing.

Overrides:
getMyAnnoResultDocs in class XMLFileIndexingWriter
Returns:
The myAnnoResultDocs value
Throws:
Exception - If error

addFields

protected final void addFields(org.apache.lucene.document.Document newDoc,
                               org.apache.lucene.document.Document existingDoc,
                               File sourceFile)
                        throws Exception
Adds fields to the index that are common to all item-level documents. These include the title, description, id and url as well as collection, accession status, annotation references, and collection(s).

Specified by:
addFields in class XMLFileIndexingWriter
Parameters:
newDoc - The new Document that is being created for this resource
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
sourceFile - The sourceFile that is being indexed.
Throws:
Exception - If an error occurs

getWhatsNewDate

protected Date getWhatsNewDate()
                        throws Exception
Returns the date used to determine "What's new" in the library, which is the item's accession date.

Specified by:
getWhatsNewDate in class XMLFileIndexingWriter
Returns:
The what's new date for the item
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getWhatsNewType

protected String getWhatsNewType()
                          throws Exception
Returns 'itemnew' or 'itemannoinprogress' or 'itemannocomplete' whichever came most recelntly.

Specified by:
getWhatsNewType in class XMLFileIndexingWriter
Returns:
The string 'itemnew' or 'itemannoinprogress' or 'itemannocomplete'.
Throws:
Exception - If error getting whats new type.

DLESE Tools
v1.6.0