DLESE Tools
v1.6.0

org.dlese.dpc.index.writer
Class ADNFileIndexingWriter

java.lang.Object
  extended by org.dlese.dpc.index.writer.FileIndexingServiceWriter
      extended by org.dlese.dpc.index.writer.XMLFileIndexingWriter
          extended by org.dlese.dpc.index.writer.ItemFileIndexingWriter
              extended by org.dlese.dpc.index.writer.ADNFileIndexingWriter
All Implemented Interfaces:
DocWriter

public class ADNFileIndexingWriter
extends ItemFileIndexingWriter

Creates a Lucene Document from an ADN-item metadata source file.

The Lucene Document fields that are created by this class are (in addition the the ones listed for FileIndexingServiceWriter):

doctype - Set to 'adn'. Stored. Note: the actual indexing of this field happens in the superclass FileIndexingServiceWriter.
additional fields - A number of additional fields are defined. See the Java code for method addFrameworkFields(Document, Document) for details.

Author:
John Weatherley, Ryan Deardorff

Constructor Summary
ADNFileIndexingWriter()
          Create a ADNFileIndexingWriter that indexes the given collection in field collection.
ADNFileIndexingWriter(boolean isDupDoc)
          Create a ADNFileIndexingWriter that indexes the given collection in field collection.
 
Method Summary
protected  String[] _getIds()
          Gets the id(s) for this item.
protected  void addFrameworkFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc)
          Adds custom fields to the index that are unique to this framework.
protected  void destroy()
          Release map resources for GC after processing.
protected  void finalize()
          Perform finalization...
protected  Date getAccessionDate()
          Returns the accession date for the item, or null if this item is not accessioned.
protected  String getAccessionStatus()
          Returns the accession status of this record, for example 'accessioned'.
protected  MmdRec[] getAllMmdRecs()
          Returns the MmdRecs for all records that catalog this resouce, including myMmdRec.
protected  MmdRec[] getAssociatedMmdRecs()
          Returns the MmdRecs for records in other collections that catalog the same resource, not including myMmdRec.
protected  String getAudienceBeneficiary()
          The audience beneficiary.
protected  String getAudienceInstructionalGoal()
          The audience instructionalGoal.
protected  String getAudienceTeachingMethod()
          The audience teachingMethod.
protected  String getAudienceToolFor()
          The audience tool for.
protected  String getAudienceTypicalAgeRange()
          The audience typical age range.
protected  BoundingBox getBoundingBox()
          Gets the boundingBox attribute of the ADNFileIndexingWriter object
 String[] getCollections()
          Returns unique collection keys for the item being indexed, separated by spaces.
protected  String getContent()
          Returns the content of the item this record catalogs, or null if not available.
protected  String[] getContentStandards()
          Gets the contentStandards attribute of the ADNFileIndexingWriter object
protected  String getContentType()
          Returns the content type of the item this record catalogs, or null if not available.
protected  String getCost()
          Returns the item's cost.
protected  Date getCreationDate()
          Returns the date this item was first created, or null if not available.
protected  String getCreator()
          Returns the items creator's full name.
protected  String getCreatorEmailAlt()
          Gets the creator's alternate email.
protected  String getCreatorEmailPrimary()
          Gets the creator's primary email.
protected  String getCreatorLastName()
          Returns the items creator's last name.
 String getDescription()
          Gets the description attribute of the ADNFileIndexingWriter object
 String getDocType()
          Gets the docType attribute of the ADNFileIndexingWriter, which is 'adn.'
protected  String getEventNames()
          Gets all event names as text.
protected  String[] getGradeRange()
          Gets the gradeRange attribute of the ADNFileIndexingWriter object
protected  boolean getHasRelatedResource()
          Returns true if the item has one or more related resource, false otherwise.
protected  String getKeywords()
          Returns the item's keywords sorted and separated by the '+' symbol.
protected  MmdRec getMyMmdRec()
          Returns the MmdRec for this record only.
static long getNumInstances()
          Gets the numInstances attribute of the ADNFileIndexingWriter class
protected  String getOrganizationEmail()
          Gets the oraganization email.
protected  String getOrganizationInstDepartment()
          Gets the oraganizations institution department name.
protected  String getOrganizationInstName()
          Gets the oraganizations institution name.
protected  String getPersonInstDepartment()
          Gets the persons institution department name.
protected  String getPersonInstName()
          Gets the persons institution name.
protected  String getPlaceNames()
          Gets all place names as text.
 String getReaderClass()
          Gets the name of the concrete DocReader class that is used to read this type of Document, which is "ItemDocReader".
protected  String[] getRelatedResourceIds()
          Returns the IDs of related resources that are cataloged by ID, or null if none are present
protected  String[] getRelatedResourceUrls()
          Returns the URLs of related resources that are cataloged by URL, or null if none are present
protected  String[] getResourceTypes()
          Gets the resourceTypes attribute of the ADNFileIndexingWriter object
protected  String[] getSubjects()
          Gets the subjects attribute of the ADNFileIndexingWriter object
protected  String getTemporalCoverageNames()
          Gets all temporal coverage names as text.
 String getTitle()
          Gets the title attribute of the ADNFileIndexingWriter object
protected  String getUrlMirrors()
          Gets the mirror URLs encoded as terms, if any.
 String[] getUrls()
          Gets the url(s) from the ADN record(s).
protected  String getValidationReport()
          Gets a report detailing any errors found in the validation of the data, or null if no error was found.
 boolean indexFullContentInDefaultAndStems()
          Default and stems fields handled here, so do not index full content.
 void initItem(File source, org.apache.lucene.document.Document existingDoc)
          Initialize the XML map, MmdRecs and other data prior to processing
 void setIsSingleDoc(boolean isSingleDoc)
          Sets the whether this writer should write a single record doc rather than a multi-item doc.
 
Methods inherited from class org.dlese.dpc.index.writer.ItemFileIndexingWriter
addFields, getMyAnnoResultDocs, getWhatsNewDate, getWhatsNewType, init
 
Methods inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
addCustomFields, getDeletedDoc, getDocGroup, getDom4jDoc, getFieldContent, getFieldContent, getFieldName, getIds, getIndex, getMyCollectionDoc, getOaiModtime, getPrimaryId, getRecordDataService, getRelatedIds, getRelatedIdsMap, getRelatedUrls, getRelatedUrlsMap, getTermStringFromStringArray, getXmlIndexer, getXmlIndexerFieldsConfig
 
Methods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter
abortIndexing, addDocToRemove, addToAdminDefaultField, addToDefaultField, create, getConfigAttributes, getDocsource, getFileContent, getFileIndexingPlugin, getFileIndexingService, getLuceneDoc, getPreviousRecordDoc, getSessionAttributes, getSourceDir, getSourceFile, isMakingDeletedDoc, isValidationEnabled, prtln, prtlnErr, setConfigAttributes, setDebug, setFileIndexingPlugin, setFileIndexingService, setIsMakingDeletedDoc, setValidationEnabled
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ADNFileIndexingWriter

public ADNFileIndexingWriter()
Create a ADNFileIndexingWriter that indexes the given collection in field collection.


ADNFileIndexingWriter

public ADNFileIndexingWriter(boolean isDupDoc)
Create a ADNFileIndexingWriter that indexes the given collection in field collection.

Parameters:
isDupDoc - False to force this to be processed as a non-dup
Method Detail

finalize

protected void finalize()
                 throws Throwable
Perform finalization... closing resources, etc.

Overrides:
finalize in class Object
Throws:
Throwable - If error

getNumInstances

public static long getNumInstances()
Gets the numInstances attribute of the ADNFileIndexingWriter class

Returns:
The numInstances value

initItem

public void initItem(File source,
                     org.apache.lucene.document.Document existingDoc)
              throws Exception
Initialize the XML map, MmdRecs and other data prior to processing

Specified by:
initItem in class ItemFileIndexingWriter
Parameters:
source - The source file being indexed.
existingDoc - A Document that previously existed in the index for this item, if present
Throws:
Exception - Thrown if error reading the XML map

destroy

protected void destroy()
Release map resources for GC after processing.

Specified by:
destroy in class ItemFileIndexingWriter

getCollections

public String[] getCollections()
                        throws Exception
Returns unique collection keys for the item being indexed, separated by spaces. For example 'dcc,' 'comet' or 'dwel'. Since this may be a multi-doc, it may have multiple collections, so overridding the default getCollection() method.

Overrides:
getCollections in class XMLFileIndexingWriter
Returns:
The collection keys
Throws:
Exception - If error

getAccessionStatus

protected String getAccessionStatus()
                             throws Exception
Returns the accession status of this record, for example 'accessioned'. The String is tokenized, stored and indexed under the field key 'accessionstatus'.

Specified by:
getAccessionStatus in class ItemFileIndexingWriter
Returns:
The accession status.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getHasRelatedResource

protected boolean getHasRelatedResource()
                                 throws Exception
Returns true if the item has one or more related resource, false otherwise.

Specified by:
getHasRelatedResource in class ItemFileIndexingWriter
Returns:
True if the item has one or more related resource, false otherwise.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getRelatedResourceIds

protected String[] getRelatedResourceIds()
                                  throws Exception
Returns the IDs of related resources that are cataloged by ID, or null if none are present

Specified by:
getRelatedResourceIds in class ItemFileIndexingWriter
Returns:
Related resource IDs, or null if none are available
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getRelatedResourceUrls

protected String[] getRelatedResourceUrls()
                                   throws Exception
Returns the URLs of related resources that are cataloged by URL, or null if none are present

Specified by:
getRelatedResourceUrls in class ItemFileIndexingWriter
Returns:
Related resource URLs, or null if none are available
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getAccessionDate

protected Date getAccessionDate()
                         throws Exception
Returns the accession date for the item, or null if this item is not accessioned. If this is a multi-doc, returns the oldest accession date of the bunch, corresponding to the first time this resource appeared in the library.

Specified by:
getAccessionDate in class ItemFileIndexingWriter
Returns:
The accession date for the item, or null if this item is not accessioned.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCreationDate

protected Date getCreationDate()
                        throws Exception
Returns the date this item was first created, or null if not available.

Specified by:
getCreationDate in class ItemFileIndexingWriter
Returns:
The item creation date or null
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getReaderClass

public String getReaderClass()
Gets the name of the concrete DocReader class that is used to read this type of Document, which is "ItemDocReader".

Specified by:
getReaderClass in interface DocWriter
Specified by:
getReaderClass in class ItemFileIndexingWriter
Returns:
The String "org.dlese.dpc.index.reader.ItemDocReader".

indexFullContentInDefaultAndStems

public boolean indexFullContentInDefaultAndStems()
Default and stems fields handled here, so do not index full content.

Specified by:
indexFullContentInDefaultAndStems in class XMLFileIndexingWriter
Returns:
False

getAssociatedMmdRecs

protected MmdRec[] getAssociatedMmdRecs()
Returns the MmdRecs for records in other collections that catalog the same resource, not including myMmdRec.

Specified by:
getAssociatedMmdRecs in class ItemFileIndexingWriter
Returns:
The associated MmdRecs, or null if none

getAllMmdRecs

protected MmdRec[] getAllMmdRecs()
Returns the MmdRecs for all records that catalog this resouce, including myMmdRec.

Specified by:
getAllMmdRecs in class ItemFileIndexingWriter
Returns:
All MmdRecs for this resource, null or empty if none

getMyMmdRec

protected MmdRec getMyMmdRec()
Returns the MmdRec for this record only.

Specified by:
getMyMmdRec in class ItemFileIndexingWriter
Returns:
The MmdRec for this record, or null

getValidationReport

protected String getValidationReport()
                              throws Exception
Gets a report detailing any errors found in the validation of the data, or null if no error was found.

Specified by:
getValidationReport in class ItemFileIndexingWriter
Returns:
Null if no data validation errors were found, otherwise a String that details the nature of the error.
Throws:
Exception - If error in performing the validation.

getDocType

public final String getDocType()
Gets the docType attribute of the ADNFileIndexingWriter, which is 'adn.'

Specified by:
getDocType in interface DocWriter
Specified by:
getDocType in class ItemFileIndexingWriter
Returns:
The docType, which is 'adn.'

_getIds

protected String[] _getIds()
                    throws Exception
Gets the id(s) for this item. If multiple IDs exists, the first one is the primary.

Specified by:
_getIds in class XMLFileIndexingWriter
Returns:
The id value
Throws:
Exception - If an error occurs

getTitle

public final String getTitle()
                      throws Exception
Gets the title attribute of the ADNFileIndexingWriter object

Specified by:
getTitle in class XMLFileIndexingWriter
Returns:
The title value
Throws:
Exception - If an error occurs

getDescription

public final String getDescription()
                            throws Exception
Gets the description attribute of the ADNFileIndexingWriter object

Specified by:
getDescription in class XMLFileIndexingWriter
Returns:
The description value
Throws:
Exception - If an error occurs

getUrls

public final String[] getUrls()
                       throws Exception
Gets the url(s) from the ADN record(s).

Specified by:
getUrls in class XMLFileIndexingWriter
Returns:
The urls value
Throws:
Exception - If an error occurs

getKeywords

protected String getKeywords()
                      throws Exception
Returns the item's keywords sorted and separated by the '+' symbol. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'keywords' and is also indexed in the 'default' field.

Specified by:
getKeywords in class ItemFileIndexingWriter
Returns:
The keywords String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCreatorLastName

protected String getCreatorLastName()
                             throws Exception
Returns the items creator's last name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field the 'default' field only.

Specified by:
getCreatorLastName in class ItemFileIndexingWriter
Returns:
The creator's last name String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCreator

protected String getCreator()
                     throws Exception
Returns the items creator's full name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'creator'.

Specified by:
getCreator in class ItemFileIndexingWriter
Returns:
Creator's full name
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCost

protected String getCost()
                  throws Exception
Returns the item's cost. The String is stored and indexed under the field key 'cost'.

Returns:
Resource cost
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getBoundingBox

protected BoundingBox getBoundingBox()
                              throws Exception
Gets the boundingBox attribute of the ADNFileIndexingWriter object

Overrides:
getBoundingBox in class XMLFileIndexingWriter
Returns:
The boundingBox value
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getContent

protected String getContent()
Returns the content of the item this record catalogs, or null if not available. For example the full HTML text of the Web page.

Specified by:
getContent in class ItemFileIndexingWriter
Returns:
The content of the item, or null

getContentType

protected String getContentType()
Returns the content type of the item this record catalogs, or null if not available. For example "text/html" or "html".

Specified by:
getContentType in class ItemFileIndexingWriter
Returns:
The content type of the item, or null

addFrameworkFields

protected final void addFrameworkFields(org.apache.lucene.document.Document newDoc,
                                        org.apache.lucene.document.Document existingDoc)
                                 throws Exception
Adds custom fields to the index that are unique to this framework.

Specified by:
addFrameworkFields in class ItemFileIndexingWriter
Parameters:
newDoc - The feature to be added to the FrameworkFields attribute
existingDoc - The feature to be added to the FrameworkFields attribute
Throws:
Exception - If an error occurs

setIsSingleDoc

public void setIsSingleDoc(boolean isSingleDoc)
Sets the whether this writer should write a single record doc rather than a multi-item doc.

Parameters:
isSingleDoc - The new isSingleDoc value

getGradeRange

protected String[] getGradeRange()
Gets the gradeRange attribute of the ADNFileIndexingWriter object

Returns:
The gradeRange value

getResourceTypes

protected String[] getResourceTypes()
Gets the resourceTypes attribute of the ADNFileIndexingWriter object

Returns:
The resourceTypes value

getContentStandards

protected String[] getContentStandards()
Gets the contentStandards attribute of the ADNFileIndexingWriter object

Returns:
The contentStandards value

getSubjects

protected String[] getSubjects()
Gets the subjects attribute of the ADNFileIndexingWriter object

Returns:
The subjects value

getCreatorEmailPrimary

protected String getCreatorEmailPrimary()
                                 throws Exception
Gets the creator's primary email.

Returns:
The creator's primary email.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getCreatorEmailAlt

protected String getCreatorEmailAlt()
                             throws Exception
Gets the creator's alternate email.

Returns:
The creator's alternate email.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getOrganizationEmail

protected String getOrganizationEmail()
                               throws Exception
Gets the oraganization email.

Returns:
The oraganization email.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getOrganizationInstName

protected String getOrganizationInstName()
                                  throws Exception
Gets the oraganizations institution name. ADN xPath lifecycle/contributors/contributor/organization/instName

Returns:
The oraganization name.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getOrganizationInstDepartment

protected String getOrganizationInstDepartment()
                                        throws Exception
Gets the oraganizations institution department name. ADN xPath lifecycle/contributors/contributor/organization/instDept

Returns:
The oraganizations institution department name.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getPersonInstName

protected String getPersonInstName()
                            throws Exception
Gets the persons institution name. ADN xPath lifecycle/contributors/contributor/person/instName

Returns:
The institution name.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getPersonInstDepartment

protected String getPersonInstDepartment()
                                  throws Exception
Gets the persons institution department name. ADN xPath lifecycle/contributors/contributor/person/instDept

Returns:
The institution department name.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getUrlMirrors

protected String getUrlMirrors()
                        throws Exception
Gets the mirror URLs encoded as terms, if any.

Returns:
The URL mirrors encoded as terms, or empty string.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getAudienceToolFor

protected String getAudienceToolFor()
                             throws Exception
The audience tool for.

Returns:
The audience tool for.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getAudienceBeneficiary

protected String getAudienceBeneficiary()
                                 throws Exception
The audience beneficiary.

Returns:
The audience beneficiary.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getAudienceTypicalAgeRange

protected String getAudienceTypicalAgeRange()
                                     throws Exception
The audience typical age range.

Returns:
The audience typical age range.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getAudienceInstructionalGoal

protected String getAudienceInstructionalGoal()
                                       throws Exception
The audience instructionalGoal.

Returns:
The audience instructionalGoal.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getAudienceTeachingMethod

protected String getAudienceTeachingMethod()
                                    throws Exception
The audience teachingMethod.

Returns:
The audience teachingMethod.
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getPlaceNames

protected String getPlaceNames()
Gets all place names as text. Place names are extracted from the following XPaths: general/simplePlacesAndEvents/placeAndEvent/place, geospatialCoverages/geospatialCoverage/boundBox/bbPlaces/place/name and geospatialCoverages/geospatialCoverage/detGeos/detGeo/detPlaces/place/name.

Returns:
All place names as text.

getEventNames

protected String getEventNames()
Gets all event names as text. Event names are extracted from the following XPaths: general/simplePlacesAndEvents/placeAndEvent/event, geospatialCoverages/geospatialCoverage/boundBox/bbEvents/event/name and geospatialCoverages/geospatialCoverage/detGeos/detGeo/detEvents/event/name.

Returns:
All event names as text.

getTemporalCoverageNames

protected String getTemporalCoverageNames()
Gets all temporal coverage names as text. Temporal coverage names are extracted from the following XPaths: general/simpleTemporalCoverages/description, and temporalCoverages/timeAndPeriod/periods/period/name.

Returns:
All temporal coverage names as text.

DLESE Tools
v1.6.0