|
DLESE Tools v1.6.0 |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.dlese.dpc.index.writer.FileIndexingServiceWriter org.dlese.dpc.index.writer.XMLFileIndexingWriter
public abstract class XMLFileIndexingWriter
Creates a Lucene Document
from any XML file by stripping the XML tags
to extract and index the content. The reader for this type of Document is XMLDocReader.
The Lucene Document fields that are created by this class are (in addition the the ones listed for
FileIndexingServiceWriter
):
collection
- The collection associated with this resource.
FileIndexingService
,
XMLDocReader
Constructor Summary | |
---|---|
XMLFileIndexingWriter()
Constructor for the XMLFileIndexingWriter. |
Method Summary | |
---|---|
protected abstract String[] |
_getIds()
Return unique IDs for the item being indexed, one for each collection that catalogs the resource. |
protected void |
addCustomFields(org.apache.lucene.document.Document newDoc,
org.apache.lucene.document.Document existingDoc,
File sourceFile)
Adds the full content of the XML to the default search field. |
protected abstract void |
addFields(org.apache.lucene.document.Document newDoc,
org.apache.lucene.document.Document existingDoc,
File sourceFile)
Adds additional fields that are unique the document format being indexed. |
protected BoundingBox |
getBoundingBox()
Return the geospatial BoundingBox footprint that represnets the resource being indexed, or null if none apply. |
protected String[] |
getCollections()
Returns unique collection keys for the item being indexed. |
org.apache.lucene.document.Document |
getDeletedDoc(org.apache.lucene.document.Document existingDoc)
Creates a Lucene Document for the XML that is equal to the exsiting Document. |
abstract String |
getDescription()
Return a description for the document being indexed, or null if none applies. |
String |
getDocGroup()
Gets the collection specifier, for example 'dcc', 'comet'. |
protected Document |
getDom4jDoc()
Gets the dom4j Document for use by sub-classes |
protected String |
getFieldContent(String[] values,
String useVocabMapping,
String metadataFormat)
Gets the vocab encoded keys for the given values, separated by the '+' symbol. |
protected String |
getFieldContent(String value,
String useVocabMapping,
String metadataFormat)
Gets the encoded vocab key for the given content. |
protected String |
getFieldName(String vocabFieldString,
String metadataFormat)
Gets the field ID, for example 'gr', for a given vocab, for example 'gradeRange'. |
String[] |
getIds()
Returns the ids for the item being indexed. |
protected SimpleLuceneIndex |
getIndex()
Gets the index used by this XML File Indexer |
protected ResultDocList |
getMyAnnoResultDocs()
Gets the annotations for this record, null or zero length if none available. |
protected DleseCollectionDocReader |
getMyCollectionDoc()
Gets the DLESECollectionDocReader for the collection in which this item is a part, or null if not available. |
static String |
getOaiModtime(File sourceFile,
org.apache.lucene.document.Document existingDoc)
Gets the oaiModtime for the given File or Document, set to 3 minutes in the future to account for any delay in indexing updates. |
String |
getPrimaryId()
Returns the unique primary record ID for the item being indexed. |
protected RecordDataService |
getRecordDataService()
Gets the recordDataService used by this XML File Indexer |
List |
getRelatedIds()
Gets the ids of related records. |
Map |
getRelatedIdsMap()
Gets the ids of related records. |
List |
getRelatedUrls()
Gets the urls of related records. |
Map |
getRelatedUrlsMap()
Gets the urls of related records. |
protected String |
getTermStringFromStringArray(String[] vals)
Gets the appropriate terms from a string array of metadata fields. |
abstract String |
getTitle()
Return a title for the document being indexed, or null if none applies. |
abstract String[] |
getUrls()
Return the URL(s) to the resource being indexed, or null if none apply. |
protected abstract Date |
getWhatsNewDate()
Returns the date used to determine "What's new" in the library, or null if none is available. |
protected abstract String |
getWhatsNewType()
Returns the type of category for "What's new" in the library, or null if none is available. |
protected XMLIndexer |
getXmlIndexer()
Gets the XMLIndexer for use by sub-classes |
protected XMLIndexerFieldsConfig |
getXmlIndexerFieldsConfig()
Gets the XMLIndexerFieldsConfig to use for XML indexing, or null if none available. |
abstract boolean |
indexFullContentInDefaultAndStems()
Return true to have the full XML content indexed in the 'default' and 'stems' fields, false if handled by the sub-class. |
abstract void |
init(File source,
org.apache.lucene.document.Document existingDoc)
This method is called prior to processing and may be used to for any necessary set-up. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public XMLFileIndexingWriter()
Method Detail |
---|
public String[] getIds() throws Exception
Exception
- If errorgetIds()
public String getPrimaryId() throws Exception
Exception
- If errorgetIds()
public List getRelatedIds() throws IllegalStateException, Exception
IllegalStateException
- If called prior to calling method #indexFields
Exception
- If errorpublic List getRelatedUrls() throws IllegalStateException, Exception
IllegalStateException
- If called prior to calling method #indexFields
Exception
- If errorpublic Map getRelatedIdsMap() throws IllegalStateException, Exception
IllegalStateException
- If called prior to calling method #indexFields
Exception
- If errorpublic Map getRelatedUrlsMap() throws IllegalStateException, Exception
IllegalStateException
- If called prior to calling method #indexFields
Exception
- If errorprotected String[] getCollections() throws Exception
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public String getDocGroup() throws Exception
getDocGroup
in class FileIndexingServiceWriter
Exception
- If error occuredprotected BoundingBox getBoundingBox() throws Exception
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract void init(File source, org.apache.lucene.document.Document existingDoc) throws Exception
init
in class FileIndexingServiceWriter
source
- The source file being indexedexistingDoc
- An existing Document that currently resides in the index for the given resource, or
null if none was previously present
Exception
- If an error occured during set-up.protected abstract String[] _getIds() throws Exception
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract String getTitle() throws Exception
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract String getDescription() throws Exception
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract String[] getUrls() throws Exception
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract boolean indexFullContentInDefaultAndStems()
protected abstract Date getWhatsNewDate() throws Exception
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.protected abstract String getWhatsNewType() throws Exception
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.protected abstract void addFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile) throws Exception
Document
class to add a Field
.
The following Lucene Field
types are available for indexing with the
Document
:
Field.Text(string name, string value) -- tokenized, indexed, stored
Field.UnStored(string name, string value) -- tokenized, indexed, not stored
Field.Keyword(string name, string value) -- not tokenized, indexed, stored
Field.UnIndexed(string name, string value) -- not tokenized, not indexed, stored
Field(String name, String string, boolean store, boolean index, boolean tokenize) -- allows control to do
anything you want
Example code:
protected void addCustomFields(Document newDoc, Document existingDoc) throws Exception {
String customContent = "Some content";
newDoc.add(Field.Text("mycustomefield", customContent));
}
newDoc
- The new Document
that is being created for this
resourceexistingDoc
- An existing Document
that currently resides in
the index for the given resource, or null if none was previously presentsourceFile
- The sourceFile that is being indexed
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.protected void addCustomFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile) throws Exception
addCustomFields
in class FileIndexingServiceWriter
newDoc
- The new Document
that is being created for this
resourceexistingDoc
- An existing Document
that currently resides in
the index for the given resource, or null if none was previously presentsourceFile
- The feature to be added to the CustomFields attribute
Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public org.apache.lucene.document.Document getDeletedDoc(org.apache.lucene.document.Document existingDoc) throws Throwable
getDeletedDoc
in class FileIndexingServiceWriter
existingDoc
- An existing FileIndexingService Document that currently resides in the index for
the given file
Throwable
- Thrown if error occursprotected ResultDocList getMyAnnoResultDocs() throws Exception
Exception
- NOT YET DOCUMENTEDprotected XMLIndexerFieldsConfig getXmlIndexerFieldsConfig()
protected String getFieldContent(String[] values, String useVocabMapping, String metadataFormat) throws Exception
values
- The valuse to encode.useVocabMapping
- The mapping to use, for example "contentStandards".metadataFormat
- The metadata format, for example 'adn'
Exception
- If error.protected String getFieldContent(String value, String useVocabMapping, String metadataFormat) throws Exception
value
- The value to encodeuseVocabMapping
- The vocab mapping to use, for example 'contentStandard'metadataFormat
- The metadata format, for example 'adn'
Exception
- If errorprotected String getFieldName(String vocabFieldString, String metadataFormat) throws Exception
vocabFieldString
- The field, for example 'gradeRange'metadataFormat
- The metadata format, for example 'adn'
Exception
- If errorprotected String getTermStringFromStringArray(String[] vals)
vals
- Metadata fields that must be delemited by colons.
protected XMLIndexer getXmlIndexer() throws Exception
Exception
- If errorprotected Document getDom4jDoc() throws Exception
Exception
- If errorprotected DleseCollectionDocReader getMyCollectionDoc()
public static final String getOaiModtime(File sourceFile, org.apache.lucene.document.Document existingDoc)
sourceFile
- The source fileexistingDoc
- The existing Doc
protected RecordDataService getRecordDataService()
protected SimpleLuceneIndex getIndex()
|
DLESE Tools v1.6.0 |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |