DLESE Tools
v1.6.0

org.dlese.dpc.index
Class FileIndexingService

java.lang.Object
  extended by org.dlese.dpc.index.FileIndexingService

public final class FileIndexingService
extends Object

Indexes files into a SimpleLuceneIndex and automatically updates the index whenever changes to the files are made. This class uses a FileIndexingServiceWriter to create the Lucene Documents that are placed in the SimpleLuceneIndex. This class looks for changes made to items in a directory of files and updates the index automatically by adding, updating or deleting items as appropriate. The frequency for update checkes is configurable. There should be only one instance of this class for each SimpleLuceneIndex that is being populated with this class.

Author:
John Weatherley

Field Summary
static int INDEXING_ABORTED
          Indicates that indexing was aborted by request
static int INDEXING_DIR_DOES_NOT_EXIST
          Indicates that indexing directory does not exist
static int INDEXING_DIR_READ_ERROR
          Indicates a read error on the directory
static int INDEXING_ERROR
          Indicates that indexing completed with a severe error
static int INDEXING_ITEM_ERROR
          Indicates that indexing completed successfully, but one or more item was indexed with errors
static int INDEXING_SUCCESS
          Indicates that indexing completed normally
 
Constructor Summary
FileIndexingService(SimpleLuceneIndex index, long updateFrequency, boolean saveDeletes, String idFieldToRemove, String fileIndexingServiceDataDir, int maxNumFilesToIndex)
          Indexes files to the given SimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency.
FileIndexingService(SimpleLuceneIndex index, long updateFrequency, String fileIndexingServiceDataDir, int maxNumFilesToIndex)
          Indexes files to the given SimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency.
 
Method Summary
 boolean addDirectory(File srcDir, Class documentWriterClass, HashMap documentWriterConfigAttributes, FileIndexingPlugin plugin, int indexingPriority)
          Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.
 boolean addDirectory(String sourceFileDirectory, Class documentWriterClass, HashMap documentWriterConfigAttributes, FileIndexingPlugin plugin, int indexingPriority)
          Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.
 void changeUpdateFrequency(long updateFrequency)
          Changes the frequency of reindexing to the new value.
 boolean deleteDirectory(File srcDir)
          Deletes the files in the given directory from the index and removes it from the configuration.
 boolean deleteDirectory(String sourceFileDirectory)
          Deletes the files in the given directory from the index and removes it from the configuration.
 Object getAttribute(String key)
          Gets an attribute Object from this FileIndexingService.
 HashMap getConfiguredDirectories()
          Gets a HashMap of all directories that are configured in this FileIndexingService, keyed by absolute path.
static String getDateStamp()
          Return a string for the current time and date, sutiable for display in log files and output to standout:
 ArrayList getIndexingMessages()
          Gets the last 10 indexing status messages.
 long getLastSyncTime()
          Gets the lastSyncTime attribute of the FileIndexingService object
 int getNumRecordsToAdd()
          Gets the numRecordsToAdd attribute of the FileIndexingService object
 int getNumRecordsToDelete()
          Gets the numRecordsToDelete attribute of the FileIndexingService object
 int getNumRecordsToReplace()
          Gets the numRecordsToReplace attribute of the FileIndexingService object
static String getSimpleDateStamp()
          Return a string for the current time and date, sutiable for display in log files and output to standout:
 long getUpdateFrequency()
          Gets the updateFrequency attribute of the FileIndexingService object
 void indexFile(File fileToIndex, FileIndexingPlugin plugin)
          Indexes a single file.
 void indexFiles(boolean reindexAll, File directory, FileIndexingObserver observer)
          Updates the index to reflect the files in the directory indicated, which must have been previously added to this FileIndexingService using addDirectory(java.lang.String, java.lang.Class, java.util.HashMap, org.dlese.dpc.index.writer.FileIndexingPlugin, int).
 void indexFiles(boolean reindexAll, FileIndexingObserver observer)
          Updates the index to reflect the files in the directories this service is monitoring, with the option to run the update in the background.
 boolean isDirectoryConfigured(File srcDir)
          Determines whether the given directory is configured for indexing.
 boolean isIndexing()
          Determins whether indexing is in progress.
 void reindexDocs(org.apache.lucene.document.Document[] docs, boolean reindexAll)
          Reindexes the given Documents.
 void reindexDocs(ResultDocList docs, boolean reindexAll)
          Reindexes the Documents in the given ResultDocs.
 int reindexDocs(String query, boolean reindexAll)
          Reindexes Documents managed by this FileIndexingService that match the given Lucene query.
 int reindexDocs(String field, String[] terms, boolean reindexAll)
          Re-indexes all documents that match the given terms within the given field.
 int reindexDocs(String field, String term, boolean reindexAll)
          Re-indexes all documents that match the given term within the given field.
 void removeDocs(String field, String[] terms, FileIndexingServiceWriter docWriter)
          Removes all documents that match the given terms within the given field.
 void removeDocs(String field, String[] terms, FileIndexingServiceWriter docWriter, boolean saveDeletes)
          Removes all documents that match the given terms within the given field.
 void removeDocs(String field, String term, FileIndexingServiceWriter docWriter)
          Removes all documents that match the given term within the given field.
 void removeDocs(String field, String term, FileIndexingServiceWriter docWriter, boolean saveDeletedRecords)
          Removes all documents that match the given term within the given field.
 void setAttribute(String key, Object attribute)
          Sets an attribute Object that will be available for access here and from the FileIndexingServiceWriters.
static void setDebug(boolean db)
          Sets the debug attribute object
 void setValidationEnabled(boolean validateFiles)
          Sets whether or not to validate the files being indexed and create a validation report, which is indexed.
 void startTester(String docRoot, String sourceFileDirectory)
          Starts a FileMoveTester iff one is not already initialized.
 void startTimerThread(long updateFrequency)
          Start or restarts the timer thread with the given update frequency.
 void stopIndexing()
          Stops the indexing process if it is currently taking place.
 void stopTester()
          Stops the FileMoveTester
 void stopTimerThread()
          Stops the indexing timer thread.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INDEXING_SUCCESS

public static final int INDEXING_SUCCESS
Indicates that indexing completed normally

See Also:
Constant Field Values

INDEXING_ABORTED

public static final int INDEXING_ABORTED
Indicates that indexing was aborted by request

See Also:
Constant Field Values

INDEXING_ERROR

public static final int INDEXING_ERROR
Indicates that indexing completed with a severe error

See Also:
Constant Field Values

INDEXING_ITEM_ERROR

public static final int INDEXING_ITEM_ERROR
Indicates that indexing completed successfully, but one or more item was indexed with errors

See Also:
Constant Field Values

INDEXING_DIR_DOES_NOT_EXIST

public static final int INDEXING_DIR_DOES_NOT_EXIST
Indicates that indexing directory does not exist

See Also:
Constant Field Values

INDEXING_DIR_READ_ERROR

public static final int INDEXING_DIR_READ_ERROR
Indicates a read error on the directory

See Also:
Constant Field Values
Constructor Detail

FileIndexingService

public FileIndexingService(SimpleLuceneIndex index,
                           long updateFrequency,
                           String fileIndexingServiceDataDir,
                           int maxNumFilesToIndex)
Indexes files to the given SimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency. Validation of files is enabled by default.

Parameters:
index - The SimpleLuceneIndex that will be populated and updated with Documents created from files
updateFrequency - The frequency by which files are checked for updates, in seconds. Zero or less indicates no updates should be performed.
fileIndexingServiceDataDir - The directory where serialized data will be stored
maxNumFilesToIndex - Max number of files to index per iteration
See Also:
setValidationEnabled(boolean validateFiles)

FileIndexingService

public FileIndexingService(SimpleLuceneIndex index,
                           long updateFrequency,
                           boolean saveDeletes,
                           String idFieldToRemove,
                           String fileIndexingServiceDataDir,
                           int maxNumFilesToIndex)
Indexes files to the given SimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency. Validation of files is enabled by default.

Parameters:
index - The SimpleLuceneIndex that will be populated and updated with Documents created from files
updateFrequency - The frequency by which files are checked for updates, in seconds. Zero or less indicates no updates should be performed.
saveDeletes - True to save removed documents in the index and mark them deleted, else they will be removed from the index.
idFieldToRemove - An ID field whoes docs should be removed if found in duplicate.
fileIndexingServiceDataDir - Dir where persistent data files will be stored
maxNumFilesToIndex - The number of files to index per iteration
See Also:
setValidationEnabled(boolean validateFiles)
Method Detail

setAttribute

public void setAttribute(String key,
                         Object attribute)
Sets an attribute Object that will be available for access here and from the FileIndexingServiceWriters.

Parameters:
key - The key used to reference the attribute.
attribute - Any Java Object.
See Also:
FileIndexingServiceWriter

stopIndexing

public void stopIndexing()
Stops the indexing process if it is currently taking place. This method may take a few seconds to complete.


getAttribute

public Object getAttribute(String key)
Gets an attribute Object from this FileIndexingService.

Parameters:
key - The key used to reference the attribute.
Returns:
The Java Object that is stored under the given key or null if none exists.
See Also:
FileIndexingServiceWriter

changeUpdateFrequency

public void changeUpdateFrequency(long updateFrequency)
Changes the frequency of reindexing to the new value. Same as startTimerThread(long updateFrequency).

Parameters:
updateFrequency - The frequency by which files are checked for changes and reindexed.

startTimerThread

public void startTimerThread(long updateFrequency)
Start or restarts the timer thread with the given update frequency. Same as changeUpdateFrequency(long updateFrequency).

Parameters:
updateFrequency - The number of seconds between index updates.

stopTimerThread

public void stopTimerThread()
Stops the indexing timer thread.


setValidationEnabled

public void setValidationEnabled(boolean validateFiles)
Sets whether or not to validate the files being indexed and create a validation report, which is indexed. If set to true, the files will be validated, otherwise they will not. Default is true.

Parameters:
validateFiles - True to validate, else false.
See Also:
FileIndexingServiceWriter.getValidationReport()

addDirectory

public boolean addDirectory(String sourceFileDirectory,
                            Class documentWriterClass,
                            HashMap documentWriterConfigAttributes,
                            FileIndexingPlugin plugin,
                            int indexingPriority)
Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.

Parameters:
sourceFileDirectory - The file direcory that will be monitored for updates.
indexingPriority - The feature to be added to the Directory attribute
documentWriterClass - The feature to be added to the Directory attribute
documentWriterConfigAttributes - The feature to be added to the Directory attribute
plugin - The feature to be added to the Directory attribute
Returns:
True if the directory was added successfully.

addDirectory

public boolean addDirectory(File srcDir,
                            Class documentWriterClass,
                            HashMap documentWriterConfigAttributes,
                            FileIndexingPlugin plugin,
                            int indexingPriority)
Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.

Parameters:
srcDir - The file direcory that will be monitored for updates.
indexingPriority - The feature to be added to the Directory attribute
documentWriterClass - The feature to be added to the Directory attribute
documentWriterConfigAttributes - The feature to be added to the Directory attribute
plugin - The feature to be added to the Directory attribute
Returns:
True if the directory was added successfully.

isIndexing

public boolean isIndexing()
Determins whether indexing is in progress.

Returns:
True if indexing is in progress, false if not

isDirectoryConfigured

public boolean isDirectoryConfigured(File srcDir)
Determines whether the given directory is configured for indexing.

Parameters:
srcDir - A directory of indexable files.
Returns:
True if this directory is already configured for indexing, false otherwise.

getConfiguredDirectories

public HashMap getConfiguredDirectories()
Gets a HashMap of all directories that are configured in this FileIndexingService, keyed by absolute path.

Returns:
The configuredDirectories value

deleteDirectory

public boolean deleteDirectory(String sourceFileDirectory)
Deletes the files in the given directory from the index and removes it from the configuration. Assumes the directory was previously added to the index using the addDirectory(java.lang.String, java.lang.Class, java.util.HashMap, org.dlese.dpc.index.writer.FileIndexingPlugin, int) method.

Parameters:
sourceFileDirectory - The directory of files needing to be removed from the index.
Returns:
True if the directory of files exsited in the index and was removed.

deleteDirectory

public boolean deleteDirectory(File srcDir)
Deletes the files in the given directory from the index and removes it from the configuration. Assumes the directory was previously added to the index using the addDirectory(java.lang.String, java.lang.Class, java.util.HashMap, org.dlese.dpc.index.writer.FileIndexingPlugin, int) method.

Parameters:
srcDir - The directory of files needing to be removed from the index.
Returns:
True if the directory of files exsited in the index and was removed.

getUpdateFrequency

public long getUpdateFrequency()
Gets the updateFrequency attribute of the FileIndexingService object

Returns:
The updateFrequency value

getLastSyncTime

public long getLastSyncTime()
Gets the lastSyncTime attribute of the FileIndexingService object

Returns:
The lastSyncTime value

getNumRecordsToDelete

public int getNumRecordsToDelete()
Gets the numRecordsToDelete attribute of the FileIndexingService object

Returns:
The numRecordsToDelete value

getNumRecordsToAdd

public int getNumRecordsToAdd()
Gets the numRecordsToAdd attribute of the FileIndexingService object

Returns:
The numRecordsToAdd value

getNumRecordsToReplace

public int getNumRecordsToReplace()
Gets the numRecordsToReplace attribute of the FileIndexingService object

Returns:
The numRecordsToReplace value

indexFiles

public void indexFiles(boolean reindexAll,
                       FileIndexingObserver observer)
Updates the index to reflect the files in the directories this service is monitoring, with the option to run the update in the background. Any new, deleted or modified files that appear in the directories will be reflected in the index.

Parameters:
reindexAll - True to reindex all files regardless of file mod date, False to reindex only those files that have changed since the last indexing.
observer - The FileIndexingObserver that will be notified when indexing is complete, or null to use none

indexFiles

public void indexFiles(boolean reindexAll,
                       File directory,
                       FileIndexingObserver observer)
Updates the index to reflect the files in the directory indicated, which must have been previously added to this FileIndexingService using addDirectory(java.lang.String, java.lang.Class, java.util.HashMap, org.dlese.dpc.index.writer.FileIndexingPlugin, int). Any new, deleted or modified files that appear in the directory will be reflected in the index.

Parameters:
reindexAll - True to reindex all files regardless of file mod date, False to reindex only those files that have changed since the last indexing.
directory - The directory to index.
observer - The FileIndexingObserver that will be notified when indexing is complete, or null to use none

indexFile

public void indexFile(File fileToIndex,
                      FileIndexingPlugin plugin)
               throws FileIndexingServiceException
Indexes a single file. The operaion is executed serially to completion.

Parameters:
fileToIndex - The file to index.
plugin - A FileIndexingPlugin or null.
Throws:
FileIndexingServiceException - If unable to index

removeDocs

public final void removeDocs(String field,
                             String term,
                             FileIndexingServiceWriter docWriter)
Removes all documents that match the given term within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in at the constructor. This is useful for removing a single document that is indexed with a unique ID field, or to remove a group of documents mathcing the same term for a given field. For example you might pass in an ID of a record that needs to be removed along with the ID field that it is indexed under, or the file path corresponding to a record along with the field "docsource." Note this is the same as SimpleLuceneIndex.removeDocs(String,String) but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.

Parameters:
field - The field that is searched.
term - The term that is matched for removal.
docWriter - The FileIndexingServiceWriter to use

removeDocs

public final void removeDocs(String field,
                             String[] terms,
                             FileIndexingServiceWriter docWriter)
Removes all documents that match the given terms within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in at the constructor. This is useful for removing multiple documents that are indexed with a unique ID field. For example you might pass in an array of IDs needing to be removed. Note this is the same as SimpleLuceneIndex.removeDocs(String,String[]) but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.

Parameters:
field - The field that is searched.
terms - The terms that are matched for removal.
docWriter - The FileIndexingServiceWriter to use

removeDocs

public final void removeDocs(String field,
                             String term,
                             FileIndexingServiceWriter docWriter,
                             boolean saveDeletedRecords)
Removes all documents that match the given term within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in to this method. This is useful for removing a single document that is indexed with a unique ID field, or to remove a group of documents mathcing the same term for a given field. For example you might pass in an ID of a record that needs to be removed along with the ID field that it is indexed under, or the file path corresponding to a record along with the field "docsource." Note this is the same as SimpleLuceneIndex.removeDocs(String,String) but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.

Parameters:
field - The field that is searched.
term - The term that is matched for removal.
saveDeletedRecords - True to save the removed documents in the index and mark them deleted, else they will be removed from the index.
docWriter - The FileIndexingServiceWriter to use

removeDocs

public final void removeDocs(String field,
                             String[] terms,
                             FileIndexingServiceWriter docWriter,
                             boolean saveDeletes)
Removes all documents that match the given terms within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in to this method. This is useful for removing multiple documents that are indexed with a unique ID field. For example you might pass in an array of IDs needing to be removed. Note this is the same as SimpleLuceneIndex.removeDocs(String,String[]) but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.

Parameters:
field - The field that is searched.
terms - The terms that are matched for removal.
saveDeletes - True to save the removed documents in the index and mark them deleted, else they will be removed from the index.
docWriter - Writer used to perform the delete.

reindexDocs

public int reindexDocs(String field,
                       String term,
                       boolean reindexAll)
Re-indexes all documents that match the given term within the given field. Requires that the file for the given document is still in it's original location. If the file is not in it's original location then the index will remove the document without updating and it will not be marked as deleted. This is useful for updating a single document that is indexed with a unique ID field, or to update a group of documents mathcing the same term for a given field. For example you might pass in an ID of a record that needs updating along with the ID field that it is indexed under, or the file path corresponding to a record that needs updating along with the field "docsource."

Parameters:
field - The field that is searched.
term - The term that is matched for updates.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.
Returns:
The number of matching documents to be updated.

reindexDocs

public int reindexDocs(String field,
                       String[] terms,
                       boolean reindexAll)
Re-indexes all documents that match the given terms within the given field. This is useful for updating multiple documents that are indexed with a unique ID field. For example you might pass in an array of IDs needing to be updated along with the ID field that it is indexed under, or an array of file paths corresponding to records that need updating along with the field "docsource."

Parameters:
field - The field that is searched.
terms - The terms that are matched for updates.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.
Returns:
The number of matching documents to be updated.

reindexDocs

public int reindexDocs(String query,
                       boolean reindexAll)
Reindexes Documents managed by this FileIndexingService that match the given Lucene query.

Parameters:
query - A Lucene search query.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.
Returns:
The number of matching documents to be updated.

reindexDocs

public void reindexDocs(org.apache.lucene.document.Document[] docs,
                        boolean reindexAll)
Reindexes the given Documents.

Parameters:
docs - Lucene Documents from the same index that is managed by this FileIndexingService.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.

reindexDocs

public void reindexDocs(ResultDocList docs,
                        boolean reindexAll)
Reindexes the Documents in the given ResultDocs.

Parameters:
docs - Lucene ResultDocs from the same index that is managed by this FileIndexingService.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.

getIndexingMessages

public ArrayList getIndexingMessages()
Gets the last 10 indexing status messages.

Returns:
The indexingMessages.

startTester

public void startTester(String docRoot,
                        String sourceFileDirectory)
Starts a FileMoveTester iff one is not already initialized. The FileMoveTester simulate moving files in and out of the sourceFile directory, for testing purposes only. Warning: FileMoveTester moves metadatafiles. Only use with test records!)

Parameters:
docRoot - The context document root as obtainied by calling getServletContext().getRealPath("/");
sourceFileDirectory - DESCRIPTION

stopTester

public void stopTester()
Stops the FileMoveTester


getSimpleDateStamp

public static String getSimpleDateStamp()
Return a string for the current time and date, sutiable for display in log files and output to standout:

Returns:
The dateStamp value

getDateStamp

public static String getDateStamp()
Return a string for the current time and date, sutiable for display in log files and output to standout:

Returns:
The dateStamp value

setDebug

public static void setDebug(boolean db)
Sets the debug attribute object

Parameters:
db - The new debug value

DLESE Tools
v1.6.0