\CatdocXlsExtractor
Extracts text from an XLS format Microsoft Excel document. Uses the xls2csv command-line utility to do so. This utility is part of the catdoc suite of command-line utilities, which can be downloaded at:
{@link http://wagner.pp.ru/~vitus/software/catdoc/}
The path to the xls2csv binary will be detected automatically if it lives at
/usr/bin/xls2csv or /usr/local/bin/xls2csv. If your xls2csv binary is in a
non-standard place, you can set it in your _ss_environment.php file like so:
<code>
define('XLS2CSV_BINARY_LOCATION', '/home/username/bin/xls2csv');
</code>
Or, if using _config.php, you can also set it directly on the class:
<code>
CatdocXlsExtractor::$binary_location = '/home/username/bin/xls2csv';
</code>
- Author: Darren Inwood <darren.inwood@chrometoaster.com>
Synopsis
class CatdocXlsExtractor
extends ZendSearchLuceneTextExtractor
{
- // members
- public static array $extensions = ;
- public static $binary_location;
- // Inherited members from ZendSearchLuceneTextExtractor
- public static array $extensions;
- public static integer $priority;
- // methods
- public static String extract()
- protected static String|Boolean get_binary_path()
- // Inherited methods from ZendSearchLuceneTextExtractor
- public abstract static String extract()
Hierarchy
Extends
Members
public
-
$binary_location
Holds the location of the xls2csv binary. Should be a full filesystem path. -
$extensions
The extensions that can be handled by this text extractor. -
$extensions
An array of strings representing file extensions that can be handled by this TextExtractor. Do not include a dot in your extensions. Extensions should be in lower case, and will detect all case variations on scanned files. -
$priority
Controls the order in which text extractor classes are tried for a specific file extension. Default is 100. To make your custom extractor run before an inbuilt one, set this to less than 100, or to make it run afterwards set it to more than 100.
Methods
protected
- get_binary_path() — Try to detect where the xls2csv binary has been installed.
public
- extract() — Returns a string containing the text in the given Microsoft Excel XLS document.
Inherited from ZendSearchLuceneTextExtractor
public
- extract() — Returns text for a given full filesystem path. If a file cannot be processed, you should return an empty string.