SilverStripe\TextExtraction\Extractor\TikaServerTextExtractor
Enables text extraction of file content via the Tika Rest Server
{@link http://tika.apache.org/1.7/gettingstarted.html}
Synopsis
class TikaServerTextExtractor
extends FileTextExtractor
{
- // members
- private static integer $priority = 80;
- private static $server_endpoint;
- protected TikaRestClient $client = NULL;
- protected array $supportedMimes = ;
- // Inherited members from FileTextExtractor
- protected static $sorted_extractor_classes;
- // methods
- public TikaRestClient getClient()
- public string getServerEndpoint()
- public float getVersion()
- public boolean isAvailable()
- public boolean supportsExtension()
- public boolean supportsMime()
- public void getContent()
- // Inherited methods from FileTextExtractor
- protected static array get_extractor_classes()
- protected static FileTextExtractor get_extractor()
- public static FileTextExtractor|null for_file()
- protected static string getPathFromFile()
- public abstract boolean isAvailable()
- public abstract boolean supportsExtension()
- public abstract boolean supportsMime()
- public abstract string getContent()
Hierarchy
Members
private
- $priority
—
integer
Tika server is pretty efficient so use it immediately if available - $server_endpoint
—
string
Server endpoint
protected
- $client — SilverStripe\TextExtraction\Rest\TikaRestClient
- $sorted_extractor_classes
—
array
Cache of extractor class names, sorted by priority - $supportedMimes
—
array
Cache of supported mime types
Methods
public
- getClient()
- getContent()
- getServerEndpoint()
- getVersion() — Get the version of Tika installed, or 0 if not installed
- isAvailable()
- supportsExtension()
- supportsMime()
Inherited from SilverStripe\TextExtraction\Extractor\FileTextExtractor
protected
- getPathFromFile() — Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path
- get_extractor() — Get the text file extractor for the given class
- get_extractor_classes() — Gets the list of prioritised extractor classes
public
- for_file() — Given a File object, decide which extractor instance to use to handle it
- getContent() — Given a File instance, extract the contents as text.
- isAvailable() — Checks if the extractor is supported on the current environment, for example if the correct binaries or libraries are available.
- supportsExtension() — Determine if this extractor supports the given extension.
- supportsMime() — Determine if this extractor supports the given mime type.