SilverStripe\TextExtraction\Extractor\TikaServerTextExtractor

Enables text extraction of file content via the Tika Rest Server

{@link http://tika.apache.org/1.7/gettingstarted.html}

Synopsis

class TikaServerTextExtractor extends FileTextExtractor {

// members
private static integer $priority = 80;
private static $server_endpoint;
protected TikaRestClient $client = NULL;
protected array $supportedMimes = ;

// Inherited members from FileTextExtractor
protected static $sorted_extractor_classes;

// methods
public TikaRestClient getClient()
public string getServerEndpoint()
public float getVersion()
public boolean isAvailable()
public boolean supportsExtension()
public boolean supportsMime()
public void getContent()

// Inherited methods from FileTextExtractor
protected static array get_extractor_classes()
protected static FileTextExtractor get_extractor()
public static FileTextExtractor|null for_file()
protected static string getPathFromFile()
public abstract boolean isAvailable()
public abstract boolean supportsExtension()
public abstract boolean supportsMime()
public abstract string getContent()

}

Hierarchy

Extends

SilverStripe\TextExtraction\Extractor\FileTextExtractor

Members

private

$priority — integer
Tika server is pretty efficient so use it immediately if available
$server_endpoint — string
Server endpoint

protected

$client — SilverStripe\TextExtraction\Rest\TikaRestClient
$sorted_extractor_classes — array
Cache of extractor class names, sorted by priority
$supportedMimes — array
Cache of supported mime types

Methods

public

getClient()
getContent()
getServerEndpoint()
getVersion() — Get the version of Tika installed, or 0 if not installed
isAvailable()
supportsExtension()
supportsMime()

Inherited from SilverStripe\TextExtraction\Extractor\FileTextExtractor

protected

getPathFromFile() — Some text extractors (like pdftotext) may require a physical file to read from, so write the current file contents to a temp file and return its path
get_extractor() — Get the text file extractor for the given class
get_extractor_classes() — Gets the list of prioritised extractor classes

public

for_file() — Given a File object, decide which extractor instance to use to handle it
getContent() — Given a File instance, extract the contents as text.
isAvailable() — Checks if the extractor is supported on the current environment, for example if the correct binaries or libraries are available.
supportsExtension() — Determine if this extractor supports the given extension.
supportsMime() — Determine if this extractor supports the given mime type.