\HTMLPurifier_Lexer_DirectLex

Our in-house implementation of a parser.

A pure PHP parser, DirectLex has absolutely no dependencies, making
it a reasonably good default for PHP4. Written with efficiency in mind,
it can be four times faster than HTMLPurifier_Lexer_PEARSax3, although it
pales in comparison to HTMLPurifier_Lexer_DOMLex.

Synopsis

class HTMLPurifier_Lexer_DirectLex extends HTMLPurifier_Lexer {

// members
public boolean $tracksLineNumbers = true;
protected string $_whitespace = "\x20\x09\x0D\x0A";

// Inherited members from HTMLPurifier_Lexer
public boolean $tracksLineNumbers;
protected array $_special_entity2str;

// methods
protected void scriptCallback()
public void tokenizeHTML()
protected void substrCount()
public void parseAttributeString()

// Inherited methods from HTMLPurifier_Lexer
public static Concrete create()
public void __construct()
public void parseData()
public HTMLPurifier_Token tokenizeHTML()
protected static void escapeCDATA()
protected static void escapeCommentedCDATA()
protected static void removeIEConditional()
protected static void CDATACallback()
public void normalize()
public void extractBody()

}

Hierarchy

Extends

HTMLPurifier_Lexer

Tasks

Line	Task
13+	Reread XML spec and document differences.
263+	Consider making protected
314+	Consider making protected

Members

protected

$_special_entity2str
Most common entity to raw value conversion table for special entities.
$_whitespace
Whitespace characters for str(c)spn.

public

$tracksLineNumbers
$tracksLineNumbers
Whether or not this lexer implements line-number/column-number tracking.

Methods

protected

scriptCallback() — Callback function for script CDATA fudge
substrCount() — PHP 5.0.x compatible substr_count that implements offset and length

public

parseAttributeString() — Takes the inside of an HTML tag and makes an assoc array of attributes.
tokenizeHTML()

Inherited from HTMLPurifier_Lexer

protected

CDATACallback() — Callback function for escapeCDATA() that does the work.
escapeCDATA() — Translates CDATA sections into regular sections (through escaping).
escapeCommentedCDATA() — Special CDATA case that is especially convoluted for <script>
removeIEConditional() — Special Internet Explorer conditional comments should be removed.

public

create() — Retrieves or sets the default Lexer as a Prototype Factory.
extractBody() — Takes a string of HTML (fragment or document) and returns the content
normalize() — Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
parseData() — Parses special entities into the proper characters.
tokenizeHTML() — Lexes an HTML string into tokens.