\HTMLPurifier_Lexer_DirectLex
Our in-house implementation of a parser.
A pure PHP parser, DirectLex has absolutely no dependencies, making
it a reasonably good default for PHP4. Written with efficiency in mind,
it can be four times faster than HTMLPurifier_Lexer_PEARSax3, although it
pales in comparison to HTMLPurifier_Lexer_DOMLex.
Synopsis
class HTMLPurifier_Lexer_DirectLex
extends HTMLPurifier_Lexer
{
- // members
- public boolean $tracksLineNumbers = true;
- protected string $_whitespace = "\x20\x09\x0D\x0A";
- // Inherited members from HTMLPurifier_Lexer
- public boolean $tracksLineNumbers;
- protected array $_special_entity2str;
- // methods
- protected void scriptCallback()
- public void tokenizeHTML()
- protected void substrCount()
- public void parseAttributeString()
- // Inherited methods from HTMLPurifier_Lexer
- public static Concrete create()
- public void __construct()
- public void parseData()
- public HTMLPurifier_Token tokenizeHTML()
- protected static void escapeCDATA()
- protected static void escapeCommentedCDATA()
- protected static void removeIEConditional()
- protected static void CDATACallback()
- public void normalize()
- public void extractBody()
Hierarchy
Extends
Tasks
Line | Task |
---|---|
13+ | Reread XML spec and document differences. |
263+ | Consider making protected |
314+ | Consider making protected |
Members
protected
-
$_special_entity2str
Most common entity to raw value conversion table for special entities. -
$_whitespace
Whitespace characters for str(c)spn.
public
- $tracksLineNumbers
-
$tracksLineNumbers
Whether or not this lexer implements line-number/column-number tracking.
Methods
protected
- scriptCallback() — Callback function for script CDATA fudge
- substrCount() — PHP 5.0.x compatible substr_count that implements offset and length
public
- parseAttributeString() — Takes the inside of an HTML tag and makes an assoc array of attributes.
- tokenizeHTML()
Inherited from HTMLPurifier_Lexer
protected
- CDATACallback() — Callback function for escapeCDATA() that does the work.
- escapeCDATA() — Translates CDATA sections into regular sections (through escaping).
- escapeCommentedCDATA() — Special CDATA case that is especially convoluted for <script>
- removeIEConditional() — Special Internet Explorer conditional comments should be removed.
public
- create() — Retrieves or sets the default Lexer as a Prototype Factory.
- extractBody() — Takes a string of HTML (fragment or document) and returns the content
- normalize() — Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
- parseData() — Parses special entities into the proper characters.
- tokenizeHTML() — Lexes an HTML string into tokens.