\HTMLPurifier_Lexer_PH5P
Experimental HTML5-based parser using Jeroen van der Meer's PH5P library.
Occupies space in the HTML5 pseudo-namespace, which may cause conflicts.
Synopsis
class HTMLPurifier_Lexer_PH5P
extends HTMLPurifier_Lexer_DOMLex
{
- // Inherited members from HTMLPurifier_Lexer
- public boolean $tracksLineNumbers;
- protected array $_special_entity2str;
- // methods
- public void tokenizeHTML()
- // Inherited methods from HTMLPurifier_Lexer_DOMLex
- public void __construct()
- public void tokenizeHTML()
- protected void tokenizeDOM()
- protected void transformAttrToAssoc()
- public void muteErrorHandler()
- public void callbackUndoCommentSubst()
- public void callbackArmorCommentEntities()
- protected void wrapHTML()
- // Inherited methods from HTMLPurifier_Lexer
- public static Concrete create()
- public void __construct()
- public void parseData()
- public HTMLPurifier_Token tokenizeHTML()
- protected static void escapeCDATA()
- protected static void escapeCommentedCDATA()
- protected static void CDATACallback()
- public void normalize()
- public void extractBody()
Hierarchy
Extends
Tasks
Line | Task |
---|---|
252+ | Consider making protected |
286+ | Consider making protected |
Members
protected
-
$_special_entity2str
Most common entity to raw value conversion table for special entities.
public
-
$tracksLineNumbers
Whether or not this lexer implements line-number/column-number tracking.
Methods
public
Inherited from HTMLPurifier_Lexer_DOMLex
protected
- tokenizeDOM() — Recursive function that tokenizes a node, putting it into an accumulator.
- transformAttrToAssoc() — Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
- wrapHTML() — Wraps an HTML fragment in the necessary HTML
public
- callbackArmorCommentEntities() — Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them
- callbackUndoCommentSubst() — Callback function for undoing escaping of stray angled brackets in comments
- muteErrorHandler() — An error handler that mutes all errors
- tokenizeHTML()
Inherited from HTMLPurifier_Lexer
protected
- CDATACallback() — Callback function for escapeCDATA() that does the work.
- escapeCDATA() — Translates CDATA sections into regular sections (through escaping).
- escapeCommentedCDATA() — Special CDATA case that is especially convoluted for <script>
public
- create() — Retrieves or sets the default Lexer as a Prototype Factory.
- extractBody() — Takes a string of HTML (fragment or document) and returns the content
- normalize() — Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
- parseData() — Parses special entities into the proper characters.
- tokenizeHTML() — Lexes an HTML string into tokens.