\HTMLPurifier_Lexer_PH5P

Experimental HTML5-based parser using Jeroen van der Meer's PH5P library.

Occupies space in the HTML5 pseudo-namespace, which may cause conflicts.

Synopsis

class HTMLPurifier_Lexer_PH5P extends HTMLPurifier_Lexer_DOMLex {

// Inherited members from HTMLPurifier_Lexer
public boolean $tracksLineNumbers;
protected array $_special_entity2str;

// methods
public void tokenizeHTML()

// Inherited methods from HTMLPurifier_Lexer_DOMLex
public void __construct()
public void tokenizeHTML()
protected void tokenizeDOM()
protected void transformAttrToAssoc()
public void muteErrorHandler()
public void callbackUndoCommentSubst()
public void callbackArmorCommentEntities()
protected void wrapHTML()

// Inherited methods from HTMLPurifier_Lexer
public static Concrete create()
public void __construct()
public void parseData()
public HTMLPurifier_Token tokenizeHTML()
protected static void escapeCDATA()
protected static void escapeCommentedCDATA()
protected static void CDATACallback()
public void normalize()
public void extractBody()

}

Hierarchy

Extends

HTMLPurifier_Lexer_DOMLex

Tasks

Line	Task
252+	Consider making protected
286+	Consider making protected

Members

protected

$_special_entity2str
Most common entity to raw value conversion table for special entities.

public

$tracksLineNumbers
Whether or not this lexer implements line-number/column-number tracking.

Methods

public

tokenizeHTML()

Inherited from HTMLPurifier_Lexer_DOMLex

protected

tokenizeDOM() — Recursive function that tokenizes a node, putting it into an accumulator.
transformAttrToAssoc() — Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
wrapHTML() — Wraps an HTML fragment in the necessary HTML

public

callbackArmorCommentEntities() — Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them
callbackUndoCommentSubst() — Callback function for undoing escaping of stray angled brackets in comments
muteErrorHandler() — An error handler that mutes all errors
tokenizeHTML()

Inherited from HTMLPurifier_Lexer

protected

CDATACallback() — Callback function for escapeCDATA() that does the work.
escapeCDATA() — Translates CDATA sections into regular sections (through escaping).
escapeCommentedCDATA() — Special CDATA case that is especially convoluted for <script>

public

create() — Retrieves or sets the default Lexer as a Prototype Factory.
extractBody() — Takes a string of HTML (fragment or document) and returns the content
normalize() — Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
parseData() — Parses special entities into the proper characters.
tokenizeHTML() — Lexes an HTML string into tokens.