\StaticSiteCrawler
Extends PHPCrawler essentially to override its handleDocumentInfo() method.
Synopsis
class StaticSiteCrawler
extends PHPCrawler
{
- // members
- protected $urlList;
- protected boolean $verbose = false;
- protected $utils;
- private static $log_file = NULL;
- // methods
- public void __construct()
- public mixed handleDocumentInfo()
- protected void initCrawlerProcess()
Hierarchy
Extends
- PHPCrawler
Tasks
Line | Task |
---|---|
756+ | Can we make use of PHPCrawlerDocumentInfo#error_occured instead of manually checking server codes?? |
756+ | The comments below state that badly formatted URLs never make it to our caching logic. Wrong. - Pass the preg_replace() call for "fixing" $mossBracketRegex into StaticSiteUrlProcessor#postProcessUrl() |
781 | is this needed? |
Members
private
- $log_file
—
string
Set this by using the yml config system
protected
- $urlList — array
- $utils
- $verbose — boolean
Methods
protected
public
- __construct()
- handleDocumentInfo() — After checking raw status codes out of PHPCrawler we continue to save each URL to our cache file