Use PHP to Get the Links from an HTML Page

Gets the links from an html page. In the example, $links is an object, so the links it contains need to be placed in an array to access them. The $linksArray is initialized and each link is added to the array in a foreach loop.

$linksArray = array();
$page = new domDocument;
$page->loadHTML(file_get_contents("http://lage.us/PHP-Get-Links-From-Page.html"));
$page->preserveWhiteSpace = false;
$links = $page->getElementsByTagName('a');
 
if ($links->length > 0) {
    foreach ($links as $link) {
        $linksArray[] = $link->getAttribute('href');
    }
}

Example:

PHP

$linksArray = array();
$page = new domDocument;
$page->loadHTML(file_get_contents("http://lage.us/PHP-Get-Links-From-Page.html"));
$page->preserveWhiteSpace = false; // do remove redundant white space
$links = $page->getElementsByTagName('a');

if ($links->length > 0) {
    foreach ($links as $link) {
        $linksArray[] = $link->getAttribute('href');
    }
}

print "<pre>";
print_r($linksArray);
print "</pre>";

Produces the result:

Array
(
    [0] => /
    [1] => php.html
    [3] => html.html
    [4] => javascript.html
    [4] => css.html
    [5] => PHP-load-CSV-into-2d-array.html
    [6] => PHP-Convert-2d-Array-to-CSV.html
    [7] => PHP-CSV-to-Array.html
    [8] => PHP-Insert-Element-Into-Array.html
    [9] => PHP-Remove-Last-Character-From-String.html
    [10] => PHP-Round-2d-Array-By-Key.html
    [11] => PHP-String-Contains-Substring.html
    [12] => PHP-Get-Contents-of-Directory.html
    [13] => PHP-Script-Time-to-Execute.html
    [14] => PHP-Loop-for-Period-of-Time.html
    [15] => PHP-Looping-Structures.html
    [16] => PHP-Get-Links-From-Page.html
    [17] => http://www.indoorclimbing.com/
    [18] => http://www.ziplinerider.com/
    [19] => http://antiqueable.com/
    [20] => http://escaperoomplayer.com/
    [21] => http://trampoline.jumpcenters.com/
    [22] => http://inflatable.jumpcenters.com/
    [23] => disclaimer.html
    [24] => privacy-policy.html
    [25] => terms-of-use.html
)