Extract URLs from the website is used in many cases, generating a sitemap from website URL is one of them. You can easily get all URLs from a web page using PHP. Here we’ll provide short and simple code snippets to extract all URLs from a web page in PHP.
The following PHP code helps to get all the links from a web page URL. The file_get_contents()
function is used to get webpage content from URL. Fetched web page content is stored in $urlContent
variable. All the URLs or links are extracted from web page HTML content using DOMDocument
class. All links will validate using FILTER_VALIDATE_URL
before return and print if it is a valid URL.
$urlContent = file_get_contents('http://php.net');
$dom = new DOMDocument();
@$dom->loadHTML($urlContent);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for($i = 0; $i < $hrefs->length; $i++){
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
$url = filter_var($url, FILTER_SANITIZE_URL);
// validate url
if(!filter_var($url, FILTER_VALIDATE_URL) === false){
echo '<a href="'.$url.'">'.$url.'</a><br />';
}
}
Do you want to get implementation help, or enhance the functionality of this script? Click here to Submit Service Request
if we want to only those href values who have only pdf extension. OR in path we mentioned start point and end point.
sir i want to scrap facebook links from multiple sites how can i achieve this?
This site is very useful…………Thank you