How to create PHP Crawler

0 14,988

Hi, we are going to teach you how to create PHP crawler in php mysql to search all links from a particular website containing those links. Web crawler are bots that are used to search for information from websites by scraping their HTML content. crawlers are everywhere they move on and on to many webpages each second. The Google web crawler will enter your domain and scan every page of your website, extracting page titles, descriptions, keywords and links then report back to Google HQ and add information to their huge database. Today i would like to teach you how to make your own basic crawler, not one that scans the whole Internet, though but one that is able to extract all the links from a given website url.

How to create PHP Crawler

Getting Started

You first have to Download the library from the projects website. we will be using helper class Simple HTML DOM. Download this zip file, unzip it, and upload the simple_html_dom.php file to your project folder. it contains function we’ll be using to traverse the elements of a webpage more easily.

Creating A HTML Form to Get URL

in this step we are going to creating a HTML simple form which help we are getting any website url to crawl.

<html>
<head>
<title>Web crawler</title>
<link type="text/css" rel="stylesheet" href="style.css" />
</head>
<body>
<form method="post" action="index.php">
<p style="text-align:center;"><input type="text" name="target" class="input" /></p>
<p style="text-align:center;"><input type="submit" name="crawl" class="button" value="Submit" /></p>
</form>
</body>
</html>

Download More Source Code Like

Styling The Form using CSS

in this step we are going to creating some style for form to attractive.

.input{
    border: 1px solid red;
    width:50%;
    margin:auto;
    height:40px;
    font-size:100%;
    box-sizing: border-box;
    outline: none;
    background:#FAFAFA;
}
.button{
    background-color: red;
    border: none;
    color: white;
    padding:15px 30px 15px 30px;
    text-align: center;
    text-decoration: none;
    font-size: 17px;
    margin: 4px 2px;
    cursor: pointer;
    border-radius:4px;
}
.links{
    text-align:Center;
    font-family:arial;
    color:#00BFFF;
}

Import Simple_html_dom.php and Some Functions using PHP

<?php
include_once('simple_html_dom.php');
$html = new simple_html_dom();
if(isset($_POST['crawl'])){
    $crawl = $_POST['target'];
    $find = "http://";
    if(strpos($crawl,$find)!==false){
    $html->load_file($crawl);
    foreach($html->find('a') as $link)
    {
        if(strpos($link,"$crawl")!==false){
            echo "<p class='links'>".$link->href."</p>";
        }
        else if(strpos($link,"http://")!==false || strpos($link,"https://")!==false){
            echo "<p class='links'>".$link->href."</p>";
        }
        else{
            echo "<p class='links'>"."$crawl/".$link->href."</p>";
        }
    }
    }
    else{
        echo "Invalid URL";
    }
}
?>

If you facing any type of problem with this source code then you can Download the Complete source code in zip Formate by clicking the below button Download Now otherwise you can send Comment.

Download Source Code

Leave A Reply

Your email address will not be published.