Scraping websites with PHP cURL under proxy
PHP script — By Script on September 18, 2009 at 1:50 pmScraping websites with PHP cURL is damn easy. Just do it the right way – use a proxy. Here is a simple function that does the job.
Simple PHP cURL scraper:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
return $result;
-
-
}
-
-
?>
PHP cURL functions used:
- curl_init – initializes a cURL session.
- curl_setopt – sets and option for a cURL transfer.
- curl_exec – performs a cURL session.
- curl_getinfo – gets information about the last transfer.
- curl_error – returns a string containing the last error for the current session.
- curl_close – close a cURL session.
curl_setopt options used:
- CURLOPT_URL – the URL to scrap.
- CURLOPT_HEADER – inlude/exclude the header?
- CURLOPT_RETURNTRANSFER – return the transfer as a string or output it out directly? Use 1, i.e. return.
- CURLOPT_PROXY – the HTTP proxy to tunnel request through.
- CURLOPT_HTTPPROXYTUNNEL – tunnel through a given HTTP proxy? Use 1, i.e. tunnel.
- CURLOPT_CONNECTTIMEOUT – it’s obvious.
- CURLOPT_REFERER – header to be used in a HTTP request.
- CURLOPT_USERAGENT – “User Agent:” to be used in a HTTP request.
Scraper usage:
-
< ?php
-
$result = getPage(
-
‘[proxy IP]:[port]‘, // use valid proxy
-
‘http://www.google.com/search?q=twitter’,
-
‘http://www.google.com/’,
-
‘Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8′,1,5);if (empty($result[‘ERR’])) {
-
-
// Job’s done! Parse, save, etc.
-
-
// …
-
-
} else {
-
-
// WTF? Captcha or network problems?
-
-
// …
-
-
}
-
-
?>
Note: Activate cURL in php.ini if required.




(15 votes, average: 4.53 out of 5)
Tweet This
Share on Facebook
Digg This
Bookmark
Stumble2_03.gif)
1_03.gif)
1 Comment
Thanks a lot this shopperpress theme is very intersting and i like it so much ,one of the best ecommerce themes out therer…
Keep It Up