- It is headless, meaning you don't see anything graphical. That means I can run it on a server, from the commandline, without needing X installed. It also means it causes less load.
- It embeds webkit, rather than attempting to interface with many browsers, and control them as a user would.
There are two things that PhantomJS makes difficult, which I will show techniques for here. The first is that authorization is kind-of-broken. The second is timeouts for requests that never finish (e.g. an http streaming web service). But, first, the basic example, without auth or timeouts, and using GET:
var page=require('webpage').create();Now here is the same code with basic auth (shown in orange), and a five second time-out (shown in red):
var callback=function(status){
if (status=='success'){
console.log(page.plainText);
}else{
console.log('Failed to load.');
}
phantom.exit();
};
var url="http://example.com/something?name=value";
page.open(url,callback);
var page=require('webpage').create();Don't use page.settings.userName = 'username';page.settings.password = 'password'; because it has a bug as of PhantomJS 1.9.0 (it uses two connections for GET requests and doesn't work at all for POST requests). Instead make your own basic auth header as shown here (thanks to Igor Semenko, on the PhantomJS mailing list for this trick).
page.customHeaders={'Authorization': 'Basic '+btoa('username:password')};
var callback=function(status){
if(timer)window.clearTimeout(timer);
if (status=='success' || status=='timedout') {
console.log(page.plainText);
}else{
console.log('Failed to load.');
}
phantom.exit();
};
var timer=window.setTimeout(callback,5000,'timedout');
var url="http://example.com/something?name=value";
page.open(url,callback);
For the time-out code I still call the same callback, but pass a status of "timedout" instead of "success" (so the callback could react differently, if timedout was a bad thing - here I treat them the same). So, if the URL finishes loading within 5000ms, then callback is called (by the page.open() call) with status equal to "success". If it has not finished within 5000ms then callback is called (by the javascript timer), with status equal to "timedout".
I explicitly clear the timer immediately when entering callback(). This is not really necessary, as we're about to shutdown (the phantom.exit() call) anyway. But it feels safer because otherwise callback() might be called twice (i.e. if the page loaded in exactly 5000ms); the more computation being done in callback(), especially if asynchronous, the more this might occur. (Well to be precise: that catches the case when page loads in just under 5000ms and triggers the callback before the timer does. But, if the timer gets in first, and then the page loads in just over 5000ms, and callback computation takes a while, then we may still get two calls. I think calling page.close() in callback() might prevent this, but that is untested.)
Finally, here is the same code using POST instead of GET:
var page=require('webpage').create();The differences are shown in red. It couldn't be easier!
page.customHeaders={'Authorization': 'Basic '+btoa('username:password')};
var callback=function(status){
if(timer)window.clearTimeout(timer);
if (status=='success' || status=='timedout') {
console.log(page.plainText);
}else{
console.log('Failed to load.');
}
phantom.exit();
};
var timer=window.setTimeout(callback,5000,'timedout');
var url="http://example.com/something";
var data="name=value";
page.open(url,'post',data,callback);