Monday, April 8, 2013

PhantomJS: POST, auth and timeouts

I recently discovered PhantomJS, which (for me) is an alternative to Selenium, with two key differences that (again, for me) make it very useful:
  1. It is headless, meaning you don't see anything graphical. That means I can run it on a server, from the commandline, without needing X installed. It also means it causes less load.
  2. It embeds webkit, rather than attempting to interface with many browsers, and control them as a user would.
The second point allows me to POST to a URL, which is great for testing how web services work in a real browser. Selenium refuses to offer this because it is not something a user can do with a browser. (The workaround in Selenium was to make a temporary page that uses an AJAX call to POST to the URL, then does something with what is returned.)

There are two things that PhantomJS makes difficult, which I will show techniques for here. The first is that authorization is kind-of-broken. The second is timeouts for requests that never finish (e.g. an http streaming web service). But, first, the basic example, without auth or timeouts, and using GET:
var page=require('webpage').create();
var callback=function(status){
    if (status=='success'){
        console.log(page.plainText);
        }else{
        console.log('Failed to load.');
        }
    phantom.exit();
    };
var url="http://example.com/something?name=value";
page.open(url,callback);
Now here is the same code with basic auth (shown in orange), and a five second time-out (shown in red):
var page=require('webpage').create();
page.customHeaders={'Authorization': 'Basic '+btoa('username:password')};
var callback=function(status){
    if(timer)window.clearTimeout(timer);
    if (status=='success' || status=='timedout') {
        console.log(page.plainText);
        }else{
        console.log('Failed to load.');
        }
    phantom.exit();
    };
var timer=window.setTimeout(callback,5000,'timedout');
var url="http://example.com/something?name=value";
page.open(url,callback);
Don't use page.settings.userName = 'username';page.settings.password = 'password'; because it has a bug as of PhantomJS 1.9.0 (it uses two connections for GET requests and doesn't work at all for POST requests). Instead make your own basic auth header as shown here (thanks to Igor Semenko, on the PhantomJS mailing list for this trick).

For the time-out code I still call the same callback, but pass a status of "timedout" instead of "success" (so the callback could react differently, if timedout was a bad thing - here I treat them the same). So, if the URL finishes loading within 5000ms, then callback is called (by the page.open() call) with status equal to "success". If it has not finished within 5000ms then callback is called (by the javascript timer), with status equal to "timedout".

I explicitly clear the timer immediately when entering callback(). This is not really necessary, as we're about to shutdown (the phantom.exit() call) anyway. But it feels safer because otherwise callback() might be called twice (i.e. if the page loaded in exactly 5000ms); the more computation being done in callback(), especially if asynchronous, the more this might occur. (Well to be precise: that catches the case when page loads in just under 5000ms and triggers the callback before the timer does. But, if the timer gets in first, and then the page loads in just over 5000ms, and callback computation takes a while, then we may still get two calls. I think calling page.close() in callback() might prevent this, but that is untested.)

Finally, here is the same code using POST instead of GET:
var page=require('webpage').create();
page.customHeaders={'Authorization': 'Basic '+btoa('username:password')};
var callback=function(status){
    if(timer)window.clearTimeout(timer);
    if (status=='success' || status=='timedout') {
        console.log(page.plainText);
        }else{
        console.log('Failed to load.');
        }
    phantom.exit();
    };
var timer=window.setTimeout(callback,5000,'timedout');
var url="http://example.com/something";
var data="name=value";
page.open(url,'post',data,callback);
The differences are shown in red. It couldn't be easier!

4 comments:

Anonymous said...

Thanks! Was having this exact problem, only difference was I was encountering it with resources requiring NTLM authentication. Workaround seems to be to use a proxy to do the auth for us sadly.

AJB said...

Hello is there a way to translate your code into java?

Darren Cook said...

AJB: PhantomJS uses JavaScript, so a translation to Java makes no sense. If you meant doing scripted control of a browser, the Selenium project is written in Java. (Though as I mention at the top of this blog post, I find PhantomJS/SlimerJS/CasperJS more useful.)

Anil Katakam said...

How to do ADFS Authentication like this?