Showing posts with label selenium. Show all posts
Showing posts with label selenium. Show all posts

Monday, April 8, 2013

PhantomJS: POST, auth and timeouts

I recently discovered PhantomJS, which (for me) is an alternative to Selenium, with two key differences that (again, for me) make it very useful:
  1. It is headless, meaning you don't see anything graphical. That means I can run it on a server, from the commandline, without needing X installed. It also means it causes less load.
  2. It embeds webkit, rather than attempting to interface with many browsers, and control them as a user would.
The second point allows me to POST to a URL, which is great for testing how web services work in a real browser. Selenium refuses to offer this because it is not something a user can do with a browser. (The workaround in Selenium was to make a temporary page that uses an AJAX call to POST to the URL, then does something with what is returned.)

There are two things that PhantomJS makes difficult, which I will show techniques for here. The first is that authorization is kind-of-broken. The second is timeouts for requests that never finish (e.g. an http streaming web service). But, first, the basic example, without auth or timeouts, and using GET:
var page=require('webpage').create();
var callback=function(status){
    if (status=='success'){
        console.log(page.plainText);
        }else{
        console.log('Failed to load.');
        }
    phantom.exit();
    };
var url="http://example.com/something?name=value";
page.open(url,callback);
Now here is the same code with basic auth (shown in orange), and a five second time-out (shown in red):
var page=require('webpage').create();
page.customHeaders={'Authorization': 'Basic '+btoa('username:password')};
var callback=function(status){
    if(timer)window.clearTimeout(timer);
    if (status=='success' || status=='timedout') {
        console.log(page.plainText);
        }else{
        console.log('Failed to load.');
        }
    phantom.exit();
    };
var timer=window.setTimeout(callback,5000,'timedout');
var url="http://example.com/something?name=value";
page.open(url,callback);
Don't use page.settings.userName = 'username';page.settings.password = 'password'; because it has a bug as of PhantomJS 1.9.0 (it uses two connections for GET requests and doesn't work at all for POST requests). Instead make your own basic auth header as shown here (thanks to Igor Semenko, on the PhantomJS mailing list for this trick).

For the time-out code I still call the same callback, but pass a status of "timedout" instead of "success" (so the callback could react differently, if timedout was a bad thing - here I treat them the same). So, if the URL finishes loading within 5000ms, then callback is called (by the page.open() call) with status equal to "success". If it has not finished within 5000ms then callback is called (by the javascript timer), with status equal to "timedout".

I explicitly clear the timer immediately when entering callback(). This is not really necessary, as we're about to shutdown (the phantom.exit() call) anyway. But it feels safer because otherwise callback() might be called twice (i.e. if the page loaded in exactly 5000ms); the more computation being done in callback(), especially if asynchronous, the more this might occur. (Well to be precise: that catches the case when page loads in just under 5000ms and triggers the callback before the timer does. But, if the timer gets in first, and then the page loads in just over 5000ms, and callback computation takes a while, then we may still get two calls. I think calling page.close() in callback() might prevent this, but that is untested.)

Finally, here is the same code using POST instead of GET:
var page=require('webpage').create();
page.customHeaders={'Authorization': 'Basic '+btoa('username:password')};
var callback=function(status){
    if(timer)window.clearTimeout(timer);
    if (status=='success' || status=='timedout') {
        console.log(page.plainText);
        }else{
        console.log('Failed to load.');
        }
    phantom.exit();
    };
var timer=window.setTimeout(callback,5000,'timedout');
var url="http://example.com/something";
var data="name=value";
page.open(url,'post',data,callback);
The differences are shown in red. It couldn't be easier!

Monday, May 28, 2012

php-webdriver bindings for selenium: how to add time-outs

Not all webpages finish loading. In particular I've a page that keeps streaming data back to the client, and never finishes. (For instance it might be used from an ajax call.) I want to test this from Selenium, but have been hitting problems. The main problem is Selenium's get() function, which is used to fetch a fresh URL, does not return until the page has finished loading [1]. In my case that meant never, and so my test script locked up!

However all is not lost; you can specify a page load timeout. It is hidden in the protocol docs, but I've added it to the php webdriver library I use (v0.9). See the three functions below [2]; just paste them in to the bottom of WebDriver.php.

I also needed one bug fix in WebDriver.php's public function get($url). It currently ends with:
    $response=curl_exec($session);

Just after that line you should add this:
    return $this->extractValueFromJsonResponse($response);


The time-out, and that bug fix, can be used like this:

require_once "/usr/local/src/selenium/php-webdriver-bindings-0.9.0/phpwebdriver/WebDriver.php";
$webdriver = new WebDriver("localhost", "4444");
$webdriver->connect("firefox");
$webdriver->setPageLoadTimeout(2000);   //2 seconds
$url="http://example.com/forever.php"; //A page that never finishes loading
$obj=$webdriver->get($url);
if($obj===null){
    $current_url=$webdriver->getCurrentUrl();
    if(!$current_url){
        //Selenium-server not running
        }
    else{
        //It worked! (it completed loading in under two seconds)
        }
    }
elseif($obj->class=='org.openqa.selenium.TimeoutException'){
    //It timed out
    }
elseif($obj->class=='org.openqa.selenium.remote.UnreachableBrowserException'){
    //Browser was closed (or selenium-server was shutdown)
    }
else{
    echo "FAILED:";print_r($obj);
    }

This is useful stuff. There is still one problem left for me: I wanted to load two seconds worth of data and then look at it. But I cannot. The browser refuses to listen to selenium while it is loading a page! So though get() returned control to my script after two seconds, I cannot do anything with that control (except close the browser window), because the URL is still actually loading. And it will do that forever!!  (I've played with an interesting alternative approach, which also fails, but suggests that a solution is possible. But that is out of the scope of this post, which is to show how to add the time limit functions to php-webdriver-bindings.)

[1]: This is browser-specific behaviour, not by Selenium design. Firefox and Chrome, at least, behave this way.


[2]: Consider this code released, with no warranty, under the MIT license, and permission granted to use in the php-webdriver-bindings project with no attribution required.

    /**
     * Set wait for a page to load.
     *
     * This timeout is for the get() function. (Firefox and Chrome, at least, won't return from get()
     * until a page is fully loaded.  If remote server is streaming content, they would never return
     * without this time-out.)
     *
     * @param Number $timeout Number of milliseconds to wait.
     * @author Darren Cook, 2012
     * @internal http://code.google.com/p/selenium/wiki/JsonWireProtocol#/session/:sessionId/timeouts
     */
    public function setPageLoadTimeout($timeout) {
        $request = $this->requestURL . "/timeouts";       
        $session = $this->curlInit($request);
        $args = array('type'=>'page load', 'ms' => $timeout);
        $jsonData = json_encode($args);
        $this->preparePOST($session, $jsonData);
        curl_exec($session);       
    }

    /**
     * Set wait for a script to finish.
     *
     * @param Number $timeout Number of milliseconds to wait.
     * @author Darren Cook, 2012
     * @internal http://code.google.com/p/selenium/wiki/JsonWireProtocol#/session/:sessionId/timeouts
     */
    public function setAsyncScriptTimeout($timeout) {
        $request = $this->requestURL . "/timeouts";       
        $session = $this->curlInit($request);
        $args = array('type'=>'script', 'ms' => $timeout);
        $jsonData = json_encode($args);
        $this->preparePOST($session, $jsonData);
        curl_exec($session);       
    }
    /**
     * Set implict wait.
     *
     * This is for waiting for page elements to appear. Not useful for scripts or
     * waiting for the initial get() call to time out.
     *
     * @param Number $timeout Number of milliseconds to wait.
     * @author Darren Cook, 2012
     * @internal http://code.google.com/p/selenium/wiki/JsonWireProtocol#/session/:sessionId/timeouts
     */
    public function setImplicitWaitTimeout($timeout) {
        $request = $this->requestURL . "/timeouts";       
        $session = $this->curlInit($request);
        $args = array('type'=>'implicit', 'ms' => $timeout);
        $jsonData = json_encode($args);
        $this->preparePOST($session, $jsonData);
        curl_exec($session);       
    }