Showing posts with label SlimerJS. Show all posts
Showing posts with label SlimerJS. Show all posts

Thursday, November 7, 2013

Saving downloaded files in SlimerJS (and Casper and Phantom)

It seems a common request is to be able to see not just the HTML of the main page that PhantomJS/SlimerJS are downloading, but also all the other files (images, CSS, JavaScript, fonts, etc.) that are being fetched. You can use onResourceReceived to see them being fetched, but not their body.

The situation with PhantomJS is a bit confusing: I believe there is a patch to allow this, but it hasn't been applied yet. There is also a download API being proposed (or possibly already implemented), but that appears to be for the special case of files that have a Content-Disposition: attachment header. (?)

In SlimerJS it is possible to use response.body inside the onResourceReceived handler. However to prevent using too much memory it does not get anything by default. You have to first set page.captureContent to say what you want. You assign an array of regexes to page.captureContent to say which files to receive. The regex is applied to the mime-type. In the example code below I use /.*/ to mean "get everything". Using [/^image/.+$/] should just get images, etc.

The below code sample will download and save all files. It is complete; you just have to edit the url at the top.

var url="http://...";

var fs=require('fs');
var page = require('webpage').create();

fs.makeTree('contents');

page.captureContent = [ /.*/ ];

page.onResourceReceived = function(response) {
//console.log('Response (#' + response.id + ', stage "' + response.stage + '"): ' + JSON.stringify(response));
if(response.stage!="end" || !response.bodySize)return;

var matches = response.url.match(/[/]([^/]+)$/);
var fname = "contents/"+matches[1];

console.log("Saving "+response.bodySize+" bytes to "+fname);
fs.write(fname,response.body);
};

page.onResourceRequested = function(requestData, networkRequest) {
//console.log('Request (#' + requestData.id + '): ' + JSON.stringify(requestData));
};

page.open(url,function(){
    phantom.exit();
    });


It is verbose in that it says what it is saving. If you want it much more verbose, to see what other information is passing back and forth, there are two logging lines commented out.

WARNING: this works in SlimerJS 0.9 (and should work in 0.8.x), but the API may change in future (to keep in sync with PhantomJS).


Tuesday, October 15, 2013

SlimerJS: getting it to work with self-signed HTTPS

SlimerJS (as of 0.8.3) lacks the commandline options of PhantomJS to say "relax about bad certificates". Unfortunately the self-signed SSL certificate, that developers typically use during development, counts as a bad certificate.

Here are the steps needed to handle this:

1. slimerjs --createprofile AllowSSL
  Make a note of the directory it has created.
  (You can call your new profile anything, "AllowSSL" is just for example.)

2. Go to normal desktop Firefox, browse to the URL in question, see the complaint, add it as a security exception.
  Chances are, if you have been testing your website already, that you've already done this and you can skip this step.

3. Go to your Firefox profile, and look for the file called "cert_override.txt". Copy that to the directory you created in step 1.

4. Have a look at the copy you just made of "cert_override.txt".
  If it only has the entry you added in step 2, you are done.
  Otherwise, remove the entries you don't want.
  (The file format is easy: one certificate per line.)

5. Now when you need to run slimerjs you must run it with the "-P AcceptSSL" commandline parameter.
  E.g. "slimerjs -P AcceptSSL httpstest.js"

If you are using SlimerJS with CasperJS (requires CasperJS 1.1 or later), do the same, e.g.
   casperjs test --engine=slimerjs -P AcceptSSL tests_involving_https.js