Showing posts with label graphics. Show all posts
Showing posts with label graphics. Show all posts

Thursday, October 13, 2016

Applying R's imager library to MNIST digits

Applying R’s imager library to MNIST digits

Introduction

For machine learning, when it comes to training data, the more the better! Well, there are a whole bunch of disclaimers on that statement, but as long as the data is representative of the type of data the machine learning model will be used on in production, doubling the amount of unique training data will be more useful than doubling the number of epochs (if deep learning) or trees (if random forest, GBM, etc.).
I’m going to concentrate in this article on the MNIST data set, and look at how to use R’s imager library to increase the number of training samples. I take a deep look at MNIST in my new book, Practical Machine Learning with H2O, which (not surprisingly) I highly recommend. But, briefly, the MNIST data is handwritten digits, and the challenge is to identify which of the 10 digits, 0 to 9, each one is.
There are 60,000 training sample, plus 10,000 test samples (and everyone uses the same test set - so watch out for the inadvertent over-fitting in published papers). The samples are 28x28 pixels, each a 0 to 255 greyscale value. I usually split the 60,000 samples into 50K for training, 10K for validation (to make sure I am not over-fitting on the test data).

imager and parallel

I used this R library for dealing with images: http://dahtah.github.io/imager/ which is based on a C++ library called CImg. Most function calls are just wrappers around the C++ code, which means they are fairly quick. It is well-documented with a good starter tutorial.
I used version 0.20 for all my development. I have just seen that 0.30 is now out, and in particular offers native parallelization. This is great! Out of scope for this article, but I used imager in conjunction with R’s parallel functions, and found the latter quite clunky, with time spent copying data structures limiting scalability. On the other hand, the docs say these new parallel options work best on large images, and the 28x28 MNIST images are certainly not that. So maybe I am still stuck using parApply() and friends.

The Approach

In the 20,000 unseen samples (10K valid, 10K test), there are often examples of bad handwriting that we don’t get to see in our 50,000 training samples. Therefore I am most interested in generated bad handwriting samples, not nice neat ones.
I spent a lot of time experimenting with what imager can produce, and settled on generating these effects, each with a random element:
  • rotate
  • warp (make it “scruffier”)
  • shift (move it 1 pixel up, down, left or right)
  • bold (make it fatter - my code)
  • dilate (make it fatter - cimg code)
  • erode (make it thinner)
  • erodedilate (one or the other)
  • scratches (add lines)
  • blotches (remove blobs)
I also defined “all” and “all2” which combined most of them.
In the full code I create the image in an imager (cimg) object called im, then copy it to im2. Each subsequent operation is performed on im2. im is left unchanged, but can be referred to for the initial state.

Rotate

The code to rotate comes in two parts. Here is the first part:
needSharpen = FALSE
angle = rnorm(1, 0, 8)
if(angle < -1.0 || angle > 1.0){
  im2 = imrotate(im2, angle, interpolation = 2)
  nPix = (width(im2) - width(im)) / 2
  im2 = crop.borders(im2 , nPix = nPix)
  needSharpen = TRUE
  }
The use of rnorm(sd=8), means 68% of time it’ll be +/-8°, only 5% of the time more than +/-16°. If my goal was simply more training samples, I’d have perhaps used as smaller sd, and/or clipped to a maximum rotation of 10°. But, as mentioned earlier, I wanted more scruffy handwriting. The if() block is a CPU optimization - if rotation is less than 1° don’t bother doing anything.
The imrotate() command takes the current im2 and replaces it with one that is rotated. This creates a larger image. To see what is going on, try running this (self-contained) script (see the inline comments for what is happening):
library(imager)

# Make 28x28 "mid-grey" square
im <- as.cimg(rep(128, 28*28), x = 28, y = 28)

#Prepare to plot side-by-side
par(mfrow = c(1,2))

#Show initial square 28x28
plot(im)

#Show rotated square, 34x34
plot(imrotate(im, angle = 16))
The output is like this:
You can see the image has become larger, to contain the rotated version. (That image also shows how imager’s plot command will scale the colours based on the range, and that my choice of 128 could have been any non-zero number. When there is only a single value (128), it chooses a grey. After rotating we have 0 for the background, 128 for the square, so it does 0 as black, 128 as white.)
For rotating MNIST digits:
  • I want to keep the 28x28 size
  • All the interesting content is in the middle, so clipping is fine.
So I call crop.borders(), which takes an argument nPix saying how many pixels to remove on each side. If it has grown from 28 to 34 pixels square, nPix will be 3.

Feeling A Bit Vague…

Here is what one of the MNIST digits looks like rotated 30° at a time, 11 times.




In a perfect world, 12 rotations would give you exactly the image you started with. But you can see the effect of each rotation is to blur it slightly. If we did another lap, even your clever mammalian brain would no longer recognize it as a 4.
The author of the imager library provided a match.hist() function (see it, and the surrounding discussion, here: https://github.com/dahtah/imager/issues/17 ) which does a good (but not perfect) job. Here are the histograms of the image before rotation, after rotation, and then after match.hist:




You can judge the success from looking at the image on the right, or by seeing how the bars on the rightmost histogram match those of the leftmost one. (Yes, even though the bumps are very small, their height, and where they are, really matter!)

You better sharpen up…

You would have noticed the earlier rotate code set needSharpen to true. That is used by the following code. Some of the time it uses the imager library’s imsharpen(), some of the time match.hist(), and some of the time a hack I wrote to make dim pixels dimmer, and bright pixels brighter.
if(needSharpen){
  if(runif(1) < 0.8){
    im2 <- imsharpen(im2, amplitude = 55)
    }
  if(runif(1) < 0.3){
    im2 <- ifelse(im2 < 128, im2-16, im2)
    im2 <- ifelse(im2 < 0, 0, im2)
    im2 <- ifelse(im2 > 200, im2+8, im2)
    im2 <- ifelse(im2 > 150, im2+8, im2)
    im2 <- ifelse(im2 > 100, im2+8, im2)
    im2 <- ifelse(im2 > 255, 255, im2)
  }else{
    im2 <- match.hist(im2, im)
    }
  }

The Others

The other image modifications, listed earlier, use imwarp(), imshift(), pmax() with imshift() (for a bold effect), dilate_square(), and erode_square(). The blotches and scratches were done by putting random noise on an image, then using pmax() or pmin() to combine them.
If there is interest I can write another article going into the details.

Timings

On a 2.8GHz single-core, I recorded these timings to process 60,000 28x28 images. (It was a 36-core 8xlarge EC2 machine, but my R script, and imager (at the time), only used one thread.)
  • 304s to run “bold”.
  • 296s to run “shift”
  • 417s for warp
  • 447s for rotate
  • 517s to 539s for all and all2
warp and rotate need to do the sharpen step, which is why they take longer.

Summary

I made 20 files, so over 95% of my training data was generated. As you will discover if you read the book, this generated data gave a very useful boost in model strength, though of course dramatically increased learning-time due to having 1.2 million training rows instead of 50,000. An interesting property was that it found the sampled data harder to learn: I got lower error rates on the unseen valid and test data sets than on the seen training data. This is a consequence of my deliberate decision to bias towards noisy data and scruffy handwriting.
Generating additional training data is a good way to prevent over-fitting, but generating data that is still representative is always a challenge. Normally you’d check mean, sd, the distribution, etc. to see how you’ve done, and usually your only technique is to jitter - add a bit of random noise. With images you have a rich set of geometric transformations to choose from, and you often use your eyes to see how it has done. Though, as we saw from the image histograms, there are some objective techniques available too.

Wednesday, March 9, 2016

Comparison Of Three WebGL Libraries

Comparison Of Three WebGL Libraries

For many people, WebGL is a technology for making browser-based games, but I am more interested in all the other uses: data visualization, data presentation, making web sites look fantastic, new and interesting user experience (UX), etc. (I have spent many years using Flash for similar things.)

What is WebGL?

WebGL is an API to allow browsers to use a GPU to speed up 2D and 3D graphics; you write in a mix of JavaScript and a shader language. Because it is low-level and complex I recommend against writing in raw WebGL; use a library instead.

It is supported on just about any popular OS/browser combination, including working on tablets and mobile phones. Your device does not need to have a dedicated GPU to run WebGL.

What libraries are there?

There are actually quite a few choices, but for this article I will focus on the three libraries I have made (non-trivial) WebGL applications with:
The first two are fairly low-level (Babylon.JS has a few more abstractions built-in), meaning you will be thinking in terms of vertices, faces, 3D coordinates, cameras, lighting, etc. A 3D graphics background will be useful. Superpowers is higher-level, but more focused on games development. Some Blender (or equivalent) skills will also come in handy, whichever library you go for.

Three.js And Its Resources


Three.JS is the most established WebGL library, with some published books, many demos (http://threejs.org/, https://stemkoski.github.io/Three.js/ and others), even a Udacity course.

However it has scant regard for backwards compatibility, meaning that frequently the code in the published books (or the source code of older demos and tutorials) will not work with the latest library version. It has a relatively aggressive developer community, who think that having an uncommented demo of a feature counts as documentation.

It uses the MIT license (the most liberal open source - fine for commercial use), hosted on github; bug reports to github, but support questions to StackOverflow’s [three.js] tag.

Babylon.js And Its Resources

Babylon.JS is now two years old, and was developed at Microsoft in France, though it is open source (Apache license, so fine for commercial use). It is primarily intended for making games, but is flexible enough for other work.

Like Three.JS, it has plenty of demos, and again they are often undocumented. There is an active web forum; explanations and experiments there often link to the Babylon Playground, which is a live coding editor. There is also a very useful eight hour training video course (free), presented by the two David’s who created Babylon.JS. (There is a just-released book, https://www.packtpub.com/game-development/babylonjs-essentials, but I’ve not seen it, so cannot comment.)

Superpowers And Its Resources

Superpowers is a bit different: it is a gaming system, with its own IDE. It is very new, only released as open source (ISC license, which is basically the nice liberal MIT license again) in the middle of January 2016, though appears to have a year’s closed development behind it. (The IDE is cross-platform; it has been running nicely for me on Linux, I’ve not tried it on other platforms.)

Some of the initial batch of demos and games have been released on GitHub (kind of as open-source - the licenses are a bit vague, especially regarding re-use of assets), which has been my main source of learning. A few tutorials have also appeared recently (GameFromScratch.com, and https://itch.io/board/11494/tutorials-guides).

What grabbed my attention was the quality and completeness of the Fat Kevin game, combined with the fact that I could download all source and assets for it, to learn from. (The Discover Superpowers demo is similar, but simpler, so easier to learn from.)

Support is through forums on itch.io, with separate English and French sections. This requires yet another user account; I find it a shame they didn’t use StackOverflow, Github, or at least HTML5 Game Devs (as Babylon did). I’d not heard of itch.io (“an open marketplace for independent digital creators with a focus on independent video games”) before, but I think their choice tells you how they see Superpowers being used.

The coding language is TypeScript, basically JavaScript 1.8 plus types; it is worth specifying those types, as then the IDE’s helpful autocomplete can work. Note that Superpowers is closely tied to the IDE - you need to be clicking and dragging things; doing everything in code is not realistic (though this might just be the style of the initial few games). Superpowers is built on Three.JS, but I’m not seeing anything exposed, so I don’t think you can take a Three.JS example and use it directly.

Conclusion

Which library to choose? I suggest you try out the demos for each of these, and choose the library that has demos that cover all the things you want to do. If the choice comes down to Three.JS vs. Babylon.JS, and you cannot find a killer reason to choose one over the other, this is because it doesn’t really matter, they can each do 95+% of what the other can: follow your hunch, choose one or the other, and dive in and learn it.

Finally I should say that WebGL for website development is hard: your programmer(s) will need 3D experience, as will your graphic designer(s). If you are using RWD/mobile-first to target both mobile and desktop, it is even more complex. My company, QQ Trend Ltd. can help (contact me at dc [at] qqtrend.com).

Tuesday, June 22, 2010

fclib 0.4.20 release

I just put up another fclib release. Fclib is an ad hoc collection of php libraries, started about 10 years ago; the i18n-related functions are perhaps of most interest to people (with functions for Japanese, Chinese and Arabic language processnig).

This new release has some minor bug fixes, a new utf8 function (for truncating a string), and a new file, modify_images.inc, that is a high-level interface to use gd functions to make doing thumbnails, cropping, resizing and basic drawing edits. It can operate on one or a batch of images.

(The previous release was 11 months ago, described here)

Friday, June 11, 2010

jquery: dcookorg_annotator V0.3 released

I know, I'm behind in my jquery plugin announcements, so I'm going to do three in one.

First up is annotator, for annotating an image (or anything):
http://dcook.org/software/jquery/annotator/

You can have any number of annotations, can drag them anywhere, and can resize them. In the default mode each annotation has a text box appear underneath it for adding a comment.

New in version 0.3, and shown in the screenshot to the left, are some hooks for attaching a form to each annotation, so you can create a custom form for each one (or any other idea you have)!

MIT-license open-source, and tested in all of IE6, IE7, IE8, Safari, Firefox 3 and Firefox 3.5.




Next up is selector_aspect, which is for selecting part of an image while maintaining a fixed aspect ratio. E.g. useful for cropping an image.
http://dcook.org/software/jquery/selector_aspect/

A simple straightforward plug-in, with not many options.

Third is get_percentage_position, which is used to get the size of one div (or any DOM position) in terms of another div (or any DOM object), and also to get the relative position in the same terms. Not very glamorous, but useful in conjunction with the selector_aspect plugin, for instance. It is available here:
http://dcook.org/software/jquery/get_percentage_position/

Finally, a reminder that my first jquery plugin, to run a magnifier over an image is introduced here:
http://darrendev.blogspot.com/2010/04/jquery-plugin-image-magnifier.html

and available here:
http://dcook.org/software/jquery/magnifier/

and all my jquery plugins are being kept here:
http://dcook.org/software/jquery/

Monday, April 19, 2010

jquery plugin: image magnifier

I've just released my first fully-fledged and useful jquery plugin:

     http://dcook.org/software/jquery/magnifier/

It allows you to magnify an image and examine just one part of it.
Handles on the edge of the "magnifying glass" allow resizing it, which alters the degree of magnification.
The plugin is fully documented, with numerous usage examples.
It runs on all major browsers and operating systems.
Naturally it is open source (MIT).

Monday, July 27, 2009

New fclib and dcflash releases

A few days ago I updated my dcflash project on SourceForge. This is an open source (MIT license) actionscript library, with a data visualization emphasis:
http://sourceforge.net/projects/dcflash/

In particular there is now an AS3 version. In fact the AS2 and AS3 releases currently have only a couple of classes in common. The AS3 classes are mostly for loading images, movies, and setting up slideshows; none of the chart classes from the AS2 version (or the only partially ported and unreleased AS1 library) are there yet. As I say in the release notes, I'm open to offers to port them to AS3. (A look at http://dcook.org/work/charts/ will show you the kind of things in the AS1 and AS2 libraries.)

And now today I've put up a new release of fclib. This is an open source (MIT license again) PHP library. Fairly ad hoc, but naturally lots of internationalization-related classes. The new 0.4.19 release contains some arabic classes, but other than that it is mostly just tweaks and small improvements:
http://dcook.org/software/fclib/

Tuesday, June 24, 2008

AS3 vertical gradient fills

The examples for gradient fills are either squares, or rectangles with the gradient going from left to right:
http://livedocs.adobe.com/flex/3/langref/flash/display/Graphics.html#beginGradientFill()

So, how do you make a horizontal rectangle, with a vertical linear gradient? Everything I've tried (i.e. rotating it 90 degrees) gives me a solid colour. Here is one failure example (270 degree rotation, then translated back the height).

s.graphics.beginGradientFill(GradientType.LINEAR,
[0x666666,0x666666],[0.9,0.1],[0,255],
new Matrix(0,-1,1,0,height,0));
s.graphics.drawRect(x,y,width,height);


(I'm actually giving up and will instead make this 9 pixel high gradient (a fancy drop shadow effect) by drawing 9 lines! But, please, somebody must have some sample code they can share?)