Tuesday, August 9, 2016

H2O Upgrade: Detailed Steps

H2O Upgrade: Detailed Steps

I wanted to upgrade both R and Python to the latest version of H2O (as of Aug 9th 2016). Here are the exact steps, and I think you will find them relevant even if you only need to update one or the other of those clients. Remember if following this at a later date, that you should follow the spirit of it, rather than copy-and-pasting: all the version numbers will have changed. This was with Linux Mint, but it should apply equally well to all other Linux distros.
Make sure you first close any R or Python clients that are using the H2O library; and separately shutdown H2O if it is still running after that.

The First Time

The below instructions are all for upgrading, which means I know I have all the dependencies in place. If this is your first H2O install, well I’d first recommend you buy my new book: Practical Machine Learning with H2O, published by O’Reilly. (Coming really soon, as I type this! Let me know if you are interested, and I'll send the best discount code I can find.)
As a quick guide, from R I recommend you use CRAN
install.packages("h2o")
and from Python I recommend you use pip:
pip install h2o
Both these approaches get all the dependences for you; you may end up back a version or two from the very latest, but it won’t matter.

The Download

cd /usr/local/src/
wget http://download.h2o.ai/versions/h2o-3.10.0.3.zip
unzip h2o-3.10.0.3.zip
It made a “h2o-3.10.0.3” directory, with python and R sub-directories.

R

I installed the h2o the first time as root, so I will continue to do that, hence the sudo:
cd /usr/local/src/h2o-3.10.0.3/R/
sudo R
Then:
remove.packages("h2o")
install.packages("h2o_3.10.0.3.tar.gz")
Then ctrl-d to exit.

Python

cd /usr/local/src/h2o-3.10.0.3/python/
sudo pip uninstall h2o
sudo pip install -U h2o-3.10.0.3-py2.py3-none-any.whl 
(The -U means upgrade any dependencies; the first time I forgot it, and ended up with some very weird errors when trying to do anything in Python.)

The Test

I started RStudio, and ran:
library(h2o)
h2o.init(nthreads=-1)
as.h2o(iris)
I then started ipython and ran:
import h2o,pandas
h2o.init()
iris = h2o.get_frame("iris")
print(iris)
As well as making sure the data arrived, I’m also checking the h2o.init() call in both cases said the cluster version was “3.10.0.3”.

AWS Scripts

If you use the AWS scripts (https://github.com/h2oai/h2o-3/tree/master/ec2) and want to make sure EC2 instances start with exactly the same version as you have installed locally, the file to edit is h2o-cluster-download-h2o.sh. (If not using those scripts, just skip this section.)
First find the h2oBranch= line and set it to “rel-turing” (notice the “g” on the end - there is also a version without the “g”!). Then comment out the two curl calls that follow, and instead set version to be whatever you have above, and build to be the last digit in the version number. So, for 3.10.0.3, I set:
h2oBranch=rel-turing

#echo "Fetching latest build number for branch ${h2oBranch}..."
#curl --silent -o latest https://h2o-release.s3.amazonaws.com/h2o/${h2oBranch}/latest
h2oBuild=3

#echo "Fetching full version number for build ${h2oBuild}..."
#curl --silent -o project_version https://h2o-release.s3.amazonaws.com/h2o/${h2oBranch}/${h2oBuild}/project_version
h2oVersion=3.10.0.3
The rest of that script, and the other EC2 scripts, can be left untouched.

Summary

Well that was easy! No excuses! Having said that, I recommend you upgrade cautiously - I have seen some hard-to-test-for regressions (e.g. model learning no longer scaling as well over a cluster) when grabbing the latest version.