Category Archives: Configuration

Set up a Data Science Ubuntu VM for Development

The steps below leverage information from this and this site, along with some modifications and sequencing before I could get my VM configured 100%.

STEP 1: Install VMware tools after creating Ubuntu VM

  • Extract VMwareTools.x.x.x-xxxx.tar.gz:
    • Power on the virtual machine.
    • Log in to the virtual machine using an account with administrator or root privileges. Select:
      • For Fusion: Virtual Machine > Install VMware Tools.
      • For Workstation: VM > Install VMware Tools.
      • For Player: Player > Manage > Install VMware Tools
    • Open the VMware Tools CD mounted on the Ubuntu desktop.
    • Right-click the file name that is similar to VMwareTools.x.x.x-xxxx.tar.gz, click Extract to, and select Ubuntu Desktop to save the extracted contents.
    • The vmware-tools-distrib folder is extracted to the Ubuntu Desktop.
  • Install VMware Tools in Ubuntu:
    • Open a Terminal window.
    • In the Terminal, run this command to navigate to the vmware-tools-distrib folder:
      • cd Desktop/vmware-tools-distrib
    • Run this command to install VMware Tools
      • sudo ./vmware-install.pl -d
  • Restart the Ubuntu virtual machine after the VMware Tools installation completes.

While Anaconda is a quick and easy way of getting Python libraries installed, I chose not to go with this option following some lessons learnt in package management and versioning during a recent Deep Learning project. The following steps therefore install all required packages manually and in a sequence that I found works best for me.
 

STEP 2: Install PRE-REQUISITE PYTHON LIBRARIES ON THE VM

$ sudo apt-get -y install git curl vim tmux htop ranger
$ sudo apt-get -y install python-dev python-pip
$ sudo apt-get -y install python-serial python-setuptools python-smbus

 

STEP 3: SET UP A CONTAINER FOR DATA SCIENCE ON THE VM

$ sudo pip install virtualenv
$ cd ~/
$ mkdir venv
$ pushd venv
$ virtualenv data-science
$ popd
$ source ~/venv/data-science/bin/activate
$ pip install --upgrade setuptools
$ pip install virtualenvwrapper
$ pip install cython
$ pip install nose

 

STEP 4: INSTALL PYTHON DATA SCIENCE LIBRARIES ON THE VM

$ sudo apt-get -y install python-numpy python-scipy python-matplotlib
$ sudo apt-get -y install ipython ipython-notebook
$ sudo apt-get -y installpython-pandas python-sympy python-nose
$ pip install jupyter
$ sudo apt-get -y install libfreetype6-dev libpng12-dev libjs-mathjax
$ sudo apt-get -y install fonts-mathjax libgcrypt11-dev libxft-dev
$ pip install matplotlib
$ sudo apt-get install libatlas-base-dev gfortran
$ pip install Seaborn && pip install statsmodels
$ pip install scikit-learn && pip install numexpr
$ pip install bottleneck && pip install pandas
$ pip install SQLAlchemy && pip install pyzmq
$ pip install jinja2 && pip install tornado
$ pip install nltk && pip install gensim
$ pip install tensorflow && pip install keras

 

STEP 5: CONFIGURE NETWORKING, SSH, FIREWALL & JUPYTER ON THE VM

  • On the host O/S, browse to the following setting via the VMware Workstation dropdown menu: VM -> Settings -> Network Adapter, and set this to bridged (automatic)
  • Log in to the VM and run the following to enable SSH
$ sudo apt-get install openssh-server
$ sudo ufw allow from host_ip to any port 22
$ nice jupyter notebook

Once the Jupyter notebook server runs in Terminal following the last command, make note of the token querystring parameter at the end of the notebook URL http://localhost:8888/?token=62gwj3k28djdkelsnab7293kkl0172hdu3ks9al7sj3kb1j

 

STEP 6: SSH TUNNEL FROM HOST WINDOWS O/S TO GUEST UBUNTU VM

Set up the following in Putty on Windows to enable SSH Tunnelling. The below screenshot has the settings for the Windows host

1

The below screenshot has the settings for the port forwarding, i.e.; the Jupyter server URL http://localhost:8000 on the Windows host O/S forwards to http://localhost:8888 on the Ubuntu VM

2

 

STEP 7: BROWSE TO JUPYTER NOTEBOOK ON HOST O/S

Browse to http://localhost:8000 on your guest O/S and verify that you can access the Jupyter notebook. When run for the first time, the web page will request a token. Enter the token you saved from Step 5.

If you have any problems with connectivity it is likely due to the Ubuntu Guest O/S firewall ufw, see Step 5 above for ufw configuration.

Advertisements

Configure Theano & CUDA for Deep Learning on a Mac

STEP 1 – Install Theano
After installing Anaconda run the following command in Terminal

$ conda install theano pygpu


STEP 2 – Install the correct CUDA driver based on your model of Mac
Browse to this link to download the correct version of the CUDA driver for your Mac. Make sure to click on the driver link and choose the Supported Products tab to determine the Mac hardware that the particular driver supports.

For some older Macs, the version 6.5.45 driver is the best choice.


STEP 3 – Install the CUDA toolkit, but don’t upgrade the driver
Download the right version of the CUDA toolkit for your Mac from the archive here.

For some older Macs, version 6.5 toolkit is the best choice


STEP 4 – Install XCode
Download the XCode app from the Apple site and install it on your Mac


STEP 5 – Install XCode Command Line tools
Open a Terminal and run the following command

$ xcode-select --install

Choose to install the command line tools


STEP 6 – Check the cc compiler
Open a Terminal and run the following command

$ /usr/bin/cc --version

 

STEP 7 – Update your .bash_profile file
Add the following to .bash_profile


export LD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-6.5/lib/
export CUDA_ROOT=/Developer/NVIDIA/CUDA-6.5/
export THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32'

export PATH=/Developer/NVIDIA/CUDA-6.5/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-6.5/lib\
${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}

 

STEP 8 – Test the NVCC compiler
Run the following in Terminal


$ /Developer/NVIDIA/CUDA-6.0/bin/nvcc -V

 

STEP 9 – Switch to the Samples Directory 
Switch to Samples directory that were installed as part of the toolkit


cd /Developer/NVIDIA/CUDA-6.0/samples/

 

STEP 10 – Make the Samples
Run the below, one line at a time and make sure you don’t get any errors


make -C 0_Simple/vectorAdd
make -C 0_Simple/vectorAddDrv
make -C 1_Utilities/deviceQuery
make -C 1_Utilities/bandwidthTest

 

STEP 11 – Run the Samples
Switch to the relevant directory to run the compiled files


cd /Developer/NVIDIA/CUDA-6.5/samples/bin/x86_64/darwin/release

Make sure you get the relevant output when running the below line by line


./deviceQuery
./bandwidthTest

 

STEP 12 – Configure Theano to use the GPU
Create a file called .theanorc in your HOME directory and add the following to it


[blas]
ldflags =

[global]
floatX = float32
device = gpu

[nvcc]
fastmath = True

[gcc]
cxxflags = -ID:\MinGW\include

[cuda]
# Set to where the cuda drivers are installed.
root=/usr/local/cuda/

 


STEP 13 – Run the following to confirm that Theano now uses the GPU


from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')