Category Archives: Python

Set up a Data Science Ubuntu VM for Development

The steps below leverage information from this and this site, along with some modifications and sequencing before I could get my VM configured 100%.

STEP 1: Install VMware tools after creating Ubuntu VM

  • Extract VMwareTools.x.x.x-xxxx.tar.gz:
    • Power on the virtual machine.
    • Log in to the virtual machine using an account with administrator or root privileges. Select:
      • For Fusion: Virtual Machine > Install VMware Tools.
      • For Workstation: VM > Install VMware Tools.
      • For Player: Player > Manage > Install VMware Tools
    • Open the VMware Tools CD mounted on the Ubuntu desktop.
    • Right-click the file name that is similar to VMwareTools.x.x.x-xxxx.tar.gz, click Extract to, and select Ubuntu Desktop to save the extracted contents.
    • The vmware-tools-distrib folder is extracted to the Ubuntu Desktop.
  • Install VMware Tools in Ubuntu:
    • Open a Terminal window.
    • In the Terminal, run this command to navigate to the vmware-tools-distrib folder:
      • cd Desktop/vmware-tools-distrib
    • Run this command to install VMware Tools
      • sudo ./vmware-install.pl -d
  • Restart the Ubuntu virtual machine after the VMware Tools installation completes.

While Anaconda is a quick and easy way of getting Python libraries installed, I chose not to go with this option following some lessons learnt in package management and versioning during a recent Deep Learning project. The following steps therefore install all required packages manually and in a sequence that I found works best for me.
 

STEP 2: Install PRE-REQUISITE PYTHON LIBRARIES ON THE VM

$ sudo apt-get -y install git curl vim tmux htop ranger
$ sudo apt-get -y install python-dev python-pip
$ sudo apt-get -y install python-serial python-setuptools python-smbus

 

STEP 3: SET UP A CONTAINER FOR DATA SCIENCE ON THE VM

$ sudo pip install virtualenv
$ cd ~/
$ mkdir venv
$ pushd venv
$ virtualenv data-science
$ popd
$ source ~/venv/data-science/bin/activate
$ pip install --upgrade setuptools
$ pip install virtualenvwrapper
$ pip install cython
$ pip install nose

 

STEP 4: INSTALL PYTHON DATA SCIENCE LIBRARIES ON THE VM

$ sudo apt-get -y install python-numpy python-scipy python-matplotlib
$ sudo apt-get -y install ipython ipython-notebook
$ sudo apt-get -y installpython-pandas python-sympy python-nose
$ pip install jupyter
$ sudo apt-get -y install libfreetype6-dev libpng12-dev libjs-mathjax
$ sudo apt-get -y install fonts-mathjax libgcrypt11-dev libxft-dev
$ pip install matplotlib
$ sudo apt-get install libatlas-base-dev gfortran
$ pip install Seaborn && pip install statsmodels
$ pip install scikit-learn && pip install numexpr
$ pip install bottleneck && pip install pandas
$ pip install SQLAlchemy && pip install pyzmq
$ pip install jinja2 && pip install tornado
$ pip install nltk && pip install gensim
$ pip install tensorflow && pip install keras

 

STEP 5: CONFIGURE NETWORKING, SSH, FIREWALL & JUPYTER ON THE VM

  • On the host O/S, browse to the following setting via the VMware Workstation dropdown menu: VM -> Settings -> Network Adapter, and set this to bridged (automatic)
  • Log in to the VM and run the following to enable SSH
$ sudo apt-get install openssh-server
$ sudo ufw allow from host_ip to any port 22
$ nice jupyter notebook

Once the Jupyter notebook server runs in Terminal following the last command, make note of the token querystring parameter at the end of the notebook URL http://localhost:8888/?token=62gwj3k28djdkelsnab7293kkl0172hdu3ks9al7sj3kb1j

 

STEP 6: SSH TUNNEL FROM HOST WINDOWS O/S TO GUEST UBUNTU VM

Set up the following in Putty on Windows to enable SSH Tunnelling. The below screenshot has the settings for the Windows host

1

The below screenshot has the settings for the port forwarding, i.e.; the Jupyter server URL http://localhost:8000 on the Windows host O/S forwards to http://localhost:8888 on the Ubuntu VM

2

 

STEP 7: BROWSE TO JUPYTER NOTEBOOK ON HOST O/S

Browse to http://localhost:8000 on your guest O/S and verify that you can access the Jupyter notebook. When run for the first time, the web page will request a token. Enter the token you saved from Step 5.

If you have any problems with connectivity it is likely due to the Ubuntu Guest O/S firewall ufw, see Step 5 above for ufw configuration.

Advertisements