JavaScript Loaders

Friday, February 21, 2020

Survey: What Degree is Best for Data Science?



TL;DR
Just answer 4 questions about best degree for Data Science here:
https://www.surveymonkey.com/r/7FGGWS7

No doubt asking the question "What's the best degree for Data Science?" one won't expect unified or even a few opinions (unless everything I know about people practicing data science is all wrong). Stephanie Glen analyzed various sources on the topic to show just that: 


Source: Best Degree for Data Science (in One Picture)
https://www.datasciencecentral.com/profiles/blogs/best-degree-for-data-science-in-one-picture

Trying to replicate her analysis with answers from data science practitioners I constructed 1-minute anonymous survey asking the same: https://www.surveymonkey.com/r/7FGGWS7
There you will find 4 questions: 2 on what degree you have and 2 on what degree you recommend. After collecting 100+ responses I will share results, thank you for participating!

Tuesday, February 11, 2020

H2O.ai Academic Program for Professors and Students: Quick Start with Driverless AI and Paperspace

If you are a professor teaching or a student enrolled in a machine learning program or non-technical program with a machine learning hands-on lab becoming a member of the H2O.ai Academic Program will get you free access to non-commercial use of software license for education and research purposes. In November 2018 H2O.ai (my employer) made its ground-breaking automated machine learning  (AutoML) platform Driverless AI available to academia for free. 

What Does Driverless AI Do?

H2O.ai defines Driverless AI as  
"an artificial intelligence platform for automatic machine learning"
To find out how Driverless AI automates machine learning activities into integral and repeatable workflow seamlessly encompassing feature engineering, model validation, hyper-parameter tuning, model selection and ensembles, custom recipes for transformers, models and scorers, automated model documentation, and finally model deployment visit User Guide. Not to forget MLI (Machine Learning Interpretability) module that offers tools for both white and black box model interpretability, model debugging, disparate impact analysis, and what-if (sensitivity) analysis.


H2O.ai Academic Program

To sign up to the H2O Academic Program launched back in October of 2018 start by filling out this form given following conditions hold true:
  • intended use is non-commercial for education and research purposes only and
  • person belongs to higher education institution or is a student currently enrolled in a higher education degree program and
  • if a student then academic status can be verified by sending a photo of your current student ID to academic@h2o.ai (required).
Upon approval H2O.ai will issue a free license for Driverless AI for non-commercial use only. While waiting to be approved apply for access to H2O.ai Community Slack channel here and don't forget to join #academic).

Driverless AI Installation Options

After receiving a license key, follow installation instructions for Mac OS X or Windows 10 Pro (via WSL Ubuntu option is highly preferred) to run Driverless AI on your workstation or laptop. While such an approach suffices for small datasets  serious problems demand installing and running Driverless AI on modern data center hardware with multiple CPUs and one or several GPUs for best results.

There are several economical cloud providers for such a solution. For general guidelines and instructions for native DEB installation on Linux Ubuntu see here. Steps below can be traced back to this documentation.

Why Paperspace

Paperspace offers a robust choice of configurations to provision and run Linux Ubuntu VMs with single GPU (no multi GPU systems available). The pricing appears competitive to suit thrifty academic budget by starting at around $0.50/hour for GPU systems with 30G of memory that should comfortably host Driverless AI. It also features a simple streamlined interface to deploy and manage VMs.


Step-by-Step Guide

Spinning up Linux VM

1. Create Paperspace Account

Start with creating account at paperspace.com:


2. Create a Cloud VM

After successfully creating account proceed to create a cloud VM:


3. Start Adding New Machine

Under Core -> Compute -> Machines on the left select (+) to add new machine:


4. Machine Location

Choose region closer to your location - in my case it was "East Coast (NY2)":


5. Choose Type Operating System

Scroll down to "Choose OS" and click on "Linux Templates":


6. Choose OS Version

Keep default Ubuntu 16.04 server image:


7. Pick Machine Type (How Much to Pay)

Scroll down to choose machine profile (keep hourly rate): for VM pick type "P4000" or more expensive machine type with GPU, while for CPU only system pick "C6" or higher (in case this instance type is not enabled instructions to enable it should pop up):
 

8. Enable Public IP

Scroll down to "Public IP" to enable it while keeping other settings unchanged except maybe for "Storage" and "Auto-Shutdown". While 50G of storage suffices for many applications if you plan on using larger datasets or create massive numbers of models increase your storage accordingly: allocate at least 20 times storage as the largest dataset you plan to use. Lastly change auto-shutdown timeout according to your needs:


9. Apply 5NXWB5R Promo Code with Payment

Scroll down to payment to enter credit card information, enter promotion code 5NXWB5R to apply (Paperspace should credit your account $10.00) before finally creating VM with "Create Your Paperspace" button:


10. Creating VM

While new system initializes its state appears as "Provisioning":


11. Wait for System to Start

Wait a minute or two until system state changes to "On/Ready" and click on small gear inside the box in upper right corner to move to system console:


12. System Console

System console displays detailed information about VM including public IP address assigned to your VM:


13. Notification from Paperspace

Next find email from Paperspace with system password:
With public IP address and password you can ssh (on Mac OS X or Linux) or connect using putty (on Windows) to Paperspace VM and install Driverless AI software following steps for vanilla Ubuntu system. This example continues with this install to show all steps in detail. 

Installing Prerequisites

14.  Terminal Access to VM

ssh to the Paperspace VM from Mac OS terminal using Public IP and password as shown in steps 12 and 13 (ssh below is used on Mac OS X - for other OSes adjust accordingly):




15. Change paperspace assigned password (optional):





16. Install core packages (optional):



17. Add support for NVIDIA GPU libraries (CUDA 10):


18. Install other prerequisites and open port Driverless AI listens to:



Installing Driverless AI 


19. H2O Download Page

Leave (do not close) ssh terminal for a browser and locate H2O.ai download page. Choose latest version of Driverless AI product:

17. Download Link

Go to Linux (X86) tab and then right-click on the "Download" link for DEB package to copy link location:

18. Back to Terminal Access

Return to ssh terminal session connected to paperspace VM. If session timed out or became inactive repeat step 14.

19. Download and install Driverless AI DEB package:



20. Install Completed

After installer successfully finishes it displays following helpful information:


21. Start Driverless AI

Check that Driverless AI is installed but inactive and then start it and check yet again its status and logs:


22. Web Access

Open browser and enter URL with public IP address like this: http://209.51.170.97:12345 (ignore 127.0.0.1 in screenshot as I was using port forwarding when taking them):


23. License Agreement

Scroll down to accept license agreement:


24. Login to Driverless AI

Driverless AI display login screen - enter credentials h2oai/h2oai:


25. Activate License

Driverless AI prompts to Enter License to activate software license:


26. License Key

Enter Driverless AI license key received by enrolling to H2O.ai Academic Program and press Save:


27. All Done

Now Driverless AI platform is fully enabled to help in your research or studies or both: 


Resources