Breaking BI: January 2018

Monday, January 29, 2018

Azure Machine Learning Workbench: Utilizing Different Environments

Today, we're going to continue looking at the Azure Machine Learning (AML) Workbench. In the previous post, we created a new Classifying_Iris project and walked through the basic layout of the Workbench. In this post, we'll be walking through the rest of the code in the Quick CLI Reference section of the Dashboard. This will focus on running our code utilizing different environments.

One of the biggest advantages of the cloud for modern data science is the ability to endlessly scale your resources in order to solve the problem at hand. In some cases, like small-scale development, it's acceptable to run a process on our local machine. However, as we need more processing power, we need to be able to run our code in more powerful environments, such as Azure Virtual Machines or HDInsight clusters. Let's see how AML Workbench helps us accomplish this.

If you are new to the AML Workbench and haven't read the previous post, it is highly recommended that you do so. The rest of this post will build on what we learned in the previous one.

Here's the first piece of code we will run.

az ml experiment submit -c local iris_sklearn.py

This code runs the "iris_sklearn.py" Python script using our local machine. We'll cover exactly what this script does in a later post. All we need to know for now is that it's running on our local machine using Python. As we mentioned before, using the local machine is great if we're just trying to do something small without having to worry about connecting to remote resources. Here's the output.

OUTPUT BEGIN

RunId: Classifying_Iris_1509458498714

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 0.01
LogisticRegression(C=100.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.6792452830188679

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 1 37 12]
[ 0 4 46]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================

RunId: Classifying_Iris_1509458498714

OUTPUT END

Here's the next piece of code.

az ml experiment submit -c docker-python iris_sklearn.py

This code runs the same "iris_sklearn.py" script as before. However, this time it uses a Python-enabled Docker container. Docker is a technology that allows us package an entire environment into a single object. This is extremely useful when we are trying to deploy code across distributed systems. For instance, some organizations will wrap their applications in Docker containers, then deploy the Docker containers. This allows them to manage the applications much easier because they can update the master Docker container, and that update can be automatically deployed to all of the existing Docker containers. You can read more about Docker and containers here, here and here. Unfortunately, we're unable to install Docker on our machine. So, we'll have to skip this one. Let's take a look at the next piece of code.

az ml experiment submit -c docker-spark iris_pyspark.py

This code runs a new script called "iris_pyspark.py". We'll save the in-depth analysis of the code for a later post. To heavily summarize, PySpark is a way to harness the power of Spark's big data analytical functionality from within Python. This can be extremely useful when we want to analyze or model big data problems without using a remote Spark cluster. Let's take a look at the next piece of code.

az ml computetarget attach --name myvm --address <ip address or FQDN> --username <username> --password <pwd> --type remotedocker

az ml experiment prepare -c myvm

az ml experiment submit -c myvm iris_pyspark.py

This is where things start to get interesting. Previously, we were running everything on our local machine. This is great when data is small. However, it becomes unusable when we need to point to larger data sources. Fortunately, the AML Workbench allows us to attach to a remote virtual machine in cases where we need additional resources.

Another important thing to notice is that we were able to seemlessly run the same code on our local machine as we are running on the virtual machine. This means that we can develop on small samples on our local machine, then effortlessly run the same code on a larger virtual machine when we want to test against a larger dataset. This is exactly why containers are becoming so popular. They make it effortless to move code from a less powerful environment, like a local machine, up to a more powerful one, like a large virtual machine.

Another advantage of this ability is that we can now manage resource costs by limiting virtual machine usage. The entire team can share the same virtual machine, using it only when they need the extra power. We can even turn the vm off when we aren't using it, saving even more money. You can read more about Azure Virtual Machines here.

Let's move to the final piece of code.

az ml computetarget attach --name myhdi --address <ip address or FQDN of the head node> --username <username> --password <pwd> --type cluster

az ml experiment prepare -c myhdi

az ml experiment submit -c myhdi iris_pyspark.py

This code is expands on the same concepts as the previous one. In some cases, we have very large resource needs. In those cases, even a powerful virtual machine may not have enough juice. For those cases, we can use containers to deploy to an Azure HDInsight cluster. This will allow us to take the same code we ran on our local machine and execute it full-scale using the power of Hadoop. You can read more about HDInsight clusters here.

This post has opened our eyes to the power and flexibility that the AML Workbench can provide. While it's more complicated than using its AML Studio counterpart, the power and flexibility it provides via containers can make all the difference for some organizations. Stay tuned for the next post where we'll walk through the built-in data preparation capabilities of the Azure Machine Learning Workbench. Thanks for reading. We hope you found this informative.

Brad Llewellyn
Data Science Consultant
Valorem
@BreakingBI
www.linkedin.com/in/bradllewellyn
llewellyn.wb@gmail.com

Friday, January 26, 2018

Azure Machine Learning Webinars

As some of you may know, we've been giving Azure Machine Learning presentations for about a year now. As promised, we wanted to include links to the videos, as well as any supplemental material for the presentations.

Azure Machine Learning Studio: Making Data Science Easy(er)

https://www.youtube.com/watch?v=QMj_dL64xCA

There are no supplemental materials for this presentation.

Azure Machine Learning Studio: Four Tips from the Pros

https://www.youtube.com/watch?v=d25wmQ_dSQg

https://drive.google.com/open?id=12xodphzcK1Oy7TBDDSzHPXe8GIiBIgbr

R Code for Creating Interaction Features

<R CODE START>

#####################

## Import Data

#####################

ignore <- c("income")

dat1 <- maml.mapInputPort(1)

dat.full <- dat1[,-which(names(dat1) %in% ignore)]

dat2 <- maml.mapInputPort(2)

vars.dummy <- names(dat.full)

vars.orig <- names(dat2[,-which(names(dat2) %in% ignore)])

temp <- dat.full[,1]

dat.int <- data.frame(temp)

################################################

## Loop through all possible combinations

################################################

for(i in 1:(length(vars.dummy) - 1)){

for(j in 2:length(vars.dummy)){

var1 <- vars.dummy[i]

var2 <- vars.dummy[j]

base1 <- substr(var1, 1, regexpr("-", var1) - 1)

base2 <- substr(var2, 1, regexpr("-", var2) - 1)

if( base1 != base2 ){

val1 <- dat.full[,which(names(dat.full) %in% var1)]

val2 <- dat.full[,which(names(dat.full) %in% var2)]

dat.int[,length(dat.int) + 1] <- val1 * val2

names(dat.int)[length(dat.int)] <- paste(var1, " * ", var2)

}

###################

## Output Data

###################

dat.out <- data.frame(dat1, dat.int[,-1])

maml.mapOutputPort("dat.out");

<R CODE END>

SQL Code for Combining Tune Model Hyperparameters Results

<SQL CODE 1 START>

SELECT
'Two-Class Locally Deep Support Vector Machine - Binning' AS [Model Type]
,'LD-SVM Tree Depth' AS [Par 1 Name]
,[LD-SVM Tree Depth] AS [Par 1 Value]
,'Lambda W' AS [Par 2 Name]
,[Lambda W] AS [Par 2 Value]
,'Lambda Theta' AS [Par 3 Name]
,[Lambda Theta] AS [Par 3 Value]
,'Lambda Theta Prime' AS [Par 4 Name]
,[Lambda Theta Prime] AS [Par 4 Value]
,'Sigma' AS [Par 5 Name]
,[Sigma] AS [Par 5 Value]
,'Num Iterations' AS [Par 6 Name]
,[Num Iterations] AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]
FROM t1
UNION ALL
SELECT
'Two-Class Neural Network - Binning' AS [Model Type]
,'Learning rate' AS [Par 1 Name]
,[Learning rate] AS [Par 1 Value]
,'None' AS [Par 2 Name]
,0 AS [Par 2 Value]
,'Number of iterations' AS [Par 3 Name]
,[Number of iterations] AS [Par 3 Value]
,'None' AS [Par 4 Name]
,0 AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'LossFunction' AS [Par 7 Name]
,[LossFunction] AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]
FROM t2
UNION ALL
SELECT
'Two-Class Decision Jungle - Replicate' AS [Model Type]
,'Number of optimization steps per decision DAG layer' AS [Par 1 Name]
,[Number of optimization steps per decision DAG layer] AS [Par 1 Value]
,'Maximum width of the decision DAGs' AS [Par 2 Name]
,[Maximum width of the decision DAGs] AS [Par 2 Value]
,'Maximum depth of the decision DAGs' AS [Par 3 Name]
,[Maximum depth of the decision DAGs] AS [Par 3 Value]
,'Number of decision DAGs' AS [Par 4 Name]
,[Number of decision DAGs] AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]

FROM t3

<SQL CODE 1 END>

<SQL CODE 2 START>

SELECT
'Two-Class Locally Deep Support Vector Machine - Gaussian' AS [Model Type]
,'LD-SVM Tree Depth' AS [Par 1 Name]
,[LD-SVM Tree Depth] AS [Par 1 Value]
,'Lambda W' AS [Par 2 Name]
,[Lambda W] AS [Par 2 Value]
,'Lambda Theta' AS [Par 3 Name]
,[Lambda Theta] AS [Par 3 Value]
,'Lambda Theta Prime' AS [Par 4 Name]
,[Lambda Theta Prime] AS [Par 4 Value]
,'Sigma' AS [Par 5 Name]
,[Sigma] AS [Par 5 Value]
,'Num Iterations' AS [Par 6 Name]
,[Num Iterations] AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]
FROM t1
UNION ALL
SELECT
'Two-Class Neural Network - Gaussian' AS [Model Type]
,'Learning rate' AS [Par 1 Name]
,[Learning rate] AS [Par 1 Value]
,'None' AS [Par 2 Name]
,0 AS [Par 2 Value]
,'Number of iterations' AS [Par 3 Name]
,[Number of iterations] AS [Par 3 Value]
,'None' AS [Par 4 Name]
,0 AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'LossFunction' AS [Par 7 Name]
,[LossFunction] AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]
FROM t2
UNION ALL
SELECT
'Two-Class Decision Jungle - Bagging' AS [Model Type]
,'Number of optimization steps per decision DAG layer' AS [Par 1 Name]
,[Number of optimization steps per decision DAG layer] AS [Par 1 Value]
,'Maximum width of the decision DAGs' AS [Par 2 Name]
,[Maximum width of the decision DAGs] AS [Par 2 Value]
,'Maximum depth of the decision DAGs' AS [Par 3 Name]
,[Maximum depth of the decision DAGs] AS [Par 3 Value]
,'Number of decision DAGs' AS [Par 4 Name]
,[Number of decision DAGs] AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]

FROM t3

<SQL CODE 2 END>

<SQL CODE 3 START>

SELECT
'Two-Class Locally Deep Support Vector Machine - Min-Max' AS [Model Type]
,'LD-SVM Tree Depth' AS [Par 1 Name]
,[LD-SVM Tree Depth] AS [Par 1 Value]
,'Lambda W' AS [Par 2 Name]
,[Lambda W] AS [Par 2 Value]
,'Lambda Theta' AS [Par 3 Name]
,[Lambda Theta] AS [Par 3 Value]
,'Lambda Theta Prime' AS [Par 4 Name]
,[Lambda Theta Prime] AS [Par 4 Value]
,'Sigma' AS [Par 5 Name]
,[Sigma] AS [Par 5 Value]
,'Num Iterations' AS [Par 6 Name]
,[Num Iterations] AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]
FROM t1
UNION ALL
SELECT
'Two-Class Neural Network - Min-Max' AS [Model Type]
,'Learning rate' AS [Par 1 Name]
,[Learning rate] AS [Par 1 Value]
,'None' AS [Par 2 Name]
,0 AS [Par 2 Value]
,'Number of iterations' AS [Par 3 Name]
,[Number of iterations] AS [Par 3 Value]
,'None' AS [Par 4 Name]
,0 AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'LossFunction' AS [Par 7 Name]
,[LossFunction] AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]
FROM t2
UNION ALL
SELECT
'Two-Class Boosted Decision Tree' AS [Model Type]
,'Number of leaves' AS [Par 1 Name]
,[Number of leaves] AS [Par 1 Value]
,'Minimum leaf instances' AS [Par 2 Name]
,[Minimum leaf instances] AS [Par 2 Value]
,'Learning rate' AS [Par 3 Name]
,[Learning rate] AS [Par 3 Value]
,'Number of trees' AS [Par 4 Name]
,[Number of trees] AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]

FROM t3

<SQL CODE 3 END>

<SQL CODE 4 START>

SELECT
'Two-Class Decision Forest - Replicate' AS [Model Type]
,'Minimum number of samples per leaf node' AS [Par 1 Name]
,[Minimum number of samples per leaf node] AS [Par 1 Value]
,'Number of random splits per node' AS [Par 2 Name]
,[Number of random splits per node] AS [Par 2 Value]
,'Maximum depth of the decision trees' AS [Par 3 Name]
,[Maximum depth of the decision trees] AS [Par 3 Value]
,'Number of decision trees' AS [Par 4 Name]
,[Number of decision trees] AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]
FROM t1
UNION ALL
SELECT
'Two-Class Averaged Perceptron' AS [Model Type]
,'Learning rate' AS [Par 1 Name]
,[Learning rate] AS [Par 1 Value]
,'Maximum number of iterations' AS [Par 2 Name]
,[Maximum number of iterations] AS [Par 2 Value]
,'None' AS [Par 3 Name]
,0 AS [Par 3 Value]
,'None' AS [Par 4 Name]
,0 AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]
FROM t2
UNION ALL
SELECT
'Two-Class Support Vector Machine' AS [Model Type]
,'Number of iterations' AS [Par 1 Name]
,[Number of iterations] AS [Par 1 Value]
,'Lambda' AS [Par 2 Name]
,[Lambda] AS [Par 2 Value]
,'None' AS [Par 3 Name]
,0 AS [Par 3 Value]
,'None' AS [Par 4 Name]
,0 AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]

FROM t3

<SQL CODE 4 END>

<SQL CODE 5 START>

SELECT
'Two-Class Decision Forest - Bagging' AS [Model Type]
,'Minimum number of samples per leaf node' AS [Par 1 Name]
,[Minimum number of samples per leaf node] AS [Par 1 Value]
,'Number of random splits per node' AS [Par 2 Name]
,[Number of random splits per node] AS [Par 2 Value]
,'Maximum depth of the decision trees' AS [Par 3 Name]
,[Maximum depth of the decision trees] AS [Par 3 Value]
,'Number of decision trees' AS [Par 4 Name]
,[Number of decision trees] AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]
FROM t1
UNION ALL
SELECT
'Two-Class Logistic Regression' AS [Model Type]
,'OptimizationTolerance' AS [Par 1 Name]
,[OptimizationTolerance] AS [Par 1 Value]
,'L1Weight' AS [Par 2 Name]
,[L1Weight] AS [Par 2 Value]
,'L2Weight' AS [Par 3 Name]
,[L2Weight] AS [Par 3 Value]
,'MemorySize' AS [Par 4 Name]
,[MemorySize] AS [Par 4 Value]
,'None' AS [Par 5 Name]
,0 AS [Par 5 Value]
,'None' AS [Par 6 Name]
,0 AS [Par 6 Value]
,'None' AS [Par 7 Name]
,'' AS [Par 7 Value]
,[Accuracy]
,[Precision]
,[Recall]
,[F-Score]
,[AUC]
,[Average Log Loss]
,[Training Log Loss]
,[Precision] * [Recall] AS [Precision * Recall]

FROM t2

<SQL CODE 5 END>

<SQL CODE 6 START>

SELECT * FROM t1
UNION ALL
SELECT * FROM t2
UNION ALL

SELECT * FROM t3

<SQL CODE 6 END>

<SQL CODE 7 START>

SELECT * FROM t1
UNION ALL

SELECT * FROM t2

<SQL CODE 7 END>

<SQL CODE 8 START>

SELECT * FROM t1
UNION ALL

SELECT * FROM t2

<SQL CODE 8 END>

<SQL CODE 9 START>

SELECT * FROM t1
ORDER BY [AUC] DESC

<SQL CODE 9 END>

Brad Llewellyn
Data Science Consultant
Valorem
@BreakingBI
www.linkedin.com/in/bradllewellyn
llewellyn.wb@gmail.com

Monday, January 8, 2018

Azure Machine Learning Workbench: Getting Started

Today, we're going to take a look at one of the newest Data Science offerings from Microsoft. Of course, we're talking about the Azure Machine Learning (AML) Workbench! Join us as we dive in and see what this new tool is all about.

Before we install the AML Workbench, let's talk about what it is. The AML Workbench is a local environment for developing data science solutions that can be easily deployed and managed using Microsoft Azure. It doesn't appear to be related to AML Studio in any way. Throughout this series, we'll walk through all of the different things we can do with the AML Workbench. For today, we're just going to get our feet wet.

Now, we need to create an Azure Machine Learning Experimentation resource in the Azure portal. You can find complete instructions here. We will also include a Workspace and a Model Management Account. This appears to be free for the first two users. However, we're not sure whether they charge separately for the storage account. Maybe someone can let us know in the comments. Now, let's boot this baby up!

Azure Machine Learning Workbench

New Project

In the top-left corner, we can see the Workspace we created in the Azure portal. Let's add a new Project to this.

Create New Project

Now, we have to add the details for our new project. Strangely, the project name can't include spaces. We felt like we were past the point where names had to be simple, but maybe it's a Git thing. Either way, we'll call our new project "Classifying_Iris" and use the "Classifying Iris" template at the bottom of the screen. Let's see what's inside this project.

Project Dashboard

The first thing we see is the Project Dashboard. This is a great place to create (or read) quality documentation on exactly what the project does, links to external resources, etc.

iris_sklearn

Following the QuickStart instructions, we were able to run the "iris_sklearn.py" code. Unfortunately, it's not immediately obvious what this does. Fortunately, the Exploring Results section tells us to check the Run History. We can find this icon on the left side of the screen.

Run History

iris_sklearn Run History

This is pretty cool stuff actually. This view would let us know how long our code is taking to run, as well as what parameters are being input. This would be extremely helpful if we were running repeated experiments. In our case, it doesn't show much though.

Job History

If we click on the Job Name in the Jobs section on the right side of the screen, we can see a more detailed result set.

Run Properties

This is what we were looking for! This gives us all kinds of information about the run. This could be extremely useful for showing the results of an experiment to bosses or colleagues.

Logs

Further down the page, we see the Logs section. This is where we can access all the granular information we would need if we needed to debug a particular issue.

The next section of the instructions is the Quick CLI Reference. This gives us a bunch of code we can use to run these scripts from the Command Line (or Powershell). Let's open a new command line window.

Open Command Prompt

In the top-left corner of the window, we can select "Open Command Prompt" from the "File" menu.

Command Prompt

In the command prompt, we can copy the first line of code from the instructions.

pip install matplotlib

This code will install the Python library "matplotlib". This library contains quite a few functions for creating graphs in Python. You can read more about it here. Now that we have the library installed, let's copy the next line of code.

az login

This code will help us log the Command Line Interface into Azure. When we run this command, we get the following response.

To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code ######### to authenticate.

When we follow the instructions, we can log into our Azure subscription.

Azure Login

The next piece of code we need to run is as follows.

python run.py

This piece of code will run the "run.py" script from our project. We'll look at this script in a later post. For now, let's see the output from this script. Please note that the "run.py" script is iterative and creates a large amount of output. You can skip to the OUTPUT END header if you don't want to see the output.

OUTPUT BEGIN

RunId: Classifying_Iris_1509457170414

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 10.0
LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.6415094339622641

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 0 31 19]
[ 0 4 46]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457170414

RunId: Classifying_Iris_1509457188739

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 5.0
LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.6415094339622641

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 0 32 18]
[ 0 4 46]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457188739

RunId: Classifying_Iris_1509457195895

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 2.5
LogisticRegression(C=0.4, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.660377358490566

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 0 33 17]
[ 0 4 46]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457195895

RunId: Classifying_Iris_1509457203051

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 1.25
LogisticRegression(C=0.8, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.6415094339622641

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 1 33 16]
[ 0 5 45]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457203051

RunId: Classifying_Iris_1509457210237

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 0.625
LogisticRegression(C=1.6, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.660377358490566

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 1 36 13]
[ 0 5 45]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457210237

RunId: Classifying_Iris_1509457217482

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 0.3125
LogisticRegression(C=3.2, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.660377358490566

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 1 36 13]
[ 0 4 46]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457217482

RunId: Classifying_Iris_1509457225704

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 0.15625
LogisticRegression(C=6.4, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.6792452830188679

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 1 36 13]
[ 0 3 47]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457225704

RunId: Classifying_Iris_1509457234132

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 0.078125
LogisticRegression(C=12.8, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.6792452830188679

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 1 36 13]
[ 0 3 47]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457234132

RunId: Classifying_Iris_1509457242301

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 0.0390625
LogisticRegression(C=25.6, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.6981132075471698

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 1 37 12]
[ 0 3 47]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457242301

RunId: Classifying_Iris_1509457249742

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 0.01953125
LogisticRegression(C=51.2, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.6981132075471698

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 1 37 12]
[ 0 3 47]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================
RunId: Classifying_Iris_1509457249742

RunId: Classifying_Iris_1509457257076

Executing user inputs .....
===========================

Python version: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Iris dataset shape: (150, 5)
Regularization rate is 0.009765625
LogisticRegression(C=102.4, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Accuracy is 0.6792452830188679

==========================================
Serialize and deserialize using the outputs folder.

Export the model to model.pkl
Import the model from model.pkl
New sample: [[3.0, 3.6, 1.3, 0.25]]
Predicted class is ['Iris-setosa']
Plotting confusion matrix...
Confusion matrix in text:
[[50 0 0]
[ 1 37 12]
[ 0 4 46]]
Confusion matrix plotted.
Plotting ROC curve....
ROC curve plotted.
Confusion matrix and ROC curve plotted. See them in Run History details page.

Execution Details
=================

RunId: Classifying_Iris_1509457257076

OUTPUT END

Like we said before, we'll dig more into this code in a later post. For now, let's take a look at the run history again.

Run History 2

Now, we can see all of the runs that just took place. This is a really easy way to get a visual of what our code was accomplishing.

This seems like a good place to stop for today. At first glance, the AML Workbench is much more developer-oriented than its Studio counterpart. There's a ton of information here, but it's going to take some more time for us to get comfortable here. Stay tuned for the next post where we'll dig into the rest of the pre-built code focusing on executing our code in different environments. Thanks for reading. We hope you found this informative.

Brad Llewellyn
Data Science Consultant
Valorem
@BreakingBI
www.linkedin.com/in/bradllewellyn
llewellyn.wb@gmail.com

About Me

Brad is a Service Engineer on Microsoft's FastTrack for Azure Team in Charlotte, NC. Brad helps individuals and organizations leverage Analytics and Azure to revolutionize themselves and their industries. He has an M.S. in Statistics from the University of South Carolina, MCSE Certification in Data Management and Analytics, MCSE Certification in Cloud Platform and Infrastructure and various MCSA Certifications in Business Intelligence and Advanced Analytics. Brad is an active blogger at breaking-bi.blogspot.com. He is also an organizer for the Charlotte BI Group, a local PASS chapter in Charlotte, NC. You can connect with him on LinkedIn at https://www.linkedin.com/in/bradllewellyn and on Twitter @BreakingBI.