未加星标

Advanced analytics with Python and Tableau 10.1 integration

字体大小 | |
[开发(python) 所属分类 开发(python) | 发布者 店小二03 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

After introducing R capabilities in Tableau 8.1, the new Tableau 10.1 now comes also with support for python. This is a great news especially for data scientists, who use the reports to visualize results of some more sophisticated analytical processes. Such reports can now bring the analytics much closer to the end users, while preserving the given level of user-friendliness.


Advanced analytics with Python and Tableau 10.1 integration

In this post I am using a simple modelling example to describe how exactly the integration of Tableau and Python works.

Technical Setting & Basic Functionalities

While R integration used Rserve and you only needed to create a running Rserve session to enable a connection from Tableau, the Python integration requires you to install and set-up TabPy Server (installation instructions from the Tableau github repository can be found here ). The set-up contains instructions on installing TabPy, installing Python 2.7 with Anaconda and connecting Tableau.

In terms of functionalities the python integration is very similar to R integration (if we take into account the script calls from Tableau), but because TabPy-client enables you to deploy endpoints, it is far more powerful. Before going through a few examples, how we can use Python in Tableau, just a quick remark that it is not possible to use the python integration at the same time with R integration. There is only one connector, so in case you want to use both, you would need to connect directly from R to python or vice-versa (for example using the rpy2 package). Furthermore, Tableau public does not currently support the TabPy functionalities.

Let’s show a few examples now, which will contain:

passing data to python fitting a scikit-learn model using a fitted model to predict saving and loading a model passing user-defined parameters to python

We will use the iris dataset that is already included in scikit-learn and create a model using the Naive Bayes estimator. The dataset contains 5 columns (sepal width, sepal length, petal width, petal length and the category). First, let’s just do a visualization of the iris dataset using only two of the 4 attributes and color coding the category (Iris type).


Advanced analytics with Python and Tableau 10.1 integration

Now we have the required data ready to start with calling the python functionalities. However, in order for the calculations to be done for each individual row in the dataset, we have to make sure that we are not working with aggregated measures in Tableau.


Advanced analytics with Python and Tableau 10.1 integration

To use python functionalities we have to create a new calculated field and define SCRIPT_XX, where XX defines the return data type. The available options are BOOL, INT, REAL, STR. There are some rules/specifics, which you have to consider when calling python:

only 1 calculated field can be returned from a calculation. So if we want to have multiple values returned, we need to create a delimited string and define other calculated fields to access the desired content. We have to return the same number of records as the input number of rows (ie. if the calculation was executed for all 20 records at once then the return vector must contain 20 elements) Python script calls are table calculations, so be careful what dimension is being used for the calculation! Because for each partition Tableau does an individual call to TabPy. Model creation

For now let us create a Naive Bayes model from the input data and predict the same data using the fitted model.

SCRIPT_REAL("
import numpy as np
from sklearn.naive_bayes import GaussianNB
# create the model
model = GaussianNB()
# transform input data
data_x = np.transpose(np.array([_arg1, _arg2, _arg3, _arg4]))
data_y = np.array(_arg5)
# fit the model
model.fit(data_x, data_y)
# predict the category for input data
predicted_category = model.predict(data_x)
# transform output
return list(np.round(predicted_category, decimals=2))
", ATTR([Petal Length]),
ATTR([Petal Width]),
ATTR([Sepal Length]),
ATTR([Sepal Width]),
ATTR([Category]))

_argX defines the individual input arguments (columns from Tableau sheets). In this example all of the input arguments are vectors. We have to use the “ATTR()”, because SCRIPT_XX requires some sort of aggregation function although we are not working with aggregated data. Also for a call to Python to be successful, the script requires the return argument.

To visualize the output, we will compare the original categories with the predicted categories from the model.


Advanced analytics with Python and Tableau 10.1 integration

We can see that the Python script executed successfully and we misclassified some observations (6 out of the 150), which is exactly the same result we get in Python.

Save, load and deploy

Since for most applications we don’t want to fit the model and then predict the model for the same data, we will now save the model and only load it for predictions. This is identical as we would proceed in pure Python.

import pickle
# fit the model
model.fit(data_x, data_y)
# save the model
pickle.dump(model, open('C:\\temp\\model', 'w'))
# load the model
model = pickle.load(open('C:\\temp\\model', 'r'))

Remember that for some models it might be more beneficial to save them using joblib.dump and joblib.load as described in the scikit-learn documentation.

However, this method is only for testing/playing around, while for production use you should use deployed functions as mentioned in the Tableau client documentation and define them as endpoints. When you deploy an individual function that contains a model, Tableau automatically saves the model definition (using pickle)to be used in the function execution. This should be the preferred way of deploying functions that are supposed to be exposed to end-users. Another alternative would be to create your own package.

The purpose of both methods is to simplify calls in calculated fields to:

# with import
SCRIPT_REAL("
# load package
import custom_function
# execute function
return custom_function(_arg1)
",ATTR(First Argument))
# with Tableau deployed function
SCRIPT_REAL("
# query the exposed endpoint
return tabpy.query('endpoint_name',_arg1)['response']"
",ATTR(First Argument))

The calculated field contains a simple function call and you can make changes to the underlying model/function without the need to modify each dashboard (with the exception of changes in input parameters of the function).

Custom parameters

In case you want to allow the end-users to change input parameters of a certain function to be able to visualize various scenarios (eg. best case, worst case), we can create standard Tableau parameters and add them to the function call.

SCRIPT_REAL("
# load package
import custom_function
# execute function
return custom_function(_arg1, _arg2)
",ATTR([First Argument]),
,[Custom Parameter])

There are two things you should be careful about when using custom parameters:

The custom parameter is passed to Python as a vec

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

主题: Python
分页:12
转载请注明
本文标题:Advanced analytics with Python and Tableau 10.1 integration
本站链接:http://www.codesec.net/view/534212.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(21)