TabPy: how do I know what models are available?

Pre…

TabPy allows deploying functions that can be called from Tableau side (more technical details are at Using Deployed Functions documentation page). With that feature, you may create a repository of models (this is what I call deployed functions in this post) with many entries with time. And at some moment the question of getting the list of all available models arises.

There are at least 3 ways to find out what models are available for a specific instance of TabPy. Let’s look at them.

Option 0. Index Page (updated Feb 02, 2020)

With TabPy v0.9.0 release you can just open your TabPy instance host:port in a browser and deployed models together with some other data will be displayed on the page.

Option 1. TabPy Logs

TabPy outputs some logging data in the console and the same log entries are preserved in a log file. The lines you are looking for have “Load endpoint: ” with a model name following it (endpoint here is just another name for a deployed function). In the screenshot below you can see models PCA, Sentiment Analysis, ttest, add, and anova being loaded on TabPy startup and available for being used in Tableau calculations.

Option 2. TabPy State File

On TabPy startup, it informs in the log output about state file location (example in the screenshot below). State file is used by Tornado web-server which TabPy is built around.

The log entry shows the location for the state file. In the file itself, there is [Query Objects Service Versions] section which lists all the deployed models. As you can see the same models you can find in TabPy logs are listed in the section.

Option 3. Use the API, Luke

In the Invoking TabPy API with Postman post, I explained how to use Postman to call TabPy API. And using the API is the best way for this scenario as well – implementation may (and will) change in the future (what is logged, how and where, how functions are deployed and preserved, where the state is preserved and so on) but the API hides all those details.

As documented for /endpoints method it returns a list of all deployed models. And Postman file in TabPy repo has the method in the collection. Simply use that method (specify your TabPy address) and you’ll see something like this:

Returned JSON lists all the deployed models and their properties.

Invoking TabPy API with Postman

TabPy server provides REST API which can be used outside of Tableau (e.g. for debugging purposes), more detail about it can be found here:

At the moment TabPy only supports API v1, but new versions with added or different functionality are possible in the future.

Additionally, read Using Python in Tableau Calculations for how parameters and data are passed to TabPy.

You can call the API from the command line using curl or similar tool, e.g.:

c:\Users\TabPyUser>curl -X POST http://my-tabpy-server:9004/evaluate -d "{\"data\":{\"_arg1\":[1, 2, 3],\"_arg2\":[3, -1, 5]},\"script\":\"return [x + y for x, y in zip(_arg1, _arg2)]\"}"
[4, 1, 8]

But as you can see it is a lot of typing, hard to read and impossible to see if anything is wrong in the address, headers, request body and so on.

Postman is the tool created just for the purpose of invoking REST API. And TabPy has Postman file in the repository with all the supported methods.

Install Postman, download https://github.com/tableau/TabPy/blob/master/misc/TabPy.postman_collection.json file in it and you have UI ready for making TabPy calls:

After loading the file open TabPy collection and click on a method you want to exercise. Replace {{endpoint}} variable with your TabPy server name and port or define the variable to reuse it and click Send button.

Note some methods are GET and some are POST – for POST methods you can specify request body as shown on the screenshot above.

If your TabPy is configured for secure connection simply use https:// instead of http:// in the URL.

In case TabPy is configured with authentication (read how to configure TabPy authentication at https://github.com/tableau/TabPy/blob/master/docs/server-config.md#authentication) use Auth tab for request as shown at screenshot below.

Note TabPy only supports Basic Auth method at the moment as specified on TabPy Authentication page.

Out-of-the-box models in TabPy

Did you know TabPy has some data science models ready to use which are installed in your Python environment as a part of TabPy package?

But first what are TabPy models? Those simply are Python functions “preserved” in TabPy and available for being used in Tableau scripts. Here’s an explanation for how to deploy a function into TabPy – Deploying a Function. And this page shows how to use deployed functions in Tableau calculations – Using Deployed Functions.

Mentioned above documentation and examples should be enough for you to start on creating, deploying and using TabPy models (or deployed functions if you prefer that term).

As I mentioned above TabPy ships with some models which only need to be deployed. And deployment for them is as easy as running tabpy_deploy_models command in your terminal window after installing TabPy package. All the models are deployed at once. Remember you need TabPy running for the models to be deployed.

The following models are available at the moment I am writing this text:

  • Principal Component Analysis (PCA).
  • Sentiment Analysis.
  • T-Test.
  • Analysis of Variants (ANOVA).

The explanation for each of the models and how to invoke them in Tableau calculations can be found at the Predeployed Functions page.

Python packages: tabpy, tabpy_server and tabpy_tools

TabPy is an open-source Python web server which is used for extending Tableau calculations or Data Prep data processing with Python scripts. You can read more about how to use TabPy with Tableau on Using Python in Tableau Calculations page or on Tableau blog Building advanced analytics applications with TabPy, or even with Building Data Science Applications with TabPy Video Tutorial.

When you just start using TabPy and search for information on how to install, configure and use it you may find a lot of articles and blog posts that are somewhat contradictory to each other and outdated. And you will find people explaining how to install and use TabPy, deploy models to it and use its other features mentioning tabpy, tabpy_server, tabpy_tools and maybe even some other buzzwords. So what are those and what are the actual steps?

First, let me tell you the most recent and updated steps for how to install TabPy can be found on the project GitHub page TabPy Installation Instructions. As you can see on the page installing TabPy as simple as running this one command: pip install tabpy. With the command, you now have the latest approved TabPy package installed in your Python environment. And to run it simply execute tabpy command.

Now, what are those tabpy_server and tabpy_client you may find mentions about? Those are old versions of TabPy when it was split into 2 packages. Neither of those is recommended to be used anymore. And if you have them installed you should delete them.

To give you more information long ago before TabPy became a package it was built as an application (tabpy_server) and a library (tabpy_client) which were distributed as source code via GitHub. But those days gone and you don’t need to clone GitHub repository, configure your environment variables, run setup/startup scripts and perform other black magic steps anymore.

Summary: you only need tabpy package, ignore all the posts and articles where tabpy_server and/or tabpy_tools are mentioned as obsolete.