Use the Python Connector

This applies to: Visual Data Discovery

You can use the Python connector using arbitrary Python scripts as connection parameter.

The Python connector functions in raw data mode only and does not support push down of aggregations. Minimize the amount of data in your request using filter operations. This will increase performance speed and improve loading time for dashboards and visuals.

Python Script Conventions

Data sources are resolved from Python script using the following conventions:

  • Each function definition is a separate data source
  • Private functions (that start with an underscore _) are not resolved as a data source

Conventions for return values of functions include:

  • Pandas dataframes.
  • Dictionaries: The key is a string (column name) and values a list: return {"column1": [1, 2, 3], "column2": ["one", "two", "three"]}.
  • List of dictionaries: return [{"column1": 1, "column2": "one"}, {"column1": 2, "column2": "two"}].
  • List of lists: return [[1, 2], [3, 4]]. Each enclosed list resolves as a row.
  • List: return [1, 2, 3, 4]. Resolves as a single column with index 1.
  • Single value of any of supported types: return 1 OR return decimal.Decimal("3.14")

Regardless of return type, you must use uniform columns of the same size, containing the same value type, to prevent unexpected behavior.

Conversion Values

The connector applies the following rules when reading values:

Python Type Connector Field Type
str STRING
int INTEGER
float DOUBLE
decimal.Decimal DOUBLE
datetime.date DATE
datetime.datetime DATE
arbitray object STRING

Python Script Writing Tips

Avoid using top level statements.

Top level statement are executed in a single thread for all users. You can add function calls in a top level statement to validate a connection, but don't call functions when you save your script. See How the Connector Works.

Avoid overriding internal names.

Python is used within your environment to invoke data source functions and convert data. Since this code is executed in the same namespace as your scripts, if you try to override the names listed below, you may receive unexpected results. Avoid using the following names in your code:

  • __convert
  • __convert_list_of_dicts_to_dict_of_lists
  • f
  • __fork
  • __emulate
  • all_functions

To use this connector, we import these modules. Attempting to use these names for variables and functions may return unexpected results.

  • pandas
  • numbers
  • datetime
  • multiprocessing
  • queue
  • inspect
  • types

Your python script has limited access to the file system; the container is run as a non-root user. Edit access is still available on the folders below:

Folders with write access include:

  • /opt/zoomdata/logs
  • /opt/zoomdata/temp
  • /opt/zoomdata/lib
  • /opt/zoomdata/wrappers

How the Connector Works

Python code is executed using JEP to interpret Python code in the same process where the Java app is running. To circumvent some Global Interpreter Lock issues in Python, some queries use processed based parallelism. Based on the request type, the connector functions in one of two ways:

  • Interpret the script in the same process for validation and describe requests.
  • Interpret the script in the same process and invoke the function in a sub process for fetch data requests.

New sub processes are created by forking the Java process. Each request starts a new Python interpreter. Scripts are always interpreted first in one parent process. Limit top level statements to imports and function definitions for optimal performance.

Function invocation happens in a separate process, so global variables aren't available. For example:

x = 40

def side_effect():
	x = x + 1
	return {"result": [x]}

While the script is valid, using it as a connection parameter and attempting to set side_effect as an entity for the source will return an error such as:

UnboundLocalError: local variable 'x' referenced before assignment

Logging

Outputs of your Python scripts are not preserved. Statements such as print("Message") to write data to stdout or stderr will not be retained.

Python Script Conversion

Python script types conversion, in versions 23.3+ of Symphony, converts all values returned from public functions to Pandas Dataframe. To resolve the type, the connector relies on DataFrame.dtypes.kind.

The following table includes the conversion rules used:

Numpy kind (character code) Numpy kind (type name) Field Type
b boolean STRING
i signed integer INTEGER
u unsigned integer INTEGER
f floating-point DOUBLE
c complex floating-point STRING
m timedelta STRING
M datetime DATE
O object STRING
S (byte-)string STRING
U Unicode STRING