Use the Python Connector

This applies to: Visual Data Discovery

You can use the Python connector using arbitrary Python scripts as connection parameter.

The Python connector functions in raw data mode only and does not support push down of aggregations. Minimize the amount of data in your request using filter operations. This will increase performance speed and improve loading time for dashboards and visuals.

Python Script Conventions

Data sources are resolved from Python script using the following conventions:

Each function definition is a separate data source
Private functions (that start with an underscore _) are not resolved as a data source

Conventions for return values of functions include:

Pandas dataframes.
Dictionaries: The key is a string (column name) and values a list: return {"column1": [1, 2, 3], "column2": ["one", "two", "three"]}.
List of dictionaries: return [{"column1": 1, "column2": "one"}, {"column1": 2, "column2": "two"}].
List of lists: return [[1, 2], [3, 4]]. Each enclosed list resolves as a row.
List: return [1, 2, 3, 4]. Resolves as a single column with index 1.
Single value of any of supported types: return 1 OR return decimal.Decimal("3.14")

Regardless of return type, you must use uniform columns of the same size, containing the same value type, to prevent unexpected behavior.

Conversion Values

The connector applies the following rules when reading values:

Python Type	Connector Field Type
str	STRING
int	INTEGER
float	DOUBLE
decimal.Decimal	DOUBLE
datetime.date	DATE
datetime.datetime	DATE
arbitray object	STRING

Python Script Writing Tips

Avoid using top level statements.

Top level statement are executed in a single thread for all users. You can add function calls in a top level statement to validate a connection, but don't call functions when you save your script. See How the Connector Works.

Avoid overriding internal names.

Python is used within your environment to invoke data source functions and convert data. Since this code is executed in the same namespace as your scripts, if you try to override the names listed below, you may receive unexpected results. Avoid using the following names in your code:

__convert
__convert_list_of_dicts_to_dict_of_lists
f
__fork
__emulate
all_functions

To use this connector, we import these modules. Attempting to use these names for variables and functions may return unexpected results.

pandas
numbers
datetime
multiprocessing
queue
inspect
types

Your python script has limited access to the file system; the container is run as a non-root user. Edit access is still available on the folders below:

Folders with write access include:

/opt/zoomdata/logs
/opt/zoomdata/temp
/opt/zoomdata/lib
/opt/zoomdata/wrappers

How the Connector Works

Python code is executed using JEP to interpret Python code in the same process where the Java app is running. To circumvent some Global Interpreter Lock issues in Python, some queries use processed based parallelism. Based on the request type, the connector functions in one of two ways:

Interpret the script in the same process for validation and describe requests.
Interpret the script in the same process and invoke the function in a sub process for fetch data requests.

New sub processes are created by forking the Java process. Each request starts a new Python interpreter. Scripts are always interpreted first in one parent process. Limit top level statements to imports and function definitions for optimal performance.

Function invocation happens in a separate process, so global variables aren't available. For example:

x = 40

def side_effect():
	x = x + 1
	return {"result": [x]}

While the script is valid, using it as a connection parameter and attempting to set side_effect as an entity for the source will return an error such as:

UnboundLocalError: local variable 'x' referenced before assignment

Logging

Outputs of your Python scripts are not preserved. Statements such as print("Message") to write data to stdout or stderr will not be retained.

Python Script Conversion

Python script types conversion, in versions 23.3+ of Symphony, converts all values returned from public functions to Pandas Dataframe. To resolve the type, the connector relies on DataFrame.dtypes.kind.

The following table includes the conversion rules used:

Numpy kind (character code)	Numpy kind (type name)	Field Type
b	boolean	STRING
i	signed integer	INTEGER
u	unsigned integer	INTEGER
f	floating-point	DOUBLE
c	complex floating-point	STRING
m	timedelta	STRING
M	datetime	DATE
O	object	STRING
S	(byte-)string	STRING
U	Unicode	STRING