Manage the Spark SQL Connector

This applies to: Visual Data Discovery

The Symphony Spark SQL connector lets you access the data available in Spark SQL databases using the Symphony client. The Symphony Spark SQL connector supports Spark SQL versions 2.3 through 3.0.

Before you can establish a connection from Symphony to Spark SQL storage, a connector server needs to be installed and configured. See Manage Connectors and Connector Servers for general instructions and Connect to Spark SQL for details specific to the Spark SQL connector.

After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Manage Visual Data Discovery Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and visuals from your data. See Create Data Discovery Dashboards.

Feature Support

Connector support for specific features is shown in the following table.

Key: Y - Supported; N - Not Supported; N/A - not applicable

Feature Supported? Notes
Admin-Defined Functions Y  
Box Plots Y
Custom SQL Queries Y If you need to access a BigQuery partition, explicitly include an alias for the built in partition column in your select clause, such as select *, _PARTITIONTIME as pt from projectId.datasetId.tableId.
Derived Fields (Row-Level Expressions) Y
Distinct Counts Y
Fast Distinct Values N/A  
Group By Multiple Fields Y  
Group By Time Y  
Group By UNIX Time Y  
Histogram Floating Point Values Y  
Histograms Y  
Kerberos Authentication Y
To enable Kerberos authentication, see Connect to Spark SQL Sources on a Kerberized HDP Cluster.
Last Value Y  
Live Mode and Playback Y  
Multivalued Fields N/A
Nested Fields N/A  
Partitions Y  
Pushdown Joins for Fusion Data Sources Y  
Schemas Y  
Text Search N/A
TLS N  
User Delegation N
Wildcard Filters Y  
Wildcard Filters, Case-Insensitive Mode Y
Wildcard Filters, Case-Sensitive Mode Y

Connect to Spark SQL

When establishing a connection to Spark SQL, you need to provide the following information when setting up the partition settings.

Configure the partition settings. For the partitioned fields you can select one of the following options:

  • No
  • Date - this option is available for the Time field type. If you select this option, the list of the partitioned columns will be displayed in the Configure column.

Numeric and time-based fields can be edited using the Fields tab:

  • Numeric type Number - ability to select a default aggregation function
  • Time fields - ability to define the default time pattern and granularity; if the time field provides granularities of hour, minute and second, then a time zone label may be applied.

When you create a data source, the specific number of distinct values for the attribute fields are saved in Symphony depending on the data sample from your data set. You can filter the data on your visual by these values. While editing a data source, if you want to use all distinct values in the filter (that is from whole data source), select Refresh in the Statistics column.