Manage the HDFS Connector

This applies to: Visual Data Discovery

Symphony offers connection to Cloudera’s open source Hadoop platform - Cloudera Distributed Hadoop (CDH)*. CDH provides unified batch processing, interactive SQL, interactive search, and role-based access controls. In addition, it offers enterprise-grade continuous availability. Specifically, Symphony connects to CDH’s fault‐tolerant storage system called the Hadoop Distributed File System (HDFS).

The Symphony HDFS connector uses its own embedded Apache Spark functionality. It supports Apache Spark 2.2 in its implementation.

By default, the HDFS connector is not included with Symphony. You or your administrator need to download and enable it before configuring the connector.

After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Manage Visual Data Discovery Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and visuals from your data. See Create Data Discovery Dashboards.

Feature Support

Connector support for specific features is shown in the following table.

Key: Y - Supported; N - Not Supported; N/A - not applicable

Feature	Supported?
Admin-Defined Functions	Y
Box Plots	Y
Custom SQL Queries	Y
Derived Fields (Row-Level Expressions)	Y
Distinct Counts	Y
Fast Distinct Values	N/A
Group By Multiple Fields	Y
Group By Time	Y
Group By UNIX Time	Y
Histogram Floating Point Values	Y
Histograms	Y
Kerberos Authentication	Y
Last Value	Y
Live Mode and Playback	Y
Multivalued Fields	N/A
Nested Fields	N/A
Partitions	N/A
Pushdown Joins for Fusion Data Sources	N
Schemas	Y
Text Search	N/A
TLS	N
User Delegation	N
Wildcard Filters	Y
Wildcard Filters, Case-Insensitive Mode	Y
Wildcard Filters, Case-Sensitive Mode	Y