Connect to Hive Sources on A Kerberized HDP Cluster

<TocEntry Title="[[[Missing Linked File System.LinkedTitle]]]" Link="/Content/topics/connectors/sheets/connecting-to-google-sheets.html" />

A secure Hortonworks Data Platform (HDP) cluster uses Kerberos authentication to validate and confirm access requests. You can set up Symphony to connect to the secure HDP cluster using the following instructions.

Prepare the Hive cluster

  • Kerberos authentication requires precise time correspondence on all instances to work properly. You need to enable the Network Time Protocol service in your network. For more information, access the topic Using the Network Time Protocol to Synchronize Time .

Configure Symphony Microservices

Obtain Kerberos Credentials

Each microservice must have its own unique identifier called a principal. Perform the following steps:

  1. Install the Kerberos client on the CentOS or Ubuntu machine where the Symphony server resides.

  2. Generate the Kerberos principal and corresponding keytab for the Symphony microservice. Before you proceed, make sure that:

    • The Symphony microservice is running on a node with proper Kerberos configuration: /etc/krb5.conf or similar location for your Linux distribution.
    • The Kerberos realm on your environment is the same as the realm specified in the kdc.conf file from the Hive server.
  3. Check the Kerberos configuration (that is, krb5.conf) and validity of the principal and keytab pair using MIT Kerberos client:

    kinit -V -k -t <composer_principal>.keytab <composer_principal@KERBEROS.REALM>
  4. Make the keytab accessible for Symphony's Hive connector:

    sudo mkdir /etc/zoomdata
    sudo mv <composer_principal>.keytab /etc/zoomdata
    sudo chown zoomdata:zoomdata /etc/zoomdata/<composer_principal>.keytab
    sudo chmod 600 /etc/zoomdata/<composer_principal>.keytab

Configure a Hive Connector

  1. Create or update the file named /etc/zoomdata/edc-hive.properties. If this file already exists, verify that the information below exists in the file:

    kerberos.krb5.conf.location=/etc/krb5.conf
    kerberos.service.account.authentication=true
    kerberos.service.account.principal=<composer_principal@KERBEROS.REALM>
    kerberos.service.account.keytab.location=/etc/zoomdata/<composer_principal>.keytab
  2. Restart the Hive connector:

    sudo systemctl restart zoomdata-edc-hive

Connect to the Kerberized Hive Source

You are now ready to create the Hive source:

  1. Open a new browser window and log into Symphony.
  2. Select Sources.
  3. Select Hive.
  4. Specify the name of your source and add a description (if desired). Then select Next.
  5. On the Connection page, define the connection source. You can use an existing connection, if available, or create a new one. To create a new connection, select the Input New Credentials option button and specify the connection name and JDBC URL. Make sure that you enter the JDBC URL in the correct format:

    jdbc:hive2://<hive_host>:10000/;principal=<hive_principal@KERBEROS.REALM>

    Replace the placeholders as follows:

    • <hive_host>: Specify the IP address or host name of the Hive node to which you are connecting.

    • <hive_principal@KERBEROS.REALM>: Enter the principal of the Hive node you are connecting to. To get the list of all Hive principals, navigate to Ambari > Admin > Kerberos > Advanced > Hive.

      The principal spec contained in the JDBC URL refers to the principal of the Hive node. <hive_principal@KERBEROS.REALM > principal has nothing to do with the <zoomdata_principal@KERBEROS.REALM> principal specified for the Symphony connector.

  6. Select Validate and, after your connection is valid, select Next.

You can continue configuring the data source as described in Manage Visual Data Discovery Data Source Configurations.

After you have completed the configuration, Symphony will begin accessing Hive using zoomdata_principal@KERBEROS.REALM authenticated by its keytab in /etc/zoomdata/<composer_principal>.keytab.