Manage the BigQuery Connector

This applies to: Visual Data Discovery

The Symphony BigQuery connector lets you access the data available in Google BigQuery storage using the Symphony client. The Symphony BigQuery connector supports the current version of this software as a microservice (SaaS) product.

The Symphony BigQuery connector is a cloud connector that connects to Google BigQuery via the BigQuery API. See Manage Connectors and Connector Servers for general instructions and Connect to BigQuery in Visual Data Discovery for details specific to the BigQuery connector.

After the connector has been set up, you can create data source configurations that specify the necessary connection information and identify the data you want to use. See Manage Visual Data Discovery Data Source Configurations for more information. After data sources are configured, they can be used to create dashboards and visuals from your data. See Create Data Discovery Dashboards.

Feature Support

Connector support for specific features is shown in the following table.

Key: Y - Supported; N - Not Supported; N/A - not applicable

Feature Supported?
Admin-Defined Functions Y
Box Plots Y
Custom SQL Queries Y
Derived Fields (Row-Level Expressions) Y
Distinct Counts Y
Fast Distinct Values N/A
Group By Multiple Fields Y
Group By Time Y
Group By UNIX Time Y
Histogram Floating Point Values Y
Histograms Y
Kerberos Authentication N/A
Last Value Y
Live Mode and Playback Y
Multivalued Fields N/A
Nested Fields N/A
Partitions Y
Pushdown Joins for Fusion Data Sources Y
Schemas Y
Text Search N/A
TLS N/A
User Delegation N
Wildcard Filters Y
Wildcard Filters, Case-Insensitive Mode Y
Wildcard Filters, Case-Sensitive Mode Y

Connect to BigQuery in Visual Data Discovery

When connecting to BigQuery, provide the following information:

  • Key Path: you have to specify the absolute path to the file that must be available for the connector.
  • Public Project IDs.

For more information about these values, refer to Google BigQuery's documentation.

Authorize the BigQuery Connection

To authorize the BigQuery connection, you need to create a security key for it. Before you can create the security key, you must access or create a BigQuery microservice account.

To create a BigQuery microservice account, perform the following steps:

  1. Login to your Google API Console.

  2. Select the required project from the list.

  3. Make sure that current account is linked to a billing account. To check this, select the menu () icon and then select Billing.

  4. On the API Manager page, select Credentials:

  5. On the Credentials page, select Manage service accounts.

  6. On the Service Accounts page, select Create Service Account and specify the following:

    • Service account name
    • Role - grant this microservice account role based access to the project. From the list, select the BigQuery category and then select BigQuery Data Viewer and BigQuery User roles.
    • Service account ID

  7. Select Create.

After you have created an account, create a security key for it.

To create a security key:

  1. On the Service Accounts page, find the required account.

  2. From the menu, select Create key.

  3. In the Create private key dialog, select JSON for the key type and select Create. The local copy of the key is saved on your computer.

    For more information, see the following Google resource: BigQuery Introduction to Authentication.

  4. Move the file with the key to the server, on which the connector is running.

Connect to BigQuery Using OAuth

To create a BigQuery connection use one of the available authentication methods:

  • Key authentication flow requires a security key to be generated at BigQuery and placed to the Symphony instance;

  • OAuth 2.0 requires providing OAuth client_id and client_secrets generated for a user that will serve for data retrieval, such as an integration user. Users are asked to authenticate via a separate authentication form. Users provide their individual credentials when accessing the data retrieved using this connection.

If both authentication methods are selected, connection via OAuth will have higher priority over key authentication except for the scheduled overrides setup.

  Authentication Flow  
Key Path Key Authentication Absolute path to the key authentication file obtained from BigQuery and placed to Composer instance
Public Project Ids List of public project IDs that will be queried for the data
OAuth 2.0 Enabled OAuth 2.0 TRUE/FALSE
Project Id

Billing project ID that will be queried for the data.

Optional if keys authentication is used

Mandatory if OAuth 2.0 connection is enabled

OAuth 2.0 Client Id client_id: Obtain from BigQuery. See https://cloud.google.com/bigquery/docs/authentication/end-user-installed.
OAuth 2.0 Client Secrets

client_secrets: Obtain from BigQuery. See https://cloud.google.com/bigquery/docs/authentication/end-user-installed.

Scheduled Override Options

To maintain Visual Data Discovery's ability to perform scheduled operations such as scheduled dashboard reports, alerts notifications, and more when using OAuth 2.0 authentication flow, you can setup scheduled overrides with key authentication method by providing a key path.

Additional OAuth 2.0 parameters available for override, however, already have prepopulated BigQuery values and do not require manual editing:

  • OAUTH2.AUTHORIZATION_URI
  • OAUTH2.TOKEN_URI
  • OAUTH2.SCOPES

Scheduled source refresh is not available when you use OAuth 2.0 authentication.

To avoid frequent authentication requests for users, Symphony operates with long-lived tokens and preemptively refreshes the tokens when they are close to expiration.

Users' OAuth sessions are terminated when the OAuth token is revoked, if the connection is deleted, or connection details are modified.

Connect to Google BigQuery in Managed Dashboards

This applies to: Managed Dashboards, Managed Reports

Google BigQuery is a web service for querying massive datasets that take advantage of Google's cloud infrastructure.

Create a data connector in Symphony to extract data from your Google project via the BigQuery API.

Using Google BigQuery

See the following sections:

Setup

Create or choose a project

Go to https://console.cloud.google.com/ and sign into your Google account.

Once signed in, you can create or choose an existing project to use. The current project is often listed at the top of the screen and you can click to see a list of projects and switch between them or to create a new one.

The cloud resource manager page should also list your projects and allow you create a new one. A project acts as a container in the Google Cloud Platform which stores information about billing and authorized users and contains BigQuery data.

Google cloud projects

Enable the BigQuery API

To check that the BigQuery API is enabled, select the menu button in the top-left corner, and go to the API & services dashboard.

List of enabled APIs for the project

Ensure your project is selected as the current project in the top-left corner. You can enable the BigQuery API on this page if not already listed.

Service account credentials

Service account credentials are needed to allow Symphony to access your data.

To create or manage service accounts, in the API Manager, select Credentials in the left navigation. On the right, select Create credentials and select Service account.

Add service account credentials

Enter a name and role for your service account.

When done, add a key for your service account. For example, click to edit the service account, go to the Keys tab if applicable, then select Add Key. Choose a JSON key type.

When finished, a certificate file should be downloaded. This file will be needed later to set up a data connector in Symphony.

P12 file type

If you require the P12 key type instead of the default JSON, you will require some additional information.

A popup should be displayed showing the password for the private key. Copy and paste this password into Notepad or similar for later use.

You will also need the service account email address. Copy this from the service account details and paste it with your password.

Service account email address

Enable billing

Billing must be enabled for your Google project in order to load data into the project. If you do not have this set up yet, you can sign up for a free trial from https://cloud.google.com/bigquery/.

Next, go back to the Google Developers Console for your project. Select the menu button in the top-left corner and then select Billing.

Ensure billing is enabled, and you may need to log out and log back in for the changes to take effect.

Public datasets

Google provides public datasets which you can copy and query using BigQuery. See https://cloud.google.com/bigquery/sample-tables for more details.

For example, you can create a new dataset in your Google project, and copy tables from a public dataset such as Shakespeare (which is one of the smaller datasets).

Project has a dataset with a copy of the Shakespeare dataset

New Data Connector

In Symphony Managed Dashboards, create a new data connector from the main menu.

Set the Data Provider drop-down to Google BigQuery.

Select Choose File and select the certificate file that you downloaded.

Choose the certificate file

If the certificate file is of a P12 file type, you now have to provide the Service Account E-Mail Address and Password For The .P12 File that you copied for later reference.

Leave the Catalog Projects field blank and select a Billed Project from the dropdown.

Select Submit to create the data connector and begin discovery.

The Catalog Projects field can be used instead of the billed project to specify free catalogs such as publicdata,gdelt-bq,fh-bigquery (comma separated). If you specify catalog projects but you also want to include the billed project, you must add the billed project to the Catalog Projects field.

Advanced

Expand the Advanced section for additional options.

Use the option Use Legacy SQL to switch the data connector from using standard SQL to legacy SQL.

Advanced options

Use the option Allow Large Results to avoid data connector errors such as Response too large to return. To use this option, you have to create a dataset for storing large results on the Google Cloud Platform.

Tip: Check the option to Expire new tables in one day when creating a dataset in the Google Cloud Platform.

See Also