Distinct Counts

This applies to: Visual Data Discovery

Distinct count functionality determines the number of unique values in a column or expression within a selected table by comparing all the records pulled from the data store by a data source configuration. When distinct counts are used, unique value results are returned when analyzing data. For example, distinct counts could return the number of:

  • Unique customers in a sales database

  • Unique UPC codes for a category of products

  • The number of trucks in a company's fleet

For example, given a single collection and string field with the following three values:

  1. Apple
  2. Orange
  3. Apple

The distinct count returns 2, since there are only two distinct values (“Apple” and “Orange”), while an ordinary count returns 3 to reflect the total number of records. SQL-based connectors might produce a query that looks like this:

select count(distinct myField) from myCollection

Support for this feature by connector is shown in the following table.

Key:Y - Supported; N - Not Supported; N/A - not applicable

Connector Supported? Notes
Amazon Redshift Y  
Amazon S3 Y  
Apache Drill Y  
Apache Phoenix Y  
Apache Phoenix Query Server (QS) Y
Apache Solr Y  
BigQuery Y If you need to access a BigQuery partition, explicitly include an alias for the built in partition column in your select clause, such as select *, _PARTITIONTIME as pt from projectId.datasetId.tableId.
Cloudera Impala Y Cloudera Impala connectors can receive only a single distinct count field in a query.
Cloudera Search Y  
Couchbase Y  
Dremio Y  
Elasticsearch 7.0 Y  
Elasticsearch 8.0 Y
File Upload Y  
HDFS Y  
Hive Y  
Jira Y  
MemSQL Y  
Microsoft SQL Server Y  
MongoDB Y  
MySQL Y  
Oracle Y  
PostgreSQL Y  
Python Y  
Real Time Sales Y  
Salesforce Y  
SAP Hana Y  
SAP IQ Y  
Spark SQL Y  
Snowflake Y  
Teradata Y  
TIBCO DV Y  
Trino Y  
File Upload (Upload API) Y  
Vertica Y