Distinct Counts
This applies to: Visual Data Discovery
Distinct count functionality determines the number of unique values in a column or expression within a selected table by comparing all the records pulled from the data store by a data source configuration. When distinct counts are used, unique value results are returned when analyzing data. For example, distinct counts could return the number of:
-
Unique customers in a sales database
-
Unique UPC codes for a category of products
-
The number of trucks in a company's fleet
For example, given a single collection and string field with the following three values:
- Apple
- Orange
- Apple
The distinct count returns 2, since there are only two distinct values (“Apple” and “Orange”), while an ordinary count returns 3 to reflect the total number of records. SQL-based connectors might produce a query that looks like this:
select count(distinct myField) from myCollection
Support for this feature by connector is shown in the following table.
Key:Y - Supported; N - Not Supported; N/A - not applicable
| Connector | Supported? | Notes |
|---|---|---|
| Amazon Redshift | Y | |
| Amazon S3 | Y | |
| Apache Drill | Y | |
| Apache Phoenix | Y | |
| Apache Phoenix Query Server (QS) | Y | |
| Apache Solr | Y | |
| BigQuery | Y | If you need to access a BigQuery partition, explicitly include an alias for the built in partition column in your select clause, such as select *, _PARTITIONTIME as pt from projectId.datasetId.tableId. |
| Cloudera Impala | Y | Cloudera Impala connectors can receive only a single distinct count field in a query. |
| Cloudera Search | Y | |
| Couchbase | Y | |
| Dremio | Y | |
| Elasticsearch 7.0 | Y | |
| Elasticsearch 8.0 | Y | |
| File Upload | Y | |
| HDFS | Y | |
| Hive | Y | |
| Jira | Y | |
| MemSQL | Y | |
| Microsoft SQL Server | Y | |
| MongoDB | Y | |
| MySQL | Y | |
| Oracle | Y | |
| PostgreSQL | Y | |
| Python | Y | |
| Real Time Sales | Y | |
| Salesforce | Y | |
| SAP Hana | Y | |
| SAP IQ | Y | |
| Spark SQL | Y | |
| Snowflake | Y | |
| Teradata | Y | |
| TIBCO DV | Y | |
| Trino | Y | |
| File Upload (Upload API) | Y | |
| Vertica | Y |