Link Search Menu Expand Document

Examine failed row samples

Last modified on 05-Jul-24

When a Soda scan results in a failed check, Soda Cloud displays details of the scan results in each check’s Check History view. To offer more insight into the data that failed a check during a scan, Soda Cloud can display failed rows samples in a check’s history.

From the Checks dashboard, select an indivdual check to access its result history page, then click the Failed rows tab (pictured below) to see the failed rows samples associated with a failed check result.

failed-rows

Troubleshoot

Problem: You open the check that failed during a scan but cannot click the Failed Rows tab.
Solution: Click a failed data point in the chart that shows the check’s scan results over time. This action identifies the specific set of failed rows associated with an individual scan result so it can display the failed rows associated with that individual scan.


Implicitly send failed rows samples

Implicitly, Soda automatically collects 100 failed row samples for the following checks:

Explicitly send failed rows samples

Define a failed rows check to explicitly send samples of rows that failed a check to Soda Cloud.

Disable failed row samples

Where your datasets contain sensitive or private information, you may not want to send failed row samples from your data source to Soda Cloud. In such a circumstance, you can disable the feature completely in Soda Cloud.

To prevent Soda Cloud from receiving any sample data or failed row samples for any datasets in any data sources to which you have connected your Soda Cloud account, proceed as follows:

  1. As an Admin, log in to your Soda Cloud account and navigate to your avatar > Organization Settings.
  2. In the Organization tab, check the box to “Disable collecting samples and failed rows for metrics in Soda Cloud”, then Save.

Alternatively, if you use Soda Library, you can adjust the configuration in your configuration.yml to disable all samples, as in the following example.

data_source my_datasource:
  type: postgres
  ...
  sampler:
    disable_samples: True

See also: Set a sample limit for a data source


Disable sampling for specific columns

For checks which implicitly or explcitly collect failed rows samples, you can add a configuration to your data source connection details to prevent Soda from collecting failed rows samples from specific columns that contain sensitive data.

Refer to Disable failed rows sampling for specific columns.


Disable failed row samples for individual checks

For checks which implicitly or explcitly collect failed rows samples, you can set the samples limit to 0 to prevent Soda from collecting and sending failed rows samples for an individual check, as in the following example.

checks for dim_customer:
  - missing_percent(email_address) < 50:
      samples limit: 0


Specify columns for failed row sampling

Use a samples columns configuration to an individual check to specify the columns for which Soda must implicitly collect failed row sample values. Soda only collects the check’s failed row samples for the columns you specify in the list, as in the duplicate_count example below.

Soda implicitly collects failed row samples for the following checks:

Note that the comma-separated list of samples columns does not support wildcard characters (%).

checks for dim_customer:
  - duplicate_count(email_address) < 50:
      samples columns: [last_name, first_name]

See also: About failed row samples

Go further



Was this documentation helpful?

What could we do to improve this page?

Documentation always applies to the latest version of Soda products
Last modified on 05-Jul-24