Examine failed row samples
Last modified on 05-Jul-24
When a Soda scan results in a failed check, Soda Cloud displays details of the scan results in each check’s Check History view. To offer more insight into the data that failed a check during a scan, Soda Cloud can display failed rows samples in a check’s history.
From the Checks dashboard, select an indivdual check to access its result history page, then click the Failed rows tab (pictured below) to see the failed rows samples associated with a failed check result.
Troubleshoot
Problem: You open the check that failed during a scan but cannot click the Failed Rows tab.
Solution: Click a failed data point in the chart that shows the check’s scan results over time. This action identifies the specific set of failed rows associated with an individual scan result so it can display the failed rows associated with that individual scan.
Implicitly send failed rows samples
Implicitly, Soda automatically collects 100 failed row samples for the following checks:
- reference check
- checks that use a missing metric
- checks that use a validity metric
- checks that use a duplicate_count or duplicate_percent metric
Explicitly send failed rows samples
Define a failed rows check to explicitly send samples of rows that failed a check to Soda Cloud.
Disable failed row samples
Where your datasets contain sensitive or private information, you may not want to send failed row samples from your data source to Soda Cloud. In such a circumstance, you can disable the feature completely in Soda Cloud.
To prevent Soda Cloud from receiving any sample data or failed row samples for any datasets in any data sources to which you have connected your Soda Cloud account, proceed as follows:
- As an Admin, log in to your Soda Cloud account and navigate to your avatar > Organization Settings.
- In the Organization tab, check the box to “Disable collecting samples and failed rows for metrics in Soda Cloud”, then Save.
Alternatively, if you use Soda Library, you can adjust the configuration in your configuration.yml
to disable all samples, as in the following example.
data_source my_datasource:
type: postgres
...
sampler:
disable_samples: True
See also: Set a sample limit for a data source
Disable sampling for specific columns
For checks which implicitly or explcitly collect failed rows samples, you can add a configuration to your data source connection details to prevent Soda from collecting failed rows samples from specific columns that contain sensitive data.
Refer to Disable failed rows sampling for specific columns.
Disable failed row samples for individual checks
For checks which implicitly or explcitly collect failed rows samples, you can set the samples limit
to 0
to prevent Soda from collecting and sending failed rows samples for an individual check, as in the following example.
checks for dim_customer:
- missing_percent(email_address) < 50:
samples limit: 0
Specify columns for failed row sampling
Use a samples columns
configuration to an individual check to specify the columns for which Soda must implicitly collect failed row sample values. Soda only collects the check’s failed row samples for the columns you specify in the list, as in the duplicate_count
example below.
Soda implicitly collects failed row samples for the following checks:
- reference check
- checks that use a missing metric
- checks that use a validity metric
- checks that use a duplicate_count or duplicate_percent metric
Note that the comma-separated list of samples columns does not support wildcard characters (%).
checks for dim_customer:
- duplicate_count(email_address) < 50:
samples columns: [last_name, first_name]
See also: About failed row samples
Go further
- Sign up for a Soda Cloud account.
- Learn more about creating and tracking Soda Incidents.
- Need help? Join the Soda community on Slack.
Was this documentation helpful?
What could we do to improve this page?
- Suggest a docs change in GitHub.
- Share feedback in the Soda community on Slack.
Documentation always applies to the latest version of Soda products
Last modified on 05-Jul-24