Module 4 – Lesson 2: Data Quality Dimensions

Album Art

Track Title

Artist Name

Track

This lesson is all about Data Quality. You need a solid understanding of the dimensions of data feeding AI systems because the data has a direct impact on the performance and reliability of AI models and any prediction models you create in Salesforce. Why are we spending so much time on data? That’s because it’s one of the core elements to a successful AI implementation. Remember: AI + Data + CRM + Trust

Trailblazer Walking

The AI Fundamentals Podcast

Episode 16: Data Quality Dimensions

Data Quality Dimensions

Age

Definition: The currency or recency of data, which refers to how old the data is.

Importance: AI models perform better when they are trained on up-to-date information. Outdated data can lead to poor predictions and decisions.

Example: A sales forecast based on customer purchase data from five years ago might not be as relevant as data from the past quarter.

Completeness

Definition: The degree to which all necessary data points are present.

Importance: Missing data can skew AI models, making them less reliable. Completeness ensures that the dataset has all the required fields for effective analysis.

Example: If customer records are missing key fields like email addresses, Einstein’s email marketing predictions will be less accurate.

Accuracy

Definition: The correctness or precision of the data.

Importance: Inaccurate data can lead to wrong conclusions, making AI recommendations faulty. Ensuring accuracy means the data reflects real-world scenarios.

Example: A contact’s address being listed incorrectly could affect the accuracy of Einstein’s lead scoring model.

Consistency

Definition: The uniformity of data across different systems or datasets.

Importance: Inconsistent data can cause errors in AI models. Consistency ensures that the same information appears the same way wherever it is stored or accessed.

Example: A customer’s name being stored as “John Smith” in one system and “J. Smith” in another can lead to confusion and duplicated predictions.

Duplication

Definition: The presence of repeated or redundant data entries.

Importance: Duplicates can skew AI models and inflate metrics, leading to inaccurate predictions. Data deduplication helps in maintaining clean, usable datasets.

Example: Two records for the same account could result in Einstein forecasting more revenue than is actually likely.

Usage

Definition: The frequency and context in which the data is utilized.

Importance: Understanding how often and in what scenarios data is used helps ensure that only relevant data is considered for AI models, enhancing model performance.

Example: Regularly updated data from a CRM is more useful for predictive modeling than rarely accessed archival data.

Validity

Definition: The conformity of data to predefined formats and rules.

Importance: Invalid data can cause errors in model training and prediction. Validity ensures that the data is correct in structure and meaning.

Example: Ensuring that phone numbers are stored in the correct format (+1 XXX-XXX-XXXX) ensures accurate communication with clients via automated tools like Einstein Bots.

Uniqueness

Definition: The singularity of data points in the dataset, ensuring no duplication of records.

Importance: Uniqueness is vital to avoid redundancy, ensuring each data point contributes distinctly to AI models without skewing results.

Example: Having two identical customer profiles could lead to double counting in sales projections.

Timeliness

Definition: The relevance of data at the time it is accessed or used.

Importance: AI models need timely data to generate relevant predictions. If the data is outdated, the models may produce irrelevant insights.

Example: Real-time data from active sales campaigns can help Einstein Analytics provide more relevant sales trend insights than data from past campaigns.

Each of the dimensions discussed – Age, Completeness, Accuracy, Consistency, Duplication, Usage, Validity, Uniqueness, and Timeliness – plays an important role in ensuring that Einstein and other AI tools provide accurate and actionable insights. It will also drive the responses from Agentforce whenever an agent action is using data for making decisions or providing responses.

Now Drop In To Focus

What is “Age” in data quality?
Age refers to the currency or recency of data. Up-to-date information ensures AI models make relevant predictions.
Why is “Completeness” important?
Completeness ensures all necessary data points are present. Missing data can skew AI models and reduce reliability.
What does “Accuracy” mean in data quality?
Accuracy refers to the correctness of data. Inaccurate data leads to faulty AI recommendations and wrong conclusions.
Why is “Consistency” critical?
Consistency ensures uniform data across systems. Inconsistencies can lead to errors and duplicated predictions in AI.
What is “Duplication” and its impact?
Duplication refers to redundant data entries. It can skew AI metrics, inflate results, and lead to inaccurate predictions.
How does “Usage” affect AI models?
Usage refers to how often and in what context data is utilized. Frequent, relevant data enhances AI performance.
What does “Validity” mean?
Validity ensures data conforms to predefined formats. Invalid data can cause errors in AI model training and predictions.
Why is “Timeliness” important?
Timeliness ensures data is relevant when accessed. Outdated data can lead to irrelevant AI insights and predictions.

Want to see how your data quality stacks up? Here is an app resources from Salesforce that does just that! Data Quality Analysis Dashboards App

Quiz Time!

Take this quiz to test your knowledge!

?