How to customize IBM Watson Knowledge Catalog to support any kind of external assets and improve your data governance.

Photo by DreamQuest on pixabay

In a previous article I have shown how to automatically catalog a high number of data sets by using IBM Cloudpak for Data and in particular Watson Knowledge Catalog.

A good enterprise catalog is the key stone of data governance. It is the place where the existence of all data and governance assets can be documented as well as their relationships with each others. Capturing the relationships between the assets is essential in order to answer data lineage questions, determine the dependency graph of any given asset, or do an impact analysis.

For example, a well maintained catalog should help…

Take the right action on your data, based on what the data really represent and not on what you think they are

Photo by Annie Spratt on Unsplash

In a previous article I showed how to create with IBM Cloud Pak for Data an automatic process to discover data and ingest them in a catalog while enforcing governance policies. One of the key elements of this process is the ability to recognize what kind of data are ingested. This is what is called Data Classification — not to be confused with classification in the ML context.

In this article I will go deeper in this particular…

Getting Started

Understand the standard data quality dimensions used by IBM Cloud Pak for Data and IBM Infosphere Information Server.

Image by saulhm from Pixabay (https://pixabay.com/users/saulhm-31267/)

In a previous article, I explained in details how IBM Cloud Pak for Data and IBM Infosphere Information Analyzer compute a unified data quality score for each analyzed dataset:

In short, the data quality score of a data set is computed by applying algorithms which look for different types of data quality issues. A data quality issue is identified, whenever data do not fulfil a given expectation. Such issues can be reported for individual cells of the data set or for complete rows or for columns or for the data set as a whole. …

Getting Started

From individual data quality metrics to a unified score.

In this article, I will to explain the concepts behind computing a unified data quality score as it is used in IBM Cloud Pak for Data and IBM Information Server / Information Analyzer to quantify the quality of structured data.

Picture by tookapic on https://pixabay.com/users/tookapic-1386459/

The need for a simple data quality score

Measuring data quality is not a new field. IBM Information Analyzer and other data profiling tools have been on the market for more than one decade to help data engineers better understand what they have in their data and what they may have to fix. …

How to ingest data sources into IBM Watson Knowledge Catalog while complying with the governance rules

Picture by Bo Mei on https://pixabay.com/users/bomei615-2623913/

AI projects — or data analytics in general — require good data in order to be successful. On the other side, the large amount of data that exist in any organization makes them difficult to find and governance policies and regulation laws makes them difficult to share.

In this article, I am going to show, how you can use IBM Cloud Pak for Data solve that problem by cataloging a high number of data sets in a short amount of time and make them available for users, while ensuring automatically that data protection policies are enforced.

In order to set…

Yannick Saillet

Software Architect, Master Inventor @IBM — Architect for Data Profiling and Data Quality in Watson Knowledge Catalog on IBM Cloud Pak for Data, IBM Cloud.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store