Understand your Data's Complete Story

 

Automated Code Level Lineage

 

  

 

Data lineage is the process of tracking the flow of data over time, providing a clear understanding of where the data originated, how it has changed, and its ultimate destination within the data pipeline². Data lineage tools provide a record of data throughout its lifecycle, including source information and any data transformations that have been applied during any ETL or ELT processes².

Are you still struggling to understand what data lineage is? Read our guide.

types of data lineage

Different Types of Data Lineage

Data lineage has different types that are classified based on how it is generated, who the intended user of the lineage is, and how the generated data lineage is documented¹. Different types of data lineage exist because there are multiple questions you may want to ask about data and multiple stakeholders who can benefit from data lineage visibility¹. For example, a lineage view that solves for compliance may not essentially be the same one that solves for root-cause analysis or quality¹.

Business Data Lineage Types

Descriptive Data Lineage

Descriptive data lineage involves the manual documentation of data sources, transformations, and destinations using natural language or diagrams. This type of lineage is ideal for business users and analysts who want an easy-to-understand representation of their data pipeline. While descriptive data lineage is easy to communicate and understand, it can be prone to errors and inconsistencies and may be difficult to maintain and update.

Business lineage

Business lineage involves the documentation of the business meaning, context, and impact of the data using business terms and rules. This type of lineage is useful for business users and analysts who want to understand data quality, compliance, and value. However, business lineage may not capture all the technical details or transformations in the data pipeline.

Technical Data Lineage Types

Automated data lineage

Automated data lineage involves the automated extraction of metadata from various data sources and systems using scanners or APIs. This type of lineage is useful for data engineers and developers who want accurate and consistent documentation that is easy to maintain and update. However, automated data lineage may not capture all the details or context and may require additional tools or integrations.

Design lineage

Design lineage involves the documentation of the intended design and structure of the data pipeline using schemas or diagrams. This type of lineage is ideal for data architects and modellers who want to plan and validate their data pipeline. However, the design lineage may not reflect the actual implementation or changes in the data pipeline.

Operational lineage

Operational lineage involves the documentation of the technical details, dependencies, and performance of the data pipeline using code or logs. This type of lineage is useful for data engineers and developers who want to debug, optimize, and monitor their data pipeline. However, operational lineage may not capture all the business meaning or context of the data.

 Here is a table that summarizes some of the common types of data lineage, their target audience, features, benefits and shortcomings:

Type

Target Audience

Features

Benefits

Shortcomings

Descriptive data lineage

Business users and analysts

Manual documentation of data sources, transformations, and destinations using natural language or diagrams¹

Easy to understand and communicate¹

Prone to errors and inconsistencies; hard to maintain and update¹

Automated data lineage

Data engineers and developers

Automated extraction of metadata from various data sources and systems using scanners or APIs¹⁵

Accurate and consistent; easy to maintain and update¹⁵

May not capture all the details or context; may require additional tools or integrations¹⁵

Design lineage

Data architects and modelers

Documentation of the intended design and structure of the data pipeline using schemas or diagrams¹⁶

Useful for planning and validating the data pipeline¹⁶

May not reflect the actual implementation or changes in the data pipeline¹⁶

Business lineage

Business users and analysts

Documentation of the business meaning, context, and impact of the data using business terms and rules¹⁷

Useful for understanding the data quality, compliance, and value¹⁷

May not capture all the technical details or transformations in the data pipeline¹⁷

Operational lineage

Data engineers and developers

Documentation of the technical details, dependencies, and performance of the data pipeline using code or logs¹⁶

Useful for debugging, optimizing, and monitoring the data pipeline¹⁶

May not capture all the business meaning or context of the data¹⁶

Note: ¹ means a footnote is provided with additional information.

In addition to these types of data lineage, there is also a concept called data provenance which refers to the first instance or source of the data². Data provenance is typically used in the context of data lineage, but it specifically focuses on where the data came from and how it was created².

In conclusion, understanding the different types of data lineage available to your organization is critical to maintaining a reliable and accurate data ecosystem. Each type has its own unique features, benefits, and limitations, and it's up to you to choose the type that best suits your organization's needs. By implementing data lineage best practices, you can ensure the quality, compliance, and value of your data, and make informed decisions based on reliable information.

(1) Ultimate Guide to Data Lineage | MANTA. https://getmanta.com/ultimate-guide-to-data-lineage.

(2) Data Lineage Done Right ⚡ - Data Lineage Tool - MANTA. https://getmanta.com/.

(3) Data Lineage Definition - What is data lineage? - Precisely. https://www.precisely.com/glossary/data-lineage.

(4) Data Lineage Metrics: Why is it Important? - Precisely. https://www.precisely.com/blog/datagovernance/what-is-data-lineage-metrics-and-why-is-it-important.

(5) Precisely - Better data. Better decisions.. https://www.precisely.com/.

(6) Data Governance Technology Tools and Software | MANTA. https://getmanta.com/solutions/data-lineage-for-data-governance.

(7) About the MANTA Data Lineage Platform & Tool | MANTA. https://getmanta.com/about-the-manta-platform.

Get Started Today!

   

Phone:+27 11 485 4856