Data lineage is the process of tracking the flow of data over time, providing a clear understanding of where the data originated, how it has changed, and its ultimate destination within the data pipeline². Data lineage tools provide a record of data throughout its lifecycle, including source information and any data transformations that have been applied during any ETL or ELT processes².
Are you still struggling to understand what data lineage is? Read our guide.
Different Types of Data Lineage
Data lineage has different types that are classified based on how it is generated, who the intended user of the lineage is, and how the generated data lineage is documented¹. Different types of data lineage exist because there are multiple questions you may want to ask about data and multiple stakeholders who can benefit from data lineage visibility¹. For example, a lineage view that solves for compliance may not essentially be the same one that solves for root-cause analysis or quality¹.
Business Data Lineage Types
Descriptive Data Lineage
Descriptive data lineage involves the manual documentation of data sources, transformations, and destinations using natural language or diagrams. This type of lineage is ideal for business users and analysts who want an easy-to-understand representation of their data pipeline. While descriptive data lineage is easy to communicate and understand, it can be prone to errors and inconsistencies and may be difficult to maintain and update.
Business lineage
Business lineage involves the documentation of the business meaning, context, and impact of the data using business terms and rules. This type of lineage is useful for business users and analysts who want to understand data quality, compliance, and value. However, business lineage may not capture all the technical details or transformations in the data pipeline.
Technical Data Lineage Types
Automated data lineage
Automated data lineage involves the automated extraction of metadata from various data sources and systems using scanners or APIs. This type of lineage is useful for data engineers and developers who want accurate and consistent documentation that is easy to maintain and update. However, automated data lineage may not capture all the details or context and may require additional tools or integrations.
Design lineage
Design lineage involves the documentation of the intended design and structure of the data pipeline using schemas or diagrams. This type of lineage is ideal for data architects and modellers who want to plan and validate their data pipeline. However, the design lineage may not reflect the actual implementation or changes in the data pipeline.
Operational lineage
Operational lineage involves the documentation of the technical details, dependencies, and performance of the data pipeline using code or logs. This type of lineage is useful for data engineers and developers who want to debug, optimize, and monitor their data pipeline. However, operational lineage may not capture all the business meaning or context of the data.
Here is a table that summarizes some of the common types of data lineage, their target audience, features, benefits and shortcomings:
Type |
Target Audience |
Features |
Benefits |
Shortcomings |
Descriptive data lineage |
Business users and analysts |
Manual documentation of data sources, transformations, and destinations using natural language or diagrams¹ |
Easy to understand and communicate¹ |
Prone to errors and inconsistencies; hard to maintain and update¹ |
Automated data lineage |
Data engineers and developers |
Automated extraction of metadata from various data sources and systems using scanners or APIs¹⁵ |
Accurate and consistent; easy to maintain and update¹⁵ |
May not capture all the details or context; may require additional tools or integrations¹⁵ |
Design lineage |
Data architects and modelers |
Documentation of the intended design and structure of the data pipeline using schemas or diagrams¹⁶ |
Useful for planning and validating the data pipeline¹⁶ |
May not reflect the actual implementation or changes in the data pipeline¹⁶ |
Business lineage |
Business users and analysts |
Documentation of the business meaning, context, and impact of the data using business terms and rules¹⁷ |
Useful for understanding the data quality, compliance, and value¹⁷ |
May not capture all the technical details or transformations in the data pipeline¹⁷ |
Operational lineage |
Data engineers and developers |
Documentation of the technical details, dependencies, and performance of the data pipeline using code or logs¹⁶ |
Useful for debugging, optimizing, and monitoring the data pipeline¹⁶ |
May not capture all the business meaning or context of the data¹⁶ |
Note: ¹ means a footnote is provided with additional information.
In addition to these types of data lineage, there is also a concept called data provenance which refers to the first instance or source of the data². Data provenance is typically used in the context of data lineage, but it specifically focuses on where the data came from and how it was created².
In conclusion, understanding the different types of data lineage available to your organization is critical to maintaining a reliable and accurate data ecosystem. Each type has its own unique features, benefits, and limitations, and it's up to you to choose the type that best suits your organization's needs. By implementing data lineage best practices, you can ensure the quality, compliance, and value of your data, and make informed decisions based on reliable information.
(1) Ultimate Guide to Data Lineage | MANTA. https://getmanta.com/ultimate-guide-to-data-lineage.
(2) Data Lineage Done Right ⚡ - Data Lineage Tool - MANTA. https://getmanta.com/.
(3) Data Lineage Definition - What is data lineage? - Precisely. https://www.precisely.com/glossary/data-lineage.
(4) Data Lineage Metrics: Why is it Important? - Precisely. https://www.precisely.com/blog/datagovernance/what-is-data-lineage-metrics-and-why-is-it-important.
(5) Precisely - Better data. Better decisions.. https://www.precisely.com/.
(6) Data Governance Technology Tools and Software | MANTA. https://getmanta.com/solutions/data-lineage-for-data-governance.
(7) About the MANTA Data Lineage Platform & Tool | MANTA. https://getmanta.com/about-the-manta-platform.