Understand your Data's Complete Story

 

Automated Code Level Lineage

 

  

 

Understanding the lineage of data is a crucial aspect of data management. To achieve this, various data lineage techniques have been developed, each with its own strengths and weaknesses.

Trying to make the right choices about data lineage? Read the ultimate guide to data lineage.

data lineage techniques

Three primary data lineage techniques

Three primary categories of data lineage techniques are pattern-based, code-based, and hybrid.

  • Pattern-based techniques make use of metadata and rules to derive data relationships and dependencies. For example, a tool may identify two columns with the same name in two tables, and infer a link between them.

  • Code-based techniques, on the other hand, analyze the code or scripts responsible for creating or transforming data.

  • Hybrid techniques, as their name suggests, combine both pattern-based and code-based approaches to offer a more comprehensive understanding of data lineage. This is the approach taken by MANTA as it is the only approach that gives a complete view of lineage

By utilizing these powerful techniques, organizations can ensure the integrity and accuracy of their data, enabling them to make informed decisions based on reliable information.

Pros and cons of these alternative data lineage techniques

Technique

Pros

Cons

Pattern-based

  • Fast and easy to implement.

  • Does not require access to code or scripts.

  • Technology independent

  • May identify manual transformations such as importing a file
  • May not capture all data flows and dependencies.

  • May produce false positives or negatives.

  • May not reflect the latest changes in code or data.

  • Will lose details such as actual transformation logic applied

Code-based

  • Provides the most accurate and complete representation of data flows and transformations.

  • Reflects the current state of code and data.

  • Can handle various code sources and languages.

  • Can identify indirect data flows and dependencies
  • Captures all the details of business logic such as aggregations and transformations
  • May be slow and difficult to implement.

  • Requires access to code or scripts and the ability to harvest them

  • May not capture metadata or business context.

Hybrid

  • Combines the strengths of pattern-based and code-based techniques.

  • Provides a comprehensive and accurate view of data lineage.

  • Integrates metadata and code analysis for better data understanding.

  • May be more complex and costly to implement.

  •  Requires access to both metadata and code or scripts.

  • May require manual validation or reconciliation of results.

 

 

Are you tired of manually tracking data lineage?

It's time-consuming, prone to errors, and can be incomplete. But don't worry, there's a solution - data lineage tools.

With these tools, the process of discovering, documenting, and visualizing your data lineage can be automated, saving you time and providing a more accurate and complete view of your data flows. What's more, data lineage tools can integrate seamlessly with other data governance solutions, making it easier than ever to manage your data effectively.

 

Get Started Today!

   

Phone:+27 11 485 4856