Automated Data Lineage Generation for Risk Models for a Multinational Banking and Financial Services Corporation
Client : A multinational banking and financial services corporation
Objective
- Analyze model execution scripts to understand and map data flow and lineage across various systems
- Identify inconsistencies such as circular referencing of feeds, sourcing from non-authorized data, white paper non-compliance, etc.
CRISIL's solution
Model Scripts
- SQL
- SAS
Automated Parser for SQL Scripts
Manual Parsing for SAS Scripts
Template to Record Data Flow
- Version
- Script Name
- Source Database, Table and Column
- Target Database, Table and Column
- Transformation applied (For e.g., multiplication by defined rates, categorical transformations, etc.)
Data Lineage Tool
- Ability to query on model and version
- Map data control checks such as circular referencing of feeds, sourcing and white paper inconsistencies, etc.
- Central repository for data dictionary
Data Flow Diagram
- Visual flow diagram representation of how data is sourced, consumed and processed in a model
- Directed graph representation enables quick audit and validation of the model process
- Technology Stack:
- Parser for SQL build using Python
- Data flow diagram build using NetworkX library in Python
Client impact
- Accurately map data lineage and version control across model scripts
- Ability to map and tag data control checks
- Enable quick data audits by visual representation
- Automated data dictionary generation and audits
Questions