Insights from quarterly earnings call transcripts using Natural language processing (NLP) to generate signals and insights for the airlines sector for a US based Investment Bank
Client : US-based investment bank
Objective
- Leverage analytical tools and technologies on company filings.
- Draw insights from quarterly earnings call transcripts to generate signals and insights for the airlines sector
Platforms used
- Python, NLP algorithms
CRISIL's solution
- Data extraction: Set up scripts to convert statements to text format
- Data processing
- Pre-processing of statements to remove stop words and special characters
- Lexicon normalisation using lemmatisation
- Feature extraction
- Extract meaningful word pairs(bi-grams) using tf-idf, which represent the major challenges/factors affecting the airlines sector (financial, competition, fuel-price, fares/surcharges, customer satisfaction).
- Extract meaningful word pairs(bi-grams) using tf-idf, which represent the major challenges/factors affecting the airlines sector (financial, competition, fuel-price, fares/surcharges, customer satisfaction).
- Variable creation
- Prepared a set of independent variables pertaining to the above factors using
- Word frequencies and sentence counts on presentation and Q&A sections
- Sentiment on sentences by the speaker containing important factors
- Tone of executives and analysts
- Prepared a set of independent variables pertaining to the above factors using
- Output
- Create derived variables YoY change (%) to observe the impact across time
- Create derived variables YoY change (%) to observe the impact across time
- Data science
- Applied dimension reduction techniques for variable reduction
- Created individual feature chart across timeline to draw insights
- Built a regression model and back-tested the model
Client impact
- Strong correlation on adjusted returns observed with certain features
- Feature chart across timeline to draw insights on impact to the airlines sector
- Performed sentiment analysis, reduced dimensionality, built regression model and back-tested to drive analysis
Questions