This morning, I assembled a quick text-parsing tool to help me quickly analyze the contents of a text document.
Process
- Select Third Avenue Funds 2015 Semi-Annual Report
- Parse word frequency utilizing WriteWords Word Frequency Counter
- Input values in xls
- Source stopwords [1] list from Ranks NL
- Input stopwords in xls
- Exclude stopwords from selected text parse
- Sort words by frequency
Output
Use Cases
- Input N # of pdfs to analyze key themes amongst a range of reports (eg, sector commentary (energy, technology, healthcare); specific equity commentary)
- Conduct a look-back analysis to understand the frequency by which key themes occur over a specific period of time (eg, corporate actions (restructuring programs, buybacks, dividends) to understand key drivers of performance going forward
- Correlation analysis: which themes correlate with greatest increases in stock and greatest drawdown? (eg, mentioning of poor weather conditions for Motorcycle sales for Harley Davidson)
- Quickly source and isolate specific equities mentioned over a period of time for a specific fund (analyze investor reports from Q1 2010 to Q4 2014) and understand how investment strategies evolve over time
Back-solving
- Input word or phrase and observe frequency of occurrence over time period over a range of select documents
[1] Also a good reference: Text Classification for Sentiment Analysis – Stopwords and Collocations