Text Mining in Excel

This morning, I assembled a quick text-parsing tool to help me quickly analyze the contents of a text document.

Process

  1. Select Third Avenue Funds 2015 Semi-Annual Report
  2. Parse word frequency utilizing WriteWords Word Frequency Counter
  3. Input values in xls
  4. Source stopwords [1] list from Ranks NL
  5. Input stopwords in xls
  6. Exclude stopwords from selected text parse
  7. Sort words by frequency

Output

cata.3rdave

Use Cases

  • Input N # of pdfs to analyze key themes amongst a range of reports (eg, sector commentary (energy, technology, healthcare); specific equity commentary)
  • Conduct a look-back analysis to understand the frequency by which key themes occur over a specific period of time (eg, corporate actions (restructuring programs, buybacks, dividends) to understand key drivers of performance going forward
  • Correlation analysis: which themes correlate with greatest increases in stock and greatest drawdown? (eg, mentioning of poor weather conditions for Motorcycle sales for Harley Davidson)
  • Quickly source and isolate specific equities mentioned over a period of time for a specific fund (analyze investor reports from Q1 2010 to Q4 2014) and understand how investment strategies evolve over time

Back-solving

  • Input word or phrase and observe frequency of occurrence over time period over a range of select documents

[1] Also a good reference: Text Classification for Sentiment Analysis – Stopwords and Collocations