【 SPEECH 】Text Mining in Accounting
Written by Oliver Pan
On May 19, 2025, the Financial Statement Analysis course at National Sun Yat-Sen University, coordinated by Professor Wil Martens, hosted a lecture by Ph.D. graduate Andrew Argue. Andrew applies natural language processing to study how firms position themselves, using examples from crowdfunding platforms and Foodpanda to illustrate the concept of “optimal distinctiveness.”
He demonstrated how to extract the “Management Discussion and Analysis” section from SEC filings using Python or WRDS to perform sentiment classification, helping students overcome data-sourcing challenges. Next, Andrew outlined the text preprocessing pipeline—building a corpus, cleaning raw text, removing stop words, applying stemming/lemmatization, and tokenization—emphasizing the trade-off between speed and accuracy.
Finally, he showed how a manager’s tone during unscripted Q&A sessions in earnings calls can predict short-term stock movements under high uncertainty, cautioning that sentiment models serve as complements rather than replacements for numerical analysis. Students gained hands-on experience with text-cleaning techniques and plan to use Python libraries (e.g., NLTK, spaCy) to build their own sentiment-analysis workflows.