Monday, January 14, 2013

Language Analysis

I have been working on a project for a while trying to analyze different modalities of the English language using Google Books, Twitter, IRC Chat and Phone conversations. The project is really quite interesting and has taught me a decent amount about statistics and linguistics. Lately I have been trying to create a confusion matrix with the different corpora in order to determine how different/similar these modalities of language we speak are. I am finally finishing up this project with what I consider respectable data and respectable results.

My goal now is to write up a nice little paper and send it off to some open access journal. I spent lots of yesterday coding and making some figures. I still love Perl despite its faults.