WhatsApp Chat Analysis – 21st May, 2024
(Updated on 29/08/24. Added more context and the code!)
I LOVE analyzing data. Deriving meaningful information data and visualizing it is something I absolutely enjoy doing.
So one evening, I decided it would be fun to export a group chat, analyze it, and send back the results on the group. It was a hit, yay! Everyone loved it and their feedback and suggestions helped me incorporate more features.
The analysis I did was VERY BASIC. And a lot of the code is umm, stolen borrowed from a dozens of tutorials and mixed it with some good ole elbow grease. Also it's worth noting that I CANNOT code; especially not in Python which is what I used. So it was a fun experiment. It's not something I technically coded. It's more like something I tried to code. It is extremely hacky .
Wrote some python code to take an exported WhatsApp group chat (whatsapp_chat.txt) to search and split the text based on the sender of the message. Used #regex for this. I know that Python regex and ECMAScript (which is the one I'm tiny bit familiar with) are different. The knowledge however, did not prevent me from making silly errors that took forever for me to recognize :D The
whatsapp_organizesplit.py
uses regex to extract messages from the mainwhatsapp_chat.txt
and saves the messages into individual text files named after the person in a folder called “output”.Used regex yet again to remove the timestamps. But before doing so, calculated the number of matches to find out the total number of messages. Saved this value. regex generator helped me to find an expression to match timestamps.
Now, we have text files for every participant. These have no timestamps. I calculated total words for each sender. I used the following python libraries to further process data.
- wordcloud
- textblob
- nltk
- emoji
Found out -
- Most common words per person
- Most common words for the entire group
- Most common emojis per person
- Most common emojis for the entire group
- Sentiment / Polarity with simple sentiment analysis using textblob. I don't understand it at all but ticks all buzzwords which excited people on the group, so I'm happy :D
Saved all of these to text files.
Code is on Github.