DataSet

To analyse the sentiments of a WhatsApp chat, I have collected the data from my personal WhatsApp chats. To collect the data of your chats, simply follow the steps mentioned below:

  1. For iPhone:
    1. Open your chat with a person or a group
    2. Just tap on the profile of the person or the group
    3. You will see an option to export chat down below
  2. For Android:
    1. Open your chat with a person or a group
    2. Click on the three dots above
    3. Click on more
    4. Click on the export chat

Iโ€™ve started this task by defining some helper functions because the data we get from WhatsApp is not a dataset that is ready to be used for any kind of data science task.

import re
import pandas as pd
import numpy as np
import emoji
from collections import Counter
import matplotlib.pyplot as plt
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

# Extract Time
def date_time(s):
    pattern = '^([0-9]+)(\\/)([0-9]+)(\\/)([0-9]+), ([0-9]+):([0-9]+)[ ]?(AM|PM|am|pm)? -'
    result = re.match(pattern, s)
    if result:
        return True
    return False

# Find Authors or Contacts
def find_author(s):
    s = s.split(":")
    if len(s)==2:
        return True
    else:
        return False

# Finding Messages
def getDatapoint(line):
    splitline = line.split(' - ')
    dateTime = splitline[0]
    date, time = dateTime.split(", ")
    message = " ".join(splitline[1:])
    if find_author(message):
        splitmessage = message.split(": ")
        author = splitmessage[0]
        message = " ".join(splitmessage[1:])
    else:
        author= None
    return date, time, author, message

In this step, It doesnโ€™t matter if you are using a group chat dataset or your conversation with one person. All the functions defined above will prepare the data for sentiment analysis.

data = []
conversation = 'WhatsApp Chat with Sapna.txt'
with open(conversation, encoding="utf-8") as fp:
    fp.readline()
    messageBuffer = []
    date, time, author = None, None, None
    while True:
        line = fp.readline()
        if not line:
            break
        line = line.strip()
        if date_time(line):
            if len(messageBuffer) > 0:
                data.append([date, time, author, ' '.join(messageBuffer)])
            messageBuffer.clear()
            date, time, author, message = getDatapoint(line)
            messageBuffer.append(message)
        else:
            messageBuffer.append(line)

Now here is how we can analyze the sentiments of WhatsApp chat using Python:

df = pd.DataFrame(data, columns=["Date", 'Time', 'Author', 'Message'])
df['Date'] = pd.to_datetime(df['Date'])

data = df.dropna()
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["Message"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["Message"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["Message"]]
print(data.head())
Date      Time        Author  ... Positive  Negative  Neutral
0 2022-04-06  01:00 am         Kirti  ...      0.0     0.000    1.000
1 2022-04-06  01:02 am         Kirti  ...      0.0     0.000    1.000
2 2022-04-06  01:06 am        Shivam  ...      0.0     0.000    1.000
3 2022-04-06  01:07 am         Kirti  ...      0.0     0.383    0.617
4 2022-04-06  01:12 am        Shivam  ...      0.0     0.000    1.000

Now, letโ€™s compare the cost of acquisition across different channels and identify the most and least profitable channels:

x = sum(data["Positive"])
y = sum(data["Negative"])
z = sum(data["Neutral"])

def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive ๐Ÿ˜Š ")
    elif (b>a) and (b>c):
        print("Negative ๐Ÿ˜  ")
    else:
        print("Neutral ๐Ÿ™‚ ")
sentiment_score(x, y, z)
Output:
Positive ๐Ÿ˜Š

By far, the data I used indicates that most of the messages between me and Kirti are positive.