AI Notebooks - Tutorial - Sentiment analysis on Tweets using Hugging Face

How to use Hugging Face models to analyse Twitter sentiments

Last updated 1st September, 2022.

Objective

The purpose of this tutorial is to show how it is possible to use Hugging Face pre-trained models to analyse sentiments on Tweets. Hugging Face is a company known for making open-source software such as Transformers and Datasets, used for building NLP systems. This software can be used for classification, question answering, translation and many other NLP tasks.

Why using existing models?

  • Someone may already have encountered the same problem as you. So a model may already exist for the task you are trying to address
  • Not enough data: you may not have enough data to train a model from scratch
  • Not enough computing power
  • Lack of knowledge in the field
  • Time saving!

How to define NLP?

The Natural language processing is a branch of Machine Learning that aims to give to computer programs the ability to understand natural human language.

USE CASE: all OVHcloud French Tweets posted on October 16, 2021, i.e. 1 day after the company's IPO and 3 days after an incident.

Hugging Face allows us to show the Tweets sentiments according to their topic.

In order to do this, we will compare 3 models on the sentiment analysis of Tweets: 2 Sentiment analysis models working on French and another one on multilingual.

We will also use a model to classify the Tweets according to their topic: a Zero-Shot classification model working on French.

Hugging Face

Requirements

Instructions

In this tutorial, we get our Tweets and form our database as a .csv file.

Beforehand, if you want to store your data (Tweets) in an object container, please follow this next step.

Uploading your dataset on Public Cloud Storage

If you want to upload it from the OVHcloud Control Panel, go to the Object Storage section and create a new object container by clicking Object Storage > Create an object container.

image

If you want to run it with the CLI, just follow this guide. You have to choose the region, the name of your container and the path where your data is located and use the following command:

ovhai data upload <region> <container> <paths>

Launch and access a Jupyter notebook

The first step will consist in creating a Jupyter Notebook with OVHcloud AI Notebooks.

First, you have to install the OVHAI CLI then choose the name of the notebook (<notebook-name>), the Hugging Face image (huggingface-transformers) and the number of GPUs (<nb-gpus>) to use on your notebook. You can also attach your data, previously stored in the object storage (<container@region/prefix:mount_path:permission>) and use the following command:

ovhai notebook run huggingface-transformers jupyterlab \
    --name <notebook-name> \
    --gpu <nb-gpus>
    --volume <container@region/prefix:mount_path:permission>

Experiment sentiment analysis on Tweets with OVHcloud examples notebooks

Once the repository has been cloned, find the notebook(s) of your choice.

  • Notebook for sentiment analysis with CamemBERT (pt-tblard-tf-allocine):

ai-training-examples > notebooks > natural-language-processing > text-classification > hugging-face > sentiment-analysis-twitter > CamemBERT > hugging_face_camembert_sentiment_analysis_tweets.ipynb

  • Notebook for sentiment analysis with BARThez (barthez-sentiment-classification):

ai-training-examples > notebooks > natural-language-processing > text-classification > hugging-face > sentiment-analysis-twitter > BARThez > hugging_face_barthez_sentiment_analysis_tweets.ipynb

  • Notebook for sentiment analysis with BERT (bert-base-multilingual-uncased-sentiment):

ai-training-examples > notebooks > natural-language-processing > text-classification > hugging-face > sentiment-analysis-twitter > BERT > hugging_face_bert_sentiment_analysis_tweets.ipynb

Instructions are directly shown inside the notebooks. You can run them with the standard "Play" button inside the notebook interface.

Testing the different models

Testing 3 models...

Sentiment Analysis with pt-tblard-tf-allocine

Théophile Blard, French sentiment analysis with BERT, (2020), GitHub repository

Tweets are divided into 2 classes according to their sentiment: positive or negative.

camemBERT_results

Sentiment Analysis with barthez-sentiment-classification

Eddine, Moussa Kamal and Tixier, Antoine J-P and Vazirgiannis, Michalis, BARThez: a Skilled Pretrained French Sequence-to-Sequence Model, (2020), GitHub repository

Tweets are divided into 2 classes according to their sentiment: positive or negative.

BARThez_results

Sentiment Analysis with bert-base-multilingual-uncased-sentiment

Refer to NLP Town

Tweets are divided into 5 classes, from 1 to 5 stars, according to their sentiment: 1 star corresponds to a very negative tweet while 5 stars correspond to a very positive tweet.

BERT_results

Comparing the models performance

Previously, we have tested 3 Hugging Face models based on BARThez, BERT and camemBERT. Two of them can be compared on our dataset: BARThez and CamemBERT.

It is possible to process our data manually and compare our results with the predictions of the models. Then, we will be able to display the success rate of the models to see which one was the best on our dataset.

The confusion matrix will also give us information about false positives or false negatives.

Confusion matrix - BARThez x reel sentiments

BARThez_matrix

Success rate: 87.02 %

Confusion matrix - CamemBERT x reel sentiments

CamemBERT_matrix

Success rate: 78.63 %

Conclusion

To sum up, we find that the results of these pre-trained models are satisfying. We note that the performance of the BARThez-based model is better on our dataset than the CamemBERT-based model.

However, it depends on the dataset you use.

Do not hesitate to test several models!

Experimenting the notebooks

A preview of the three notebooks can be found on GitHub here.

Go further

  • You can also deploy a Flask app to classify text sentiments with Hugging Face models. Check this documentation.
  • If you are interested in NLP (Natural Language Processing), familiarise yourself with speech to text by following this tutorial.

Feedback

Please send us your questions, feedback and suggestions to improve the service:


Questa documentazione ti è stata utile?

Prima di inviare la valutazione, proponici dei suggerimenti per migliorare la documentazione.

Immagini, contenuti, struttura... Spiegaci perché, così possiamo migliorarla insieme!

Le richieste di assistenza non sono gestite con questo form. Se ti serve supporto, utilizza il form "Crea un ticket" .

Grazie per averci inviato il tuo feedback.


Potrebbero interessarti anche...

OVHcloud Community

Accedi al tuo spazio nella Community Fai domande, cerca informazioni, pubblica contenuti e interagisci con gli altri membri della Community OVHcloud

Discuss with the OVHcloud community

Conformemente alla Direttiva 2006/112/CE e successive modifiche, a partire dal 01/01/2015 i prezzi IVA inclusa possono variare in base al Paese di residenza del cliente
(i prezzi IVA inclusa pubblicati includono di default l'aliquota IVA attualmente in vigore in Italia).