AI Notebooks - Tutorial - Sentiment analysis on Tweets using Hugging Face
How to use Hugging Face models to analyse Twitter sentiments
How to use Hugging Face models to analyse Twitter sentiments
Last updated 1st September, 2022.
The purpose of this tutorial is to show how it is possible to use Hugging Face pre-trained models to analyse sentiments on Tweets. Hugging Face is a company known for making open-source software such as Transformers and Datasets, used for building NLP systems. This software can be used for classification, question answering, translation and many other NLP tasks.
Why using existing models?
How to define NLP?
The Natural language processing is a branch of Machine Learning that aims to give to computer programs the ability to understand natural human language.
USE CASE: all OVHcloud French Tweets posted on October 16, 2021, i.e. 1 day after the company's IPO and 3 days after an incident.
Hugging Face allows us to show the Tweets sentiments according to their topic.
In order to do this, we will compare 3 models on the sentiment analysis of Tweets: 2 Sentiment analysis models working on French and another one on multilingual.
We will also use a model to classify the Tweets according to their topic: a Zero-Shot classification model working on French.
In this tutorial, we get our Tweets and form our database as a .csv file.
Beforehand, if you want to store your data (Tweets) in an object container, please follow this next step.
If you want to upload it from the OVHcloud Control Panel, go to the Object Storage section and create a new object container by clicking Object Storage
> Create an object container
.
If you want to run it with the CLI, just follow this guide. You have to choose the region, the name of your container and the path where your data is located and use the following command:
ovhai data upload <region> <container> <paths>
The first step will consist in creating a Jupyter Notebook with OVHcloud AI Notebooks.
First, you have to install the OVHAI CLI then choose the name of the notebook (<notebook-name>
), the Hugging Face image (huggingface-transformers
) and the number of GPUs (<nb-gpus>
) to use on your notebook. You can also attach your data, previously stored in the object storage (<container@region/prefix:mount_path:permission>
) and use the following command:
ovhai notebook run huggingface-transformers jupyterlab \
--name <notebook-name> \
--gpu <nb-gpus>
--volume <container@region/prefix:mount_path:permission>
Once the repository has been cloned, find the notebook(s) of your choice.
ai-training-examples
> notebooks
> natural-language-processing
> text-classification
> hugging-face
> sentiment-analysis-twitter
> CamemBERT
> hugging_face_camembert_sentiment_analysis_tweets.ipynb
ai-training-examples
> notebooks
> natural-language-processing
> text-classification
> hugging-face
> sentiment-analysis-twitter
> BARThez
> hugging_face_barthez_sentiment_analysis_tweets.ipynb
ai-training-examples
> notebooks
> natural-language-processing
> text-classification
> hugging-face
> sentiment-analysis-twitter
> BERT
> hugging_face_bert_sentiment_analysis_tweets.ipynb
Instructions are directly shown inside the notebooks. You can run them with the standard "Play" button inside the notebook interface.
Testing 3 models...
Théophile Blard, French sentiment analysis with BERT, (2020), GitHub repository
Tweets are divided into 2 classes according to their sentiment: positive or negative.
Eddine, Moussa Kamal and Tixier, Antoine J-P and Vazirgiannis, Michalis, BARThez: a Skilled Pretrained French Sequence-to-Sequence Model, (2020), GitHub repository
Tweets are divided into 2 classes according to their sentiment: positive or negative.
Refer to NLP Town
Tweets are divided into 5 classes, from 1 to 5 stars, according to their sentiment: 1 star corresponds to a very negative tweet while 5 stars correspond to a very positive tweet.
Previously, we have tested 3 Hugging Face models based on BARThez, BERT and camemBERT. Two of them can be compared on our dataset: BARThez and CamemBERT.
It is possible to process our data manually and compare our results with the predictions of the models. Then, we will be able to display the success rate of the models to see which one was the best on our dataset.
The confusion matrix will also give us information about false positives or false negatives.
Success rate: 87.02 %
Success rate: 78.63 %
To sum up, we find that the results of these pre-trained models are satisfying. We note that the performance of the BARThez-based model is better on our dataset than the CamemBERT-based model.
However, it depends on the dataset you use.
Do not hesitate to test several models!
A preview of the three notebooks can be found on GitHub here.
Please send us your questions, feedback and suggestions to improve the service:
Prima di inviare la valutazione, proponici dei suggerimenti per migliorare la documentazione.
Immagini, contenuti, struttura... Spiegaci perché, così possiamo migliorarla insieme!
Le richieste di assistenza non sono gestite con questo form. Se ti serve supporto, utilizza il form "Crea un ticket" .
Grazie per averci inviato il tuo feedback.
Accedi al tuo spazio nella Community Fai domande, cerca informazioni, pubblica contenuti e interagisci con gli altri membri della Community OVHcloud
Discuss with the OVHcloud community