Arctic Monkeys Lyrics Generator with Data Augmentation

AM : Don’t believe the hype. 
AI : Well..


Text generators are pretty cool right? When I first came across something similar to a ‘Shakespeare-generator’ around two years back I was awestruck.

Generally a Text-Generator will be a language model with a Recurrent Neural Network or LSTM and try to predict the next words based on the previous seed-words.

So I decided to create a lyric generator based on Arctic Monkeys lyrics. The idea was divided in three major components

  • Creating a Data Corpus and Cleaning the Data
  • Text Data Augmentation
  • Language Models and Generators

Creating Data Corpus

I found the blog here by Jonathan Dayton really helpful. It uses spotifyAPI for getting Artist’s ID on spotify, list all the album ID’s, get a list of all the track ID’s. Then all the lyrics of the songs are saved using GeniusAPI. Here’s some code snippet for getting the data.

Dependencies :

  • lyricsgenius

Text Augmentation

The dataset consisted of 144 songs which is 167887 words. I really wanted to make a comment about the number of songs Alex has written and these don’t even include the songs from the last shadow puppets and his solo album — I am getting distracted!

Given the dataset isn’t as large as expected for a language modelling task text augmentation could be applied.

The two types of text augmentation that were used here were

  • Substitution — Replaces the current word with that is generally predicted by the language model.
  • Insertion — Uses the words as features for predicting the next word.

I used nlpaug for this and a really good overview can be found in this article — Data Augmentation library for text by Edward Ma
nlpaug has character, word and flow augmenters. To generate synthetic data for lyrics I believe using word level models was more beneficial and flow augmenters like ‘naf.sequential’ is used to sequentially apply different augmentations.

I used two types of augmentation — BertAug and FasttextAug. They both insert/substitute similar words based on context. BertAug uses BERT language model for predicting the replaced word or predict the next word in case of insertion. FasstextAug replaces or inserts the words based on contextualised word embeddings.

Results after BERTAug insert and substitute

in : there is always somebody taller with more of a wit
out: it is always somebody taller with more of temper wit

weeeirrrdddd.. but sounds about right.

Results after FasttextAug insert and substitute

in : there is always somebody taller with more of a wit
out: There is invariably somebody tall with more of a wit

Also an interesting thing that happened, there were no ValueError exceptions for unknown words for FasttestAug because of sub-word embeddings — I used wiki-news-300d-1M-subword.vec for loading the model —

Except for — well — “i.d.s.t. i.d.s.t. i.d.s.t i.d.s.t” , “choo-choo! choo-choo! choo-choo!”and “shoo-wop shoo-wop shoo-wop”. I honestly don’t blame it.

After augmentation there were 334524 words in the corpus. That means the new data is twice the original data.

Creation of Augmented dataset did take quite some time. (around one hourish) I do have the .txt file of final corpus uploaded on google drive.

LSTM Model

An ideal text generation model will take in a seed word/sentence and given a history of words w0, …, wk it will predict the next wn+p words. Because Recurrent Neural Networks and LSTMs have memory they compute the next observation based on previous state.

LSTMs are special because they have input, output and forget gate, and cell memory. Due to this they have an ability to store information over a longerish time interval. Here I used an LSTM with 0.5 dropout and 0.5 recurrent dropout.

Currently there are non-recursive models which perform really well in language modelling using transformers like OpenAIs GPT-2 text generation model.



Fine-Tuning using OpenAI’s GPT-2

I used gpt_2_simple — “A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI’s GPT-2 text generation model”

The original dataset with BERTAug and FasttextAug were combined and exported to text to form am_corpus.txt

Results with Random Prefix:

Generated Lyrics

For evaluation of this result I used ROUGE . It stands for Recall Oriented Understudy for Gisting Evaluation. I found — What Is ROUGE And How It Works For Evaluation Of Summarization Tasks? — really helpful for understanding ROUGE.

Essentially the lyrics generated by the GPT-2 Model are more meaningful than compared to the one the LSTM model! Though to be fair, the LSTM model didn’t get a fighting chance with just 5 epochs to train.


Edit : Added evaluation metrics and more results.

Leave a Reply

%d bloggers like this: