Album Generator

Final writeup: https://docs.google.com/document/d/10MCC2Ao4J1ZkdynuzIISE_GtXBolavjqvsTlLhFNU10/edit# Album Cover Generation

Demetri – jtsatsar Daniel Flores - dflores3 Benjamin Smith- bsmith28

https://www.researchgate.net/publication/318987126_Album_Cover_Generation_from_Genre_Tags

Introduction: What problem are you trying to solve and why? Our project aims to use Generative Adversarial Neural Network to generate rich, expressive album cover artwork based on genre descriptors. Cover art is an important part of an album that complements the musical experience. There are thousands of independent musicians creating new albums every day, many of whom cannot commission an artist to create an album cover. Computer generated art can provide a simple, accessible alternative. Also art is cool.

The paper titled Album Cover Generator from Genre Tags, creates unique images that reflect the characteristics of a song, based on genre labels. This paper was inspired by a desire to replicate the multi-sensual experience that having visual effects alongside music can have, and this same desire also inspired us to choose this paper to reimplement. Music and art are human creative outlets, and we want to remove the limitations that prevent musicians from having an album cover by creating a network that can easily do it.

In this project, the generative model aspect that uses a Generative Adversarial Networks is an unsupervised learning problem, as we are trying to generate images rather than predicting anything. This paper also includes a discriminator, which is a classification problem, as it’s job is to classify images based on genre tags.

Data: The model will be pre-trained on unlabeled data from the One Million Audio Cover Images for Research (OMACIR) dataset. We plan on collecting labeled data from either Bandcamp or Napster’s API to train the model.

Methodology: The model is a Deep Convolutional Generative Adversarial Network. The generator consists of 1 fully-connected layer and 4 deconvolutional layers. The discriminator consists of 4 convolutional layers and 3 fully-connected layers. The model will first be pre-trained on the OMACIR dataset to decrease overfitting. Then, it will be trained on a labeled dataset from a music API so that album covers can be generated when given some feature about the album, e.g. its genre. The hardest part about implementing the model will probably be collecting the data from API’s and pre-processing it.

Metrics: The model’s discriminator can be tested on data from an API to determine what percent of the time it can correctly label an album cover’s genre. The generator can be tested by generating album covers for each of the genre’s and measuring the accuracy of the discriminator on these generated album covers. If the discriminator gets a similar accuracy on the generated album covers as with the album covers from the API, then it will be a good sign that the generator is performing well. Also, we can visualize the album covers output by our generator and judge its performance for ourselves.

Base goal: creating a model which can create any sort of image Target goal: creating a model which can create album covers for a given genre which plausibly match that genre Stretch goal: creating a model which can create album covers given a phrase which plausibly matches that phrase

Ethics: Computer generated art raises questions about ownership of technology. For instance, in 2018, a GAN-generated piece of artwork sold for over $400,000 at a Christie’s auction. The work was created using code from an open source developer who received no compensation from the sale. There is nothing blatantly unethical about this transaction, as the developer chose to release his code for free, but a case could be made that the art was created by him and therefore belongs to him, at least in part. This, in turn, raises questions about the nature of art - is the ‘artist’ the one who physically creates the art, or the one who decided to have the art made, giving context and meaning to it? Normally, this would all be part of the same process from a person or group, but with computers, it becomes an issue.

Sources: https://supervisorconnect.it.monash.edu/projects/research/ethics-ai-art https://www.theverge.com/2018/10/23/18013190/ai-art-portrait-auction-christies-belamy-obvious-robbie-barrat-gans

The dataset is a large collection of album covers and descriptors across many genres and decades. Among these album covers, there is almost certainly going to be some that have content that is harmful and offensive to any people, including racist content, violence, sexual material, and more. Therefore, it is possible that our model will generate similar art. This raises concerns about how to detect and prevent generation of such images, how to decide what to prevent or remove, and the possible harm caused from what cannot be caught.

What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?

How are you planning to quantify or measure error or success? What implications does your quantification have?

Division of labor: Briefly outline who will be responsible for which part(s) of the project. We will all work on building the model together. Demetri will collect the data from API’s. Dan will preprocess the data. Ben will create a visualizer for the albums

Built With

python

Updates

Demetri Tsatsaros posted an update — Nov 23, 2020 09:26 PM EST

Introduction: The paper titled Album Cover Generator from Genre Tags, creates unique images that reflect the characteristics of a song, based on genre labels. This paper was inspired by a desire to replicate the multi-sensual experience that having visual effects alongside music can have, and this same desire also inspired us to choose this paper to reimplement. Music and art are human creative outlets, and we want to remove the limitations that prevent musicians from having an album cover by creating a network that can easily do it.

Challenges: One challenge that we faced was understanding the concepts that were presented in the paper and used to create the model architecture. This was our first introduction to GANs, Deep Convolutional GANs, Auxiliary Classifier GANs, and multi-scale structural similarity, so we had to research and learn about these things in order to understand the paper. Another challenge was becoming familiar with PyTorch, which none of us had previous experience with.

Insights: We have already preprocessed the data from Napster’s API, which involved connecting to the API to get image urls, and preprocessing and normalizing the data. We have already built much of the model’s discriminator in PyTorch but we are still debugging it due to our unfamiliarity with the library.

Plan: So far, we are on track with our project but will need to dedicate more time to thoroughly understanding the research paper we are basing our project on. Though we have a high level understanding of the concepts and models used, we will have to gain a deeper understanding to implement the model correctly. This, along with just learning PyTorch, has become a priority for us to dedicate more time to, which will help us debug further. Also, we may change the categories that the model divides the album covers on, as the paper uses tags other than genres that are unique to the Spotify data set. In addition to genre we may also leave the option open to train data on years/decades that the album was produced in, to increase the variety of album outputs and to try something that the paper did not.

Log in or sign up for Devpost to join the conversation.

Demetri Tsatsaros started this project — Nov 15, 2020 01:14 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.