Final writeup: https://docs.google.com/document/d/10MCC2Ao4J1ZkdynuzIISE_GtXBolavjqvsTlLhFNU10/edit# Album Cover Generation
Demetri – jtsatsar Daniel Flores - dflores3 Benjamin Smith- bsmith28
https://www.researchgate.net/publication/318987126_Album_Cover_Generation_from_Genre_Tags
Introduction: What problem are you trying to solve and why? Our project aims to use Generative Adversarial Neural Network to generate rich, expressive album cover artwork based on genre descriptors. Cover art is an important part of an album that complements the musical experience. There are thousands of independent musicians creating new albums every day, many of whom cannot commission an artist to create an album cover. Computer generated art can provide a simple, accessible alternative. Also art is cool.
The paper titled Album Cover Generator from Genre Tags, creates unique images that reflect the characteristics of a song, based on genre labels. This paper was inspired by a desire to replicate the multi-sensual experience that having visual effects alongside music can have, and this same desire also inspired us to choose this paper to reimplement. Music and art are human creative outlets, and we want to remove the limitations that prevent musicians from having an album cover by creating a network that can easily do it.
In this project, the generative model aspect that uses a Generative Adversarial Networks is an unsupervised learning problem, as we are trying to generate images rather than predicting anything. This paper also includes a discriminator, which is a classification problem, as it’s job is to classify images based on genre tags.
Data: The model will be pre-trained on unlabeled data from the One Million Audio Cover Images for Research (OMACIR) dataset. We plan on collecting labeled data from either Bandcamp or Napster’s API to train the model.
Methodology: The model is a Deep Convolutional Generative Adversarial Network. The generator consists of 1 fully-connected layer and 4 deconvolutional layers. The discriminator consists of 4 convolutional layers and 3 fully-connected layers. The model will first be pre-trained on the OMACIR dataset to decrease overfitting. Then, it will be trained on a labeled dataset from a music API so that album covers can be generated when given some feature about the album, e.g. its genre. The hardest part about implementing the model will probably be collecting the data from API’s and pre-processing it.
Metrics: The model’s discriminator can be tested on data from an API to determine what percent of the time it can correctly label an album cover’s genre. The generator can be tested by generating album covers for each of the genre’s and measuring the accuracy of the discriminator on these generated album covers. If the discriminator gets a similar accuracy on the generated album covers as with the album covers from the API, then it will be a good sign that the generator is performing well. Also, we can visualize the album covers output by our generator and judge its performance for ourselves.
Base goal: creating a model which can create any sort of image Target goal: creating a model which can create album covers for a given genre which plausibly match that genre Stretch goal: creating a model which can create album covers given a phrase which plausibly matches that phrase
Ethics: Computer generated art raises questions about ownership of technology. For instance, in 2018, a GAN-generated piece of artwork sold for over $400,000 at a Christie’s auction. The work was created using code from an open source developer who received no compensation from the sale. There is nothing blatantly unethical about this transaction, as the developer chose to release his code for free, but a case could be made that the art was created by him and therefore belongs to him, at least in part. This, in turn, raises questions about the nature of art - is the ‘artist’ the one who physically creates the art, or the one who decided to have the art made, giving context and meaning to it? Normally, this would all be part of the same process from a person or group, but with computers, it becomes an issue.
Sources: https://supervisorconnect.it.monash.edu/projects/research/ethics-ai-art https://www.theverge.com/2018/10/23/18013190/ai-art-portrait-auction-christies-belamy-obvious-robbie-barrat-gans
The dataset is a large collection of album covers and descriptors across many genres and decades. Among these album covers, there is almost certainly going to be some that have content that is harmful and offensive to any people, including racist content, violence, sexual material, and more. Therefore, it is possible that our model will generate similar art. This raises concerns about how to detect and prevent generation of such images, how to decide what to prevent or remove, and the possible harm caused from what cannot be caught.
What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?
How are you planning to quantify or measure error or success? What implications does your quantification have?
Division of labor: Briefly outline who will be responsible for which part(s) of the project. We will all work on building the model together. Demetri will collect the data from API’s. Dan will preprocess the data. Ben will create a visualizer for the albums


Log in or sign up for Devpost to join the conversation.