Our comprehensive final write-up can be found at the link below: https://docs.google.com/document/d/1fjfgPrKO8JK0WgU6e6kFm1oFqGkueevF7CLuKE0gXpU/edit?usp=sharing

Title

DeepAnimation

Who

Zichuan Wang, Tongyu Zhou, Yicheng Shi, Jing Qian

Introduction

We propose DeepAnimation, a vector-based generation model that predicts the next possible keyframes for animation given a SVG input in real-time. The goal is to help users create web-friendly, scalable, and lightweight animations by recommending exemplars from existing commercial-grade ones about “where,” “how,” or “when” should the animation occur. We achieve this via SVG to GIF search using a VAE network. This model helps users unfamiliar with underlying parametric concepts in creating vectorized images to swiftly iterate on animations. It can be used to fabricate web posters and other graphical elements or to enrich the virtual conversation experience. We additionally present a GUI demonstrating how recommendations can be made as the user is drawing.

Related Work

Animating vector graphics over web pages lets end-users modify graphical contents more easily. Unlike raster images, vector graphics are typically smaller in file size with high quality and are robust to scaling. These benefits allow them to be widely used for representing charts, icons, maps, clip arts across different devices over the internet. Vector graphics can be further animated by using (typically Javascript) code to update their vertex positions. For example, one can draw a virtual flower first then animate it by translating or rotating its vertices. However, such a method fails when topology changes. Recent work by Dalstein et al. addresses this issue by introducing Vector Animation Complex (VAC).

Due to the scalability, compactness, and ease with which SVGs can be queried (due to its DOM specification), recent papers have focused on learning-based models to enhance ways to author and manipulate SVGs. Carlier et al. proposed DeepSVG for complex SVG icons generation and interpolation to in order to create smooth transitions between two disparate SVGs. This architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. Similarly, Lopes et al. devised another generative strategy with SVG-VAE, which uses a variational autoencoder to learn latent representations of vector-based fonts and exploits this representation to perform style propagation. As these prior two methods both require explicit vector graphics supervision, Reddy et al. more recently introduced a new neural network, Im2Vec, which can generate complex vector graphics with only raster-based training data. Other learning approaches for SVG include Bahari et al's SVG-Net, a transformer-based model that represents the information of a scene (e.g., rotation of wheels) using SVG. This SVG-Net can predict the movement of the scene described by the path in the SVG input.

Data

We collected gif and svg data from icons8.com by web scraping. Our dataset includes about 1500 gif and corresponding svg images (the first frame) as well as the group information that indicates some properties of the image, e.g., color, windows-10, UI, etc. To feed the data into our network, we may need to extract each frame from gif based on a specified frame rate. Also, we need to resize and rescale images to the same size. To improve the robustness of our model, we may apply transformation to the data, e.g. by cropping, shearing, etc. These modified frames will then be vectorized into SVG format to feed into the model.

Methodology

Our model will take a vector image as input and generate a series of possible subsequent keyframes (also in vector) that contains the possible movement of elements in the scene. For example, if a user draws a face, our model can make the face gradually smile. To do this, we may need to train a GAN to learn the distribution of our dataset. Given the first frame, we will subsequent frames extracted from the gif as a label and the GAN will learn what movements are possible for the scene. Similar to style transfer tasks, possible loss functions are content loss and style loss. In the end, for an unseen input image, the GAN will predict the possible changing patterns and users can modify the output based on some parameters in the latent space.

Metrics

During training, we can compare the generated vector keyframes to the raster keyframes from the gif. We may use content loss and style loss to evaluate the visual effect of the output image. But to be more flexible, we may also evaluate the visual effects by human judgement.

  • Base goal

We will train a model that can generate vector-based keyframes based on the input SVG as the first frame.

  • Target goal

Hopefully our model can generate smooth vectorized animations with artistic style.

  • Stretch goal

We will implement a GUI for others to try our model if everything works.

Ethics

  • Why is Deep Learning a good approach to this problem?

Because the latent space of a GAN is very diverse, we can use this property to help designers create new vectorized animations. Sometimes designers have their own stereotypes and may find it hard to create new artistic movements everyday. However, we can leverage deep models to learn from existing patterns and create new patterns.

  • If there is an issue about your algorithm you would like to discuss or explain further, feel free to do so.

There may be bias in the data we use. For example, there are many icons related to windows in our icon dataset. Therefore, the model may show a bias towards windows icons when the input image is about electronic devices. A broader issue would be that if the data used by others have a bias towards gender or race, then the trained model may inherit that bias.

Division of labor

Jing Qian

  • writing proposal
  • model architecture design
  • writing code for model structure
  • model training
  • parameter finetuning
  • writing code for visualization and interactive demo
  • writing report paper

Tongyu Zhou

  • writing proposal
  • model architecture design
  • writing code for model structure
  • model training
  • parameter finetuning
  • writing code for visualization and interactive demo
  • writing report paper

Zichuan Wang

  • writing proposal
  • model architecture design
  • web scraping
  • writing code for model structure
  • writing code for testing
  • model training
  • parameter finetuning
  • writing report paper

Yicheng Shi

  • writing proposal
  • model architecture design
  • writing code for data loading
  • writing code for model structure
  • writing code for training
  • model training
  • parameter finetuning
  • writing report paper

Built With

Share this project:

Updates

posted an update

Check in for Nov 30, 2021

Introduction:

We propose DeepAnimation, a vector-based generation model that predicts the next possible keyframes for animation given a SVG input in real-time. The goal is to help users create web-friendly, scalable, and lightweight animations by recommending exemplars from existing commercial-grade ones about “where”, “how”, or “when” should the animation occur. This model helps users unfamiliar with underlying parametric concepts in creating vectorized images to swiftly iterate on animations. It can be used to fabricate web posters and other graphical elements or to enrich the virtual conversation experience. We additionally present a GUI demonstrating how recommendations can be made as the user is drawing.

Challenges: What has been the hardest part of the project you’ve encountered so far?

The hardest part of the project is how do we control the latent vector and to get meaningful animation generation. If we can interpret the direction of change, such as those used in age editing, we would be able to control the latent vector in our favor. Currently, our model pipeline takes in a user-created SVG file on our web-based interface (see the image below). Once a user creates a drawing, it is transformed into a point in high dimensional space. Then, we generate a circular path that goes through this high dimensional point. This circular path serves as an agent to create looping animations. Finally, the decoder would take a fixed number (N) of points uniformly sampled from the high dimensional circle to generate a sequential image as the final result.

Insights: Are there any concrete results you can show at this point?

We pre-trained a MNIST network that expands the variational autoencoder base code used in our assignment. Preliminary results show that transitions between the animations are smooth, indicating that the hyper dimensional circle provides a recognizable, sequential pattern from the user inputs. Here is the link of the suggested animation from a user sketching a number “5” (Result showing transition from a user-input sketch of “5” to automatically generated “3”)

How is your model performing compared with expectations?

For our generative model, we rely on visualization and human judgement for comparison. The model fits our expectation in terms of transition and encoder-decoder for customized input. However, the model still lacks the ability to consider the relationship between the input shape and the output animation at this stage.

Are you on track with your project?

We are a bit lagged behind the progress since there is not so much preliminary work in this area. We are trying to make great innovations in this area.

What do you need to dedicate more time to? What are you thinking of changing, if anything?

We need to spend more time on the core model development in terms of inferring animation from the input svg.

Log in or sign up for Devpost to join the conversation.