Our comprehensive final write-up can be found at the link below: https://docs.google.com/document/d/1fjfgPrKO8JK0WgU6e6kFm1oFqGkueevF7CLuKE0gXpU/edit?usp=sharing
Title
DeepAnimation
Who
Zichuan Wang, Tongyu Zhou, Yicheng Shi, Jing Qian
Introduction
We propose DeepAnimation, a vector-based generation model that predicts the next possible keyframes for animation given a SVG input in real-time. The goal is to help users create web-friendly, scalable, and lightweight animations by recommending exemplars from existing commercial-grade ones about “where,” “how,” or “when” should the animation occur. We achieve this via SVG to GIF search using a VAE network. This model helps users unfamiliar with underlying parametric concepts in creating vectorized images to swiftly iterate on animations. It can be used to fabricate web posters and other graphical elements or to enrich the virtual conversation experience. We additionally present a GUI demonstrating how recommendations can be made as the user is drawing.
Related Work
Animating vector graphics over web pages lets end-users modify graphical contents more easily. Unlike raster images, vector graphics are typically smaller in file size with high quality and are robust to scaling. These benefits allow them to be widely used for representing charts, icons, maps, clip arts across different devices over the internet. Vector graphics can be further animated by using (typically Javascript) code to update their vertex positions. For example, one can draw a virtual flower first then animate it by translating or rotating its vertices. However, such a method fails when topology changes. Recent work by Dalstein et al. addresses this issue by introducing Vector Animation Complex (VAC).
Due to the scalability, compactness, and ease with which SVGs can be queried (due to its DOM specification), recent papers have focused on learning-based models to enhance ways to author and manipulate SVGs. Carlier et al. proposed DeepSVG for complex SVG icons generation and interpolation to in order to create smooth transitions between two disparate SVGs. This architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. Similarly, Lopes et al. devised another generative strategy with SVG-VAE, which uses a variational autoencoder to learn latent representations of vector-based fonts and exploits this representation to perform style propagation. As these prior two methods both require explicit vector graphics supervision, Reddy et al. more recently introduced a new neural network, Im2Vec, which can generate complex vector graphics with only raster-based training data. Other learning approaches for SVG include Bahari et al's SVG-Net, a transformer-based model that represents the information of a scene (e.g., rotation of wheels) using SVG. This SVG-Net can predict the movement of the scene described by the path in the SVG input.
Data
We collected gif and svg data from icons8.com by web scraping. Our dataset includes about 1500 gif and corresponding svg images (the first frame) as well as the group information that indicates some properties of the image, e.g., color, windows-10, UI, etc. To feed the data into our network, we may need to extract each frame from gif based on a specified frame rate. Also, we need to resize and rescale images to the same size. To improve the robustness of our model, we may apply transformation to the data, e.g. by cropping, shearing, etc. These modified frames will then be vectorized into SVG format to feed into the model.
Methodology
Our model will take a vector image as input and generate a series of possible subsequent keyframes (also in vector) that contains the possible movement of elements in the scene. For example, if a user draws a face, our model can make the face gradually smile. To do this, we may need to train a GAN to learn the distribution of our dataset. Given the first frame, we will subsequent frames extracted from the gif as a label and the GAN will learn what movements are possible for the scene. Similar to style transfer tasks, possible loss functions are content loss and style loss. In the end, for an unseen input image, the GAN will predict the possible changing patterns and users can modify the output based on some parameters in the latent space.
Metrics
During training, we can compare the generated vector keyframes to the raster keyframes from the gif. We may use content loss and style loss to evaluate the visual effect of the output image. But to be more flexible, we may also evaluate the visual effects by human judgement.
- Base goal
We will train a model that can generate vector-based keyframes based on the input SVG as the first frame.
- Target goal
Hopefully our model can generate smooth vectorized animations with artistic style.
- Stretch goal
We will implement a GUI for others to try our model if everything works.
Ethics
- Why is Deep Learning a good approach to this problem?
Because the latent space of a GAN is very diverse, we can use this property to help designers create new vectorized animations. Sometimes designers have their own stereotypes and may find it hard to create new artistic movements everyday. However, we can leverage deep models to learn from existing patterns and create new patterns.
- If there is an issue about your algorithm you would like to discuss or explain further, feel free to do so.
There may be bias in the data we use. For example, there are many icons related to windows in our icon dataset. Therefore, the model may show a bias towards windows icons when the input image is about electronic devices. A broader issue would be that if the data used by others have a bias towards gender or race, then the trained model may inherit that bias.
Division of labor
Jing Qian
- writing proposal
- model architecture design
- writing code for model structure
- model training
- parameter finetuning
- writing code for visualization and interactive demo
- writing report paper
Tongyu Zhou
- writing proposal
- model architecture design
- writing code for model structure
- model training
- parameter finetuning
- writing code for visualization and interactive demo
- writing report paper
Zichuan Wang
- writing proposal
- model architecture design
- web scraping
- writing code for model structure
- writing code for testing
- model training
- parameter finetuning
- writing report paper
Yicheng Shi
- writing proposal
- model architecture design
- writing code for data loading
- writing code for model structure
- writing code for training
- model training
- parameter finetuning
- writing report paper
Built With
- python
- tensorflow



Log in or sign up for Devpost to join the conversation.