Ashish Patel 🇮🇳’s Post

𝗗𝗮𝘆-𝟮𝟭𝟵 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗗𝗮𝗿𝗸𝗚𝗔𝗡: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs by Sony Computer Science Laboratories (CSL), Paris, France Follow me for a similar post:  🇮🇳 Ashish Patel Interesting Facts : 🔸 This is a paper in ISMIR2021 with over 1 citations. ------------------------------------------------------------------- 𝗔𝗺𝗮𝘇𝗶𝗻𝗴 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 : https://lnkd.in/eSnUHzRk ------------------------------------------------------------------- 𝗜𝗠𝗣𝗢𝗥𝗧𝗔𝗡𝗖𝗘 🔸 Generative Adversarial Networks (GANs) have achieved excellent audio synthesis quality in the last years. However, making them operable with semantically meaningful controls remains an open challenge. 🔸An obvious approach is to control the GAN by conditioning it on metadata contained in audio datasets. Unfortunately, audio datasets often lack the desired annotations, especially in the musical domain. 🔸A way to circumvent this lack of annotations is to generate them, for example, with an automatic audio-tagging system. The output probabilities of such systems (so-called "soft labels") carry rich information about the characteristics of the respective audios and can be used to distill the knowledge from a teacher model into a student model. 🔸In this work, we perform knowledge distillation from a large audio tagging system into an adversarial audio synthesizer that we call DarkGAN. 🔸Results show that DarkGAN can synthesize musical audio with acceptable quality and exhibits moderate attribute control even with out-of-distribution input conditioning. We release the code and provide audio examples on the accompanying website. #computervision #artificialintelligence #data

  • No alternative text description for this image

To view or add a comment, sign in

Explore content categories