Data2vec

8/8/2023

Directly predicting representations is not straightforward, and it requires defining a robust normalization of the features for the task that would be reliable in different modalities. Data2vec, the first high-performance self-supervised algorithm that learns in the same way for speech, vision, and text. This removes the dependence on modality-specific targets in the learning task. A single algorithm can work with completely different types of input. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture, told arXiv.Īccording to a post in Meta’s blog, data2vec is simplifying the different algorithms functioning by training models to predict their own representations of the input data, regardless of the modality. And that is compared to a model that already uses self-supervised learning. It achieves higher accuracy with one hour of audio training data than wav2vec achieves with 10h of audio training data. Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language (Baevski et al. 2 likes, 0 comments - Towards AI (towardsai) on Instagram: Metas Data2vec is a New Self-Supervised Model that Works for Speech, Vision, and Text Author(. Yan LeCun argues that SSL is the key to artificial general intelligence (AGI). In addition, Data2vec demonstrates how SSL is capable of working across different domains. Instead of predicting modality-specific targets such as words, visual tokens, or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input.

When evaluated on common benchmarks, models trained using data2vec perform as well as or better than state-of-the-art models trained with modality-specific objectives, noted InfoQ.ĭata2vec is a framework that uses the same learning method for either speech, NLP or computer vision. Despite not being developed specifically for audio, data2vec outperforms other self-supervised methods like wav2vec. Data2vec is an approach where a single neural network is capable of processing different modalities (after appropriate pre-processing) and retains information about the data. Request PDF data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language While the general idea of self-supervised learning is identical across modalities, the. Data2vec is the first framework for SSL that works for different modalities. Meta AI recently open-sourced data2vec, a unified framework for self-supervised deep learning on images, text, and speech audio data.

0 Comments

Data2vec

Leave a Reply.

Author

Archives

Categories