Title | Lecture Notes in Computer ScienceComputer Vision – ECCV 2016Ambient Sound Provides Supervision for Visual Learning |
Publication Type | Conference Proceedings |
Year of Publication | 2016 |
Authors | Owens, A, Isola, P, McDermott, JH, Freeman, WT, Torralba, A |
Conference Name | 14th European Conference on Computer Vision |
Pagination | 801 - 816 |
Date Published | 10/2016 |
Conference Location | Cham |
ISBN Number | 978-3-319-46447-3 |
ISBN | 0302-9743 |
Keywords | convolutional networks, Sound, unsupervised learning |
Abstract | The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds. |
URL | http://link.springer.com.ezproxy.canberra.edu.au/10.1007/978-3-319-46448-0 |
DOI | 10.1007/978-3-319-46448-010.1007/978-3-319-46448-0_48 |
Research Area:
CBMM Relationship:
- CBMM Funded