Diff-SSL-G-Comp: Towards a Large-Scale and Diverse Dataset
for Virtual Analog Modeling

Yicheng Gu1,2,★ Runsong Zhang2,★ Lauri Juvela1 Zhizheng Wu2
1 Acoustic Lab, Department of Information and Communication Engineering, Aalto University, Espoo, Finland 2 School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, China
Equal Contribution
Abstract

Virtual Analog (VA) modeling aims to simulate the behavior of hardware circuits via algorithms to replicate their tone digitally. Dynamic Range Compressor (DRC) is an audio processing module that controls the dynamics of a track by reducing and amplifying the volumes of loud and quiet sounds, which is essential in music production. In recent years, neural-network-based VA modeling has shown great potential in producing high-fidelity models. However, due to the lack of data quantity and diversity, their generalization ability in different parameter settings and input sounds is still limited. To tackle this problem, we present Diff-SSL-G-Comp, the first large-scale and diverse dataset for modeling the SSL 500 G-Bus Compressor. Specifically, we manually collected 175 unmastered songs from the Cambridge Multitrack Library. We recorded the compressed audio in 220 parameter combinations, resulting in an extensive 2528-hour dataset with diverse genres, instruments, tempos, and keys. Moreover, to facilitate the use of our proposed dataset, we conducted benchmark experiments in various open-sourced black-box and grey-box models, as well as white-box plugins. We also conducted ablation studies in different data subsets to illustrate the effectiveness of improved data diversity and quantity. The dataset and demos are on our project page: http://www.yichenggu.com/DiffSSLGComp/.

Diff-SSL-G-Comp

Overview

Diff-SSL-G-Comp is constructed from processing 175 unmastered songs from the Cambridge Multitrack Library with 220 different parameter conbinations. It comprises 2258 hours of processed data with diverse genres, instruments, tempos, and keys, as illustrated below.


The statistics of the unmastered songs used as input signals by genres and instruments are shown below.

The statistics of the unmastered songs used as input signals by tempos and keys are shown below.


The figure below compares the acoustic and semantic diversities between Diff-SSL-G-Comp and existing datasets, which are sourced mainly from noises and test signals. The more scattered pattern in the cluster representing the real-world recordings highlights our dataset as encompassing a richer acoustic characteristic and semantic coverage than the existing datasets.


Data Preview

To better understand the diversity and quality of the dataset, we have sampled a few examples below for preview.

{{ item.name }}
{{ item.name }}
{{ item.parameter }}

Demos

In this section, we demonstrate the virtual analog modeling performance of the representative black-box and grey-box models trained on Diff-SSL-G-Comp, as well as samples generated by white-box commercial plugins.

All the experimental checkpoints demonstrated in the paper can be found on our Google Drive, and all the samples generated by the commercial plugins can be found on the Hugging Face dataset page. Model configurations and usage guides are also attached. We highly recommend that researchers run these experiments by hand since it is usually hard to hear the significant difference in compression models.

Samples generated by black-box and grey-box models
{{ item.name }}
Samples generated by white-box commercial plugins
{{ item.name }}