|
| 1 | +# Loudness Normalization Analysis & Recommendations |
| 2 | + |
| 3 | +## Current Issue |
| 4 | +Processed audio from all three test files exhibits: |
| 5 | +- **Hot/slightly distorted output** - audible harshness |
| 6 | +- **Clipping at points** - digital distortion |
| 7 | +- **Over-aggressive processing** - despite close level matching between presenters |
| 8 | + |
| 9 | +## Root Cause Analysis |
| 10 | + |
| 11 | +### Current Processing Chain |
| 12 | +``` |
| 13 | +highpass → adeclick → afftdn → agate → acompressor → dynaudnorm → alimiter |
| 14 | +``` |
| 15 | + |
| 16 | +### The Problem: Dynaudnorm Adaptive Tuning |
| 17 | + |
| 18 | +The current adaptive tuning for `dynaudnorm` is **too aggressive**: |
| 19 | + |
| 20 | +1. **Target RMS Conversion** - Converting LUFS to linear RMS value: |
| 21 | + ```go |
| 22 | + targetLUFS := config.TargetI // -16.0 LUFS |
| 23 | + targetDBFS := targetLUFS + 23.0 // ~+7.0 dBFS (!!) |
| 24 | + config.DynaudnormTargetRMS = math.Pow(10, targetDBFS/20.0) // Very hot target |
| 25 | + ``` |
| 26 | + **Issue**: The +23dB LUFS→dBFS conversion is approximate and creates an overly hot target |
| 27 | + |
| 28 | +2. **Maximum Gain Based on Quiet Input**: |
| 29 | + ```go |
| 30 | + if measurements.InputI < -40.0 { |
| 31 | + config.DynaudnormMaxGain = 25.0 // Allow very high gain |
| 32 | + ``` |
| 33 | + **Issue**: 25x gain (28dB) is excessive and can amplify noise/artifacts |
| 34 | +
|
| 35 | +3. **Aggressive Compression**: |
| 36 | + ```go |
| 37 | + if measurements.InputLRA > 15.0 { |
| 38 | + config.DynaudnormCompress = 7.0 // Mild compression |
| 39 | + ``` |
| 40 | + **Issue**: Even "mild" compression (7.0) on highly dynamic content causes pumping |
| 41 | +
|
| 42 | +4. **No Noise Floor Protection**: |
| 43 | + - Threshold is derived from noise floor but may still normalize quiet passages too aggressively |
| 44 | + - Creates distortion when trying to match hot RMS targets |
| 45 | +
|
| 46 | +--- |
| 47 | +
|
| 48 | +## Recommended Solution: Loudnorm + Dynaudnorm Combination |
| 49 | +
|
| 50 | +### Strategy |
| 51 | +Use **both** filters in sequence with complementary roles: |
| 52 | +1. **Loudnorm** - Gentle, standards-compliant LUFS normalization |
| 53 | +2. **Dynaudnorm** - Fine-tuning for consistent perceived loudness across segments |
| 54 | +
|
| 55 | +### New Filter Order |
| 56 | +``` |
| 57 | +highpass → adeclick → afftdn → agate → acompressor → deesser → loudnorm → dynaudnorm → alimiter |
| 58 | +``` |
| 59 | +
|
| 60 | +**Key change**: Deesser moved BEFORE loudnorm to prevent amplification of sibilance problems. |
| 61 | +
|
| 62 | +--- |
| 63 | +
|
| 64 | +## Filter Configuration Recommendations |
| 65 | +
|
| 66 | +### 1. Loudnorm (Primary Normalization) |
| 67 | +
|
| 68 | +**Purpose**: Provide accurate, gentle LUFS-based normalization using EBU R128 standard |
| 69 | +
|
| 70 | +**Recommended Settings**: |
| 71 | +```go |
| 72 | +loudnormFilter := fmt.Sprintf( |
| 73 | + "loudnorm=I=-18.0:TP=-2.0:LRA=11.0:"+ |
| 74 | + "measured_I=%.2f:measured_TP=%.2f:measured_LRA=%.2f:"+ |
| 75 | + "measured_thresh=%.2f:offset=%.2f:"+ |
| 76 | + "linear=false:print_format=summary", |
| 77 | + measurements.InputI, measurements.InputTP, measurements.InputLRA, |
| 78 | + measurements.InputThresh, measurements.TargetOffset, |
| 79 | +) |
| 80 | +``` |
| 81 | +
|
| 82 | +**Key Parameters**: |
| 83 | +- **I=-18.0 LUFS** (instead of -16.0) |
| 84 | + - Rationale: Gentler target leaves headroom for dynaudnorm fine-tuning |
| 85 | + - Prevents loudnorm from being too aggressive |
| 86 | + - -18.0 is still podcast-appropriate (Spotify uses -14 to -18) |
| 87 | +
|
| 88 | +- **TP=-2.0 dBTP** (instead of -0.3) |
| 89 | + - Rationale: More conservative true peak ceiling |
| 90 | + - Prevents inter-sample peaks before dynaudnorm |
| 91 | + - Final limiting happens at alimiter (-1.5 dBTP) |
| 92 | +
|
| 93 | +- **LRA=11.0 LU** (instead of 7.0) |
| 94 | + - Rationale: Wider loudness range preserves natural dynamics |
| 95 | + - Prevents "squashing" before dynaudnorm |
| 96 | + - Still within broadcast standards (7-20 LU) |
| 97 | +
|
| 98 | +- **linear=false** (dynamic mode) |
| 99 | + - Rationale: Adapts to source material rather than forcing linear scaling |
| 100 | + - Better for varied podcast content |
| 101 | + - Prevents distortion on narrow-LRA sources |
| 102 | +
|
| 103 | +**Benefits**: |
| 104 | +- Standards-compliant LUFS targeting |
| 105 | +- Gentle, musical processing |
| 106 | +- Accurate measurements from Pass 1 |
| 107 | +- Leaves headroom for fine-tuning |
| 108 | +
|
| 109 | +--- |
| 110 | +
|
| 111 | +### 2. Dynaudnorm (Fine-Tuning) |
| 112 | +
|
| 113 | +**Purpose**: Smooth out remaining loudness variations between segments/speakers |
| 114 | +
|
| 115 | +**Recommended Settings** (Conservative, Non-Adaptive): |
| 116 | +```go |
| 117 | +dynaudnormFilter := fmt.Sprintf( |
| 118 | + "dynaudnorm=f=500:g=31:p=0.95:m=5.0:r=0.0:t=0.0:n=1:c=0:b=0:s=0.0", |
| 119 | + // f=500: Frame length 500ms (default, balanced) |
| 120 | + // g=31: Gaussian filter size 31 (default, smooth) |
| 121 | + // p=0.95: Peak target 0.95 (default, 5% headroom) |
| 122 | + // m=5.0: Max gain 5x (reduced from 10x, less aggressive) |
| 123 | + // r=0.0: No RMS targeting (disabled, let loudnorm handle this) |
| 124 | + // t=0.0: Normalize all frames (but see gain staging safety check) |
| 125 | + // n=1: Coupled channels (mono so no effect) |
| 126 | + // c=0: No DC correction |
| 127 | + // b=0: Standard boundary mode |
| 128 | + // s=0.0: No compression (preserve dynamics - acompressor handles this) |
| 129 | +) |
| 130 | +``` |
| 131 | +
|
| 132 | +**Note on `s` parameter (compression)**: |
| 133 | +- Kept at `s=0.0` to avoid double-compression with acompressor |
| 134 | +- While `s=3.0` (light compression) might help even out vocal characteristics, it conflicts with goal of reducing aggression |
| 135 | +- Can be tested as optional parameter if needed |
| 136 | +
|
| 137 | +**Gain Staging Safety Check**: |
| 138 | +```go |
| 139 | +// Prevent cascading gain from loudnorm + dynaudnorm |
| 140 | +loudnormGain := math.Abs(measurements.TargetOffset) // Actual needed gain from Pass 1 |
| 141 | +dynaudnormGainDB := 20 * math.Log10(config.DynaudnormMaxGain) |
| 142 | +totalGain := loudnormGain + dynaudnormGainDB |
| 143 | + |
| 144 | +// Safety limit: if total potential gain exceeds 30dB, reduce dynaudnorm's contribution |
| 145 | +if totalGain > 30.0 { |
| 146 | + // Calculate safe maximum for dynaudnorm based on loudnorm's workload |
| 147 | + config.DynaudnormMaxGain = math.Max(3.0, math.Pow(10, (30.0-loudnormGain)/20.0)) |
| 148 | +} |
| 149 | +``` |
| 150 | +
|
| 151 | +**Key Changes from Current**: |
| 152 | +- **Removed RMS targeting** (`r=0.0`) - loudnorm already handled LUFS |
| 153 | +- **Reduced max gain** (`m=5.0` instead of adaptive 10-25) - prevents over-amplification |
| 154 | +- **Safety-limited max gain** - backs off if loudnorm doing heavy lifting |
| 155 | +- **No compression** (`s=0.0`) - acompressor already handled this |
| 156 | +- **Conservative** - fixed parameters, minimal adaptive behavior |
| 157 | +
|
| 158 | +**Benefits**: |
| 159 | +- Smooths remaining level variations |
| 160 | +- Doesn't try to re-normalize (loudnorm did that) |
| 161 | +- Gentle, transparent processing |
| 162 | +- No risk of over-amplification or cascading gain |
| 163 | +
|
| 164 | +--- |
| 165 | +
|
| 166 | +### 3. Acompressor (Pre-Normalization) |
| 167 | +
|
| 168 | +**Purpose**: Control dynamic range BEFORE normalization to prevent distortion |
| 169 | +
|
| 170 | +#### Current vs Recommended |
| 171 | +
|
| 172 | +| Parameter | Current | Recommended | Rationale | |
| 173 | +|-----------|---------|-------------|-----------| |
| 174 | +| **threshold** | -20 dB | **-18 dB** | Higher threshold = less compression on normal speech | |
| 175 | +| **ratio** | 2.5:1 | **3.0:1** | Slightly more ratio to control peaks better | |
| 176 | +| **attack** | 15 ms | **20 ms** | Slower attack preserves transients | |
| 177 | +| **release** | 80 ms | **100 ms** | Slower release sounds more natural | |
| 178 | +| **makeup** | 3 dB | **2 dB** | Less makeup (loudnorm will handle gain) | |
| 179 | +| **knee** | 2.5 | **3.0** | Softer knee for smoother compression | |
| 180 | +| **detection** | RMS | **RMS** | Keep RMS for smooth, musical compression | |
| 181 | +| **mix** | 1.0 | **0.85** | Parallel compression (15% dry signal) | |
| 182 | +
|
| 183 | +**Recommended Configuration**: |
| 184 | +```go |
| 185 | +acompressorFilter := fmt.Sprintf( |
| 186 | + "acompressor=threshold=%.6f:ratio=3.0:attack=20:release=100:"+ |
| 187 | + "makeup=%.2f:knee=3.0:detection=rms:mix=0.85", |
| 188 | + dbToLinear(-18.0), // -18 dB threshold |
| 189 | + dbToLinear(2.0), // 2 dB makeup |
| 190 | +) |
| 191 | +``` |
| 192 | +
|
| 193 | +**Adaptive Tuning Improvements**: |
| 194 | +
|
| 195 | +Based on measurements from Pass 1: |
| 196 | +
|
| 197 | +```go |
| 198 | +// Adaptive compression based on measured dynamic range |
| 199 | +if measurements.DynamicRange > 0 { |
| 200 | + if measurements.DynamicRange > 30.0 { |
| 201 | + // Very dynamic content (expressive delivery) |
| 202 | + config.CompRatio = 2.0 // Gentle ratio |
| 203 | + config.CompThreshold = -16.0 // Higher threshold |
| 204 | + config.CompMakeup = 1.0 // Minimal makeup |
| 205 | + } else if measurements.DynamicRange > 20.0 { |
| 206 | + // Moderately dynamic (typical podcast) |
| 207 | + config.CompRatio = 3.0 |
| 208 | + config.CompThreshold = -18.0 |
| 209 | + config.CompMakeup = 2.0 |
| 210 | + } else { |
| 211 | + // Already compressed/consistent |
| 212 | + config.CompRatio = 4.0 // Stronger ratio for peaks |
| 213 | + config.CompThreshold = -20.0 // Lower threshold |
| 214 | + config.CompMakeup = 3.0 // More makeup |
| 215 | + } |
| 216 | +} |
| 217 | + |
| 218 | +// Adaptive attack/release based on loudness range |
| 219 | +if measurements.InputLRA > 15.0 { |
| 220 | + // Wide loudness range = preserve transients |
| 221 | + config.CompAttack = 25 // Slower attack |
| 222 | + config.CompRelease = 150 // Slower release |
| 223 | +} else if measurements.InputLRA > 10.0 { |
| 224 | + // Moderate range |
| 225 | + config.CompAttack = 20 |
| 226 | + config.CompRelease = 100 |
| 227 | +} else { |
| 228 | + // Narrow range = tighter control |
| 229 | + config.CompAttack = 15 |
| 230 | + config.CompRelease = 80 |
| 231 | +} |
| 232 | + |
| 233 | +// Adaptive parallel compression mix based on recording quality AND dynamic range |
| 234 | +var mixFactor float64 |
| 235 | + |
| 236 | +// Noise floor indicates recording quality (affects artifact audibility) |
| 237 | +if measurements.NoiseFloor < -50 { |
| 238 | + mixFactor = 0.95 // Clean recording baseline - can use more compression |
| 239 | +} else if measurements.NoiseFloor < -40 { |
| 240 | + mixFactor = 0.85 // Moderate quality |
| 241 | +} else { |
| 242 | + mixFactor = 0.75 // Noisy - gentler processing to mask pumping artifacts |
| 243 | +} |
| 244 | + |
| 245 | +// Dynamic range indicates content characteristics (affects how much compression needed) |
| 246 | +if measurements.DynamicRange > 30 { |
| 247 | + // Very dynamic - preserve more dry signal |
| 248 | + config.CompMix = mixFactor - 0.10 |
| 249 | +} else if measurements.DynamicRange > 20 { |
| 250 | + // Moderate dynamics |
| 251 | + config.CompMix = mixFactor |
| 252 | +} else { |
| 253 | + // Already compressed - can use more |
| 254 | + config.CompMix = math.Min(1.0, mixFactor + 0.10) |
| 255 | +} |
| 256 | +``` |
| 257 | +
|
| 258 | +**Benefits of Adaptive Compression**: |
| 259 | +- Uses **actual measurements** (Dynamic Range, LRA, Noise Floor) not derived values |
| 260 | +- Gentler on naturally expressive delivery |
| 261 | +- Tighter control on already-compressed sources |
| 262 | +- Parallel compression preserves naturalness and masks artifacts |
| 263 | +- **Quality-aware mixing**: noisy recordings get gentler processing |
| 264 | +- **Content-aware mixing**: dynamic content preserves more dry signal |
| 265 | +- Better peak control BEFORE normalization = less distortion |
| 266 | +
|
| 267 | +--- |
| 268 | +
|
| 269 | +## Implementation Recommendations |
| 270 | +
|
| 271 | +### Phase 1: Implemented Changes |
| 272 | +1. ✅ **Re-enable loudnorm** with gentler settings (-18 LUFS, -2 TP, 11 LRA) |
| 273 | +2. ✅ **Remove dynaudnorm adaptive tuning** - use fixed conservative values (m=5.0, r=0.0, s=0.0) |
| 274 | +3. ✅ **Add gain staging safety check** - prevent cascading gain from loudnorm + dynaudnorm |
| 275 | +4. ✅ **Move deesser before loudnorm** - prevent amplification of sibilance problems |
| 276 | +5. ✅ **Implement adaptive compression mixing** - based on noise floor + dynamic range |
| 277 | +6. ✅ **Improve adaptive compression** - better tuning based on DynamicRange and LRA |
| 278 | +
|
| 279 | +### Phase 2: Testing & Validation |
| 280 | +1. **Process all three test files** with Phase 1 configuration |
| 281 | +2. **Measure output levels** using ffmpeg ebur128 |
| 282 | +3. **Listen critically** for distortion, harshness, clipping |
| 283 | +4. **Compare perceived loudness** between speakers |
| 284 | +5. **Gather feedback** for fine-tuning |
| 285 | +
|
| 286 | +### Phase 3: Optional Refinements (if needed) |
| 287 | +1. Test `dynaudnorm s=3.0` if additional vocal matching needed |
| 288 | +2. Adjust compression mix ratios based on real-world results |
| 289 | +3. Fine-tune loudnorm LRA target if dynamic range issues persist |
| 290 | +4. Optimize gain staging threshold if over/under amplification detected |
| 291 | +
|
| 292 | +--- |
| 293 | +
|
| 294 | +## Expected Results |
| 295 | +
|
| 296 | +### Before (Current Issue): |
| 297 | +- ❌ Hot, distorted output |
| 298 | +- ❌ Clipping at peaks |
| 299 | +- ❌ Over-aggressive processing |
| 300 | +- ❌ Unnatural "squashed" sound |
| 301 | +
|
| 302 | +### After (Recommended Approach): |
| 303 | +- ✅ Clean -16 LUFS output (via gentle -18 → fine-tune) |
| 304 | +- ✅ No clipping (proper headroom at each stage) |
| 305 | +- ✅ Natural dynamics preserved |
| 306 | +- ✅ Consistent loudness between speakers |
| 307 | +- ✅ Professional broadcast quality |
| 308 | +
|
| 309 | +--- |
| 310 | +
|
| 311 | +## Testing Protocol |
| 312 | +
|
| 313 | +1. **Process all three test files** with new configuration |
| 314 | +2. **Load into Audacity** and verify: |
| 315 | + - Peak levels below -1.5 dBTP |
| 316 | + - RMS levels around -16 LUFS (use loudness meter) |
| 317 | + - No visible clipping |
| 318 | +3. **Listen critically** for: |
| 319 | + - Natural dynamics |
| 320 | + - No distortion/harshness |
| 321 | + - Consistent volume between speakers |
| 322 | + - No pumping/breathing artifacts |
| 323 | +4. **Measure with ffmpeg**: |
| 324 | + ```bash |
| 325 | + ffmpeg -i output.flac -af ebur128=framelog=verbose -f null - 2>&1 | grep "I:" |
| 326 | + ``` |
| 327 | + Target: I: -16.0 LUFS ±1.0 |
| 328 | +
|
| 329 | +--- |
| 330 | +
|
| 331 | +## Summary |
| 332 | +
|
| 333 | +**The core issue**: Aggressive dynaudnorm RMS targeting combined with high max gain causes distortion. |
| 334 | +
|
| 335 | +**The solution**: Use loudnorm for primary LUFS normalization with gentle settings, then dynaudnorm for subtle fine-tuning only. |
| 336 | +
|
| 337 | +**The benefit**: Standards-compliant, professional-quality output without distortion. |
0 commit comments