Skip to content

Commit 071b488

Browse files
committed
docs: add loudness normalization analysis and strategy documentation
Add detailed technical documentation covering audio processing approaches: - LOUDNESS_ANALYSIS.md: Documents current issues with audio processing, root cause analysis of dynaudnorm settings, and recommendations for improved filter configuration - NORMALIZATION_STRATEGY.md: Explains the evolution from loudnorm to dynaudnorm-only approach and proposes speechnorm integration to address input level disparities These documents provide comprehensive guidance for audio engineers working on the normalization pipeline.
1 parent a6b36f7 commit 071b488

2 files changed

Lines changed: 671 additions & 0 deletions

File tree

docs/LOUDNESS_ANALYSIS.md

Lines changed: 337 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,337 @@
1+
# Loudness Normalization Analysis & Recommendations
2+
3+
## Current Issue
4+
Processed audio from all three test files exhibits:
5+
- **Hot/slightly distorted output** - audible harshness
6+
- **Clipping at points** - digital distortion
7+
- **Over-aggressive processing** - despite close level matching between presenters
8+
9+
## Root Cause Analysis
10+
11+
### Current Processing Chain
12+
```
13+
highpass → adeclick → afftdn → agate → acompressor → dynaudnorm → alimiter
14+
```
15+
16+
### The Problem: Dynaudnorm Adaptive Tuning
17+
18+
The current adaptive tuning for `dynaudnorm` is **too aggressive**:
19+
20+
1. **Target RMS Conversion** - Converting LUFS to linear RMS value:
21+
```go
22+
targetLUFS := config.TargetI // -16.0 LUFS
23+
targetDBFS := targetLUFS + 23.0 // ~+7.0 dBFS (!!)
24+
config.DynaudnormTargetRMS = math.Pow(10, targetDBFS/20.0) // Very hot target
25+
```
26+
**Issue**: The +23dB LUFS→dBFS conversion is approximate and creates an overly hot target
27+
28+
2. **Maximum Gain Based on Quiet Input**:
29+
```go
30+
if measurements.InputI < -40.0 {
31+
config.DynaudnormMaxGain = 25.0 // Allow very high gain
32+
```
33+
**Issue**: 25x gain (28dB) is excessive and can amplify noise/artifacts
34+
35+
3. **Aggressive Compression**:
36+
```go
37+
if measurements.InputLRA > 15.0 {
38+
config.DynaudnormCompress = 7.0 // Mild compression
39+
```
40+
**Issue**: Even "mild" compression (7.0) on highly dynamic content causes pumping
41+
42+
4. **No Noise Floor Protection**:
43+
- Threshold is derived from noise floor but may still normalize quiet passages too aggressively
44+
- Creates distortion when trying to match hot RMS targets
45+
46+
---
47+
48+
## Recommended Solution: Loudnorm + Dynaudnorm Combination
49+
50+
### Strategy
51+
Use **both** filters in sequence with complementary roles:
52+
1. **Loudnorm** - Gentle, standards-compliant LUFS normalization
53+
2. **Dynaudnorm** - Fine-tuning for consistent perceived loudness across segments
54+
55+
### New Filter Order
56+
```
57+
highpass → adeclick → afftdn → agate → acompressor → deesser → loudnorm → dynaudnorm → alimiter
58+
```
59+
60+
**Key change**: Deesser moved BEFORE loudnorm to prevent amplification of sibilance problems.
61+
62+
---
63+
64+
## Filter Configuration Recommendations
65+
66+
### 1. Loudnorm (Primary Normalization)
67+
68+
**Purpose**: Provide accurate, gentle LUFS-based normalization using EBU R128 standard
69+
70+
**Recommended Settings**:
71+
```go
72+
loudnormFilter := fmt.Sprintf(
73+
"loudnorm=I=-18.0:TP=-2.0:LRA=11.0:"+
74+
"measured_I=%.2f:measured_TP=%.2f:measured_LRA=%.2f:"+
75+
"measured_thresh=%.2f:offset=%.2f:"+
76+
"linear=false:print_format=summary",
77+
measurements.InputI, measurements.InputTP, measurements.InputLRA,
78+
measurements.InputThresh, measurements.TargetOffset,
79+
)
80+
```
81+
82+
**Key Parameters**:
83+
- **I=-18.0 LUFS** (instead of -16.0)
84+
- Rationale: Gentler target leaves headroom for dynaudnorm fine-tuning
85+
- Prevents loudnorm from being too aggressive
86+
- -18.0 is still podcast-appropriate (Spotify uses -14 to -18)
87+
88+
- **TP=-2.0 dBTP** (instead of -0.3)
89+
- Rationale: More conservative true peak ceiling
90+
- Prevents inter-sample peaks before dynaudnorm
91+
- Final limiting happens at alimiter (-1.5 dBTP)
92+
93+
- **LRA=11.0 LU** (instead of 7.0)
94+
- Rationale: Wider loudness range preserves natural dynamics
95+
- Prevents "squashing" before dynaudnorm
96+
- Still within broadcast standards (7-20 LU)
97+
98+
- **linear=false** (dynamic mode)
99+
- Rationale: Adapts to source material rather than forcing linear scaling
100+
- Better for varied podcast content
101+
- Prevents distortion on narrow-LRA sources
102+
103+
**Benefits**:
104+
- Standards-compliant LUFS targeting
105+
- Gentle, musical processing
106+
- Accurate measurements from Pass 1
107+
- Leaves headroom for fine-tuning
108+
109+
---
110+
111+
### 2. Dynaudnorm (Fine-Tuning)
112+
113+
**Purpose**: Smooth out remaining loudness variations between segments/speakers
114+
115+
**Recommended Settings** (Conservative, Non-Adaptive):
116+
```go
117+
dynaudnormFilter := fmt.Sprintf(
118+
"dynaudnorm=f=500:g=31:p=0.95:m=5.0:r=0.0:t=0.0:n=1:c=0:b=0:s=0.0",
119+
// f=500: Frame length 500ms (default, balanced)
120+
// g=31: Gaussian filter size 31 (default, smooth)
121+
// p=0.95: Peak target 0.95 (default, 5% headroom)
122+
// m=5.0: Max gain 5x (reduced from 10x, less aggressive)
123+
// r=0.0: No RMS targeting (disabled, let loudnorm handle this)
124+
// t=0.0: Normalize all frames (but see gain staging safety check)
125+
// n=1: Coupled channels (mono so no effect)
126+
// c=0: No DC correction
127+
// b=0: Standard boundary mode
128+
// s=0.0: No compression (preserve dynamics - acompressor handles this)
129+
)
130+
```
131+
132+
**Note on `s` parameter (compression)**:
133+
- Kept at `s=0.0` to avoid double-compression with acompressor
134+
- While `s=3.0` (light compression) might help even out vocal characteristics, it conflicts with goal of reducing aggression
135+
- Can be tested as optional parameter if needed
136+
137+
**Gain Staging Safety Check**:
138+
```go
139+
// Prevent cascading gain from loudnorm + dynaudnorm
140+
loudnormGain := math.Abs(measurements.TargetOffset) // Actual needed gain from Pass 1
141+
dynaudnormGainDB := 20 * math.Log10(config.DynaudnormMaxGain)
142+
totalGain := loudnormGain + dynaudnormGainDB
143+
144+
// Safety limit: if total potential gain exceeds 30dB, reduce dynaudnorm's contribution
145+
if totalGain > 30.0 {
146+
// Calculate safe maximum for dynaudnorm based on loudnorm's workload
147+
config.DynaudnormMaxGain = math.Max(3.0, math.Pow(10, (30.0-loudnormGain)/20.0))
148+
}
149+
```
150+
151+
**Key Changes from Current**:
152+
- **Removed RMS targeting** (`r=0.0`) - loudnorm already handled LUFS
153+
- **Reduced max gain** (`m=5.0` instead of adaptive 10-25) - prevents over-amplification
154+
- **Safety-limited max gain** - backs off if loudnorm doing heavy lifting
155+
- **No compression** (`s=0.0`) - acompressor already handled this
156+
- **Conservative** - fixed parameters, minimal adaptive behavior
157+
158+
**Benefits**:
159+
- Smooths remaining level variations
160+
- Doesn't try to re-normalize (loudnorm did that)
161+
- Gentle, transparent processing
162+
- No risk of over-amplification or cascading gain
163+
164+
---
165+
166+
### 3. Acompressor (Pre-Normalization)
167+
168+
**Purpose**: Control dynamic range BEFORE normalization to prevent distortion
169+
170+
#### Current vs Recommended
171+
172+
| Parameter | Current | Recommended | Rationale |
173+
|-----------|---------|-------------|-----------|
174+
| **threshold** | -20 dB | **-18 dB** | Higher threshold = less compression on normal speech |
175+
| **ratio** | 2.5:1 | **3.0:1** | Slightly more ratio to control peaks better |
176+
| **attack** | 15 ms | **20 ms** | Slower attack preserves transients |
177+
| **release** | 80 ms | **100 ms** | Slower release sounds more natural |
178+
| **makeup** | 3 dB | **2 dB** | Less makeup (loudnorm will handle gain) |
179+
| **knee** | 2.5 | **3.0** | Softer knee for smoother compression |
180+
| **detection** | RMS | **RMS** | Keep RMS for smooth, musical compression |
181+
| **mix** | 1.0 | **0.85** | Parallel compression (15% dry signal) |
182+
183+
**Recommended Configuration**:
184+
```go
185+
acompressorFilter := fmt.Sprintf(
186+
"acompressor=threshold=%.6f:ratio=3.0:attack=20:release=100:"+
187+
"makeup=%.2f:knee=3.0:detection=rms:mix=0.85",
188+
dbToLinear(-18.0), // -18 dB threshold
189+
dbToLinear(2.0), // 2 dB makeup
190+
)
191+
```
192+
193+
**Adaptive Tuning Improvements**:
194+
195+
Based on measurements from Pass 1:
196+
197+
```go
198+
// Adaptive compression based on measured dynamic range
199+
if measurements.DynamicRange > 0 {
200+
if measurements.DynamicRange > 30.0 {
201+
// Very dynamic content (expressive delivery)
202+
config.CompRatio = 2.0 // Gentle ratio
203+
config.CompThreshold = -16.0 // Higher threshold
204+
config.CompMakeup = 1.0 // Minimal makeup
205+
} else if measurements.DynamicRange > 20.0 {
206+
// Moderately dynamic (typical podcast)
207+
config.CompRatio = 3.0
208+
config.CompThreshold = -18.0
209+
config.CompMakeup = 2.0
210+
} else {
211+
// Already compressed/consistent
212+
config.CompRatio = 4.0 // Stronger ratio for peaks
213+
config.CompThreshold = -20.0 // Lower threshold
214+
config.CompMakeup = 3.0 // More makeup
215+
}
216+
}
217+
218+
// Adaptive attack/release based on loudness range
219+
if measurements.InputLRA > 15.0 {
220+
// Wide loudness range = preserve transients
221+
config.CompAttack = 25 // Slower attack
222+
config.CompRelease = 150 // Slower release
223+
} else if measurements.InputLRA > 10.0 {
224+
// Moderate range
225+
config.CompAttack = 20
226+
config.CompRelease = 100
227+
} else {
228+
// Narrow range = tighter control
229+
config.CompAttack = 15
230+
config.CompRelease = 80
231+
}
232+
233+
// Adaptive parallel compression mix based on recording quality AND dynamic range
234+
var mixFactor float64
235+
236+
// Noise floor indicates recording quality (affects artifact audibility)
237+
if measurements.NoiseFloor < -50 {
238+
mixFactor = 0.95 // Clean recording baseline - can use more compression
239+
} else if measurements.NoiseFloor < -40 {
240+
mixFactor = 0.85 // Moderate quality
241+
} else {
242+
mixFactor = 0.75 // Noisy - gentler processing to mask pumping artifacts
243+
}
244+
245+
// Dynamic range indicates content characteristics (affects how much compression needed)
246+
if measurements.DynamicRange > 30 {
247+
// Very dynamic - preserve more dry signal
248+
config.CompMix = mixFactor - 0.10
249+
} else if measurements.DynamicRange > 20 {
250+
// Moderate dynamics
251+
config.CompMix = mixFactor
252+
} else {
253+
// Already compressed - can use more
254+
config.CompMix = math.Min(1.0, mixFactor + 0.10)
255+
}
256+
```
257+
258+
**Benefits of Adaptive Compression**:
259+
- Uses **actual measurements** (Dynamic Range, LRA, Noise Floor) not derived values
260+
- Gentler on naturally expressive delivery
261+
- Tighter control on already-compressed sources
262+
- Parallel compression preserves naturalness and masks artifacts
263+
- **Quality-aware mixing**: noisy recordings get gentler processing
264+
- **Content-aware mixing**: dynamic content preserves more dry signal
265+
- Better peak control BEFORE normalization = less distortion
266+
267+
---
268+
269+
## Implementation Recommendations
270+
271+
### Phase 1: Implemented Changes
272+
1. ✅ **Re-enable loudnorm** with gentler settings (-18 LUFS, -2 TP, 11 LRA)
273+
2. ✅ **Remove dynaudnorm adaptive tuning** - use fixed conservative values (m=5.0, r=0.0, s=0.0)
274+
3. ✅ **Add gain staging safety check** - prevent cascading gain from loudnorm + dynaudnorm
275+
4. ✅ **Move deesser before loudnorm** - prevent amplification of sibilance problems
276+
5. ✅ **Implement adaptive compression mixing** - based on noise floor + dynamic range
277+
6. ✅ **Improve adaptive compression** - better tuning based on DynamicRange and LRA
278+
279+
### Phase 2: Testing & Validation
280+
1. **Process all three test files** with Phase 1 configuration
281+
2. **Measure output levels** using ffmpeg ebur128
282+
3. **Listen critically** for distortion, harshness, clipping
283+
4. **Compare perceived loudness** between speakers
284+
5. **Gather feedback** for fine-tuning
285+
286+
### Phase 3: Optional Refinements (if needed)
287+
1. Test `dynaudnorm s=3.0` if additional vocal matching needed
288+
2. Adjust compression mix ratios based on real-world results
289+
3. Fine-tune loudnorm LRA target if dynamic range issues persist
290+
4. Optimize gain staging threshold if over/under amplification detected
291+
292+
---
293+
294+
## Expected Results
295+
296+
### Before (Current Issue):
297+
- ❌ Hot, distorted output
298+
- ❌ Clipping at peaks
299+
- ❌ Over-aggressive processing
300+
- ❌ Unnatural "squashed" sound
301+
302+
### After (Recommended Approach):
303+
- ✅ Clean -16 LUFS output (via gentle -18 → fine-tune)
304+
- ✅ No clipping (proper headroom at each stage)
305+
- ✅ Natural dynamics preserved
306+
- ✅ Consistent loudness between speakers
307+
- ✅ Professional broadcast quality
308+
309+
---
310+
311+
## Testing Protocol
312+
313+
1. **Process all three test files** with new configuration
314+
2. **Load into Audacity** and verify:
315+
- Peak levels below -1.5 dBTP
316+
- RMS levels around -16 LUFS (use loudness meter)
317+
- No visible clipping
318+
3. **Listen critically** for:
319+
- Natural dynamics
320+
- No distortion/harshness
321+
- Consistent volume between speakers
322+
- No pumping/breathing artifacts
323+
4. **Measure with ffmpeg**:
324+
```bash
325+
ffmpeg -i output.flac -af ebur128=framelog=verbose -f null - 2>&1 | grep "I:"
326+
```
327+
Target: I: -16.0 LUFS ±1.0
328+
329+
---
330+
331+
## Summary
332+
333+
**The core issue**: Aggressive dynaudnorm RMS targeting combined with high max gain causes distortion.
334+
335+
**The solution**: Use loudnorm for primary LUFS normalization with gentle settings, then dynaudnorm for subtle fine-tuning only.
336+
337+
**The benefit**: Standards-compliant, professional-quality output without distortion.

0 commit comments

Comments
 (0)