stereo.html@ 98827

最後變更在這個檔案從98827是 96468,由 vboxsync 提交於 2 年前
libs/libvorbis-1.3.7: Re-exporting, hopefully this time everything is there. bugref:10275
屬性 svn:eol-style 設為 `native` 屬性 svn:keywords 設為 `Author Date Id Revision`
檔案大小: 16.2 KB

行
1	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2	<html>
3	<head>
4
5	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
6	<title>Ogg Vorbis Documentation</title>
7
8	<style type="text/css">
9	body {
10	margin: 0 18px 0 18px;
11	padding-bottom: 30px;
12	font-family: Verdana, Arial, Helvetica, sans-serif;
13	color: #333333;
14	font-size: .8em;
15	}
16
17	a {
18	color: #3366cc;
19	}
20
21	img {
22	border: 0;
23	}
24
25	#xiphlogo {
26	margin: 30px 0 16px 0;
27	}
28
29	#content p {
30	line-height: 1.4;
31	}
32
33	h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
34	font-weight: bold;
35	color: #ff9900;
36	margin: 1.3em 0 8px 0;
37	}
38
39	h1 {
40	font-size: 1.3em;
41	}
42
43	h2 {
44	font-size: 1.2em;
45	}
46
47	h3 {
48	font-size: 1.1em;
49	}
50
51	li {
52	line-height: 1.4;
53	}
54
55	#copyright {
56	margin-top: 30px;
57	line-height: 1.5em;
58	text-align: center;
59	font-size: .8em;
60	color: #888888;
61	clear: both;
62	}
63	</style>
64
65	</head>
66
67	<body>
68
69	<div id="xiphlogo">
70	<a href="https://xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.Org"/></a>
71	</div>
72
73	<h1>Ogg Vorbis stereo-specific channel coupling discussion</h1>
74
75	<h2>Abstract</h2>
76
77	<p>The Vorbis audio CODEC provides a channel coupling
78	mechanisms designed to reduce effective bitrate by both eliminating
79	interchannel redundancy and eliminating stereo image information
80	labeled inaudible or undesirable according to spatial psychoacoustic
81	models. This document describes both the mechanical coupling
82	mechanisms available within the Vorbis specification, as well as the
83	specific stereo coupling models used by the reference
84	<tt>libvorbis</tt> codec provided by xiph.org.</p>
85
86	<h2>Mechanisms</h2>
87
88	<p>In encoder release beta 4 and earlier, Vorbis supported multiple
89	channel encoding, but the channels were encoded entirely separately
90	with no cross-analysis or redundancy elimination between channels.
91	This multichannel strategy is very similar to the mp3's <em>dual
92	stereo</em> mode and Vorbis uses the same name for its analogous
93	uncoupled multichannel modes.</p>
94
95	<p>However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and
96	later implement a coupled channel strategy. Vorbis has two specific
97	mechanisms that may be used alone or in conjunction to implement
98	channel coupling. The first is <em>channel interleaving</em> via
99	residue backend type 2, and the second is <em>square polar
100	mapping</em>. These two general mechanisms are particularly well
101	suited to coupling due to the structure of Vorbis encoding, as we'll
102	explore below, and using both we can implement both totally
103	<em>lossless stereo image coupling</em> [bit-for-bit decode-identical
104	to uncoupled modes], as well as various lossy models that seek to
105	eliminate inaudible or unimportant aspects of the stereo image in
106	order to enhance bitrate. The exact coupling implementation is
107	generalized to allow the encoder a great deal of flexibility in
108	implementation of a stereo or surround model without requiring any
109	significant complexity increase over the combinatorially simpler
110	mid/side joint stereo of mp3 and other current audio codecs.</p>
111
112	<p>A particular Vorbis bitstream may apply channel coupling directly to
113	more than a pair of channels; polar mapping is hierarchical such that
114	polar coupling may be extrapolated to an arbitrary number of channels
115	and is not restricted to only stereo, quadraphonics, ambisonics or 5.1
116	surround. However, the scope of this document restricts itself to the
117	stereo coupling case.</p>
118
119	<a name="sqpm"></a>
120	<h3>Square Polar Mapping</h3>
121
122	<h4>maximal correlation</h4>
123
124	<p>Recall that the basic structure of a a Vorbis I stream first generates
125	from input audio a spectral 'floor' function that serves as an
126	MDCT-domain whitening filter. This floor is meant to represent the
127	rough envelope of the frequency spectrum, using whatever metric the
128	encoder cares to define. This floor is subtracted from the log
129	frequency spectrum, effectively normalizing the spectrum by frequency.
130	Each input channel is associated with a unique floor function.</p>
131
132	<p>The basic idea behind any stereo coupling is that the left and right
133	channels usually correlate. This correlation is even stronger if one
134	first accounts for energy differences in any given frequency band
135	across left and right; think for example of individual instruments
136	mixed into different portions of the stereo image, or a stereo
137	recording with a dominant feature not perfectly in the center. The
138	floor functions, each specific to a channel, provide the perfect means
139	of normalizing left and right energies across the spectrum to maximize
140	correlation before coupling. This feature of the Vorbis format is not
141	a convenient accident.</p>
142
143	<p>Because we strive to maximally correlate the left and right channels
144	and generally succeed in doing so, left and right residue is typically
145	nearly identical. We could use channel interleaving (discussed below)
146	alone to efficiently remove the redundancy between the left and right
147	channels as a side effect of entropy encoding, but a polar
148	representation gives benefits when left/right correlation is
149	strong.</p>
150
151	<h4>point and diffuse imaging</h4>
152
153	<p>The first advantage of a polar representation is that it effectively
154	separates the spatial audio information into a 'point image'
155	(magnitude) at a given frequency and located somewhere in the sound
156	field, and a 'diffuse image' (angle) that fills a large amount of
157	space simultaneously. Even if we preserve only the magnitude (point)
158	data, a detailed and carefully chosen floor function in each channel
159	provides us with a free, fine-grained, frequency relative intensity
160	stereo*. Angle information represents diffuse sound fields, such as
161	reverberation that fills the entire space simultaneously.</p>
162
163	<p>*<em>Because the Vorbis model supports a number of different possible
164	stereo models and these models may be mixed, we do not use the term
165	'intensity stereo' talking about Vorbis; instead we use the terms
166	'point stereo', 'phase stereo' and subcategories of each.</em></p>
167
168	<p>The majority of a stereo image is representable by polar magnitude
169	alone, as strong sounds tend to be produced at near-point sources;
170	even non-diffuse, fast, sharp echoes track very accurately using
171	magnitude representation almost alone (for those experimenting with
172	Vorbis tuning, this strategy works much better with the precise,
173	piecewise control of floor 1; the continuous approximation of floor 0
174	results in unstable imaging). Reverberation and diffuse sounds tend
175	to contain less energy and be psychoacoustically dominated by the
176	point sources embedded in them. Thus, we again tend to concentrate
177	more represented energy into a predictably smaller number of numbers.
178	Separating representation of point and diffuse imaging also allows us
179	to model and manipulate point and diffuse qualities separately.</p>
180
181	<h4>controlling bit leakage and symbol crosstalk</h4>
182
183	<p>Because polar
184	representation concentrates represented energy into fewer large
185	values, we reduce bit 'leakage' during cascading (multistage VQ
186	encoding) as a secondary benefit. A single large, monolithic VQ
187	codebook is more efficient than a cascaded book due to entropy
188	'crosstalk' among symbols between different stages of a multistage cascade.
189	Polar representation is a way of further concentrating entropy into
190	predictable locations so that codebook design can take steps to
191	improve multistage codebook efficiency. It also allows us to cascade
192	various elements of the stereo image independently.</p>
193
194	<h4>eliminating trigonometry and rounding</h4>
195
196	<p>Rounding and computational complexity are potential problems with a
197	polar representation. As our encoding process involves quantization,
198	mixing a polar representation and quantization makes it potentially
199	impossible, depending on implementation, to construct a coupled stereo
200	mechanism that results in bit-identical decompressed output compared
201	to an uncoupled encoding should the encoder desire it.</p>
202
203	<p>Vorbis uses a mapping that preserves the most useful qualities of
204	polar representation, relies only on addition/subtraction (during
205	decode; high quality encoding still requires some trig), and makes it
206	trivial before or after quantization to represent an angle/magnitude
207	through a one-to-one mapping from possible left/right value
208	permutations. We do this by basing our polar representation on the
209	unit square rather than the unit-circle.</p>
210
211	<p>Given a magnitude and angle, we recover left and right using the
212	following function (note that A/B may be left/right or right/left
213	depending on the coupling definition used by the encoder):</p>
214
215	<pre>
216	if(magnitude>0)
217	if(angle>0){
218	A=magnitude;
219	B=magnitude-angle;
220	}else{
221	B=magnitude;
222	A=magnitude+angle;
223	}
224	else
225	if(angle>0){
226	A=magnitude;
227	B=magnitude+angle;
228	}else{
229	B=magnitude;
230	A=magnitude-angle;
231	}
232	}
233	</pre>
234
235	<p>The function is antisymmetric for positive and negative magnitudes in
236	order to eliminate a redundant value when quantizing. For example, if
237	we're quantizing to integer values, we can visualize a magnitude of 5
238	and an angle of -2 as follows:</p>
239
240	<p><img src="squarepolar.png" alt="square polar"/></p>
241
242	<p>This representation loses or replicates no values; if the range of A
243	and B are integral -5 through 5, the number of possible Cartesian
244	permutations is 121. Represented in square polar notation, the
245	possible values are:</p>
246
247	<pre>
248	0, 0
249
250	-1,-2 -1,-1 -1, 0 -1, 1
251
252	1,-2 1,-1 1, 0 1, 1
253
254	-2,-4 -2,-3 -2,-2 -2,-1 -2, 0 -2, 1 -2, 2 -2, 3
255
256	2,-4 2,-3 ... following the pattern ...
257
258	... 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 5, 7 5, 8 5, 9
259
260	</pre>
261
262	<p>...for a grand total of 121 possible values, the same number as in
263	Cartesian representation (note that, for example, <tt>5,-10</tt> is
264	the same as <tt>-5,10</tt>, so there's no reason to represent
265	both. 2,10 cannot happen, and there's no reason to account for it.)
266	It's also obvious that this mapping is exactly reversible.</p>
267
268	<h3>Channel interleaving</h3>
269
270	<p>We can remap and A/B vector using polar mapping into a magnitude/angle
271	vector, and it's clear that, in general, this concentrates energy in
272	the magnitude vector and reduces the amount of information to encode
273	in the angle vector. Encoding these vectors independently with
274	residue backend #0 or residue backend #1 will result in bitrate
275	savings. However, there are still implicit correlations between the
276	magnitude and angle vectors. The most obvious is that the amplitude
277	of the angle is bounded by its corresponding magnitude value.</p>
278
279	<p>Entropy coding the results, then, further benefits from the entropy
280	model being able to compress magnitude and angle simultaneously. For
281	this reason, Vorbis implements residue backend #2 which pre-interleaves
282	a number of input vectors (in the stereo case, two, A and B) into a
283	single output vector (with the elements in the order of
284	A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus
285	each vector to be coded by the vector quantization backend consists of
286	matching magnitude and angle values.</p>
287
288	<p>The astute reader, at this point, will notice that in the theoretical
289	case in which we can use monolithic codebooks of arbitrarily large
290	size, we can directly interleave and encode left and right without
291	polar mapping; in fact, the polar mapping does not appear to lend any
292	benefit whatsoever to the efficiency of the entropy coding. In fact,
293	it is perfectly possible and reasonable to build a Vorbis encoder that
294	dispenses with polar mapping entirely and merely interleaves the
295	channel. Libvorbis based encoders may configure such an encoding and
296	it will work as intended.</p>
297
298	<p>However, when we leave the ideal/theoretical domain, we notice that
299	polar mapping does give additional practical benefits, as discussed in
300	the above section on polar mapping and summarized again here:</p>
301
302	<ul>
303	<li>Polar mapping aids in controlling entropy 'leakage' between stages
304	of a cascaded codebook.</li>
305	<li>Polar mapping separates the stereo image
306	into point and diffuse components which may be analyzed and handled
307	differently.</li>
308	</ul>
309
310	<h2>Stereo Models</h2>
311
312	<h3>Dual Stereo</h3>
313
314	<p>Dual stereo refers to stereo encoding where the channels are entirely
315	separate; they are analyzed and encoded as entirely distinct entities.
316	This terminology is familiar from mp3.</p>
317
318	<h3>Lossless Stereo</h3>
319
320	<p>Using polar mapping and/or channel interleaving, it's possible to
321	couple Vorbis channels losslessly, that is, construct a stereo
322	coupling encoding that both saves space but also decodes
323	bit-identically to dual stereo. OggEnc 1.0 and later uses this
324	mode in all high-bitrate encoding.</p>
325
326	<p>Overall, this stereo mode is overkill; however, it offers a safe
327	alternative to users concerned about the slightest possible
328	degradation to the stereo image or archival quality audio.</p>
329
330	<h3>Phase Stereo</h3>
331
332	<p>Phase stereo is the least aggressive means of gracefully dropping
333	resolution from the stereo image; it affects only diffuse imaging.</p>
334
335	<p>It's often quoted that the human ear is deaf to signal phase above
336	about 4kHz; this is nearly true and a passable rule of thumb, but it
337	can be demonstrated that even an average user can tell the difference
338	between high frequency in-phase and out-of-phase noise. Obviously
339	then, the statement is not entirely true. However, it's also the case
340	that one must resort to nearly such an extreme demonstration before
341	finding the counterexample.</p>
342
343	<p>'Phase stereo' is simply a more aggressive quantization of the polar
344	angle vector; above 4kHz it's generally quite safe to quantize noise
345	and noisy elements to only a handful of allowed phases, or to thin the
346	phase with respect to the magnitude. The phases of high amplitude
347	pure tones may or may not be preserved more carefully (they are
348	relatively rare and L/R tend to be in phase, so there is generally
349	little reason not to spend a few more bits on them)</p>
350
351	<h4>example: eight phase stereo</h4>
352
353	<p>Vorbis may implement phase stereo coupling by preserving the entirety
354	of the magnitude vector (essential to fine amplitude and energy
355	resolution overall) and quantizing the angle vector to one of only
356	four possible values. Given that the magnitude vector may be positive
357	or negative, this results in left and right phase having eight
358	possible permutation, thus 'eight phase stereo':</p>
359
360	<p><img src="eightphase.png" alt="eight phase"/></p>
361
362	<p>Left and right may be in phase (positive or negative), the most common
363	case by far, or out of phase by 90 or 180 degrees.</p>
364
365	<h4>example: four phase stereo</h4>
366
367	<p>Similarly, four phase stereo takes the quantization one step further;
368	it allows only in-phase and 180 degree out-out-phase signals:</p>
369
370	<p><img src="fourphase.png" alt="four phase"/></p>
371
372	<h3>example: point stereo</h3>
373
374	<p>Point stereo eliminates the possibility of out-of-phase signal
375	entirely. Any diffuse quality to a sound source tends to collapse
376	inward to a point somewhere within the stereo image. A practical
377	example would be balanced reverberations within a large, live space;
378	normally the sound is diffuse and soft, giving a sonic impression of
379	volume. In point-stereo, the reverberations would still exist, but
380	sound fairly firmly centered within the image (assuming the
381	reverberation was centered overall; if the reverberation is stronger
382	to the left, then the point of localization in point stereo would be
383	to the left). This effect is most noticeable at low and mid
384	frequencies and using headphones (which grant perfect stereo
385	separation). Point stereo is is a graceful but generally easy to
386	detect degradation to the sound quality and is thus used in frequency
387	ranges where it is least noticeable.</p>
388
389	<h3>Mixed Stereo</h3>
390
391	<p>Mixed stereo is the simultaneous use of more than one of the above
392	stereo encoding models, generally using more aggressive modes in
393	higher frequencies, lower amplitudes or 'nearly' in-phase sound.</p>
394
395	<p>It is also the case that near-DC frequencies should be encoded using
396	lossless coupling to avoid frame blocking artifacts.</p>
397
398	<h3>Vorbis Stereo Modes</h3>
399
400	<p>Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes
401	constructed out of lossless and point stereo. Phase stereo was used
402	in the rc2 encoder, but is not currently used for simplicity's sake. It
403	will likely be re-added to the stereo model in the future.</p>
404
405	<div id="copyright">
406	The Xiph Fish Logo is a
407	trademark (™) of Xiph.Org.<br/>
408
409	These pages © 1994 - 2005 Xiph.Org. All rights reserved.
410	</div>
411
412	</body>
413	</html>
414
415
416
417
418
419

注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

source: vbox/trunk/src/libs/libvorbis-1.3.7/doc/stereo.html@ 98827

以其他格式下載: