anv.rst

最後變更在這個檔案是 103996,由 vboxsync 提交於 11 月前
Additions/3D/mesa: export mesa-24.0.2 to OSE. bugref:10606
檔案大小: 14.1 KB

行
1	ANV
2	===
3
4	Debugging
5	---------
6
7	Here are a few environment variable debug environment variables
8	specific to ANV:
9
10	:envvar:`ANV_ENABLE_PIPELINE_CACHE`
11	If defined to ``0`` or ``false``, this will disable pipeline
12	caching, forcing ANV to reparse and recompile any VkShaderModule
13	(SPIRV) it is given.
14	:envvar:`ANV_DISABLE_SECONDARY_CMD_BUFFER_CALLS`
15	If defined to ``1`` or ``true``, this will prevent usage of self
16	modifying command buffers to implement ``vkCmdExecuteCommands``. As
17	a result of this, it will also disable :ext:`VK_KHR_performance_query`.
18	:envvar:`ANV_ALWAYS_BINDLESS`
19	If defined to ``1`` or ``true``, this forces all descriptor sets to
20	use the internal `Bindless model`_.
21	:envvar:`ANV_QUEUE_THREAD_DISABLE`
22	If defined to ``1`` or ``true``, this disables support for timeline
23	semaphores.
24	:envvar:`ANV_USERSPACE_RELOCS`
25	If defined to ``1`` or ``true``, this forces ANV to always do
26	kernel relocations in command buffers. This should only have an
27	effect on hardware that doesn't support soft-pinning (Ivybridge,
28	Haswell, Cherryview).
29	:envvar:`ANV_PRIMITIVE_REPLICATION_MAX_VIEWS`
30	Specifies up to how many view shaders can be lowered to handle
31	:ext:`VK_KHR_multiview`. Beyond this number, multiview is implemented
32	using instanced rendering. If unspecified, the value default to
33	``2``.
34
35
36	Experimental features
37	---------------------
38
39	.. _`Bindless model`:
40
41	Binding Model
42	-------------
43
44	Here is the ANV bindless binding model that was implemented for the
45	descriptor indexing feature of Vulkan 1.2 :
46
47	.. graphviz::
48
49	digraph G {
50	fontcolor="black";
51	compound=true;
52
53	subgraph cluster_1 {
54	label = "Binding Table (HW)";
55
56	bgcolor="cornflowerblue";
57
58	node [ style=filled,shape="record",fillcolor="white",
59	label="RT0" ] n0;
60	node [ label="RT1" ] n1;
61	node [ label="dynbuf0"] n2;
62	node [ label="set0" ] n3;
63	node [ label="set1" ] n4;
64	node [ label="set2" ] n5;
65
66	n0 -> n1 -> n2 -> n3 -> n4 -> n5 [style=invis];
67	}
68	subgraph cluster_2 {
69	label = "Descriptor Set 0";
70
71	bgcolor="burlywood3";
72	fixedsize = true;
73
74	node [ style=filled,shape="record",fillcolor="white", fixedsize = true, width=4,
75	label="binding 0 - STORAGE_IMAGE\n anv_storage_image_descriptor" ] n8;
76	node [ label="binding 1 - COMBINED_IMAGE_SAMPLER\n anv_sampled_image_descriptor" ] n9;
77	node [ label="binding 2 - UNIFORM_BUFFER\n anv_address_range_descriptor" ] n10;
78	node [ label="binding 3 - UNIFORM_TEXEL_BUFFER\n anv_storage_image_descriptor" ] n11;
79
80	n8 -> n9 -> n10 -> n11 [style=invis];
81	}
82	subgraph cluster_5 {
83	label = "Vulkan Objects"
84
85	fontcolor="black";
86	bgcolor="darkolivegreen4";
87
88	subgraph cluster_6 {
89	label = "VkImageView";
90
91	bgcolor=darkolivegreen3;
92	node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
93	label="surface_state" ] n12;
94	}
95	subgraph cluster_7 {
96	label = "VkSampler";
97
98	bgcolor=darkolivegreen3;
99	node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
100	label="sample_state" ] n13;
101	}
102	subgraph cluster_8 {
103	label = "VkImageView";
104	bgcolor="darkolivegreen3";
105
106	node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
107	label="surface_state" ] n14;
108	}
109	subgraph cluster_9 {
110	label = "VkBuffer";
111	bgcolor=darkolivegreen3;
112
113	node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
114	label="address" ] n15;
115	}
116	subgraph cluster_10 {
117	label = "VkBufferView";
118
119	bgcolor=darkolivegreen3;
120	node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
121	label="surface_state" ] n16;
122	}
123
124	n12 -> n13 -> n14 -> n15 -> n16 [style=invis];
125	}
126
127	subgraph cluster_11 {
128	subgraph cluster_12 {
129	label = "CommandBuffer state stream";
130
131	bgcolor="gold3";
132	node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
133	label="surface_state" ] n17;
134	node [ label="surface_state" ] n18;
135	node [ label="surface_state" ] n19;
136
137	n17 -> n18 -> n19 [style=invis];
138	}
139	}
140
141	n3 -> n8 [lhead=cluster_2];
142
143	n8 -> n12;
144	n9 -> n13;
145	n9 -> n14;
146	n10 -> n15;
147	n11 -> n16;
148
149	n0 -> n17;
150	n1 -> n18;
151	n2 -> n19;
152	}
153
154
155
156	The HW binding table is generated when the draw or dispatch commands
157	are emitted. Here are the types of entries one can find in the binding
158	table :
159
160	- The currently bound descriptor sets, one entry per descriptor set
161	(our limit is 8).
162
163	- For dynamic buffers, one entry per dynamic buffer.
164
165	- For draw commands, render target entries if needed.
166
167	The entries of the HW binding table for descriptor sets are
168	RENDER_SURFACE_STATE similar to what you would have for a normal
169	uniform buffer. The shader will emit reads this buffer first to get
170	the information it needs to access a surface/sampler/etc... and then
171	emits the appropriate message using the information gathered from the
172	descriptor set buffer.
173
174	Each binding type entry gets an associated structure in memory
175	(``anv_storage_image_descriptor``, ``anv_sampled_image_descriptor``,
176	``anv_address_range_descriptor``, ``anv_storage_image_descriptor``).
177	This is the information read by the shader.
178
179
180	.. _`Binding tables`:
181
182	Binding Tables
183	--------------
184
185	Binding tables are arrays of 32bit offset entries referencing surface
186	states. This is how shaders can refer to binding table entry to read
187	or write a surface. For example fragment shaders will often refer to
188	entry 0 as the first render target.
189
190	The way binding tables are managed is fairly awkward.
191
192	Each shader stage must have its binding table programmed through
193	a corresponding instruction
194	``3DSTATE_BINDING_TABLE_POINTERS_*`` (each stage has its own).
195
196	.. graphviz::
197
198	digraph structs {
199	node [shape=record];
200	struct3 [label="{ binding tables\n area \| { <bt4> BT4 \| <bt3> BT3 \| ... \| <bt0> BT0 } }\|{ surface state\n area \|{<ss0> ss0\|<ss1> ss1\|<ss2> ss2\|...}}"];
201	struct3:bt0 -> struct3:ss0;
202	struct3:bt0 -> struct3:ss1;
203	}
204
205
206	The value programmed in the ``3DSTATE_BINDING_TABLE_POINTERS_*``
207	instructions is not a 64bit pointer but an offset from the address
208	programmed in ``STATE_BASE_ADDRESS::Surface State Base Address`` or
209	``3DSTATE_BINDING_TABLE_POOL_ALLOC::Binding Table Pool Base Address``
210	(available on Gfx11+). The offset value in
211	``3DSTATE_BINDING_TABLE_POINTERS_*`` is also limited to a few bits
212	(not a full 32bit value), meaning that as we use more and more binding
213	tables we need to reposition ``STATE_BASE_ADDRESS::Surface State Base
214	Address`` to make space for new binding table arrays.
215
216	To make things even more awkward, the binding table entries are also
217	relative to ``STATE_BASE_ADDRESS::Surface State Base Address`` so as
218	we change ``STATE_BASE_ADDRESS::Surface State Base Address`` we need
219	add that offsets to the binding table entries.
220
221	The way with deal with this is that we allocate 4Gb of address space
222	(since the binding table entries can address 4Gb of surface state
223	elements). We reserve the first gigabyte exclusively to binding
224	tables, so that anywhere we position our binding table in that first
225	gigabyte, it can always refer to the surface states in the next 3Gb.
226
227
228	.. _`Descriptor Set Memory Layout`:
229
230	Descriptor Set Memory Layout
231	----------------------------
232
233	Here is a representation of how the descriptor set bindings, with each
234	elements in each binding is mapped to a the descriptor set memory :
235
236	.. graphviz::
237
238	digraph structs {
239	node [shape=record];
240	rankdir=LR;
241
242	struct1 [label="Descriptor Set \| \
243	<b0> binding 0\n STORAGE_IMAGE \n (array_length=3) \| \
244	<b1> binding 1\n COMBINED_IMAGE_SAMPLER \n (array_length=2) \| \
245	<b2> binding 2\n UNIFORM_BUFFER \n (array_length=1) \| \
246	<b3> binding 3\n UNIFORM_TEXEL_BUFFER \n (array_length=1)"];
247	struct2 [label="Descriptor Set Memory \| \
248	<b0e0> anv_storage_image_descriptor\|\
249	<b0e1> anv_storage_image_descriptor\|\
250	<b0e2> anv_storage_image_descriptor\|\
251	<b1e0> anv_sampled_image_descriptor\|\
252	<b1e1> anv_sampled_image_descriptor\|\
253	<b2e0> anv_address_range_descriptor\|\
254	<b3e0> anv_storage_image_descriptor"];
255
256	struct1:b0 -> struct2:b0e0;
257	struct1:b0 -> struct2:b0e1;
258	struct1:b0 -> struct2:b0e2;
259	struct1:b1 -> struct2:b1e0;
260	struct1:b1 -> struct2:b1e1;
261	struct1:b2 -> struct2:b2e0;
262	struct1:b3 -> struct2:b3e0;
263	}
264
265	Each Binding in the descriptor set is allocated an array of
266	``anv__descriptor`` data structure. The type of ``anv__descriptor``
267	used for a binding is selected based on the ``VkDescriptorType`` of
268	the bindings.
269
270	The value of ``anv_descriptor_set_binding_layout::descriptor_offset``
271	is a byte offset from the descriptor set memory to the associated
272	binding. ``anv_descriptor_set_binding_layout::array_size`` is the
273	number of ``anv_*_descriptor`` elements in the descriptor set memory
274	from that offset for the binding.
275
276
277	Pipeline state emission
278	-----------------------
279
280	Vulkan initially started by baking as much state as possible in
281	pipelines. But extension after extension, more and more state has
282	become potentially dynamic.
283
284	ANV tries to limit the amount of time an instruction has to be packed
285	to reprogram part of the 3D pipeline state. The packing is happening
286	in 2 places :
287
288	- ``genX_pipeline.c`` where the non dynamic state is emitted in the
289	pipeline batch. Chunks of the batches are copied into the command
290	buffer as a result of calling ``vkCmdBindPipeline()``, depending on
291	what changes from the previously bound graphics pipeline
292
293	- ``genX_gfx_state.c`` where the dynamic state is added to already
294	packed instructions from ``genX_pipeline.c``
295
296	The rule to know where to emit an instruction programming the 3D
297	pipeline is as follow :
298
299	- If any field of the instruction can be made dynamic, it should be
300	emitted in ``genX_gfx_state.c``
301
302	- Otherwise, the instruction can be emitted in ``genX_pipeline.c``
303
304	When a piece of state programming is dynamic, it should have a
305	corresponding field in ``anv_gfx_dynamic_state`` and the
306	``genX(cmd_buffer_flush_gfx_runtime_state)`` function should be
307	updated to ensure we minimize the amount of time an instruction should
308	be emitted. Each instruction should have a associated
309	``ANV_GFX_STATE_*`` mask so that the dynamic emission code can tell
310	when to re-emit an instruction.
311
312
313	Generated indirect draws optimization
314	-------------------------------------
315
316	Indirect draws have traditionally been implemented on Intel HW by
317	loading the indirect parameters from memory into HW registers using
318	the command streamer's ``MI_LOAD_REGISTER_MEM`` instruction before
319	dispatching a draw call to the 3D pipeline.
320
321	On recent products, it was found that the command streamer is showing
322	as performance bottleneck, because it cannot dispatch draw calls fast
323	enough to keep the 3D pipeline busy.
324
325	The solution to this problem is to change the way we deal with
326	indirect draws. Instead of loading HW registers with values using the
327	command streamer, we generate entire set of ``3DPRIMITIVE``
328	instructions using a shader. The generated instructions contain the
329	entire draw call parameters. This way the command streamer executes
330	only ``3DPRIMITIVE`` instructions and doesn't do any data loading from
331	memory or touch HW registers, feeding the 3D pipeline as fast as it
332	can.
333
334	In ANV this implemented in 2 different ways :
335
336	By generating instructions directly into the command stream using a
337	side batch buffer. When ANV encounters the first indirect draws, it
338	generates a jump into the side batch, the side batch contains a draw
339	call using a generation shader for each indirect draw. We keep adding
340	on more generation draws into the batch until we have to stop due to
341	command buffer end, secondary command buffer calls or a barrier
342	containing the access flag ``VK_ACCESS_INDIRECT_COMMAND_READ_BIT``.
343	The side batch buffer jump back right after the instruction where it
344	was called. Here is a high level diagram showing how the generation
345	batch buffer writes in the main command buffer :
346
347	.. graphviz::
348
349	digraph commands_mode {
350	rankdir = "LR"
351	"main-command-buffer" [
352	label = "main command buffer\|...\|draw indirect0 start\|<f0>jump to\ngeneration batch\|<f1>\|<f2>empty instruction0\|<f3>empty instruction1\|...\|draw indirect0 end\|...\|draw indirect1 start\|<f4>empty instruction0\|<f5>empty instruction1\|...\|<f6>draw indirect1 end\|..."
353	shape = "record"
354	];
355	"generation-command-buffer" [
356	label = "generation command buffer\|<f0>\|<f1>write draw indirect0\|<f2>write draw indirect1\|...\|<f3>exit jump"
357	shape = "record"
358	];
359	"main-command-buffer":f0 -> "generation-command-buffer":f0;
360	"generation-command-buffer":f1 -> "main-command-buffer":f2 [color="#0000ff"];
361	"generation-command-buffer":f1 -> "main-command-buffer":f3 [color="#0000ff"];
362	"generation-command-buffer":f2 -> "main-command-buffer":f4 [color="#0000ff"];
363	"generation-command-buffer":f2 -> "main-command-buffer":f5 [color="#0000ff"];
364	"generation-command-buffer":f3 -> "main-command-buffer":f1;
365	}
366
367	By generating instructions into a ring buffer of commands, when the
368	draw count number is high. This solution allows smaller batches to be
369	emitted. Here is a high level diagram showing how things are
370	executed :
371
372	.. graphviz::
373
374	digraph ring_mode {
375	rankdir=LR;
376	"main-command-buffer" [
377	label = "main command buffer\|...\| draw indirect \|<f1>generation shader\|<f2> jump to ring\|<f3> increment\ndraw_base\|<f4>..."
378	shape = "record"
379	];
380	"ring-buffer" [
381	label = "ring buffer\|<f0>generated draw0\|<f1>generated draw1\|<f2>generated draw2\|...\|<f3>exit jump"
382	shape = "record"
383	];
384	"main-command-buffer":f2 -> "ring-buffer":f0;
385	"ring-buffer":f3 -> "main-command-buffer":f3;
386	"ring-buffer":f3 -> "main-command-buffer":f4;
387	"main-command-buffer":f3 -> "main-command-buffer":f1;
388	"main-command-buffer":f1 -> "ring-buffer":f1 [color="#0000ff"];
389	"main-command-buffer":f1 -> "ring-buffer":f2 [color="#0000ff"];
390	}

注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

source: vbox/trunk/src/VBox/Additions/3D/mesa/mesa-24.0.2/docs/drivers/anv.rst

以其他格式下載: