VirtualBox

source: vbox/trunk/src/VBox/Additions/3D/mesa/mesa-24.0.2/docs/drivers/anv.rst

最後變更 在這個檔案是 103996,由 vboxsync 提交於 11 月 前

Additions/3D/mesa: export mesa-24.0.2 to OSE. bugref:10606

檔案大小: 14.1 KB
 
1ANV
2===
3
4Debugging
5---------
6
7Here are a few environment variable debug environment variables
8specific to ANV:
9
10:envvar:`ANV_ENABLE_PIPELINE_CACHE`
11 If defined to ``0`` or ``false``, this will disable pipeline
12 caching, forcing ANV to reparse and recompile any VkShaderModule
13 (SPIRV) it is given.
14:envvar:`ANV_DISABLE_SECONDARY_CMD_BUFFER_CALLS`
15 If defined to ``1`` or ``true``, this will prevent usage of self
16 modifying command buffers to implement ``vkCmdExecuteCommands``. As
17 a result of this, it will also disable :ext:`VK_KHR_performance_query`.
18:envvar:`ANV_ALWAYS_BINDLESS`
19 If defined to ``1`` or ``true``, this forces all descriptor sets to
20 use the internal `Bindless model`_.
21:envvar:`ANV_QUEUE_THREAD_DISABLE`
22 If defined to ``1`` or ``true``, this disables support for timeline
23 semaphores.
24:envvar:`ANV_USERSPACE_RELOCS`
25 If defined to ``1`` or ``true``, this forces ANV to always do
26 kernel relocations in command buffers. This should only have an
27 effect on hardware that doesn't support soft-pinning (Ivybridge,
28 Haswell, Cherryview).
29:envvar:`ANV_PRIMITIVE_REPLICATION_MAX_VIEWS`
30 Specifies up to how many view shaders can be lowered to handle
31 :ext:`VK_KHR_multiview`. Beyond this number, multiview is implemented
32 using instanced rendering. If unspecified, the value default to
33 ``2``.
34
35
36Experimental features
37---------------------
38
39.. _`Bindless model`:
40
41Binding Model
42-------------
43
44Here is the ANV bindless binding model that was implemented for the
45descriptor indexing feature of Vulkan 1.2 :
46
47.. graphviz::
48
49 digraph G {
50 fontcolor="black";
51 compound=true;
52
53 subgraph cluster_1 {
54 label = "Binding Table (HW)";
55
56 bgcolor="cornflowerblue";
57
58 node [ style=filled,shape="record",fillcolor="white",
59 label="RT0" ] n0;
60 node [ label="RT1" ] n1;
61 node [ label="dynbuf0"] n2;
62 node [ label="set0" ] n3;
63 node [ label="set1" ] n4;
64 node [ label="set2" ] n5;
65
66 n0 -> n1 -> n2 -> n3 -> n4 -> n5 [style=invis];
67 }
68 subgraph cluster_2 {
69 label = "Descriptor Set 0";
70
71 bgcolor="burlywood3";
72 fixedsize = true;
73
74 node [ style=filled,shape="record",fillcolor="white", fixedsize = true, width=4,
75 label="binding 0 - STORAGE_IMAGE\n anv_storage_image_descriptor" ] n8;
76 node [ label="binding 1 - COMBINED_IMAGE_SAMPLER\n anv_sampled_image_descriptor" ] n9;
77 node [ label="binding 2 - UNIFORM_BUFFER\n anv_address_range_descriptor" ] n10;
78 node [ label="binding 3 - UNIFORM_TEXEL_BUFFER\n anv_storage_image_descriptor" ] n11;
79
80 n8 -> n9 -> n10 -> n11 [style=invis];
81 }
82 subgraph cluster_5 {
83 label = "Vulkan Objects"
84
85 fontcolor="black";
86 bgcolor="darkolivegreen4";
87
88 subgraph cluster_6 {
89 label = "VkImageView";
90
91 bgcolor=darkolivegreen3;
92 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
93 label="surface_state" ] n12;
94 }
95 subgraph cluster_7 {
96 label = "VkSampler";
97
98 bgcolor=darkolivegreen3;
99 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
100 label="sample_state" ] n13;
101 }
102 subgraph cluster_8 {
103 label = "VkImageView";
104 bgcolor="darkolivegreen3";
105
106 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
107 label="surface_state" ] n14;
108 }
109 subgraph cluster_9 {
110 label = "VkBuffer";
111 bgcolor=darkolivegreen3;
112
113 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
114 label="address" ] n15;
115 }
116 subgraph cluster_10 {
117 label = "VkBufferView";
118
119 bgcolor=darkolivegreen3;
120 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
121 label="surface_state" ] n16;
122 }
123
124 n12 -> n13 -> n14 -> n15 -> n16 [style=invis];
125 }
126
127 subgraph cluster_11 {
128 subgraph cluster_12 {
129 label = "CommandBuffer state stream";
130
131 bgcolor="gold3";
132 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
133 label="surface_state" ] n17;
134 node [ label="surface_state" ] n18;
135 node [ label="surface_state" ] n19;
136
137 n17 -> n18 -> n19 [style=invis];
138 }
139 }
140
141 n3 -> n8 [lhead=cluster_2];
142
143 n8 -> n12;
144 n9 -> n13;
145 n9 -> n14;
146 n10 -> n15;
147 n11 -> n16;
148
149 n0 -> n17;
150 n1 -> n18;
151 n2 -> n19;
152 }
153
154
155
156The HW binding table is generated when the draw or dispatch commands
157are emitted. Here are the types of entries one can find in the binding
158table :
159
160- The currently bound descriptor sets, one entry per descriptor set
161 (our limit is 8).
162
163- For dynamic buffers, one entry per dynamic buffer.
164
165- For draw commands, render target entries if needed.
166
167The entries of the HW binding table for descriptor sets are
168RENDER_SURFACE_STATE similar to what you would have for a normal
169uniform buffer. The shader will emit reads this buffer first to get
170the information it needs to access a surface/sampler/etc... and then
171emits the appropriate message using the information gathered from the
172descriptor set buffer.
173
174Each binding type entry gets an associated structure in memory
175(``anv_storage_image_descriptor``, ``anv_sampled_image_descriptor``,
176``anv_address_range_descriptor``, ``anv_storage_image_descriptor``).
177This is the information read by the shader.
178
179
180.. _`Binding tables`:
181
182Binding Tables
183--------------
184
185Binding tables are arrays of 32bit offset entries referencing surface
186states. This is how shaders can refer to binding table entry to read
187or write a surface. For example fragment shaders will often refer to
188entry 0 as the first render target.
189
190The way binding tables are managed is fairly awkward.
191
192Each shader stage must have its binding table programmed through
193a corresponding instruction
194``3DSTATE_BINDING_TABLE_POINTERS_*`` (each stage has its own).
195
196.. graphviz::
197
198 digraph structs {
199 node [shape=record];
200 struct3 [label="{ binding tables&#92;n area | { <bt4> BT4 | <bt3> BT3 | ... | <bt0> BT0 } }|{ surface state&#92;n area |{<ss0> ss0|<ss1> ss1|<ss2> ss2|...}}"];
201 struct3:bt0 -> struct3:ss0;
202 struct3:bt0 -> struct3:ss1;
203 }
204
205
206The value programmed in the ``3DSTATE_BINDING_TABLE_POINTERS_*``
207instructions is not a 64bit pointer but an offset from the address
208programmed in ``STATE_BASE_ADDRESS::Surface State Base Address`` or
209``3DSTATE_BINDING_TABLE_POOL_ALLOC::Binding Table Pool Base Address``
210(available on Gfx11+). The offset value in
211``3DSTATE_BINDING_TABLE_POINTERS_*`` is also limited to a few bits
212(not a full 32bit value), meaning that as we use more and more binding
213tables we need to reposition ``STATE_BASE_ADDRESS::Surface State Base
214Address`` to make space for new binding table arrays.
215
216To make things even more awkward, the binding table entries are also
217relative to ``STATE_BASE_ADDRESS::Surface State Base Address`` so as
218we change ``STATE_BASE_ADDRESS::Surface State Base Address`` we need
219add that offsets to the binding table entries.
220
221The way with deal with this is that we allocate 4Gb of address space
222(since the binding table entries can address 4Gb of surface state
223elements). We reserve the first gigabyte exclusively to binding
224tables, so that anywhere we position our binding table in that first
225gigabyte, it can always refer to the surface states in the next 3Gb.
226
227
228.. _`Descriptor Set Memory Layout`:
229
230Descriptor Set Memory Layout
231----------------------------
232
233Here is a representation of how the descriptor set bindings, with each
234elements in each binding is mapped to a the descriptor set memory :
235
236.. graphviz::
237
238 digraph structs {
239 node [shape=record];
240 rankdir=LR;
241
242 struct1 [label="Descriptor Set | \
243 <b0> binding 0\n STORAGE_IMAGE \n (array_length=3) | \
244 <b1> binding 1\n COMBINED_IMAGE_SAMPLER \n (array_length=2) | \
245 <b2> binding 2\n UNIFORM_BUFFER \n (array_length=1) | \
246 <b3> binding 3\n UNIFORM_TEXEL_BUFFER \n (array_length=1)"];
247 struct2 [label="Descriptor Set Memory | \
248 <b0e0> anv_storage_image_descriptor|\
249 <b0e1> anv_storage_image_descriptor|\
250 <b0e2> anv_storage_image_descriptor|\
251 <b1e0> anv_sampled_image_descriptor|\
252 <b1e1> anv_sampled_image_descriptor|\
253 <b2e0> anv_address_range_descriptor|\
254 <b3e0> anv_storage_image_descriptor"];
255
256 struct1:b0 -> struct2:b0e0;
257 struct1:b0 -> struct2:b0e1;
258 struct1:b0 -> struct2:b0e2;
259 struct1:b1 -> struct2:b1e0;
260 struct1:b1 -> struct2:b1e1;
261 struct1:b2 -> struct2:b2e0;
262 struct1:b3 -> struct2:b3e0;
263 }
264
265Each Binding in the descriptor set is allocated an array of
266``anv_*_descriptor`` data structure. The type of ``anv_*_descriptor``
267used for a binding is selected based on the ``VkDescriptorType`` of
268the bindings.
269
270The value of ``anv_descriptor_set_binding_layout::descriptor_offset``
271is a byte offset from the descriptor set memory to the associated
272binding. ``anv_descriptor_set_binding_layout::array_size`` is the
273number of ``anv_*_descriptor`` elements in the descriptor set memory
274from that offset for the binding.
275
276
277Pipeline state emission
278-----------------------
279
280Vulkan initially started by baking as much state as possible in
281pipelines. But extension after extension, more and more state has
282become potentially dynamic.
283
284ANV tries to limit the amount of time an instruction has to be packed
285to reprogram part of the 3D pipeline state. The packing is happening
286in 2 places :
287
288- ``genX_pipeline.c`` where the non dynamic state is emitted in the
289 pipeline batch. Chunks of the batches are copied into the command
290 buffer as a result of calling ``vkCmdBindPipeline()``, depending on
291 what changes from the previously bound graphics pipeline
292
293- ``genX_gfx_state.c`` where the dynamic state is added to already
294 packed instructions from ``genX_pipeline.c``
295
296The rule to know where to emit an instruction programming the 3D
297pipeline is as follow :
298
299- If any field of the instruction can be made dynamic, it should be
300 emitted in ``genX_gfx_state.c``
301
302- Otherwise, the instruction can be emitted in ``genX_pipeline.c``
303
304When a piece of state programming is dynamic, it should have a
305corresponding field in ``anv_gfx_dynamic_state`` and the
306``genX(cmd_buffer_flush_gfx_runtime_state)`` function should be
307updated to ensure we minimize the amount of time an instruction should
308be emitted. Each instruction should have a associated
309``ANV_GFX_STATE_*`` mask so that the dynamic emission code can tell
310when to re-emit an instruction.
311
312
313Generated indirect draws optimization
314-------------------------------------
315
316Indirect draws have traditionally been implemented on Intel HW by
317loading the indirect parameters from memory into HW registers using
318the command streamer's ``MI_LOAD_REGISTER_MEM`` instruction before
319dispatching a draw call to the 3D pipeline.
320
321On recent products, it was found that the command streamer is showing
322as performance bottleneck, because it cannot dispatch draw calls fast
323enough to keep the 3D pipeline busy.
324
325The solution to this problem is to change the way we deal with
326indirect draws. Instead of loading HW registers with values using the
327command streamer, we generate entire set of ``3DPRIMITIVE``
328instructions using a shader. The generated instructions contain the
329entire draw call parameters. This way the command streamer executes
330only ``3DPRIMITIVE`` instructions and doesn't do any data loading from
331memory or touch HW registers, feeding the 3D pipeline as fast as it
332can.
333
334In ANV this implemented in 2 different ways :
335
336By generating instructions directly into the command stream using a
337side batch buffer. When ANV encounters the first indirect draws, it
338generates a jump into the side batch, the side batch contains a draw
339call using a generation shader for each indirect draw. We keep adding
340on more generation draws into the batch until we have to stop due to
341command buffer end, secondary command buffer calls or a barrier
342containing the access flag ``VK_ACCESS_INDIRECT_COMMAND_READ_BIT``.
343The side batch buffer jump back right after the instruction where it
344was called. Here is a high level diagram showing how the generation
345batch buffer writes in the main command buffer :
346
347.. graphviz::
348
349 digraph commands_mode {
350 rankdir = "LR"
351 "main-command-buffer" [
352 label = "main command buffer|...|draw indirect0 start|<f0>jump to\ngeneration batch|<f1>|<f2>empty instruction0|<f3>empty instruction1|...|draw indirect0 end|...|draw indirect1 start|<f4>empty instruction0|<f5>empty instruction1|...|<f6>draw indirect1 end|..."
353 shape = "record"
354 ];
355 "generation-command-buffer" [
356 label = "generation command buffer|<f0>|<f1>write draw indirect0|<f2>write draw indirect1|...|<f3>exit jump"
357 shape = "record"
358 ];
359 "main-command-buffer":f0 -> "generation-command-buffer":f0;
360 "generation-command-buffer":f1 -> "main-command-buffer":f2 [color="#0000ff"];
361 "generation-command-buffer":f1 -> "main-command-buffer":f3 [color="#0000ff"];
362 "generation-command-buffer":f2 -> "main-command-buffer":f4 [color="#0000ff"];
363 "generation-command-buffer":f2 -> "main-command-buffer":f5 [color="#0000ff"];
364 "generation-command-buffer":f3 -> "main-command-buffer":f1;
365 }
366
367By generating instructions into a ring buffer of commands, when the
368draw count number is high. This solution allows smaller batches to be
369emitted. Here is a high level diagram showing how things are
370executed :
371
372.. graphviz::
373
374 digraph ring_mode {
375 rankdir=LR;
376 "main-command-buffer" [
377 label = "main command buffer|...| draw indirect |<f1>generation shader|<f2> jump to ring|<f3> increment\ndraw_base|<f4>..."
378 shape = "record"
379 ];
380 "ring-buffer" [
381 label = "ring buffer|<f0>generated draw0|<f1>generated draw1|<f2>generated draw2|...|<f3>exit jump"
382 shape = "record"
383 ];
384 "main-command-buffer":f2 -> "ring-buffer":f0;
385 "ring-buffer":f3 -> "main-command-buffer":f3;
386 "ring-buffer":f3 -> "main-command-buffer":f4;
387 "main-command-buffer":f3 -> "main-command-buffer":f1;
388 "main-command-buffer":f1 -> "ring-buffer":f1 [color="#0000ff"];
389 "main-command-buffer":f1 -> "ring-buffer":f2 [color="#0000ff"];
390 }
注意: 瀏覽 TracBrowser 來幫助您使用儲存庫瀏覽器

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette