1 | .. _context:
|
---|
2 |
|
---|
3 | Context
|
---|
4 | =======
|
---|
5 |
|
---|
6 | A Gallium rendering context encapsulates the state which effects 3D
|
---|
7 | rendering such as blend state, depth/stencil state, texture samplers,
|
---|
8 | etc.
|
---|
9 |
|
---|
10 | Note that resource/texture allocation is not per-context but per-screen.
|
---|
11 |
|
---|
12 |
|
---|
13 | Methods
|
---|
14 | -------
|
---|
15 |
|
---|
16 | CSO State
|
---|
17 | ^^^^^^^^^
|
---|
18 |
|
---|
19 | All Constant State Object (CSO) state is created, bound, and destroyed,
|
---|
20 | with triplets of methods that all follow a specific naming scheme.
|
---|
21 | For example, ``create_blend_state``, ``bind_blend_state``, and
|
---|
22 | ``destroy_blend_state``.
|
---|
23 |
|
---|
24 | CSO objects handled by the context object:
|
---|
25 |
|
---|
26 | * :ref:`Blend`: ``*_blend_state``
|
---|
27 | * :ref:`Sampler`: Texture sampler states are bound separately for fragment,
|
---|
28 | vertex, geometry and compute shaders with the ``bind_sampler_states``
|
---|
29 | function. The ``start`` and ``num_samplers`` parameters indicate a range
|
---|
30 | of samplers to change. NOTE: at this time, start is always zero and
|
---|
31 | the CSO module will always replace all samplers at once (no sub-ranges).
|
---|
32 | This may change in the future.
|
---|
33 | * :ref:`Rasterizer`: ``*_rasterizer_state``
|
---|
34 | * :ref:`depth-stencil-alpha`: ``*_depth_stencil_alpha_state``
|
---|
35 | * :ref:`Shader`: These are create, bind and destroy methods for vertex,
|
---|
36 | fragment and geometry shaders.
|
---|
37 | * :ref:`vertexelements`: ``*_vertex_elements_state``
|
---|
38 |
|
---|
39 |
|
---|
40 | Resource Binding State
|
---|
41 | ^^^^^^^^^^^^^^^^^^^^^^
|
---|
42 |
|
---|
43 | This state describes how resources in various flavors (textures,
|
---|
44 | buffers, surfaces) are bound to the driver.
|
---|
45 |
|
---|
46 |
|
---|
47 | * ``set_constant_buffer`` sets a constant buffer to be used for a given shader
|
---|
48 | type. index is used to indicate which buffer to set (some APIs may allow
|
---|
49 | multiple ones to be set, and binding a specific one later, though drivers
|
---|
50 | are mostly restricted to the first one right now).
|
---|
51 | If take_ownership is true, the buffer reference is passed to the driver, so
|
---|
52 | that the driver doesn't have to increment the reference count.
|
---|
53 |
|
---|
54 | * ``set_inlinable_constants`` sets inlinable constants for constant buffer 0.
|
---|
55 |
|
---|
56 | These are constants that the driver would like to inline in the IR
|
---|
57 | of the current shader and recompile it. Drivers can determine which
|
---|
58 | constants they prefer to inline in finalize_nir and store that
|
---|
59 | information in shader_info::*inlinable_uniform*. When the state tracker
|
---|
60 | or frontend uploads constants to a constant buffer, it can pass
|
---|
61 | inlinable constants separately via this call.
|
---|
62 |
|
---|
63 | Any ``set_constant_buffer`` call invalidates inlinable constants, so
|
---|
64 | ``set_inlinable_constants`` must be called after it. Binding a shader also
|
---|
65 | invalidates this state.
|
---|
66 |
|
---|
67 | There is no ``PIPE_CAP`` for this. Drivers shouldn't set the shader_info
|
---|
68 | fields if they don't implement ``set_inlinable_constants``.
|
---|
69 |
|
---|
70 | * ``set_framebuffer_state``
|
---|
71 |
|
---|
72 | * ``set_vertex_buffers``
|
---|
73 |
|
---|
74 |
|
---|
75 | Non-CSO State
|
---|
76 | ^^^^^^^^^^^^^
|
---|
77 |
|
---|
78 | These pieces of state are too small, variable, and/or trivial to have CSO
|
---|
79 | objects. They all follow simple, one-method binding calls, e.g.
|
---|
80 | ``set_blend_color``.
|
---|
81 |
|
---|
82 | * ``set_stencil_ref`` sets the stencil front and back reference values
|
---|
83 | which are used as comparison values in stencil test.
|
---|
84 | * ``set_blend_color``
|
---|
85 | * ``set_sample_mask`` sets the per-context multisample sample mask. Note
|
---|
86 | that this takes effect even if multisampling is not explicitly enabled if
|
---|
87 | the framebuffer surface(s) are multisampled. Also, this mask is AND-ed
|
---|
88 | with the optional fragment shader sample mask output (when emitted).
|
---|
89 | * ``set_sample_locations`` sets the sample locations used for rasterization.
|
---|
90 | ```get_sample_position``` still returns the default locations. When NULL,
|
---|
91 | the default locations are used.
|
---|
92 | * ``set_min_samples`` sets the minimum number of samples that must be run.
|
---|
93 | * ``set_clip_state``
|
---|
94 | * ``set_polygon_stipple``
|
---|
95 | * ``set_scissor_states`` sets the bounds for the scissor test, which culls
|
---|
96 | pixels before blending to render targets. If the :ref:`Rasterizer` does
|
---|
97 | not have the scissor test enabled, then the scissor bounds never need to
|
---|
98 | be set since they will not be used. Note that scissor xmin and ymin are
|
---|
99 | inclusive, but xmax and ymax are exclusive. The inclusive ranges in x
|
---|
100 | and y would be [xmin..xmax-1] and [ymin..ymax-1]. The number of scissors
|
---|
101 | should be the same as the number of set viewports and can be up to
|
---|
102 | PIPE_MAX_VIEWPORTS.
|
---|
103 | * ``set_viewport_states``
|
---|
104 | * ``set_window_rectangles`` sets the window rectangles to be used for
|
---|
105 | rendering, as defined by :ext:`GL_EXT_window_rectangles`. There are two
|
---|
106 | modes - include and exclude, which define whether the supplied
|
---|
107 | rectangles are to be used for including fragments or excluding
|
---|
108 | them. All of the rectangles are ORed together, so in exclude mode,
|
---|
109 | any fragment inside any rectangle would be culled, while in include
|
---|
110 | mode, any fragment outside all rectangles would be culled. xmin/ymin
|
---|
111 | are inclusive, while xmax/ymax are exclusive (same as scissor states
|
---|
112 | above). Note that this only applies to draws, not clears or
|
---|
113 | blits. (Blits have their own way to pass the requisite rectangles
|
---|
114 | in.)
|
---|
115 | * ``set_tess_state`` configures the default tessellation parameters:
|
---|
116 |
|
---|
117 | * ``default_outer_level`` is the default value for the outer tessellation
|
---|
118 | levels. This corresponds to GL's ``PATCH_DEFAULT_OUTER_LEVEL``.
|
---|
119 | * ``default_inner_level`` is the default value for the inner tessellation
|
---|
120 | levels. This corresponds to GL's ``PATCH_DEFAULT_INNER_LEVEL``.
|
---|
121 | * ``set_patch_vertices`` sets the number of vertices per input patch
|
---|
122 | for tessellation.
|
---|
123 |
|
---|
124 | * ``set_debug_callback`` sets the callback to be used for reporting
|
---|
125 | various debug messages, eventually reported via :ext:`GL_KHR_debug` and
|
---|
126 | similar mechanisms.
|
---|
127 |
|
---|
128 | Samplers
|
---|
129 | ^^^^^^^^
|
---|
130 |
|
---|
131 | pipe_sampler_state objects control how textures are sampled (coordinate wrap
|
---|
132 | modes, interpolation modes, etc). Samplers are only required for texture
|
---|
133 | instructions for which nir_tex_instr_need_sampler returns true. Drivers must
|
---|
134 | ignore samplers for other texture instructions. Frontends may or may not bind
|
---|
135 | samplers when no texture instruction use them. Notably, frontends may not bind
|
---|
136 | samplers for texture buffer objects, which are never accessed with samplers.
|
---|
137 |
|
---|
138 | Sampler Views
|
---|
139 | ^^^^^^^^^^^^^
|
---|
140 |
|
---|
141 | These are the means to bind textures to shader stages. To create one, specify
|
---|
142 | its format, swizzle and LOD range in sampler view template.
|
---|
143 |
|
---|
144 | If texture format is different than template format, it is said the texture
|
---|
145 | is being cast to another format. Casting can be done only between compatible
|
---|
146 | formats, that is formats that have matching component order and sizes.
|
---|
147 |
|
---|
148 | Swizzle fields specify the way in which fetched texel components are placed
|
---|
149 | in the result register. For example, ``swizzle_r`` specifies what is going to be
|
---|
150 | placed in first component of result register.
|
---|
151 |
|
---|
152 | The ``first_level`` and ``last_level`` fields of sampler view template specify
|
---|
153 | the LOD range the texture is going to be constrained to. Note that these
|
---|
154 | values are in addition to the respective min_lod, max_lod values in the
|
---|
155 | pipe_sampler_state (that is if min_lod is 2.0, and first_level 3, the first mip
|
---|
156 | level used for sampling from the resource is effectively the fifth).
|
---|
157 |
|
---|
158 | The ``first_layer`` and ``last_layer`` fields specify the layer range the
|
---|
159 | texture is going to be constrained to. Similar to the LOD range, this is added
|
---|
160 | to the array index which is used for sampling.
|
---|
161 |
|
---|
162 | * ``set_sampler_views`` binds an array of sampler views to a shader stage.
|
---|
163 | Every binding point acquires a reference
|
---|
164 | to a respective sampler view and releases a reference to the previous
|
---|
165 | sampler view.
|
---|
166 |
|
---|
167 | Sampler views outside of ``[start_slot, start_slot + num_views)`` are
|
---|
168 | unmodified. If ``views`` is NULL, the behavior is the same as if
|
---|
169 | ``views[n]`` was NULL for the entire range, i.e. releasing the reference
|
---|
170 | for all the sampler views in the specified range.
|
---|
171 |
|
---|
172 | * ``create_sampler_view`` creates a new sampler view. ``texture`` is associated
|
---|
173 | with the sampler view which results in sampler view holding a reference
|
---|
174 | to the texture. Format specified in template must be compatible
|
---|
175 | with texture format.
|
---|
176 |
|
---|
177 | * ``sampler_view_destroy`` destroys a sampler view and releases its reference
|
---|
178 | to associated texture.
|
---|
179 |
|
---|
180 | Hardware Atomic buffers
|
---|
181 | ^^^^^^^^^^^^^^^^^^^^^^^
|
---|
182 |
|
---|
183 | Buffers containing HW atomics are required to support the feature
|
---|
184 | on some drivers.
|
---|
185 |
|
---|
186 | Drivers that require this need to fill the ``set_hw_atomic_buffers`` method.
|
---|
187 |
|
---|
188 | Shader Resources
|
---|
189 | ^^^^^^^^^^^^^^^^
|
---|
190 |
|
---|
191 | Shader resources are textures or buffers that may be read or written
|
---|
192 | from a shader without an associated sampler. This means that they
|
---|
193 | have no support for floating point coordinates, address wrap modes or
|
---|
194 | filtering.
|
---|
195 |
|
---|
196 | There are 2 types of shader resources: buffers and images.
|
---|
197 |
|
---|
198 | Buffers are specified using the ``set_shader_buffers`` method.
|
---|
199 |
|
---|
200 | Images are specified using the ``set_shader_images`` method. When binding
|
---|
201 | images, the ``level``, ``first_layer`` and ``last_layer`` pipe_image_view
|
---|
202 | fields specify the mipmap level and the range of layers the image will be
|
---|
203 | constrained to.
|
---|
204 |
|
---|
205 | Surfaces
|
---|
206 | ^^^^^^^^
|
---|
207 |
|
---|
208 | These are the means to use resources as color render targets or depthstencil
|
---|
209 | attachments. To create one, specify the mip level, the range of layers, and
|
---|
210 | the bind flags (either PIPE_BIND_DEPTH_STENCIL or PIPE_BIND_RENDER_TARGET).
|
---|
211 | Note that layer values are in addition to what is indicated by the geometry
|
---|
212 | shader output variable XXX_FIXME (that is if first_layer is 3 and geometry
|
---|
213 | shader indicates index 2, the 5th layer of the resource will be used). These
|
---|
214 | first_layer and last_layer parameters will only be used for 1d array, 2d array,
|
---|
215 | cube, and 3d textures otherwise they are 0.
|
---|
216 |
|
---|
217 | * ``create_surface`` creates a new surface.
|
---|
218 |
|
---|
219 | * ``surface_destroy`` destroys a surface and releases its reference to the
|
---|
220 | associated resource.
|
---|
221 |
|
---|
222 | Stream output targets
|
---|
223 | ^^^^^^^^^^^^^^^^^^^^^
|
---|
224 |
|
---|
225 | Stream output, also known as transform feedback, allows writing the primitives
|
---|
226 | produced by the vertex pipeline to buffers. This is done after the geometry
|
---|
227 | shader or vertex shader if no geometry shader is present.
|
---|
228 |
|
---|
229 | The stream output targets are views into buffer resources which can be bound
|
---|
230 | as stream outputs and specify a memory range where it's valid to write
|
---|
231 | primitives. The pipe driver must implement memory protection such that any
|
---|
232 | primitives written outside of the specified memory range are discarded.
|
---|
233 |
|
---|
234 | Two stream output targets can use the same resource at the same time, but
|
---|
235 | with a disjoint memory range.
|
---|
236 |
|
---|
237 | Additionally, the stream output target internally maintains the offset
|
---|
238 | into the buffer which is incremented every time something is written to it.
|
---|
239 | The internal offset is equal to how much data has already been written.
|
---|
240 | It can be stored in device memory and the CPU actually doesn't have to query
|
---|
241 | it.
|
---|
242 |
|
---|
243 | The stream output target can be used in a draw command to provide
|
---|
244 | the vertex count. The vertex count is derived from the internal offset
|
---|
245 | discussed above.
|
---|
246 |
|
---|
247 | * ``create_stream_output_target`` create a new target.
|
---|
248 |
|
---|
249 | * ``stream_output_target_destroy`` destroys a target. Users of this should
|
---|
250 | use pipe_so_target_reference instead.
|
---|
251 |
|
---|
252 | * ``set_stream_output_targets`` binds stream output targets. The parameter
|
---|
253 | offset is an array which specifies the internal offset of the buffer. The
|
---|
254 | internal offset is, besides writing, used for reading the data during the
|
---|
255 | draw_auto stage, i.e. it specifies how much data there is in the buffer
|
---|
256 | for the purposes of the draw_auto stage. -1 means the buffer should
|
---|
257 | be appended to, and everything else sets the internal offset.
|
---|
258 |
|
---|
259 | * ``stream_output_target_offset`` Retrieve the internal stream offset from
|
---|
260 | an streamout target. This is used to implement Vulkan pause/resume support
|
---|
261 | which needs to pass the internal offset to the API.
|
---|
262 |
|
---|
263 | NOTE: The currently-bound vertex or geometry shader must be compiled with
|
---|
264 | the properly-filled-in structure pipe_stream_output_info describing which
|
---|
265 | outputs should be written to buffers and how. The structure is part of
|
---|
266 | pipe_shader_state.
|
---|
267 |
|
---|
268 | Clearing
|
---|
269 | ^^^^^^^^
|
---|
270 |
|
---|
271 | Clear is one of the most difficult concepts to nail down to a single
|
---|
272 | interface (due to both different requirements from APIs and also driver/HW
|
---|
273 | specific differences).
|
---|
274 |
|
---|
275 | ``clear`` initializes some or all of the surfaces currently bound to
|
---|
276 | the framebuffer to particular RGBA, depth, or stencil values.
|
---|
277 | Currently, this does not take into account color or stencil write masks (as
|
---|
278 | used by GL), and always clears the whole surfaces (no scissoring as used by
|
---|
279 | GL clear or explicit rectangles like d3d9 uses). It can, however, also clear
|
---|
280 | only depth or stencil in a combined depth/stencil surface.
|
---|
281 | If a surface includes several layers then all layers will be cleared.
|
---|
282 |
|
---|
283 | ``clear_render_target`` clears a single color rendertarget with the specified
|
---|
284 | color value. While it is only possible to clear one surface at a time (which can
|
---|
285 | include several layers), this surface need not be bound to the framebuffer.
|
---|
286 | If render_condition_enabled is false, any current rendering condition is ignored
|
---|
287 | and the clear will be unconditional.
|
---|
288 |
|
---|
289 | ``clear_depth_stencil`` clears a single depth, stencil or depth/stencil surface
|
---|
290 | with the specified depth and stencil values (for combined depth/stencil buffers,
|
---|
291 | it is also possible to only clear one or the other part). While it is only
|
---|
292 | possible to clear one surface at a time (which can include several layers),
|
---|
293 | this surface need not be bound to the framebuffer.
|
---|
294 | If render_condition_enabled is false, any current rendering condition is ignored
|
---|
295 | and the clear will be unconditional.
|
---|
296 |
|
---|
297 | ``clear_texture`` clears a non-PIPE_BUFFER resource's specified level
|
---|
298 | and bounding box with a clear value provided in that resource's native
|
---|
299 | format.
|
---|
300 |
|
---|
301 | ``clear_buffer`` clears a PIPE_BUFFER resource with the specified clear value
|
---|
302 | (which may be multiple bytes in length). Logically this is a memset with a
|
---|
303 | multi-byte element value starting at offset bytes from resource start, going
|
---|
304 | for size bytes. It is guaranteed that size % clear_value_size == 0.
|
---|
305 |
|
---|
306 | Evaluating Depth Buffers
|
---|
307 | ^^^^^^^^^^^^^^^^^^^^^^^^
|
---|
308 |
|
---|
309 | ``evaluate_depth_buffer`` is a hint to decompress the current depth buffer
|
---|
310 | assuming the current sample locations to avoid problems that could arise when
|
---|
311 | using programmable sample locations.
|
---|
312 |
|
---|
313 | If a depth buffer is rendered with different sample location state than
|
---|
314 | what is current at the time of reading the depth buffer, the values may differ
|
---|
315 | because depth buffer compression can depend the sample locations.
|
---|
316 |
|
---|
317 |
|
---|
318 | Uploading
|
---|
319 | ^^^^^^^^^
|
---|
320 |
|
---|
321 | For simple single-use uploads, use ``pipe_context::stream_uploader`` or
|
---|
322 | ``pipe_context::const_uploader``. The latter should be used for uploading
|
---|
323 | constants, while the former should be used for uploading everything else.
|
---|
324 | PIPE_USAGE_STREAM is implied in both cases, so don't use the uploaders
|
---|
325 | for static allocations.
|
---|
326 |
|
---|
327 | Usage:
|
---|
328 |
|
---|
329 | Call u_upload_alloc or u_upload_data as many times as you want. After you are
|
---|
330 | done, call u_upload_unmap. If the driver doesn't support persistent mappings,
|
---|
331 | u_upload_unmap makes sure the previously mapped memory is unmapped.
|
---|
332 |
|
---|
333 | Gotchas:
|
---|
334 | - Always fill the memory immediately after u_upload_alloc. Any following call
|
---|
335 | to u_upload_alloc and u_upload_data can unmap memory returned by previous
|
---|
336 | u_upload_alloc.
|
---|
337 | - Don't interleave calls using stream_uploader and const_uploader. If you use
|
---|
338 | one of them, do the upload, unmap, and only then can you use the other one.
|
---|
339 |
|
---|
340 |
|
---|
341 | Drawing
|
---|
342 | ^^^^^^^
|
---|
343 |
|
---|
344 | ``draw_vbo`` draws a specified primitive. The primitive mode and other
|
---|
345 | properties are described by ``pipe_draw_info``.
|
---|
346 |
|
---|
347 | The ``mode``, ``start``, and ``count`` fields of ``pipe_draw_info`` specify the
|
---|
348 | the mode of the primitive and the vertices to be fetched, in the range between
|
---|
349 | ``start`` to ``start``+``count``-1, inclusive.
|
---|
350 |
|
---|
351 | Every instance with instanceID in the range between ``start_instance`` and
|
---|
352 | ``start_instance``+``instance_count``-1, inclusive, will be drawn.
|
---|
353 |
|
---|
354 | If ``index_size`` != 0, all vertex indices will be looked up from the index
|
---|
355 | buffer.
|
---|
356 |
|
---|
357 | In indexed draw, ``min_index`` and ``max_index`` respectively provide a lower
|
---|
358 | and upper bound of the indices contained in the index buffer inside the range
|
---|
359 | between ``start`` to ``start``+``count``-1. This allows the driver to
|
---|
360 | determine which subset of vertices will be referenced during the draw call
|
---|
361 | without having to scan the index buffer. Providing a over-estimation of the
|
---|
362 | the true bounds, for example, a ``min_index`` and ``max_index`` of 0 and
|
---|
363 | 0xffffffff respectively, must give exactly the same rendering, albeit with less
|
---|
364 | performance due to unreferenced vertex buffers being unnecessarily DMA'ed or
|
---|
365 | processed. Providing a underestimation of the true bounds will result in
|
---|
366 | undefined behavior, but should not result in program or system failure.
|
---|
367 |
|
---|
368 | In case of non-indexed draw, ``min_index`` should be set to
|
---|
369 | ``start`` and ``max_index`` should be set to ``start``+``count``-1.
|
---|
370 |
|
---|
371 | ``index_bias`` is a value added to every vertex index after lookup and before
|
---|
372 | fetching vertex attributes.
|
---|
373 |
|
---|
374 | When drawing indexed primitives, the primitive restart index can be
|
---|
375 | used to draw disjoint primitive strips. For example, several separate
|
---|
376 | line strips can be drawn by designating a special index value as the
|
---|
377 | restart index. The ``primitive_restart`` flag enables/disables this
|
---|
378 | feature. The ``restart_index`` field specifies the restart index value.
|
---|
379 |
|
---|
380 | When primitive restart is in use, array indexes are compared to the
|
---|
381 | restart index before adding the index_bias offset.
|
---|
382 |
|
---|
383 | If a given vertex element has ``instance_divisor`` set to 0, it is said
|
---|
384 | it contains per-vertex data and effective vertex attribute address needs
|
---|
385 | to be recalculated for every index.
|
---|
386 |
|
---|
387 | attribAddr = ``stride`` * index + ``src_offset``
|
---|
388 |
|
---|
389 | If a given vertex element has ``instance_divisor`` set to non-zero,
|
---|
390 | it is said it contains per-instance data and effective vertex attribute
|
---|
391 | address needs to recalculated for every ``instance_divisor``-th instance.
|
---|
392 |
|
---|
393 | attribAddr = ``stride`` * instanceID / ``instance_divisor`` + ``src_offset``
|
---|
394 |
|
---|
395 | In the above formulas, ``src_offset`` is taken from the given vertex element
|
---|
396 | and ``stride`` is taken from a vertex buffer associated with the given
|
---|
397 | vertex element.
|
---|
398 |
|
---|
399 | The calculated attribAddr is used as an offset into the vertex buffer to
|
---|
400 | fetch the attribute data.
|
---|
401 |
|
---|
402 | The value of ``instanceID`` can be read in a vertex shader through a system
|
---|
403 | value register declared with INSTANCEID semantic name.
|
---|
404 |
|
---|
405 |
|
---|
406 | Queries
|
---|
407 | ^^^^^^^
|
---|
408 |
|
---|
409 | Queries gather some statistic from the 3D pipeline over one or more
|
---|
410 | draws. Queries may be nested, though not all gallium frontends exercise this.
|
---|
411 |
|
---|
412 | Queries can be created with ``create_query`` and deleted with
|
---|
413 | ``destroy_query``. To start a query, use ``begin_query``, and when finished,
|
---|
414 | use ``end_query`` to end the query.
|
---|
415 |
|
---|
416 | ``create_query`` takes a query type (``PIPE_QUERY_*``), as well as an index,
|
---|
417 | which is the vertex stream for ``PIPE_QUERY_PRIMITIVES_GENERATED`` and
|
---|
418 | ``PIPE_QUERY_PRIMITIVES_EMITTED``, and allocates a query structure.
|
---|
419 |
|
---|
420 | ``begin_query`` will clear/reset previous query results.
|
---|
421 |
|
---|
422 | ``get_query_result`` is used to retrieve the results of a query. If
|
---|
423 | the ``wait`` parameter is TRUE, then the ``get_query_result`` call
|
---|
424 | will block until the results of the query are ready (and TRUE will be
|
---|
425 | returned). Otherwise, if the ``wait`` parameter is FALSE, the call
|
---|
426 | will not block and the return value will be TRUE if the query has
|
---|
427 | completed or FALSE otherwise.
|
---|
428 |
|
---|
429 | ``get_query_result_resource`` is used to store the result of a query into
|
---|
430 | a resource without synchronizing with the CPU. This write will optionally
|
---|
431 | wait for the query to complete, and will optionally write whether the value
|
---|
432 | is available instead of the value itself.
|
---|
433 |
|
---|
434 | ``set_active_query_state`` Set whether all current non-driver queries except
|
---|
435 | TIME_ELAPSED are active or paused.
|
---|
436 |
|
---|
437 | The interface currently includes the following types of queries:
|
---|
438 |
|
---|
439 | ``PIPE_QUERY_OCCLUSION_COUNTER`` counts the number of fragments which
|
---|
440 | are written to the framebuffer without being culled by
|
---|
441 | :ref:`depth-stencil-alpha` testing or shader KILL instructions.
|
---|
442 | The result is an unsigned 64-bit integer.
|
---|
443 | This query can be used with ``render_condition``.
|
---|
444 |
|
---|
445 | In cases where a boolean result of an occlusion query is enough,
|
---|
446 | ``PIPE_QUERY_OCCLUSION_PREDICATE`` should be used. It is just like
|
---|
447 | ``PIPE_QUERY_OCCLUSION_COUNTER`` except that the result is a boolean
|
---|
448 | value of FALSE for cases where COUNTER would result in 0 and TRUE
|
---|
449 | for all other cases.
|
---|
450 | This query can be used with ``render_condition``.
|
---|
451 |
|
---|
452 | In cases where a conservative approximation of an occlusion query is enough,
|
---|
453 | ``PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE`` should be used. It behaves
|
---|
454 | like ``PIPE_QUERY_OCCLUSION_PREDICATE``, except that it may return TRUE in
|
---|
455 | additional, implementation-dependent cases.
|
---|
456 | This query can be used with ``render_condition``.
|
---|
457 |
|
---|
458 | ``PIPE_QUERY_TIME_ELAPSED`` returns the amount of time, in nanoseconds,
|
---|
459 | the context takes to perform operations.
|
---|
460 | The result is an unsigned 64-bit integer.
|
---|
461 |
|
---|
462 | ``PIPE_QUERY_TIMESTAMP`` returns a device/driver internal timestamp,
|
---|
463 | scaled to nanoseconds, recorded after all commands issued prior to
|
---|
464 | ``end_query`` have been processed.
|
---|
465 | This query does not require a call to ``begin_query``.
|
---|
466 | The result is an unsigned 64-bit integer.
|
---|
467 |
|
---|
468 | ``PIPE_QUERY_TIMESTAMP_DISJOINT`` can be used to check the
|
---|
469 | internal timer resolution and whether the timestamp counter has become
|
---|
470 | unreliable due to things like throttling etc. - only if this is FALSE
|
---|
471 | a timestamp query (within the timestamp_disjoint query) should be trusted.
|
---|
472 | The result is a 64-bit integer specifying the timer resolution in Hz,
|
---|
473 | followed by a boolean value indicating whether the timestamp counter
|
---|
474 | is discontinuous or disjoint.
|
---|
475 |
|
---|
476 | ``PIPE_QUERY_PRIMITIVES_GENERATED`` returns a 64-bit integer indicating
|
---|
477 | the number of primitives processed by the pipeline (regardless of whether
|
---|
478 | stream output is active or not).
|
---|
479 |
|
---|
480 | ``PIPE_QUERY_PRIMITIVES_EMITTED`` returns a 64-bit integer indicating
|
---|
481 | the number of primitives written to stream output buffers.
|
---|
482 |
|
---|
483 | ``PIPE_QUERY_SO_STATISTICS`` returns 2 64-bit integers corresponding to
|
---|
484 | the result of
|
---|
485 | ``PIPE_QUERY_PRIMITIVES_EMITTED`` and
|
---|
486 | the number of primitives that would have been written to stream output buffers
|
---|
487 | if they had infinite space available (primitives_storage_needed), in this order.
|
---|
488 | XXX the 2nd value is equivalent to ``PIPE_QUERY_PRIMITIVES_GENERATED`` but it is
|
---|
489 | unclear if it should be increased if stream output is not active.
|
---|
490 |
|
---|
491 | ``PIPE_QUERY_SO_OVERFLOW_PREDICATE`` returns a boolean value indicating
|
---|
492 | whether a selected stream output target has overflowed as a result of the
|
---|
493 | commands issued between ``begin_query`` and ``end_query``.
|
---|
494 | This query can be used with ``render_condition``. The output stream is
|
---|
495 | selected by the stream number passed to ``create_query``.
|
---|
496 |
|
---|
497 | ``PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE`` returns a boolean value indicating
|
---|
498 | whether any stream output target has overflowed as a result of the commands
|
---|
499 | issued between ``begin_query`` and ``end_query``. This query can be used
|
---|
500 | with ``render_condition``, and its result is the logical OR of multiple
|
---|
501 | ``PIPE_QUERY_SO_OVERFLOW_PREDICATE`` queries, one for each stream output
|
---|
502 | target.
|
---|
503 |
|
---|
504 | ``PIPE_QUERY_GPU_FINISHED`` returns a boolean value indicating whether
|
---|
505 | all commands issued before ``end_query`` have completed. However, this
|
---|
506 | does not imply serialization.
|
---|
507 | This query does not require a call to ``begin_query``.
|
---|
508 |
|
---|
509 | ``PIPE_QUERY_PIPELINE_STATISTICS`` returns an array of the following
|
---|
510 | 64-bit integers:
|
---|
511 | Number of vertices read from vertex buffers.
|
---|
512 | Number of primitives read from vertex buffers.
|
---|
513 | Number of vertex shader threads launched.
|
---|
514 | Number of geometry shader threads launched.
|
---|
515 | Number of primitives generated by geometry shaders.
|
---|
516 | Number of primitives forwarded to the rasterizer.
|
---|
517 | Number of primitives rasterized.
|
---|
518 | Number of fragment shader threads launched.
|
---|
519 | Number of tessellation control shader threads launched.
|
---|
520 | Number of tessellation evaluation shader threads launched.
|
---|
521 | If a shader type is not supported by the device/driver,
|
---|
522 | the corresponding values should be set to 0.
|
---|
523 |
|
---|
524 | ``PIPE_QUERY_PIPELINE_STATISTICS_SINGLE`` returns a single counter from
|
---|
525 | the ``PIPE_QUERY_PIPELINE_STATISTICS`` group. The specific counter must
|
---|
526 | be selected when calling ``create_query`` by passing one of the
|
---|
527 | ``PIPE_STAT_QUERY`` enums as the query's ``index``.
|
---|
528 |
|
---|
529 | Gallium does not guarantee the availability of any query types; one must
|
---|
530 | always check the capabilities of the :ref:`Screen` first.
|
---|
531 |
|
---|
532 |
|
---|
533 | Conditional Rendering
|
---|
534 | ^^^^^^^^^^^^^^^^^^^^^
|
---|
535 |
|
---|
536 | A drawing command can be skipped depending on the outcome of a query
|
---|
537 | (typically an occlusion query, or streamout overflow predicate).
|
---|
538 | The ``render_condition`` function specifies the query which should be checked
|
---|
539 | prior to rendering anything. Functions always honoring render_condition include
|
---|
540 | (and are limited to) draw_vbo and clear.
|
---|
541 | The blit, clear_render_target and clear_depth_stencil functions (but
|
---|
542 | not resource_copy_region, which seems inconsistent) can also optionally honor
|
---|
543 | the current render condition.
|
---|
544 |
|
---|
545 | If ``render_condition`` is called with ``query`` = NULL, conditional
|
---|
546 | rendering is disabled and drawing takes place normally.
|
---|
547 |
|
---|
548 | If ``render_condition`` is called with a non-null ``query`` subsequent
|
---|
549 | drawing commands will be predicated on the outcome of the query.
|
---|
550 | Commands will be skipped if ``condition`` is equal to the predicate result
|
---|
551 | (for non-boolean queries such as OCCLUSION_QUERY, zero counts as FALSE,
|
---|
552 | non-zero as TRUE).
|
---|
553 |
|
---|
554 | If ``mode`` is PIPE_RENDER_COND_WAIT the driver will wait for the
|
---|
555 | query to complete before deciding whether to render.
|
---|
556 |
|
---|
557 | If ``mode`` is PIPE_RENDER_COND_NO_WAIT and the query has not yet
|
---|
558 | completed, the drawing command will be executed normally. If the query
|
---|
559 | has completed, drawing will be predicated on the outcome of the query.
|
---|
560 |
|
---|
561 | If ``mode`` is PIPE_RENDER_COND_BY_REGION_WAIT or
|
---|
562 | PIPE_RENDER_COND_BY_REGION_NO_WAIT rendering will be predicated as above
|
---|
563 | for the non-REGION modes but in the case that an occlusion query returns
|
---|
564 | a non-zero result, regions which were occluded may be omitted by subsequent
|
---|
565 | drawing commands. This can result in better performance with some GPUs.
|
---|
566 | Normally, if the occlusion query returned a non-zero result subsequent
|
---|
567 | drawing happens normally so fragments may be generated, shaded and
|
---|
568 | processed even where they're known to be obscured.
|
---|
569 |
|
---|
570 | The ''render_condition_mem'' function specifies the drawing is dependent
|
---|
571 | on a value in memory. A buffer resource and offset denote which 32-bit
|
---|
572 | value to use for the query. This is used for Vulkan API.
|
---|
573 |
|
---|
574 | Flushing
|
---|
575 | ^^^^^^^^
|
---|
576 |
|
---|
577 | ``flush``
|
---|
578 |
|
---|
579 | PIPE_FLUSH_END_OF_FRAME: Whether the flush marks the end of frame.
|
---|
580 |
|
---|
581 | PIPE_FLUSH_DEFERRED: It is not required to flush right away, but it is required
|
---|
582 | to return a valid fence. If fence_finish is called with the returned fence
|
---|
583 | and the context is still unflushed, and the ctx parameter of fence_finish is
|
---|
584 | equal to the context where the fence was created, fence_finish will flush
|
---|
585 | the context.
|
---|
586 |
|
---|
587 | PIPE_FLUSH_ASYNC: The flush is allowed to be asynchronous. Unlike
|
---|
588 | ``PIPE_FLUSH_DEFERRED``, the driver must still ensure that the returned fence
|
---|
589 | will finish in finite time. However, subsequent operations in other contexts of
|
---|
590 | the same screen are no longer guaranteed to happen after the flush. Drivers
|
---|
591 | which use this flag must implement pipe_context::fence_server_sync.
|
---|
592 |
|
---|
593 | PIPE_FLUSH_HINT_FINISH: Hints to the driver that the caller will immediately
|
---|
594 | wait for the returned fence.
|
---|
595 |
|
---|
596 | Additional flags may be set together with ``PIPE_FLUSH_DEFERRED`` for even
|
---|
597 | finer-grained fences. Note that as a general rule, GPU caches may not have been
|
---|
598 | flushed yet when these fences are signaled. Drivers are free to ignore these
|
---|
599 | flags and create normal fences instead. At most one of the following flags can
|
---|
600 | be specified:
|
---|
601 |
|
---|
602 | PIPE_FLUSH_TOP_OF_PIPE: The fence should be signaled as soon as the next
|
---|
603 | command is ready to start executing at the top of the pipeline, before any of
|
---|
604 | its data is actually read (including indirect draw parameters).
|
---|
605 |
|
---|
606 | PIPE_FLUSH_BOTTOM_OF_PIPE: The fence should be signaled as soon as the previous
|
---|
607 | command has finished executing on the GPU entirely (but data written by the
|
---|
608 | command may still be in caches and inaccessible to the CPU).
|
---|
609 |
|
---|
610 |
|
---|
611 | ``flush_resource``
|
---|
612 |
|
---|
613 | Flush the resource cache, so that the resource can be used
|
---|
614 | by an external client. Possible usage:
|
---|
615 | - flushing a resource before presenting it on the screen
|
---|
616 | - flushing a resource if some other process or device wants to use it
|
---|
617 | This shouldn't be used to flush caches if the resource is only managed
|
---|
618 | by a single pipe_screen and is not shared with another process.
|
---|
619 | (i.e. you shouldn't use it to flush caches explicitly if you want to e.g.
|
---|
620 | use the resource for texturing)
|
---|
621 |
|
---|
622 | Fences
|
---|
623 | ^^^^^^
|
---|
624 |
|
---|
625 | ``pipe_fence_handle``, and related methods, are used to synchronize
|
---|
626 | execution between multiple parties. Examples include CPU <-> GPU synchronization,
|
---|
627 | renderer <-> windowing system, multiple external APIs, etc.
|
---|
628 |
|
---|
629 | A ``pipe_fence_handle`` can either be 'one time use' or 're-usable'. A 'one time use'
|
---|
630 | fence behaves like a traditional GPU fence. Once it reaches the signaled state it
|
---|
631 | is forever considered to be signaled.
|
---|
632 |
|
---|
633 | Once a re-usable ``pipe_fence_handle`` becomes signaled, it can be reset
|
---|
634 | back into an unsignaled state. The ``pipe_fence_handle`` will be reset to
|
---|
635 | the unsignaled state by performing a wait operation on said object, i.e.
|
---|
636 | ``fence_server_sync``. As a corollary to this behavior, a re-usable
|
---|
637 | ``pipe_fence_handle`` can only have one waiter.
|
---|
638 |
|
---|
639 | This behavior is useful in producer <-> consumer chains. It helps avoid
|
---|
640 | unnecessarily sharing a new ``pipe_fence_handle`` each time a new frame is
|
---|
641 | ready. Instead, the fences are exchanged once ahead of time, and access is synchronized
|
---|
642 | through GPU signaling instead of direct producer <-> consumer communication.
|
---|
643 |
|
---|
644 | ``fence_server_sync`` inserts a wait command into the GPU's command stream.
|
---|
645 |
|
---|
646 | ``fence_server_signal`` inserts a signal command into the GPU's command stream.
|
---|
647 |
|
---|
648 | There are no guarantees that the wait/signal commands will be flushed when
|
---|
649 | calling ``fence_server_sync`` or ``fence_server_signal``. An explicit
|
---|
650 | call to ``flush`` is required to make sure the commands are emitted to the GPU.
|
---|
651 |
|
---|
652 | The Gallium implementation may implicitly ``flush`` the command stream during a
|
---|
653 | ``fence_server_sync`` or ``fence_server_signal`` call if necessary.
|
---|
654 |
|
---|
655 | Resource Busy Queries
|
---|
656 | ^^^^^^^^^^^^^^^^^^^^^
|
---|
657 |
|
---|
658 | ``is_resource_referenced``
|
---|
659 |
|
---|
660 |
|
---|
661 |
|
---|
662 | Blitting
|
---|
663 | ^^^^^^^^
|
---|
664 |
|
---|
665 | These methods emulate classic blitter controls.
|
---|
666 |
|
---|
667 | These methods operate directly on ``pipe_resource`` objects, and stand
|
---|
668 | apart from any 3D state in the context. Each method is assumed to have an
|
---|
669 | implicit memory barrier around itself. They do not need any explicit
|
---|
670 | ``memory_barrier``. Blitting functionality may be moved to a separate
|
---|
671 | abstraction at some point in the future.
|
---|
672 |
|
---|
673 | ``resource_copy_region`` blits a region of a resource to a region of another
|
---|
674 | resource, provided that both resources have the same format, or compatible
|
---|
675 | formats, i.e., formats for which copying the bytes from the source resource
|
---|
676 | unmodified to the destination resource will achieve the same effect of a
|
---|
677 | textured quad blitter.. The source and destination may be the same resource,
|
---|
678 | but overlapping blits are not permitted.
|
---|
679 | This can be considered the equivalent of a CPU memcpy.
|
---|
680 |
|
---|
681 | ``blit`` blits a region of a resource to a region of another resource, including
|
---|
682 | scaling, format conversion, and up-/downsampling, as well as a destination clip
|
---|
683 | rectangle (scissors) and window rectangles. It can also optionally honor the
|
---|
684 | current render condition (but either way the blit itself never contributes
|
---|
685 | anything to queries currently gathering data).
|
---|
686 | As opposed to manually drawing a textured quad, this lets the pipe driver choose
|
---|
687 | the optimal method for blitting (like using a special 2D engine), and usually
|
---|
688 | offers, for example, accelerated stencil-only copies even where
|
---|
689 | PIPE_CAP_SHADER_STENCIL_EXPORT is not available.
|
---|
690 |
|
---|
691 |
|
---|
692 | Transfers
|
---|
693 | ^^^^^^^^^
|
---|
694 |
|
---|
695 | These methods are used to get data to/from a resource.
|
---|
696 |
|
---|
697 | ``transfer_map`` creates a memory mapping and the transfer object
|
---|
698 | associated with it.
|
---|
699 | The returned pointer points to the start of the mapped range according to
|
---|
700 | the box region, not the beginning of the resource. If transfer_map fails,
|
---|
701 | the returned pointer to the buffer memory is NULL, and the pointer
|
---|
702 | to the transfer object remains unchanged (i.e. it can be non-NULL).
|
---|
703 |
|
---|
704 | When mapping an MSAA surface, the samples are implicitly resolved to
|
---|
705 | single-sampled for reads (returning the first sample for depth/stencil/integer,
|
---|
706 | averaged for others). See u_transfer_helper's U_TRANSFER_HELPER_MSAA_MAP for a
|
---|
707 | way to get that behavior using a resolve blit.
|
---|
708 |
|
---|
709 | ``transfer_unmap`` remove the memory mapping for and destroy
|
---|
710 | the transfer object. The pointer into the resource should be considered
|
---|
711 | invalid and discarded.
|
---|
712 |
|
---|
713 | ``texture_subdata`` and ``buffer_subdata`` perform a simplified
|
---|
714 | transfer for simple writes. Basically transfer_map, data write, and
|
---|
715 | transfer_unmap all in one.
|
---|
716 |
|
---|
717 |
|
---|
718 | The box parameter to some of these functions defines a 1D, 2D or 3D
|
---|
719 | region of pixels. This is self-explanatory for 1D, 2D and 3D texture
|
---|
720 | targets.
|
---|
721 |
|
---|
722 | For PIPE_TEXTURE_1D_ARRAY and PIPE_TEXTURE_2D_ARRAY, the box::z and box::depth
|
---|
723 | fields refer to the array dimension of the texture.
|
---|
724 |
|
---|
725 | For PIPE_TEXTURE_CUBE, the box:z and box::depth fields refer to the
|
---|
726 | faces of the cube map (z + depth <= 6).
|
---|
727 |
|
---|
728 | For PIPE_TEXTURE_CUBE_ARRAY, the box:z and box::depth fields refer to both
|
---|
729 | the face and array dimension of the texture (face = z % 6, array = z / 6).
|
---|
730 |
|
---|
731 |
|
---|
732 | .. _transfer_flush_region:
|
---|
733 |
|
---|
734 | transfer_flush_region
|
---|
735 | %%%%%%%%%%%%%%%%%%%%%
|
---|
736 |
|
---|
737 | If a transfer was created with ``FLUSH_EXPLICIT``, it will not automatically
|
---|
738 | be flushed on write or unmap. Flushes must be requested with
|
---|
739 | ``transfer_flush_region``. Flush ranges are relative to the mapped range, not
|
---|
740 | the beginning of the resource.
|
---|
741 |
|
---|
742 |
|
---|
743 |
|
---|
744 | .. _texture_barrier:
|
---|
745 |
|
---|
746 | texture_barrier
|
---|
747 | %%%%%%%%%%%%%%%
|
---|
748 |
|
---|
749 | This function flushes all pending writes to the currently-set surfaces and
|
---|
750 | invalidates all read caches of the currently-set samplers. This can be used
|
---|
751 | for both regular textures as well as for framebuffers read via FBFETCH.
|
---|
752 |
|
---|
753 |
|
---|
754 |
|
---|
755 | .. _memory_barrier:
|
---|
756 |
|
---|
757 | memory_barrier
|
---|
758 | %%%%%%%%%%%%%%%
|
---|
759 |
|
---|
760 | This function flushes caches according to which of the PIPE_BARRIER_* flags
|
---|
761 | are set.
|
---|
762 |
|
---|
763 |
|
---|
764 |
|
---|
765 | .. _resource_commit:
|
---|
766 |
|
---|
767 | resource_commit
|
---|
768 | %%%%%%%%%%%%%%%
|
---|
769 |
|
---|
770 | This function changes the commit state of a part of a sparse resource. Sparse
|
---|
771 | resources are created by setting the ``PIPE_RESOURCE_FLAG_SPARSE`` flag when
|
---|
772 | calling ``resource_create``. Initially, sparse resources only reserve a virtual
|
---|
773 | memory region that is not backed by memory (i.e., it is uncommitted). The
|
---|
774 | ``resource_commit`` function can be called to commit or uncommit parts (or all)
|
---|
775 | of a resource. The driver manages the underlying backing memory.
|
---|
776 |
|
---|
777 | The contents of newly committed memory regions are undefined. Calling this
|
---|
778 | function to commit an already committed memory region is allowed and leaves its
|
---|
779 | content unchanged. Similarly, calling this function to uncommit an already
|
---|
780 | uncommitted memory region is allowed.
|
---|
781 |
|
---|
782 | For buffers, the given box must be aligned to multiples of
|
---|
783 | ``PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE``. As an exception to this rule, if the size
|
---|
784 | of the buffer is not a multiple of the page size, changing the commit state of
|
---|
785 | the last (partial) page requires a box that ends at the end of the buffer
|
---|
786 | (i.e., box->x + box->width == buffer->width0).
|
---|
787 |
|
---|
788 |
|
---|
789 |
|
---|
790 | .. _pipe_transfer:
|
---|
791 |
|
---|
792 | PIPE_MAP
|
---|
793 | ^^^^^^^^^^^^^
|
---|
794 |
|
---|
795 | These flags control the behavior of a transfer object.
|
---|
796 |
|
---|
797 | ``PIPE_MAP_READ``
|
---|
798 | Resource contents read back (or accessed directly) at transfer create time.
|
---|
799 |
|
---|
800 | ``PIPE_MAP_WRITE``
|
---|
801 | Resource contents will be written back at transfer_unmap time (or modified
|
---|
802 | as a result of being accessed directly).
|
---|
803 |
|
---|
804 | ``PIPE_MAP_DIRECTLY``
|
---|
805 | a transfer should directly map the resource. May return NULL if not supported.
|
---|
806 |
|
---|
807 | ``PIPE_MAP_DISCARD_RANGE``
|
---|
808 | The memory within the mapped region is discarded. Cannot be used with
|
---|
809 | ``PIPE_MAP_READ``.
|
---|
810 |
|
---|
811 | ``PIPE_MAP_DISCARD_WHOLE_RESOURCE``
|
---|
812 | Discards all memory backing the resource. It should not be used with
|
---|
813 | ``PIPE_MAP_READ``.
|
---|
814 |
|
---|
815 | ``PIPE_MAP_DONTBLOCK``
|
---|
816 | Fail if the resource cannot be mapped immediately.
|
---|
817 |
|
---|
818 | ``PIPE_MAP_UNSYNCHRONIZED``
|
---|
819 | Do not synchronize pending operations on the resource when mapping. The
|
---|
820 | interaction of any writes to the map and any operations pending on the
|
---|
821 | resource are undefined. Cannot be used with ``PIPE_MAP_READ``.
|
---|
822 |
|
---|
823 | ``PIPE_MAP_FLUSH_EXPLICIT``
|
---|
824 | Written ranges will be notified later with :ref:`transfer_flush_region`.
|
---|
825 | Cannot be used with ``PIPE_MAP_READ``.
|
---|
826 |
|
---|
827 | ``PIPE_MAP_PERSISTENT``
|
---|
828 | Allows the resource to be used for rendering while mapped.
|
---|
829 | PIPE_RESOURCE_FLAG_MAP_PERSISTENT must be set when creating
|
---|
830 | the resource.
|
---|
831 | If COHERENT is not set, memory_barrier(PIPE_BARRIER_MAPPED_BUFFER)
|
---|
832 | must be called to ensure the device can see what the CPU has written.
|
---|
833 |
|
---|
834 | ``PIPE_MAP_COHERENT``
|
---|
835 | If PERSISTENT is set, this ensures any writes done by the device are
|
---|
836 | immediately visible to the CPU and vice versa.
|
---|
837 | PIPE_RESOURCE_FLAG_MAP_COHERENT must be set when creating
|
---|
838 | the resource.
|
---|
839 |
|
---|
840 | Compute kernel execution
|
---|
841 | ^^^^^^^^^^^^^^^^^^^^^^^^
|
---|
842 |
|
---|
843 | A compute program can be defined, bound or destroyed using
|
---|
844 | ``create_compute_state``, ``bind_compute_state`` or
|
---|
845 | ``destroy_compute_state`` respectively.
|
---|
846 |
|
---|
847 | Any of the subroutines contained within the compute program can be
|
---|
848 | executed on the device using the ``launch_grid`` method. This method
|
---|
849 | will execute as many instances of the program as elements in the
|
---|
850 | specified N-dimensional grid, hopefully in parallel.
|
---|
851 |
|
---|
852 | The compute program has access to four special resources:
|
---|
853 |
|
---|
854 | * ``GLOBAL`` represents a memory space shared among all the threads
|
---|
855 | running on the device. An arbitrary buffer created with the
|
---|
856 | ``PIPE_BIND_GLOBAL`` flag can be mapped into it using the
|
---|
857 | ``set_global_binding`` method.
|
---|
858 |
|
---|
859 | * ``LOCAL`` represents a memory space shared among all the threads
|
---|
860 | running in the same working group. The initial contents of this
|
---|
861 | resource are undefined.
|
---|
862 |
|
---|
863 | * ``PRIVATE`` represents a memory space local to a single thread.
|
---|
864 | The initial contents of this resource are undefined.
|
---|
865 |
|
---|
866 | * ``INPUT`` represents a read-only memory space that can be
|
---|
867 | initialized at ``launch_grid`` time.
|
---|
868 |
|
---|
869 | These resources use a byte-based addressing scheme, and they can be
|
---|
870 | accessed from the compute program by means of the LOAD/STORE TGSI
|
---|
871 | opcodes. Additional resources to be accessed using the same opcodes
|
---|
872 | may be specified by the user with the ``set_compute_resources``
|
---|
873 | method.
|
---|
874 |
|
---|
875 | In addition, normal texture sampling is allowed from the compute
|
---|
876 | program: ``bind_sampler_states`` may be used to set up texture
|
---|
877 | samplers for the compute stage and ``set_sampler_views`` may
|
---|
878 | be used to bind a number of sampler views to it.
|
---|
879 |
|
---|
880 | Compute kernel queries
|
---|
881 | ^^^^^^^^^^^^^^^^^^^^^^
|
---|
882 |
|
---|
883 | .. _get_compute_state_info:
|
---|
884 |
|
---|
885 | get_compute_state_info
|
---|
886 | %%%%%%%%%%%%%%%%%%%%%%
|
---|
887 |
|
---|
888 | This function allows frontends to query kernel information defined inside
|
---|
889 | ``pipe_compute_state_object_info``.
|
---|
890 |
|
---|
891 | .. _get_compute_state_subgroup_size:
|
---|
892 |
|
---|
893 | get_compute_state_subgroup_size
|
---|
894 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
895 |
|
---|
896 | This function returns the choosen subgroup size when `launch_grid` is
|
---|
897 | called with the given block size. This doesn't need to be implemented when
|
---|
898 | only one size is reported through ``PIPE_COMPUTE_CAP_SUBGROUP_SIZES`` or
|
---|
899 | ``pipe_compute_state_object_info::simd_sizes``.
|
---|
900 |
|
---|
901 | Mipmap generation
|
---|
902 | ^^^^^^^^^^^^^^^^^
|
---|
903 |
|
---|
904 | If PIPE_CAP_GENERATE_MIPMAP is true, ``generate_mipmap`` can be used
|
---|
905 | to generate mipmaps for the specified texture resource.
|
---|
906 | It replaces texel image levels base_level+1 through
|
---|
907 | last_level for layers range from first_layer through last_layer.
|
---|
908 | It returns TRUE if mipmap generation succeeds, otherwise it
|
---|
909 | returns FALSE. Mipmap generation may fail when it is not supported
|
---|
910 | for particular texture types or formats.
|
---|
911 |
|
---|
912 | Device resets
|
---|
913 | ^^^^^^^^^^^^^
|
---|
914 |
|
---|
915 | Gallium frontends can query or request notifications of when the GPU
|
---|
916 | is reset for whatever reason (application error, driver error). When
|
---|
917 | a GPU reset happens, the context becomes unusable and all related state
|
---|
918 | should be considered lost and undefined. Despite that, context
|
---|
919 | notifications are single-shot, i.e. subsequent calls to
|
---|
920 | ``get_device_reset_status`` will return PIPE_NO_RESET.
|
---|
921 |
|
---|
922 | * ``get_device_reset_status`` queries whether a device reset has happened
|
---|
923 | since the last call or since the last notification by callback.
|
---|
924 | * ``set_device_reset_callback`` sets a callback which will be called when
|
---|
925 | a device reset is detected. The callback is only called synchronously.
|
---|
926 |
|
---|
927 | Bindless
|
---|
928 | ^^^^^^^^
|
---|
929 |
|
---|
930 | If PIPE_CAP_BINDLESS_TEXTURE is TRUE, the following ``pipe_context`` functions
|
---|
931 | are used to create/delete bindless handles, and to make them resident in the
|
---|
932 | current context when they are going to be used by shaders.
|
---|
933 |
|
---|
934 | * ``create_texture_handle`` creates a 64-bit unsigned integer texture handle
|
---|
935 | that is going to be directly used in shaders.
|
---|
936 | * ``delete_texture_handle`` deletes a 64-bit unsigned integer texture handle.
|
---|
937 | * ``make_texture_handle_resident`` makes a 64-bit unsigned texture handle
|
---|
938 | resident in the current context to be accessible by shaders for texture
|
---|
939 | mapping.
|
---|
940 | * ``create_image_handle`` creates a 64-bit unsigned integer image handle that
|
---|
941 | is going to be directly used in shaders.
|
---|
942 | * ``delete_image_handle`` deletes a 64-bit unsigned integer image handle.
|
---|
943 | * ``make_image_handle_resident`` makes a 64-bit unsigned integer image handle
|
---|
944 | resident in the current context to be accessible by shaders for image loads,
|
---|
945 | stores and atomic operations.
|
---|
946 |
|
---|
947 | Using several contexts
|
---|
948 | ----------------------
|
---|
949 |
|
---|
950 | Several contexts from the same screen can be used at the same time. Objects
|
---|
951 | created on one context cannot be used in another context, but the objects
|
---|
952 | created by the screen methods can be used by all contexts.
|
---|
953 |
|
---|
954 | Transfers
|
---|
955 | ^^^^^^^^^
|
---|
956 | A transfer on one context is not expected to synchronize properly with
|
---|
957 | rendering on other contexts, thus only areas not yet used for rendering should
|
---|
958 | be locked.
|
---|
959 |
|
---|
960 | A flush is required after transfer_unmap to expect other contexts to see the
|
---|
961 | uploaded data, unless:
|
---|
962 |
|
---|
963 | * Using persistent mapping. Associated with coherent mapping, unmapping the
|
---|
964 | resource is also not required to use it in other contexts. Without coherent
|
---|
965 | mapping, memory_barrier(PIPE_BARRIER_MAPPED_BUFFER) should be called on the
|
---|
966 | context that has mapped the resource. No flush is required.
|
---|
967 |
|
---|
968 | * Mapping the resource with PIPE_MAP_DIRECTLY.
|
---|