1 | Perfetto Tracing
|
---|
2 | ================
|
---|
3 |
|
---|
4 | Mesa has experimental support for `Perfetto <https://perfetto.dev>`__ for
|
---|
5 | GPU performance monitoring. Perfetto supports multiple
|
---|
6 | `producers <https://perfetto.dev/docs/concepts/service-model>`__ each with
|
---|
7 | one or more data-sources. Perfetto already provides various producers and
|
---|
8 | data-sources for things like:
|
---|
9 |
|
---|
10 | - CPU scheduling events (``linux.ftrace``)
|
---|
11 | - CPU frequency scaling (``linux.ftrace``)
|
---|
12 | - System calls (``linux.ftrace``)
|
---|
13 | - Process memory utilization (``linux.process_stats``)
|
---|
14 |
|
---|
15 | As well as various domain specific producers.
|
---|
16 |
|
---|
17 | The mesa Perfetto support adds additional producers, to allow for visualizing
|
---|
18 | GPU performance (frequency, utilization, performance counters, etc) on the
|
---|
19 | same timeline, to better understand and tune/debug system level performance:
|
---|
20 |
|
---|
21 | - pps-producer: A systemwide daemon that can collect global performance
|
---|
22 | counters.
|
---|
23 | - mesa: Per-process producer within mesa to capture render-stage traces
|
---|
24 | on the GPU timeline, track events on the CPU timeline, etc.
|
---|
25 |
|
---|
26 | The exact supported features vary per driver:
|
---|
27 |
|
---|
28 | .. list-table:: Supported data-sources
|
---|
29 | :header-rows: 1
|
---|
30 |
|
---|
31 | * - Driver
|
---|
32 | - PPS Counters
|
---|
33 | - Render Stages
|
---|
34 | * - Freedreno
|
---|
35 | - ``gpu.counters.msm``
|
---|
36 | - ``gpu.renderstages.msm``
|
---|
37 | * - Turnip
|
---|
38 | - ``gpu.counters.msm``
|
---|
39 | - ``gpu.renderstages.msm``
|
---|
40 | * - Intel
|
---|
41 | - ``gpu.counters.i915``
|
---|
42 | - ``gpu.renderstages.intel``
|
---|
43 | * - Panfrost
|
---|
44 | - ``gpu.counters.panfrost``
|
---|
45 | -
|
---|
46 |
|
---|
47 | Run
|
---|
48 | ---
|
---|
49 |
|
---|
50 | To capture a trace with Perfetto you need to take the following steps:
|
---|
51 |
|
---|
52 | 1. Build Perfetto from sources available at ``subprojects/perfetto`` following
|
---|
53 | `this guide <https://perfetto.dev/docs/quickstart/linux-tracing>`__.
|
---|
54 |
|
---|
55 | 2. Create a `trace config <https://perfetto.dev/docs/concepts/config>`__, which is
|
---|
56 | a json formatted text file with extension ``.cfg``, or use one of the config
|
---|
57 | files under the ``src/tool/pps/cfg`` directory. More examples of config files
|
---|
58 | can be found in ``subprojects/perfetto/test/configs``.
|
---|
59 |
|
---|
60 | 3. Change directory to ``subprojects/perfetto`` and run a
|
---|
61 | `convenience script <https://perfetto.dev/docs/quickstart/linux-tracing#capturing-a-trace>`__
|
---|
62 | to start the tracing service:
|
---|
63 |
|
---|
64 | .. code-block:: console
|
---|
65 |
|
---|
66 | cd subprojects/perfetto
|
---|
67 | CONFIG=<path/to/gpu.cfg> OUT=out/linux_clang_release ./tools/tmux -n
|
---|
68 |
|
---|
69 | 4. Start other producers you may need, e.g. ``pps-producer``.
|
---|
70 |
|
---|
71 | 5. Start ``perfetto`` under the tmux session initiated in step 3.
|
---|
72 |
|
---|
73 | 6. Once tracing has finished, you can detach from tmux with :kbd:`Ctrl+b`,
|
---|
74 | :kbd:`d`, and the convenience script should automatically copy the trace
|
---|
75 | files into ``$HOME/Downloads``.
|
---|
76 |
|
---|
77 | 7. Go to `ui.perfetto.dev <https://ui.perfetto.dev>`__ and upload
|
---|
78 | ``$HOME/Downloads/trace.protobuf`` by clicking on **Open trace file**.
|
---|
79 |
|
---|
80 | 8. Alternatively you can open the trace in `AGI <https://gpuinspector.dev/>`__
|
---|
81 | (which despite the name can be used to view non-android traces).
|
---|
82 |
|
---|
83 | To be a bit more explicit, here is a listing of commands reproducing
|
---|
84 | the steps above :
|
---|
85 |
|
---|
86 | .. code-block:: console
|
---|
87 |
|
---|
88 | # Configure Mesa with perfetto
|
---|
89 | mesa $ meson . build -Dperfetto=true -Dvulkan-drivers=intel,broadcom -Dgallium-drivers=
|
---|
90 | # Build mesa
|
---|
91 | mesa $ meson compile -C build
|
---|
92 |
|
---|
93 | # Within the Mesa repo, build perfetto
|
---|
94 | mesa $ cd subprojects/perfetto
|
---|
95 | perfetto $ ./tools/install-build-deps
|
---|
96 | perfetto $ ./tools/gn gen --args='is_debug=false' out/linux
|
---|
97 | perfetto $ ./tools/ninja -C out/linux
|
---|
98 |
|
---|
99 | # Start perfetto
|
---|
100 | perfetto $ CONFIG=../../src/tool/pps/cfg/gpu.cfg OUT=out/linux/ ./tools/tmux -n
|
---|
101 |
|
---|
102 | # In parallel from the Mesa repo, start the PPS producer
|
---|
103 | mesa $ ./build/src/tool/pps/pps-producer
|
---|
104 |
|
---|
105 | # Back in the perfetto tmux, press enter to start the capture
|
---|
106 |
|
---|
107 | CPU Tracing
|
---|
108 | ~~~~~~~~~~~
|
---|
109 |
|
---|
110 | Mesa's CPU tracepoints (``MESA_TRACE_*``) use Perfetto track events when
|
---|
111 | Perfetto is enabled. They use ``mesa.default`` and ``mesa.slow`` categories.
|
---|
112 |
|
---|
113 | Currently, only EGL and Freedreno have CPU tracepoints.
|
---|
114 |
|
---|
115 | Vulkan data sources
|
---|
116 | ~~~~~~~~~~~~~~~~~~~
|
---|
117 |
|
---|
118 | The Vulkan API gives the application control over recording of command
|
---|
119 | buffers as well as when they are submitted to the hardware. As a
|
---|
120 | consequence, we need to ensure command buffers are properly
|
---|
121 | instrumented for the Perfetto driver data sources prior to Perfetto
|
---|
122 | actually collecting traces.
|
---|
123 |
|
---|
124 | This can be achieved by setting the :envvar:`MESA_GPU_TRACES`
|
---|
125 | environment variable before starting a Vulkan application :
|
---|
126 |
|
---|
127 | .. code-block:: console
|
---|
128 |
|
---|
129 | MESA_GPU_TRACES=perfetto ./build/my_vulkan_app
|
---|
130 |
|
---|
131 | Driver Specifics
|
---|
132 | ~~~~~~~~~~~~~~~~
|
---|
133 |
|
---|
134 | Below is driver specific information/instructions for the PPS producer.
|
---|
135 |
|
---|
136 | Freedreno / Turnip
|
---|
137 | ^^^^^^^^^^^^^^^^^^
|
---|
138 |
|
---|
139 | The Freedreno PPS driver needs root access to read system-wide
|
---|
140 | performance counters, so you can simply run it with sudo:
|
---|
141 |
|
---|
142 | .. code-block:: console
|
---|
143 |
|
---|
144 | sudo ./build/src/tool/pps/pps-producer
|
---|
145 |
|
---|
146 | Intel
|
---|
147 | ^^^^^
|
---|
148 |
|
---|
149 | The Intel PPS driver needs root access to read system-wide
|
---|
150 | `RenderBasic <https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/gpu-metrics-reference.html>`__
|
---|
151 | performance counters, so you can simply run it with sudo:
|
---|
152 |
|
---|
153 | .. code-block:: console
|
---|
154 |
|
---|
155 | sudo ./build/src/tool/pps/pps-producer
|
---|
156 |
|
---|
157 | Another option to enable access wide data without root permissions would be running the following:
|
---|
158 |
|
---|
159 | .. code-block:: console
|
---|
160 |
|
---|
161 | sudo sysctl dev.i915.perf_stream_paranoid=0
|
---|
162 |
|
---|
163 | Alternatively using the ``CAP_PERFMON`` permission on the binary should work too.
|
---|
164 |
|
---|
165 | A particular metric set can also be selected to capture a different
|
---|
166 | set of HW counters :
|
---|
167 |
|
---|
168 | .. code-block:: console
|
---|
169 |
|
---|
170 | INTEL_PERFETTO_METRIC_SET=RasterizerAndPixelBackend ./build/src/tool/pps/pps-producer
|
---|
171 |
|
---|
172 | Vulkan applications can also be instrumented to be Perfetto producers.
|
---|
173 | To enable this for given application, set the environment variable as
|
---|
174 | follow :
|
---|
175 |
|
---|
176 | .. code-block:: console
|
---|
177 |
|
---|
178 | PERFETTO_TRACE=1 my_vulkan_app
|
---|
179 |
|
---|
180 | Panfrost
|
---|
181 | ^^^^^^^^
|
---|
182 |
|
---|
183 | The Panfrost PPS driver uses unstable ioctls that behave correctly on
|
---|
184 | kernel version `5.4.23+ <https://lwn.net/Articles/813601/>`__ and
|
---|
185 | `5.5.7+ <https://lwn.net/Articles/813600/>`__.
|
---|
186 |
|
---|
187 | To run the producer, follow these two simple steps:
|
---|
188 |
|
---|
189 | 1. Enable Panfrost unstable ioctls via kernel parameter:
|
---|
190 |
|
---|
191 | .. code-block:: console
|
---|
192 |
|
---|
193 | modprobe panfrost unstable_ioctls=1
|
---|
194 |
|
---|
195 | Alternatively you could add ``panfrost.unstable_ioctls=1`` to your kernel command line, or ``echo 1 > /sys/module/panfrost/parameters/unstable_ioctls``.
|
---|
196 |
|
---|
197 | 2. Run the producer:
|
---|
198 |
|
---|
199 | .. code-block:: console
|
---|
200 |
|
---|
201 | ./build/pps-producer
|
---|
202 |
|
---|
203 | Troubleshooting
|
---|
204 | ---------------
|
---|
205 |
|
---|
206 | Tmux
|
---|
207 | ~~~~
|
---|
208 |
|
---|
209 | If the convenience script ``tools/tmux`` keeps copying artifacts to your
|
---|
210 | ``SSH_TARGET`` without starting the tmux session, make sure you have ``tmux``
|
---|
211 | installed in your system.
|
---|
212 |
|
---|
213 | .. code-block:: console
|
---|
214 |
|
---|
215 | apt install tmux
|
---|
216 |
|
---|
217 | Missing counter names
|
---|
218 | ~~~~~~~~~~~~~~~~~~~~~
|
---|
219 |
|
---|
220 | If the trace viewer shows a list of counters with a description like
|
---|
221 | ``gpu_counter(#)`` instead of their proper names, maybe you had a data loss due
|
---|
222 | to the trace buffer being full and wrapped.
|
---|
223 |
|
---|
224 | In order to prevent this loss of data you can tweak the trace config file in
|
---|
225 | two different ways:
|
---|
226 |
|
---|
227 | - Increase the size of the buffer in use:
|
---|
228 |
|
---|
229 | .. code-block:: javascript
|
---|
230 |
|
---|
231 | buffers {
|
---|
232 | size_kb: 2048,
|
---|
233 | fill_policy: RING_BUFFER,
|
---|
234 | }
|
---|
235 |
|
---|
236 | - Periodically flush the trace buffer into the output file:
|
---|
237 |
|
---|
238 | .. code-block:: javascript
|
---|
239 |
|
---|
240 | write_into_file: true
|
---|
241 | file_write_period_ms: 250
|
---|
242 |
|
---|
243 |
|
---|
244 | - Discard new traces when the buffer fills:
|
---|
245 |
|
---|
246 | .. code-block:: javascript
|
---|
247 |
|
---|
248 | buffers {
|
---|
249 | size_kb: 2048,
|
---|
250 | fill_policy: DISCARD,
|
---|
251 | }
|
---|