1 | Perfetto Tracing
2 | ================
3 |
4 | Mesa has experimental support for `Perfetto <https://perfetto.dev>`__ for
5 | GPU performance monitoring. Perfetto supports multiple
6 | `producers <https://perfetto.dev/docs/concepts/service-model>`__ each with
7 | one or more data-sources. Perfetto already provides various producers and
8 | data-sources for things like:
9 |
10 | - CPU scheduling events (``linux.ftrace``)
11 | - CPU frequency scaling (``linux.ftrace``)
12 | - System calls (``linux.ftrace``)
13 | - Process memory utilization (``linux.process_stats``)
14 |
15 | As well as various domain specific producers.
16 |
17 | The mesa Perfetto support adds additional producers, to allow for visualizing
18 | GPU performance (frequency, utilization, performance counters, etc) on the
19 | same timeline, to better understand and tune/debug system level performance:
20 |
21 | - pps-producer: A systemwide daemon that can collect global performance
22 | counters.
23 | - mesa: Per-process producer within mesa to capture render-stage traces
24 | on the GPU timeline, track events on the CPU timeline, etc.
25 |
26 | The exact supported features vary per driver:
27 |
28 | .. list-table:: Supported data-sources
29 | :header-rows: 1
30 |
31 | * - Driver
32 | - PPS Counters
33 | - Render Stages
34 | * - Freedreno
35 | - ``gpu.counters.msm``
36 | - ``gpu.renderstages.msm``
37 | * - Turnip
38 | - ``gpu.counters.msm``
39 | - ``gpu.renderstages.msm``
40 | * - Intel
41 | - ``gpu.counters.i915``
42 | - ``gpu.renderstages.intel``
43 | * - Panfrost
44 | - ``gpu.counters.panfrost``
45 | -
46 |
47 | Run
48 | ---
49 |
50 | To capture a trace with Perfetto you need to take the following steps:
51 |
52 | 1. Build Perfetto from sources available at ``subprojects/perfetto`` following
53 | `this guide <https://perfetto.dev/docs/quickstart/linux-tracing>`__.
54 |
55 | 2. Create a `trace config <https://perfetto.dev/docs/concepts/config>`__, which is
56 | a json formatted text file with extension ``.cfg``, or use one of the config
57 | files under the ``src/tool/pps/cfg`` directory. More examples of config files
58 | can be found in ``subprojects/perfetto/test/configs``.
59 |
60 | 3. Change directory to ``subprojects/perfetto`` and run a
61 | `convenience script <https://perfetto.dev/docs/quickstart/linux-tracing#capturing-a-trace>`__
62 | to start the tracing service:
63 |
64 | .. code-block:: console
65 |
66 | cd subprojects/perfetto
67 | CONFIG=<path/to/gpu.cfg> OUT=out/linux_clang_release ./tools/tmux -n
68 |
69 | 4. Start other producers you may need, e.g. ``pps-producer``.
70 |
71 | 5. Start ``perfetto`` under the tmux session initiated in step 3.
72 |
73 | 6. Once tracing has finished, you can detach from tmux with :kbd:`Ctrl+b`,
74 | :kbd:`d`, and the convenience script should automatically copy the trace
75 | files into ``$HOME/Downloads``.
76 |
77 | 7. Go to `ui.perfetto.dev <https://ui.perfetto.dev>`__ and upload
78 | ``$HOME/Downloads/trace.protobuf`` by clicking on **Open trace file**.
79 |
80 | 8. Alternatively you can open the trace in `AGI <https://gpuinspector.dev/>`__
81 | (which despite the name can be used to view non-android traces).
82 |
83 | To be a bit more explicit, here is a listing of commands reproducing
84 | the steps above :
85 |
86 | .. code-block:: console
87 |
88 | # Configure Mesa with perfetto
89 | mesa $ meson . build -Dperfetto=true -Dvulkan-drivers=intel,broadcom -Dgallium-drivers=
90 | # Build mesa
91 | mesa $ meson compile -C build
92 |
93 | # Within the Mesa repo, build perfetto
94 | mesa $ cd subprojects/perfetto
95 | perfetto $ ./tools/install-build-deps
96 | perfetto $ ./tools/gn gen --args='is_debug=false' out/linux
97 | perfetto $ ./tools/ninja -C out/linux
98 |
99 | # Start perfetto
100 | perfetto $ CONFIG=../../src/tool/pps/cfg/gpu.cfg OUT=out/linux/ ./tools/tmux -n
101 |
102 | # In parallel from the Mesa repo, start the PPS producer
103 | mesa $ ./build/src/tool/pps/pps-producer
104 |
105 | # Back in the perfetto tmux, press enter to start the capture
106 |
107 | CPU Tracing
108 | ~~~~~~~~~~~
109 |
110 | Mesa's CPU tracepoints (``MESA_TRACE_*``) use Perfetto track events when
111 | Perfetto is enabled. They use ``mesa.default`` and ``mesa.slow`` categories.
112 |
113 | Currently, only EGL and Freedreno have CPU tracepoints.
114 |
115 | Vulkan data sources
116 | ~~~~~~~~~~~~~~~~~~~
117 |
118 | The Vulkan API gives the application control over recording of command
119 | buffers as well as when they are submitted to the hardware. As a
120 | consequence, we need to ensure command buffers are properly
121 | instrumented for the Perfetto driver data sources prior to Perfetto
122 | actually collecting traces.
123 |
124 | This can be achieved by setting the :envvar:`MESA_GPU_TRACES`
125 | environment variable before starting a Vulkan application :
126 |
127 | .. code-block:: console
128 |
129 | MESA_GPU_TRACES=perfetto ./build/my_vulkan_app
130 |
131 | Driver Specifics
132 | ~~~~~~~~~~~~~~~~
133 |
134 | Below is driver specific information/instructions for the PPS producer.
135 |
136 | Freedreno / Turnip
137 | ^^^^^^^^^^^^^^^^^^
138 |
139 | The Freedreno PPS driver needs root access to read system-wide
140 | performance counters, so you can simply run it with sudo:
141 |
142 | .. code-block:: console
143 |
144 | sudo ./build/src/tool/pps/pps-producer
145 |
146 | Intel
147 | ^^^^^
148 |
149 | The Intel PPS driver needs root access to read system-wide
150 | `RenderBasic <https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/gpu-metrics-reference.html>`__
151 | performance counters, so you can simply run it with sudo:
152 |
153 | .. code-block:: console
154 |
155 | sudo ./build/src/tool/pps/pps-producer
156 |
157 | Another option to enable access wide data without root permissions would be running the following:
158 |
159 | .. code-block:: console
160 |
161 | sudo sysctl dev.i915.perf_stream_paranoid=0
162 |
163 | Alternatively using the ``CAP_PERFMON`` permission on the binary should work too.
164 |
165 | A particular metric set can also be selected to capture a different
166 | set of HW counters :
167 |
168 | .. code-block:: console
169 |
170 | INTEL_PERFETTO_METRIC_SET=RasterizerAndPixelBackend ./build/src/tool/pps/pps-producer
171 |
172 | Vulkan applications can also be instrumented to be Perfetto producers.
173 | To enable this for given application, set the environment variable as
174 | follow :
175 |
176 | .. code-block:: console
177 |
178 | PERFETTO_TRACE=1 my_vulkan_app
179 |
180 | Panfrost
181 | ^^^^^^^^
182 |
183 | The Panfrost PPS driver uses unstable ioctls that behave correctly on
184 | kernel version `5.4.23+ <https://lwn.net/Articles/813601/>`__ and
185 | `5.5.7+ <https://lwn.net/Articles/813600/>`__.
186 |
187 | To run the producer, follow these two simple steps:
188 |
189 | 1. Enable Panfrost unstable ioctls via kernel parameter:
190 |
191 | .. code-block:: console
192 |
193 | modprobe panfrost unstable_ioctls=1
194 |
195 | Alternatively you could add ``panfrost.unstable_ioctls=1`` to your kernel command line, or ``echo 1 > /sys/module/panfrost/parameters/unstable_ioctls``.
196 |
197 | 2. Run the producer:
198 |
199 | .. code-block:: console
200 |
201 | ./build/pps-producer
202 |
203 | Troubleshooting
204 | ---------------
205 |
206 | Tmux
207 | ~~~~
208 |
209 | If the convenience script ``tools/tmux`` keeps copying artifacts to your
210 | ``SSH_TARGET`` without starting the tmux session, make sure you have ``tmux``
211 | installed in your system.
212 |
213 | .. code-block:: console
214 |
215 | apt install tmux
216 |
217 | Missing counter names
218 | ~~~~~~~~~~~~~~~~~~~~~
219 |
220 | If the trace viewer shows a list of counters with a description like
221 | ``gpu_counter(#)`` instead of their proper names, maybe you had a data loss due
222 | to the trace buffer being full and wrapped.
223 |
224 | In order to prevent this loss of data you can tweak the trace config file in
225 | two different ways:
226 |
227 | - Increase the size of the buffer in use:
228 |
229 | .. code-block:: javascript
230 |
231 | buffers {
232 | size_kb: 2048,
233 | fill_policy: RING_BUFFER,
234 | }
235 |
236 | - Periodically flush the trace buffer into the output file:
237 |
238 | .. code-block:: javascript
239 |
240 | write_into_file: true
241 | file_write_period_ms: 250
242 |
243 |
244 | - Discard new traces when the buffer fills:
245 |
246 | .. code-block:: javascript
247 |
248 | buffers {
249 | size_kb: 2048,
250 | fill_policy: DISCARD,
251 | }