-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf record with --call-graph=fp --freq=max #1969
base: master
Are you sure you want to change the base?
Conversation
Interesting! I wonder if Btw, how much RAM do you have? 😆 I tried to generate a profile for |
dwarf is more precise (it knows about inlined functions) than fp, but produces a lot more data and is therefore more keen to just break. therefore it also requires the lower frequency, which in turn makes it less precise. fp generally works better and faster, but can't know about inlined functions. |
I always set
That's unfortunate. On my system this recording peaks at 19.6 GB memory usage; 20x blowup for in-memory data structures is pretty typical in my experience. My system has 128 GB of memory but that's not really relevant because the perf-report UI becomes unusably slow when you load this much data into it. Granted, I'd never use this for profiling primary benchmarks. We should probably set a lower frequency for them.
Yup. About half the time I reach for it, perf with dwarf callgraphs is completely unusable due to bugs in perf. It crashes when loading its recordings, with a variety of errors depending on your kernel/perf version. Occasionally it just segfaults. |
Makes sense. So I would suggest this:
I can do 3) in a follow-up PR if you don't want to deal with it. Btw, what do you use to postprocess/analyze the |
I've been carrying this patch locally for months/years; @Noratrieb asked me to make this PR.
I only find
profile_local perf-record
useful with this patch applied, because otherwise the profile doesn't have enough samples to make anything of the profile data. And since we're sampling as fast as possible to get a reasonable signal-to-noise ratio on a microbenchmark, we need to use frame pointers. Which are now enabled by default in the compiler profile, (and are enabled in the distributed standard library too!).