What are peoples approaches to profiling for occassional freezes

richtea · June 18, 2023, 4:21pm

This isn’t necessarily something I’m currently facing but its something I’ve found come up quite often when developing games and I wanted to see if others had a better approach to it.

Background

Imagine a long running application (such as a JME application) that has significant every frame CPU usage (Task A). Profiling shows that Task A uses 90% of the total CPU usage. However, there is also a Task B that runs periodically or in response to some event, say once every 3 seconds. Profiling shows that Task B takes up 2% of overall CPU time BUT in frames where it runs at all it takes up 95% of CPU time and takes 0.1s to finish. Task B is the cause of the freezes and Task A is not, but profiling might lead someone to improve Task A. How can Task B be found as the problem.

I imagine a technique to find this might be a binned profile where the time of a method call was weighted by the total size of the time in that bin; i.e. time is most “expensive” when there isn’t much of it available. Or where the profileed application could choose if the last bin was or wasn’t included in the results (based on if it met or failed its frame rate target). But I’m not aware of any profiler that can be set up like that

Question

Does anyone have any techniques for finding slow but rare tasks that cause application freezes.

Currently I use a normal java profiler and “interpret” the results based on what I know is an every frame process and what are rare processes (if a rare process appears at all in the results its a bad sign), but its a bit hit and miss

pspeed · June 18, 2023, 5:20pm

I use a lot of logging and usually if I have some process that kicks off in the background, it logs that event. So when I’m playing the game and I see slowdown, I glance down at the log and see what just happened.

I catch a lot of things that way.

In Mythruna, the world is divided up into a 32x32 grid and every time the player crosses a grid boundary a ton of stuff has to happen. Even without looking these days, I can see the logs move when I cross one of these boundaries and if I dropped frames then I know what it was.

These days, I don’t find the profiler very good for fishing expeditions. As I got more experienced, I tend to write code that avoids the “big fish”. It still happens but it’s rare that I find that one silly thing that is easy to fix. I miss it sometimes as the profiler use to be a magic trick. I do still use it to sanity check memory/time and a fishing expedition never hurts now and then. But my “daily driver” experience is limited these days.

However, if I were trying to profile task B aside from task A (or vice versa) then I’d drill down from that direction. Find the method call, see what calls are taking the most time, drill in from there. The upside down “show me the things that are taking the most time” view is generally useless in my code because there are things called a million times from many places and threads with no way to avoid that. I’m more interested in the things that are calling stuff that they don’t have to.

I also do a LOT of nano timing of my code for the background processing. I also admit that if my main thread were taking 90% CPU then I would be very concerned about that.

richtea · June 18, 2023, 6:17pm

That does sound a lot like my process (I am of course exaggerating with the actual percentages, just for an easy example); I think, like you, the pond is getting fairly free of fish but I’m looking for the last few in the corners.

Just as a clarification, where @pspeed’s answer has made me realise I wasn’t totally clear, by a Task I mean bit of code, not task in the multithreaded sense; I already have those separated out and mostly focus on things on the main thread. As an example of what a Task B might be; when a mob takes a hit it searches for it’s allies to request help but does so in an insanely inefficiently way - it won’t happen every tick but when it does its bad

pspeed · June 18, 2023, 6:45pm

And in these sorts of cases, I usually have a debug log “searching for allies” or “running AI” or whatever general things I’ve logged… and seeing that pop up on the console will at least let me know what caused the frame drops and I can drill in from there.

If anything, I have too much logging. I regularly make code passes to remove it or turn it to trace logs… way more often than I profile.