Debugging Raid Lag in 25s

90 Worgen Hunter
12325
TL;DR version: Healing is strongly implicated in causing significant "input lag" in 25 man raids. Temerity tested a number of theories this week and the data supports this conclusion and discredits DPSing, addons, RPPM, and even Stampede as causes. It seems to be correlated to the peak US raid times as well (presumably when many raids and LFRs are occurring).

Background:

As many 25 man raiders know, particularly those raiding Heroic content, there are times when the game client becomes very unresponsive. Abilities take far longer than they should to register as used, causing very frustrating gameplay for most classes and specs (particularly those who are riding each GCD rather than having longer spell casts). This happens primarily on the pull of many fights, but it occurs at other times as well.

It does not occur in 10 man raids. It occurs less in 25 Normal raids.

Here is a video emphasizing the issue. The perspective is of a Brewmaster Monk and a Beast Master Hunter. The video is slowed down to one third normal speed, and demonstrates some of the worst lag we've experienced lately. It also shows what a lag-free experience is like. Note the delays between the game client pressing a button (gold outline) and the game acting on it (resource consumption, cooldown trigger). It can be upwards of 1 second, and this is the source of our pain.

http://www.youtube.com/watch?v=Ff56_vfZ5sw

The Experiment:

We had many theories (RPPM being most people's prime culprit, followed by Stampede, and Addons) but we constructed an experiment to definitely prove what the cause is. We spent nearly two hours testing this in ToT this week, primarily on the training dummies before Jin'rokh, trying many variations:

1) No addons
2) No RPPM trinkets, set bonuses, meta, or enchants
3) No pets
4) No healing
5) Hunters (Stampede)

We did multiple pulls of each theory and privately recorded our own ratings of each pull, on a scale of 1-10 (1 being no lag, 10 being unplayable). Our goal was to see if patterns emerged without any communication tainting the results. After several experiments, we tallied our results and it was clear:

Healing causes the input lag.

Once we had this theory, we tested further. We tried different healers. We tried different numbers of healing cooldowns. We tried more or fewer temporary pets (such as Stampede). We tried pulls with healers doing nothing for the first minute, then going nuts. We even did a pull where only two players DPSd and healers went nuts. The clear result was healing of any kind, made worse by the number of healers red lining it. It had nothing to do with what DPS did, nothing to do with Addons people were running.

Other guilds have seen the same thing; in fact, every Heroic raiding guild probably sees it, perhaps depending slightly upon their healing composition. Cooldowns and smart heals seem to exacerbate the issue.

Our spreadsheet of results:

https://docs.google.com/spreadsheet/ccc?key=0ArK4-MdVBzk5dHAyTkVNOXJpRnhmWG43MFhubjlWZXc

The Second Experiment:

Our initial testing was conducted at the beginning of our normal raid week (Wednesday from 8pm-10pm PST for these tests). Late Sunday night (~11pm PST) we attempted to reproduce the problem and were much less successful. We were able to see it during Heroism, but it was not as pronounced as on Wednesday, and the game was smooth at other times during healing. From this, we theorize that it is heavily impacted either by the load on the servers themselves and/or network connectivity at the Blizzard data centers.

This means the lag likely occurs at peak raiding times (and probably most noticeably during the overlap of the East and West US coasts).

Conclusion:

Healing in 25s is very strongly implicated in causing the maligned "input lag" across the entire raid, made much worse during peak US raiding hours. While other factors may contribute in minor ways (composition, hybrid healing, etc), in our tests, healing or its absence dwarfed any other cause, and its absence left us with the silky smooth gameplay from previous expansions. We don't know what is happening inside the game client or on the WoW servers or on the network in between, but this is a huge, huge issue when it comes to player enjoyment. We expect it to get significantly worse in 5.4 when healers get their legendary cloaks and as gear scales.

What You Can Do:

Repeat our tests, see if you encounter the same problem. The easiest test is to simply do two fake raid pulls on the training dummies before Jin'rokh: one without healing, and one with healing. Take 10-15 minutes from your raid week, note the time and day and battlegroup, and see if you get the same result. The more data we gather, hopefully the more easily Blizzard can fix the problem.
Reply Quote
100 Blood Elf Monk
17985
+1

There are significant noticeable differences, even within this expansion. IE: last tier when we were able to underheal, far less input lag during high damage phases whereas fights like Empress were nearly unplayable.

It's far more pronounced this tier with the fact that we can't really underheal anything and have people going full blast all the time.

I feel like the rampup of smart healing this expansion is partially part of the problem.
Reply Quote
90 Draenei Paladin
14545
This is purely speculation, but it appears that the lag is related to a high number of events per second being processed. So for example if you are writing events to a combat log, you could analyze this by looking at how many events per second are being generated at the time the ability lag is being experienced.

For most of us, this lag got significantly worse in 5.2, so one of our theories is that it has to do with the six new values (resource tracking like HP, mana, etc.) that were added to every combat log event in 5.2. If these values really do come from the server and are computed to be in sync with each particular event, then this would be significantly more data that would have to be pushed from the server to the clients.
Edited by Kihra on 8/19/2013 1:45 PM PDT
Reply Quote
100 Tauren Warrior
21385
Really hope the Blues see this. I have 28 ms on Shattered Hand, and raid with about 40-60 FPS in a 25 setting, yet during raids my input lag with or without mods can be severely impacted. It makes it feel like I am playing at above 200 ms if it's at it's worst. Many people report the worst lag is at the start of the fight when the combat log gets flooded. This is especially noticeable on a fight like Lei Shen.

We had issues with this even on PTR, Malkorok is a huge issue because of the shields. The issue in that case is that the absorbs spam the combat log, and there is no way to disable this parsing. Similarly on live this happens on nearly any fight we do.

I have not run damage meters all expansion as the parsing from these mods also impacts performance, even on high-end gaming machines. This is not nearly as noticeable in 10 mans.
Edited by Arij on 8/19/2013 1:52 PM PDT
Reply Quote
90 Human Monk
10260
+1

There are significant noticeable differences, even within this expansion. IE: last tier when we were able to underheal, far less input lag during high damage phases whereas fights like Empress were nearly unplayable.

It's far more pronounced this tier with the fact that we can't really underheal anything and have people going full blast all the time.

I feel like the rampup of smart healing this expansion is partially part of the problem.


+1

As a programmer, it's my opinion that smart-heals seem to be the culprit here. Presumably, the game on the server-side uses some synchronization mechanism (synchronizing access to a resource has a significant performance cost) to minimize overhealing from smart heals. As a result, each healing event has to execute sequentially, causing performance issues when the number of healing events increases, as it does in a 25 man setting. The biggest offenders that come to mind are Tranquility and Divine Hymn.

I'm afraid this might be an engine design or architecture issue that can't be fixed by simply upgrading server hardware.

@OP: Can you repeat the "all healers (No RPPM)" test, except this time have every healer only cast their spec's single-target "Greater Heal" equivalent?
Edited by Snapmonk on 8/19/2013 2:02 PM PDT
Reply Quote
100 Blood Elf Monk
17985
I mean if you really think about smart healing. As a monk, At any given "full blast" phase" - I am healing 15+ people every second with RM alone, add in individual Uplift heals on every target, add in Chi Torpedo on all 25 people.

For high damage periods specifically where all healers are going HAM, you can essentially view it as each individual healer (average 6 or so in a 25m raid) healing 25 people every second-few seconds, each of those heals being an individual combat log event.

Pepper that with events like ELW \ IH \ everything else that players apply + the healer cloak next patch, you have the combat log essentially being destroyed by events being registered every second. That is a fairly large load for the servers to handle, especially when you start adding up the likely hundreds of guilds raiding at peak hours.
Reply Quote
90 Worgen Hunter
12325
I suspect it's a combination of smart heals and possibly of the combat log changes (maybe the amount of data, but also possibly that the game simply has to collect more data into the combat log events themselves, too).

We may test a few more variations this week, so if there are other ideas, we'd love to hear them. Of course, other guilds taking a few minutes to test them out helps, too :)

Historically this reminds me of Ultraxion and Ysera's buff, before they fixed it; that lag felt similar (though I played a healer at the time and wasn't watching GCDs the same). It may be worse in Maegara because of the head's buff that spreads healing (which, iirc, isn't tweaked like Ysera's to not spam every event).
Reply Quote
90 Blood Elf Monk
13845
Question has anyone tried monkeying around in the code to turn off the combat log and see if that fixes it, not sure if its even possible and it wouldn't be ideal since wol is pretty much invaluable during progression but possibly when going for a kill if there was no input lag might be worth it.
Reply Quote
100 Tauren Warrior
21385
I don't think you can simply turn off combat log parsing, other than controlling whether it writes to a .txt or not.
Edited by Arij on 8/19/2013 2:15 PM PDT
Reply Quote
100 Blood Elf Monk
17985
Even if you were to "turn off the combat log" - the events still have to register to the server.
Reply Quote
90 Undead Monk
13535
All I know is that in T13 and T12, I could use loggerhead (combatlog addon) easily with no lag. T14 was pretty much the same too, some fights would be a little bad, but nothing unplayable.

In T15, I've noticed that it's a lot worse. Some fights where I don't lag on at all, once I turn LH on, the lag is pretty noticeable. So I don't even bother with it anymore.
Reply Quote
90 Night Elf Druid
16870
+1

During times of high input lag, as a Resto Druid my playstyle changes completely. All of the timers and short-cooldown abilities that I cut super-close can no longer be fully optimized, and my ability to spam 1-second GCD HoTs such as Rejuvenation is diminished significantly. This has become very noticeable in 5.2 and I really hope the problem finds an appropriate remedy soon.
Reply Quote
100 Gnome Mage
20080
Does this really old command do anything? Always had it macroed into my AI, not sure if it actually helps at all because I still get some input lag during Megarea's rampage and elsewhere. Haven't raided without it though. I agree, input lag is a top priority issue that needs tackling.

/run COMBATLOG:UnregisterEvent("COMBAT_LOG_EVENT")
Edited by Digerati on 8/19/2013 2:46 PM PDT
Reply Quote
90 Pandaren Priest
14930
08/19/2013 02:00 PMPosted by Valen
I suspect it's a combination of smart heals and possibly of the combat log changes (maybe the amount of data, but also possibly that the game simply has to collect more data into the combat log events themselves, too).


It's been a long, long time, but...I could swear CoH was smart during BC, at least towards the tail end of it. And I do not recall it causing input lag like this, even with 3-4 Priests spamming the crap out of it. Wasn't Chain Heal also smart?
Reply Quote
100 Tauren Warrior
21385
Megaera is by far the worst for me this tier in terms of the lag I get. It's not as bad on any other fight for me, but exceptionally noticeable there. Especially during rampage.
Reply Quote
90 Pandaren Shaman
5330
08/19/2013 02:51 PMPosted by Tiriel
I suspect it's a combination of smart heals and possibly of the combat log changes (maybe the amount of data, but also possibly that the game simply has to collect more data into the combat log events themselves, too).


It's been a long, long time, but...I could swear CoH was smart during BC, at least towards the tail end of it. And I do not recall it causing input lag like this, even with 3-4 Priests spamming the crap out of it. Wasn't Chain Heal also smart?


There's chain heal smart, and then there's 25 people in healing rain, affected by our mastery and with earthliving, then popping Ancestral Guidance and copying all of it again.
Reply Quote
100 Human Paladin
10625
+1. I really hope this gets some attention since there have been dozens of threads on this topic - and I was beginning to think input lag was just something we had to put up with.

GC tweeted a long time ago that one of wow's greatest strengths was its responsiveness - I agree and 25H has not been very responsive since 5.2 =/ Nice testing and details there; I really really hope this is helpful to any blues who read this thread.

Historically this reminds me of Ultraxion and Ysera's buff, before they fixed it; that lag felt similar (though I played a healer at the time and wasn't watching GCDs the same). It may be worse in Maegara because of the head's buff that spreads healing (which, iirc, isn't tweaked like Ysera's to not spam every event).


Agree, it reminds me a ton of both Ultrax and Ysera on Madness; felt the same.

Thanks for all the details Valen, really glad you guys took the time to collect all this info.
Reply Quote
100 Blood Elf Monk
17985
Smart healing in BC was EXTREMELY limited compared to what it is now. It wasn't even this bad in Cata \ WOTLK.

It ramped up SIGNIFICANTLY this expansion.
Reply Quote
100 Orc Hunter
19570
I thought I was the only that experienced this and just kind of grew to accept it.. This sounds exactly like what I've experienced. For example I'll press a cooldown like glaive toss, the GCD for it completes, I press my next button like a kill command for example, the GCD for kill command will go about half-way through and then the cooldown countdown will begin on my glaive toss.

Great job for looking into this and trying to determine the cause of the problem
Reply Quote
90 Pandaren Priest
14930
There's chain heal smart, and then there's 25 people in healing rain, affected by our mastery and with earthliving, then popping Ancestral Guidance and copying all of it again.


Yesss...although Healing Rain isn't smart. But someone else mentioned Ysera's buff back in Cata, and I do remember that at the beginning, it was like healing through jello when someone would get that buff. So maybe it's less "smart healing" and simply "there's too much healing going on in general."

Still think that the most obvious culprit is the changes to the combat log that they implemented in 5.2.

Smart healing in BC was EXTREMELY limited compared to what it is now. It wasn't even this bad in Cata \ WOTLK.

It ramped up SIGNIFICANTLY this expansion.


True. But the OP had said that they tried several different combinations of healers vs. DPS. I may have misread, did they not try "very few healers" vs. "all healers"? It just seems like it's healing in general and not necessarily smart healing.
Reply Quote

Please report any Code of Conduct violations, including:

Threats of violence. We take these seriously and will alert the proper authorities.

Posts containing personal information about other players. This includes physical addresses, e-mail addresses, phone numbers, and inappropriate photos and/or videos.

Harassing or discriminatory language. This will not be tolerated.

Forums Code of Conduct

Report Post # written by

Reason
Explain (256 characters max)

Reported!

[Close]