Writing on Monday Nov 26, but this actually happened a week ago on Tuesday Nov 20.
Many things happened at work today, but the most remarkable was probably the fact that we had a “bad response time” event (which had actually started on Saturday night). Many people looked at it and fussed over it; Bruce’s observation was that it was between the front end and the outside world, since Gladiator showed no change in back-end or total round-trip time. This was confirmed by Queryint logs which showed backend performance much better, not worse, starting Friday afternoon. My only guess is that it was a problem with one of our ISPs and could we try switing to the other ISP, at least in New York. This was not done, and no real progress was made until after 5PM when Bills called an emergency meeting with me and Brett.
During the meeting we found out that according to Keynote, Sunnyvale response times were way higher than New York, and that the 4-sec “average” response was not typical; it was more like 1 sec most of the time, but with some 25 and 40 sec responses mixed in. Since it was pretty clearly in Sunnyvale and not in New York, that meant that we could place more load on New York, to take advantage of the better response there, at least to the point where times in NY would degrade. This immediately improved the user experience. The netops team then found a problem with one netswitch in NY, which they swapped the management module and everything got better. I waited an hour to confirm and then set the split back to normal.
All of this lasted until about 9:15 but I actually got to leave the office at 6:30 and do the rest from home. This was good, because C had made Beef Stew and bread sticks (yum) which we all ate greedily.