Recently the company I work for decided to update one of our websites from Flash to HTML5 with MVC backend. Since there was going to be a lot of AJAX request with the server do some heavy processing we needed a good implantation of a AJAX handler for passing data from the server to the browser. We could also see a use for having constant connections to the server in the future to add technical support and communication between users, basically a hodgepodge of features that may or may not be added someday.
Obviously the easiest thing to do was add something like JQuery and use their AJAX calls, there were other choices too including to build our own. After some research I decided on SignalR for a variety of reasons but partly for it robustness, large community, and its ability to handle different browsers.
So after building the site we published a test version using our Rackspace account. So far so good, there was a slight uptick in CC but nothing unexpected. The only slight problem was the system kept losing connections which was easily solved by adding a reconnection function to the original connect. Obviously poor internet connections and intermitting connection would cause this, mostly expected. Things were going well so we launched the full site.
After a couple weeks we received the first bill which was about double our normal, ok time to start investigating. Looking at the CC it was obvious that the CC started to ramp up from the day we first launched the site. Now we was expecting a jump but not like this, we were looking at sub one thousand CC a day to better than five thousand per day and it was still climbing, it seems to have peaked at eighteen thousand a day. Note this was aggregated by the fact that we could not see the CC usage until about three days after it occurred.
After spending some time with the helpful people at Rackspace it was determined the problem was the re-connection time in SignalR. The connection time was sometime slow but the reconnection time was running for up to seven minutes. This was just from a sampling of log records, it is possible that in some cases it could be longer. We were unable to reproduce this on test systems where connection and re-connection times were normally less than a second but something had to be done.
So we yanked SignalR and replaced all the calls with standard JQuery AJAX calls and uploaded the new site. Mind you it would still be days before we could see a difference and the log files where still showing SignalR request but at least they were 404 now, we assume as the browsers refreshed their caches that would end.
So after a week our CC rate is down to the six hundred range and the site is still working well. We will obviously have to either remove some of our future dream feature or find another way of implementing them.
In conclusion I don’t know if the problem was SignalR, Rackspace hosting, or something we did. I don’t think it was something per say done on our part because of the few changes made to fix it, basically the removal of SignalR. But it our humble opinion SignalR and Rackspace hosing are not compatible but I do think there is a deeper problem with SignalR connection/re-connection function. I would love to dig into this deeper but we can’t simulate a large load and the Rackspace environment here so it would just have to stand as it is.
first test
second test
third test