Network Issue Announcement

Andy R

Thinks s/he gets paid by the post
Site Team
Joined
Jan 31, 2007
Messages
1,220
Location
Dallas, Tx
I just wanted to post an update about reports of super slow page loads at our site. I was able to speak with some members and get some trace route info that helped identify the problem. The server admins looked into the results and they found an issue between our servers and some members. The server admins then contacted the upstream providers and they responded saying "These issues are occurring because the connection between our networks is full."

So here is a brief overview of what is happening. The network the servers use a system that sends the information back and forth over the shortest path in order to try to speed things up. Like taking the shortest road to work. The problem is sometimes there is the equivalent of an online traffic jam and even though the packets are trying to take the shortest path they cannot get through. So they sit there waiting and the pages load really slow.

I am looking at options to change to a different network that uses route optimization to find the fastest path (not always the shortest) to speed things up. This is more like watching the news and finding out where traffic is really bad and picking a new route to work. It might be longer but there is less traffic and you get their faster.

In the short term there is nothing I can do. I apologize for the issues that this backbone provider is causing. Please rest assured I am looking at solutions and hopefully we can get upgraded to a higher quality network as soon as possible.

If you would like to contact us and let us know you are having problems please use the contact us link at the bottom of the page. We can then run some tests so we have more information to understand where the bottlenecks are occurring.

Thanks for you patience,

Andy R
 
Our ISP is Hawaiian Telcom, which used to be Verizon until bought out by Carlyle a year or two back. So HT probably rents their DSL bandwidth from Verizon and uses whatever fiber they can get between here and the Mainland.

Here's what's become my typical morning routine since the software change:
- Bring up http://early-retirement.org/forums/index.php around 5-6 AM HST to find 30-60 threads with new posts.
- Start clicking through latest posts in each thread. Screen takes a minimum of 2-5 seconds to redraw between viewing posts and going back to "New Posts" list.
- Answer a post and click on "New Posts" link again. Screen takes 5-10 seconds to redraw when putting up my post and again when regenerating list of new posts.
- After a few iterations of the previous steps, board slows down even more-- enabling the playing of a game of Windows Solitaire between screen refreshes.
- At this point, at least twice a week IE6.0 returns "Cannot find server or DNS error" screen.
- Give up in disgust and head over to Early Retirement Status Forum :: Index. Note that the board (using PHPBB) redraws immediately and links take fractions of a second to come up. Same behavior observed at Raddr's board (Raddr's Early Retirement and Financial Strategy Board :: Index).

Andy, in the last six months you've changed software and servers and now you're working the path issue. However my crappy board experience began when the new software was implemented, I can't really see any improvement in servers, and there aren't too many paths to get packets to Hawaii.

Here's another question that wasn't an issue under the previous management-- how's the speed of the board affected by the ad server?

I wonder how many new posts per day there were under the old software version compared to the new software version. I wonder how many users are simultaneously logged in on the new board compared to the old board. I wonder how overall traffic has changed since the new software and since the new servers. There are certainly fewer of my posts per day since the new software, and I'm certainly spending less time here than I used to. Notice that you don't see CFB or Brewer or REW around here as often, either?

This board has a lot of history and a couple dozen veteran posters. However we've also kept accounts at the other boards, too, and it's not too difficult to recreate the coffee-shop atmosphere at those boards. At some point the veterans are going to get tired of waiting for this situation to improve and they're going to wander off to the boards that have a faster response.
 
Our ISP is Hawaiian Telcom, which used to be Verizon until bought out by Carlyle a year or two back. So HT probably rents their DSL bandwidth from Verizon and uses whatever fiber they can get between here and the Mainland.

I can't really see any improvement in servers, and there aren't too many paths to get packets to Hawaii.
I would be happy to work with you to get some trace route information (it will give us the time it is taking for packets to travel from router to router and tell us if there is any issues with the connection. If you would like to help me out and go through this please send me an email and I will let you know the commands to run to get that info so I can give it to the server admins.
how's the speed of the board affected by the ad server?
It should not be effected very much. The ads come from Google and should only ad the same about of time to render as they do on any other site with Google ads.
I wonder how many new posts per day there were under the old software version compared to the new software version.
Attached is a PDF showing posts per month since the sites inception.
I wonder how many users are simultaneously logged in on the new board compared to the old board.
I have also attached a pdf that shows the number of members that visit each day.
At some point the veterans are going to get tired of waiting for this situation to improve and they're going to wander off to the boards that have a faster response.
I understand that you all have options on where you post and want the site to run as fast as possible. As I said "I am looking at solutions and hopefully we can get upgraded to a higher quality network as soon as possible."

If you would like to contact us and let us know you are having problems please use the contact us link at the bottom of the page. We can then run some tests so we have more information to understand where the bottlenecks are occurring.

I am trying to gather as much information about the network issue, so it would be helpful if you would use the contact us form so I can gather more information which will help me understand to what extent the network issue is effecting user experience.
 

Attachments

  • users per day.pdf
    29 KB · Views: 23
  • posts by month.pdf
    30.8 KB · Views: 22
Nords,

I asked a server admin that I know who lives in Hawaii about the issue and here are the email he sent.
Sent: Friday, August 17, 2007 12:05 PM
Subject: Trace Route from Hawaii
Chris,

I have a member who is experiencing very slow page loads in Hawaii. I was wondering if you could run a trace route for me to early-retirement.org and paste the results for me to analyze?

Also, do you know what kind of connections are used for Hawaii? Is it underwater cable or satellites that cannot you all to the main land? Are there many backbones or just a few?

Thanks for you help.

Andy
His response:
Hi Andy -
Couple of quick comments:
First - Hawaii is one of the "best" connected locations in the world, as it is the mid point fiber connection between Asia and North America. So bandwidth is generally not the issue (specifically).
Second - not all of HI is created equal. It depends very much on WHO you are trying to connect to. Traceroutes do not work the same both ways - ie. for me to trace to early-retirement.org will give you my route, based on my connection here. The better question is, what ISP does the user use? Is it Hawaiian Tel? Or Oceanic cable... or one of the universities? Lease line? They are have different isp connections, which governs the routing and possibly the pipes they can traverse.
For my connection (commercial) I get 150ms latency to that domain - which is quick.
Chris
I replied with:
Chris,

Thanks for the info. Here is what the member posted about his ISP
Our ISP is Hawaiian Telcom, which used to be Verizon until bought out by Carlyle a year or two back. So HT probably rents their DSL bandwidth from Verizon and uses whatever fiber they can get between here and the Mainland.

Any comments on that network or thoughts about what could be causing this?

Andy
and he responded with:
What I can tell you is - latency from Hawaiian Tel to that domain is *at least* 500ms, which gives you an idea of how much worse they are compared to other providers.
They are notoriously bad at the moment. They do not even host their own email (could not handle load) and have had much trouble with both retail and enterprise customers. Interestingly - their DNS servers were up and down this morning - and I spent some time working out solutions for my customers.
Chris

Nords, can we please touch base via email so that I can get your trace route info to pass to the server admins? I am trying to best understand this issue and it's very likely that your ISP connects the backbone provider where the traffic jam is happening. A trace route will provide me the data I need to help see if this theory is true.
 
Man, I can't believe the number of posts per day in the past year. No wonder I don't try to keep up anymore.

Andy, for me sometimes it is fast (today for example) and other times very slow.
 
What I can tell you is - latency from Hawaiian Tel to that domain is *at least* 500ms, which gives you an idea of how much worse they are compared to other providers.
They are notoriously bad at the moment. They do not even host their own email (could not handle load) and have had much trouble with both retail and enterprise customers. Interestingly - their DNS servers were up and down this morning - and I spent some time working out solutions for my customers.
Chris
I e-mailed a tracert run to your SocialKnowledge.net address... about 1165 msec over 15 hops. Let me know if I need to try other paramaters or times.

HT's DNS server problem today was statewide for at least 30 minutes (20 minutes of which I spent in the hold queue), and I know what Chris means about the commercial priority over us "regular" customers. HT is not exactly a shining example of a company being taken over by private equity. Oceanic's RoadRunner cable on our street floods out during rainstorms and I gave up on them after a couple years of attempts to get it fixed. If he has any ideas on where to get shorter latency for $31.20/month I'd love to use it!

EDIT: I got your e-mails, thanks for the server IP addresses, and I'll keep an eye on running tracert & speed tests when E-R.org seems slow from here.
 
Link to interesting article:

Slashdot | How Much Are Ad Servers Slowing the Web?


"Most of the times I have a problem with a Web page loading slow or freezing temporarily, I look down at the status bar and see that it's waiting on an ad server, Google Analytics, or the like. It seems to me that on popular Web sites the bottleneck is overwhelmingly on the ad servers now and not on the servers of the site itself. In my opinion we need a better model for serving ads — or else these services need to add more servers/bandwidth. Are there any studies on the delay that 3rd-party ad servers are introducing, or any new models that are being introduced to serve ads?"
 
Slow ad servers should produce slow ads, not slow sites. However, many sites will not draw the page until the ad is loaded. (I presume this is forced through javascript.) So in those cases the ads do adversely affect the user experience.

As far as I can tell, that is not the case here. For me the Google ads fill in after the rest of the page is drawn.
 
Just a follow up on my speed to connect. I just logged on to the site and it took 2 min 45 secs. To get to the point of actually being able to post, an additional 4 plus minutes. And this is fast compared to most days.

I sympathize with the Andy as I feel he is trying to get this corrected, but I think I'm gonna take a vacation from here. Someone email me when this gets corrected. Not giving up on the site, but this is just too slow for me.
 
8/18/07-Steve: FYI I am in Central Ohio and can confirm the page switches have really gotten slow over the past 8-10 days or so. I have HS Time Warner Service so it is not inordinately slow I just have to remember to have a bit of patience waiting for the pages to load.
 
Andy, I've run tracert from my Oahu IP address to other ER & financial discussion boards. (A poster researching early retirement would be quite likely to run across these same boards.) Although I've been on some of these boards for years with a succession of computers, this very rudimentary study of course would need many more data runs under many different times & load conditions to have any statistical significance. I ran these tracert commands on a Saturday afternoon, sequentially, one run each. I have no idea where these servers are located, how they're configured, or how they run. If it'll help your analysis then I can e-mail the DOS screen dumps, but I understand if this isn't much use to you or your server hosts.

Here are the numbers:
E-R.org: 1261, 1349 (both servers)
M*: 800+ before timeouts
Bogleheads: 1412
FIRECalc: 1217
REHP: 1429
FundAlarm: 1303
ER status board: 877
RADDR's board: 893

I ran tracert for a local board (whose servers seem to be in Los Angeles) that's also running vBulletin. (I don't view any other boards using vBulletin.) Hawaii Threads: 458.

You would think that numbers varying from 893-1429 msec would not be very noticeable-- a difference of barely more than half a second. When I go to most of those boards, I can't really tell whether there's a lag. You would also expect Hawaii Threads to be pretty snappy compared to the rest, yet it's almost as slow as E-R.org.

However there's very little lag on FIRECalc, REHP, FundAlarm, the ER status board, and RADDR's board. Even though the routes are roughly the same delay as E-R.org, all of those other boards are faster and some are much faster. The browsing routine is click, new screen, click, new screen, with no lag. The screen draws almost as fast as I click. FundAlarm is particularly snappy because Roy Weitz has been using the same barebones software since 1996. I've noticed that the PHPBB boards are pretty fast, too, particularly RADDR's & the ER status boards.

Coincidentally the slowest boards (almost as slow as E-R.org) seem to be M* and Hawaii Threads. One is clogged with animated ads and the other is the only other vBulletin board I browse.

Here's another interesting behavior on my IE6.0 browser. When I click on an E-R.org link, the status bar fills up its progress box with the little green blocks. Sometimes it's slow, sometimes they zip right into the box. No matter how fast the box fills up, once it's full the browser sits there for at least a second, then it clears the screen. Then it fills in the background color and draws a graphic or two. Then, like a C+ program on a 1980s Silicon Graphics Unix box, it finally fills in the rest. Instead of click/new screen it's click, wait on boxes, draw, draw, fill, click, wait, draw, draw, fill. I have the same behavior from M* and Hawaii Threads. Again I don't know if it's the adserver or the vBulletin bloat. It's not my computer or my graphics card because I don't have this issue with any other discussion boards or websites. I do not believe that this delay is related to my ISP's packet routing, either, because I'm seeing about the same tracert numbers everywhere.

Here's what I think, and again this is just my opinion. First, I think that E-R.org is getting slower as we get more users. I know more people have signed up and I remember that at the end of 2006 we were starting to set records for numbers of people logged in at once. It's a little difficult for me to draw conclusions from the logs you've provided but I suspect the trend is continuing. It may noticeably slow down a server when a couple dozen people are simultaneously trying to view the same thread.

Second, I think E-R.org is crippled by [-]slow, bloated[/-] feature-rich vBulletin software and an adserver. These two changes have dramatically slowed things down from the Dory days.

I think that the only way to speed up the current configuration is to host servers that are so blindingly fast (or have so much capacity) that they can even run vBulletin and handle advertising. Another option would be to scrap vBulletin in favor of something like PHPBB. A third option would be to configure the servers so that the people who see ads are on a different server than everyone else-- the people who shouldn't see ads (like me) would be on a separate server that wouldn't have to coordinate with an adserver. But these changes would only be appreciated by the users, and they might not be feasible from a sysadmin perspective.

While it's easy for me to armchair quarterback, I also appreciate that you have no incentive to change. The only people who notice that the board is slow are those of us who were around when the board was fast. The new posters may notice that things are slow from time to time, but there are many complicating factors that would cause anyone less than a sysadmin to accept the status quo. It's only us veteran posters who find things unacceptably slow all the time. The incentive problem is that us veterans would rarely, if ever, click on an ad while the new users are much more likely to click on ads. There's no incentive to make the veterans happy because they don't generate any revenue.

I like vBulletin's "Ignore Poster" feature but I'm ready to give it up for more responsive software. I've always felt that the vBulletin change was for SocialKnowledge's benefits of standardization & board administration and not for the users. The users have had to deal with a lot of learning and conversion problems and frankly from a user's perspective the conversion hasn't been worth the results. If this was Microsoft Vista then I'd be going back to XP, asking for a refund, and looking into Linux.

I've never had a real job, but if I was maximizing a discussion board's revenue model then I'd want many new users clicking on ad links and I wouldn't care about the turnover. I'd actually prefer that the veterans move along-- they wouldn't generate revenue and their attempts to use the board would just slow down the servers that should be supporting the new ad-clicking users. Of course the board would eventually become filled with new users who have no sense of what's been previously discussed, who don't bother to read the archives, and who keep repeating the same topics ad nauseum. That might also encourage the veterans to move on in search of fresh content (problem solved!). The veterans would feel that their nice quiet bar has expanded into a noisy, chaotic nightclub... but you & I have already had this discussion.

Again it's just my opinion, and among the hordes of new users my opinion is definitely in the minority. While your routing work may help some users (particularly AOL), I think you're just tweaking the margins without solving the root issues.

But I'd be interested to hear what the board's other veteran posters think.
 
The board is fast for me. Latency is low from Seattle, and rendering is quick for both IE and Firefox.

Of course, sometimes the board is unreachable and sometimes I see 500 errors, etc. But I assume that's when you're having various combinations of IT problems or high loads due to the shared server....
 
Andy, I've run tracert from my Oahu IP address to other ER & financial discussion boards... I have no idea where these servers are located, how they're configured, or how they run. If it'll help your analysis then I can e-mail the DOS screen dumps, but I understand if this isn't much use to you or your server hosts.
Thanks for posting those numbers, as you can see E-R.org is up there at the high end of latency scale which I hope we can make faster buy moving to a route optimized network. I should get some bids on this next week and will understand the work needed to change over. Unfortunately (or fortunately) I am heading back to the States a week from Monday and will not be available until after Labor Day to schedule a network move. If I can figure out the logistics and get it taken care of earlier I will.

Here is a list of objects downloaded when you go to the /forum/ listing page:

Page Objects

QTY SIZE# TYPE URL
1 42345 SCRIPT http://www.early-retirement.org/forums/clientscript/vbulletin_global.js?v=367
1 16986 HTML http://www.early-retirement.org/forums/
1 16927 SCRIPT http://www.early-retirement.org/forums/clientscript/vbulletin_menu.js?v=367
1 9661 SCRIPT http://www.early-retirement.org/forums/clientscript/vbulletin_md5.js?v=367
1 6795 SCRIPT http://www.early-retirement.org/forums/clientscript/vbulletin_read_marker.js?v=367
1 6232 SCRIPT http://www.google-analytics.com/urchin.js
1 5303 IMG http://www.early-retirement.org/sk/forums/images/misc/vbulletin3_logo_white.gif
4 4751 SCRIPT http://pagead2.googlesyndication.com/pagead/show_ads.js
1 3495 IMG http://www.early-retirement.org/images/erlogo.gif
1 1683 IMG http://www.early-retirement.org/sk/forums/images/statusicon/forum_old_lock.gif
1 1649 IMG http://www.socialknowledge.net/images/community_logo_badge.gif
12 1628 IMG http://www.early-retirement.org/sk/forums/images/statusicon/forum_old.gif
1 1623 IMG http://www.early-retirement.org/sk/forums/images/statusicon/forum_new.gif
1 1461 IMG http://www.early-retirement.org/sk/forums/images/misc/stats.gif
2 1440 IMG http://www.early-retirement.org/sk/forums/images/misc/whos_online.gif
1 1004 IMG http://www.early-retirement.org/sk/forums/images/misc/navbits_start.gif
11 964 IMG http://www.early-retirement.org/sk/forums/images/buttons/lastpost.gif
1 642 IMG http://www.early-retirement.org/sk/forums/images/buttons/collapse_tcat.gif
3 580 IMG http://www.early-retirement.org/sk/forums/images/buttons/collapse_thead.gif
9 43 IMG http://www.early-retirement.org/sk/clear.gif
9 43 SCRIPT http://www.early-retirement.org/sk/clear.gif

As you can see it's JavaScript that is the culprit on vBulletin pages and I will look for a solution to consolidate all the JavaScript files into less files and compress them so they will be a fraction of the size. I will see if I can find someone to work on this project for me ASAP. Do we have any JavaScript gurus out there?

As for not having an incentive to make things run as fast as possible, that is incorrect. I want the site to run as fast as possible and have more of an incentive then anyone. I don't want our long time members like yourself to experience the issues you are having and I am working to make things as fast as possible.
 
Andy, here are some tips on improving rendering times with multiple images and scripts:

High Performance Web Sites: Rule 6 – Move Scripts to the Bottom (Yahoo! Developer Network blog)
Thank you for posting that, I will include this in out work next week.

I have just emailed a programmer about doing this to compress the JavaScript of vBulletin. I will also keep moving forward on moving to a better network.

A short term solution could be the use of Google's free Web Accelerator which should cache the common files and not request them on every page load.
 
Andy, your javascript files are being served as MIME type application/octet-stream instead of text/javascript or application/x-javascript. I don't know if this is causing a problem, but it's not the correct MIME type.

Also, I had some weird delays when trying to download them directly.

Code:
GET /forums/clientscript/vbulletin_global.js?v=367 HTTP/1.1
Host: www.early-retirement.org
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.early-retirement.org/forums/f32/network-issue-announcement-29541-2.html
Cookie: [cookies omitted]

HTTP/1.x 200 OK
Content-Type: application/octet-stream
Accept-Ranges: bytes
Content-Length: 42345
Date: Sun, 19 Aug 2007 22:17:30 GMT
Server: Apache 1.99-beta

Apache? I thought you were using lighttpd.
 
Andy, your javascript files are being served as MIME type application/octet-stream instead of text/javascript or application/x-javascript. I don't know if this is causing a problem, but it's not the correct MIME type.
Thanks for the feedback, I will look into this.
Apache? I thought you were using lighttpd.
The web server is indeed lighttpd. That is shown just to throw off people snooping around.
 
I got an IP to the same data center over another network. I have run trace route from down here and it seems to be quite a bit faster (much less hops).

If would be great if you all could help me test this other route optimized network vs the one we are on now. In order to do this you will need to run Trace Route.


I was given an IP on another network to check out. I have tested it a handful of time from down here in S America and also from a desktop that I can remote into in Dallas. Both of the tracer routes showed much better results for me at this time. It would be very much appreciated if you could run tracert a few time over the next few days for both of the following and let me know your results.

I will need info on our current network which you can get by running this:

tracert early-retirement.org

and then the trace route on one of the proposed new networks:

tracert 208.78.43.1

I VERY much appreciate your feedback. Also, please note I am going to work at getting the JavaScript (which makes up 3/4 of the page load) optimized this week and loading at the end of the page. I do sympathize with you all and please rest assured it is not my intent to run anyone off with slow page loads. It's been a crazy two months for me I'll keep pressing forward getting things as fast as possible. Thanks for you patience during all this.
 
I will need info on our current network which you can get by running this:
tracert early-retirement.org
and then the trace route on one of the proposed new networks:
tracert 208.78.43.1
Good grief, it's nearly half the time-- 633 vs 1175.
I'll run some more over the next few days.
 
Good grief, it's nearly half the time-- 633 vs 1175.
I'll run some more over the next few days.
Thanks, I got a PM and the speed was nearly the same from that member. I think a lot is going to depend on the peering agreements between the internet backbone providers. Some people are seeing the current server just fine but then others are hitting horrible traffic jams. I know the answer is going to be to switch to the route optimized network but the more data you all can provide in the days to come the better it will help me understand this issue.

Also, please post the amount of hops needed to reach the destination. The route optimized network should be significantly less in most situations.

BMJ, I had a couple people check the Javascript and it was ok for them. Although I am going to get more info this week about compressing the JavaScript files and loading them at the end of the page rendering they should be caching on nearly all browsers. The headers being sent are not telling the browser to refresh those files.

Nords, just out of curiosity do you have Firefox installed? I wonder if for some reason the JavaScript (75% of the page load) is not caching for some reason for you and there for trying to download on every page)?
 
Nords, just out of curiosity do you have Firefox installed? I wonder if for some reason the JavaScript (75% of the page load) is not caching for some reason for you and there for trying to download on every page)?
No, just IE. Never bothered with Firefox-- never had a compelling reason to fiddle with it...
 
I just wanted to let you all know that we will be switching to a route optimized network this weekend. I will post more as soon as the info is available. The server are not running at very high loads but I also plan to get a couple more in place in September.
 
Back
Top Bottom