A large fraction of the web is unusable for people on slow connections

A couple years ago, I took a road trip from Wisconsin to Washington and mostly stayed in rural hotels on the way. I expected the internet in rural areas too sparse to have cable internet to be slow, but I was still surprised that a large fraction of the web was inaccessible. Some blogs with lightweight styling were readable, as were pages by academics who hadn’t updated the styling on their website since 1995. But very few commercial websites were usable (other than Google). When I measured my connection, I found that the bandwidth was roughly comparable to what I got with a 56k modem in the 90s. The latency and packetloss were significantly worse than the average day on dialup: latency varied between 500ms and 1000ms and packetloss varied between 1% and 10%. Those numbers are comparable to what I’d see on dialup on a bad day.

Despite my connection being only a bit worse than it was in the 90s, the vast majority of the web wouldn’t load. Why shouldn’t the web work with dialup or a dialup-like connection? It would be one thing if I tried to watch youtube and read pinterest. It’s hard to serve videos and images without bandwidth. But my online interests are quite boring from a media standpoint. Pretty much everything I consume online is plain text, even if it happens to be styled with images and fancy javascript. In fact, I recently tried using w3m (a terminal-based web browser that, by default, doesn’t support css, javascript, or even images) for a week and it turns out there are only two websites I regularly visit that don’t really work in w3m (twitter and zulip, both fundamentally text based sites, at least as I use them).

More recently, I was reminded of how poorly the web works for people on slow connections when I tried to read a joelonsoftware post while using a flaky mobile connection. The HTML loaded but either one of the five CSS requests or one of the thirteen javascript requests timed out, leaving me with a broken page. Instead of seeing the article, I saw three entire pages of sidebar, menu, and ads before getting to the title because the page required some kind of layout modification to display reasonably[^S]. After refreshing the page twice, the page loaded as it was supposed to and I was able to read the blog post, a fairly compelling post on eliminating dependencies.

Complaining that people don’t care about performance like they used to and that we’re letting bloat slow things down for no good reason is “old man yells at cloud” territory; I probably sound like that dude who complains that his word processor, which used to take 1MB of RAM, takes 1GB of RAM. Sure, that could be trimmed down, but there’s a real cost to spending time doing optimization and even a $300 laptop comes with 2GB of RAM, so why bother? But it’s not quite the same situation it’s not just nerds like me who care about web performance. In the U.S., AOL alone had over 2 million dialup users in 2015. Outside of the U.S., there are even more people with slow connections. I recently chatted with Ben Kuhn, who spends a fair amount of time in Africa, about his internet connection:

I’ve seen ping latencies as bad as ~45 sec and packet loss as bad as 50% on a mobile hotspot in the evenings from Jijiga, Ethiopia. (I’m here now and currently I have 150ms ping with no packet loss but it’s 10am). There are some periods of the day where it ~never gets better than 10 sec and ~10% loss. The internet has gotten a lot better in the past ~year; it used to be that bad all the time except in the early mornings.

Speedtest.net reports 2.6 mbps download, 0.6 mbps upload. I realized I probably shouldn’t run a speed test on my mobile data because bandwidth is really expensive.

Our server in Ethiopia is has a fiber uplink, but it frequently goes down and we fall back to a 16kbps satellite connection, though I think normal people would just stop using the Internet in that case.

If you think browsing on a 56k connection is bad, try a 16k connection from Ethiopia!

Everything we’ve seen so far is anecdotal. Let’s load some websites that programmers might frequent with a variety of simulated connections to get data on page load times. webpagetest lets us see how long it takes a web site to load (and why it takes that long) from locations all over the world. It even lets us simulate different kinds of connections as well as load sites on a variety of mobile devices. The times listed in the table below are the time until the page is “visually complete”; as measured by webpagetest, that’s the time until the above-the-fold content stops changing.

URL Size C Load time in seconds MB FIOS Cable LTE 3G 2G Dial Bad :scream: 0 http://bellard.org 0.01 5 0.40 0.59 0.60 1.2 2.9 1.8 9.5 7.6 1 http://danluu.com 0.02 2 0.20 0.20 0.40 0.80 2.7 1.6 6.4 7.6 2 news.ycombinator.com 0.03 1 0.30 0.49 0.69 1.6 5.5 5.0 14 27 3 danluu.com 0.03 2 0.20 0.40 0.49 1.1 3.6 3.5 9.3 15 4 http://jvns.ca 0.14 7 0.49 0.69 1.2 2.9 10 19 29 108 5 jvns.ca 0.15 4 0.50 0.80 1.2 3.3 11 21 31 97 6 fgiesen.wordpress.com 0.37 12 1.0 1.1 1.4 5.0 16 66 68 FAIL 7 google.com 0.59 6 0.80 1.8 1.4 6.8 19 94 96 236 8 joelonsoftware.com 0.72 19 1.3 1.7 1.9 9.7 28 140 FAIL FAIL 9 bing.com 1.3 12 1.4 2.9 3.3 11 43 134 FAIL FAIL 10 reddit.com 1.3 26 7.5 6.9 7.0 20 58 179 210 FAIL 11 signalvnoise.com 2.1 7 2.0 3.5 3.7 16 47 173 218 FAIL 12 amazon.com 4.4 47 6.6 13 8.4 36 65 265 300 FAIL 13 steve-yegge.blogspot.com 9.7 19 2.2 3.6 3.3 12 36 206 188 FAIL 14 blog.codinghorror.com 23 24 6.5 15 9.5 83 235 FAIL FAIL FAIL

Each row is a website. For sites that support both plain HTTP as well as HTTPS, both were tested; URLs are HTTPS except where explicitly as HTTP. The first two columns show the amount of data transferred over the wire in MB (which includes headers, handshaking, compression, etc.) and the number of TCP connections made. The rest of the columns show the time in seconds to load the page on a variety of connections from fiber (FIOS) to less good connections. “Bad” has the bandwidth of dialup, but with 1000ms ping and 10% packetloss, which is roughly what I saw when using the internet in small rural hotels. “:scream:” simulates a 16kbps satellite connection from Jijiga, Ethiopia. Rows are sorted by the measured amount of data transferred.

The timeout for tests was 6 minutes; anything slower than that is listed as FAIL. Pages that failed to load are also listed as FAIL. A few things that jump out from the table are:

A large fraction of the web is unusable on a bad connection. Even on a good (0% packetloss, no ping spike) dialup connection, some sites won’t load. Some sites will use a lot of data! The web on bad connections

As commercial websites go, Google is basically as good as it gets for people on a slow connection. On dialup, the 50%-ile page load time is a minute and a half. But at least it loads when I was on a slow, shared, satellite connection in rural Montana, virtually no commercial websites would load at all. I could view websites that only had static content via Google cache, but the live site had no hope of loading.

Some sites will use a lot of data

Although only two really big sites were tested here, there are plenty of sites that will use 10MB or 20MB of data. If you’re reading this from the U.S., maybe you don’t care, but if you’re browsing from Mauritania, Madagascar, or Vanuatu, loading codinghorror once will cost you more than 10% of the daily per capita GNI .

Page weight matters

Despite the best efforts of Maciej , the meme that page weight doesn’t matter keeps getting spread around. AFAICT, the top HN link of all time on web page optimization is to an article titled “Ludicrously Fast Page Loads - A Guide for Full-Stack Devs”. At the bottom of the page, the author links to another one of his posts, titled “Page Weight Doesn’t Matter”.

Usually, the boogeyman that gets pointed at is bandwidth: users in low-bandwidth areas (3G, developing world) are getting shafted. But the math doesn’t quite work out. Akamai puts the global connection speed average at 3.9 megabits per second.

The “ludicrously fast” guide fails to display properly on dialup or slow mobile connections because the images time out. On reddit, it also fails under load : “Ironically, that page took so long to load that I closed the window.”, “a lot of … gifs that do nothing but make your viewing experience worse”, “I didn’t even make it to the gifs; the header loaded then it just hung.”, etc.

The flaw in the “page weight doesn’t matter because average speed is fast” is that if you average the connection of someone in my apartment building (which is wired for 1Gb internet) and someone on 56k dialup, you get an average speed of 500 Mb/s. That doesn’t mean the person on dialup is actually going to be able to load a 5MB website. The average speed of 3.9 Mbps comes from a 2014 Akamai report, but it’s just an average. If you look at Akamai’s 2016 report, you can find entire countries where more than 90% of IP addresses are slower than that!

Yes, there are a lot of factors besides page weight that matter, and yes it’s possible to create a contrived page that’s very small but loads slowly, as well as a huge page that loads ok because all of the weight isn’t blocking, but total page weight is still pretty decently correlated with load time.

What about tail latency?

My original plan for this was post was to show 50%-ile, 90%-ile, 99%-ile, etc., tail load times. But the 50%-ile results are so bad that I don’t know if there’s any point to showing the other results. If you were to look at the 90%-ile results, you’d see that most pages fail to load on dialup and the “Bad” and “:scream:” connections are hopeless for almost all sites.

HTTP vs HTTPs URL Size C Load time in seconds kB FIOS Cable LTE 3G 2G Dial Bad :scream: 1 http://danluu.com 21.1 2 0.20 0.20 0.40 0.80 2.7 1.6 6.4 7.6 3 https://danluu.com 29.3 2 0.20 0.40 0.49 1.1 3.6 3.5 9.3 15

You can see that for a very small site that doesn’t load many blocking resources, HTTPS is noticeably slower than HTTP, especially on slow connections. Practically speaking, this doesn’t matter today because virtually no sites are that small, but if you design a web site as if people with slow connections actually matter, this is noticeable.

How to make pages usable on slow connections

The long version is, to really understand what’s going on, considering reading high-performance browser networking , a great book on web performance that’s avaiable for free.

The short version is that most sites are so poorly optimized that someone who has no idea what they’re doing can get a 10x improvement in page load times for a site whose job is to serve up text with the occasional image. When I started this blog in 2013, I used Octopress because Jekyll/Octopress was the most widely recommended static site generator back then. A plain blog post with one or two images took 11s to load on a cable connection because the Octopress defaults included multiple useless javascript files in the header (for never-used-by-me things like embedding flash videos and delicious integration), which blocked page rendering. Just moving those javascript includes to the footer halved page load time, and making a few other tweaks decreased page load time by another order of magnitude . At the time I made those changes, I knew nothing about web page optimization, other than what I heard during a 2-minute blurb on optimization from a 40-minute talk on how the internet works and I was able to get a 20x speedup on my blog in a few hours. You might argue that I’ve now gone too far and removed too much CSS, but I got a 20x speedup for people on fast connections before making changes that affected the site’s appearance (and the speedup on slow connections was much larger).

That’s normal. Popular themes for many different kinds of blogging software and CMSs contain anti-optimizations so blatant that any programmer, even someone with no front-end experience, can find large gains by just pointing webpagetest at their site and looking at the output.

Conclusion

When I was at Google, someone told me a story about a time that “they” completed a big optimization push only to find that measured page load times increased. When they dug into the data, they found that the reason load times had increased was that they got a lot more traffic from Africa after doing the optimizations. The team’s product went from being unusable for people with slow connections to usable, which caused so many users with slow connections to start using the product that load times actually increased.

Last night, at a presentation on the websockets protocol, Gary Bernhardt made the observation that the people who designed the websockets protocol did things like using a variable length field for frame length to save a few bytes. By contrast, if you look at the Alexa top 100 sites, almost all of them have a huge amount of slop in them; it’s plausible that the total bandwidth used for those 100 sites is probably greater than the total bandwidth for all websockets connections combined. Despite that, if we just look at the three top 35 sites tested in this post, two send uncompressed javascript over the wire, two redirect the bare domain to the www subdomain, and two send a lot of extraneous information by not compressing images as much as they could be compressed without sacrificing quality. If you look at twitter, which isn’t in our table but was mentioned above, they actually do an anti-optimization where, if you upload a PNG which isn’t even particularly well optimized, they’ll re-encode it as a jpeg which is larger and has visible artifacts !

“Use bcrypt” has become the mantra for a reasonable default if you’re not sure what to do when storing passwords. The web would be a nicer place if “use webpagetest” caught on in the same way. It’s not always the best tool for the job, but it sure beats the current defaults.

Appendix: experimental caveats

The above tests were done by repeatedly loading pages via a private webpagetest image in AWS west 2, on a c4.xlarge VM, with simulated connections on a first page load in Chrome with no other tabs open and nothing running on the VM other than the webpagetest software and the browser. This is unrealistic in many ways.

In relative terms, this disadvantages sites that have a large edge presence. When I was in rural Montana, I ran some tests and found that I had noticeably better latency to Google than to basically any other site. This is not reflected in the test results.

All tests were run assuming a first page load, but it’s entirely reasonable for sites like Google and Amazon to assume that many or most of their assets are cached. Testing first page load times is perhaps reasonable for sites with a traffic profile like mine, where much of the traffic comes from social media referrals of people who’ve never visited the site before.

A c4.xlarge is a fairly powerful machine. Today, most page loads come from mobile and even the fastest mobile devices aren’t as fast as a c4.xlarge; most mobile devices are much slower than the fastest mobile devices. Most desktop page loads will also be from a machine that’s slower than a c4.xlarge. Although the results aren’t shown, I also ran a set of tests using a t2.micro instance: for simple sites, like mine, the difference was negligible, but for complex sites, like Amazon, page load times were as much as 2x worse. As you might expect, for any particular site, the difference got smaller as the connection got slower.

Firefox, IE, and Edge often have substantially different performance characteristics from Chrome. For that matter, different versions of Chrome can have different performance characteristics. I just used Chrome because it’s the most widely used desktop browser, and running this set of tests took over a full day of VM time with a single-browser.

The simulated bad connections add a constant latency and fixed (10%) packetloss. In reality, poor connections have highly variable latency with peaks that are much higher than the simulated latency and periods of much higher packetloss than can last for minutes, hours, or days.

Appendix: irony

The main table in this post is almost 50kB of HTML (without compression or minification); that’s larger than everything else in this post combined. That table is curiously large because I used a library (pandas) to generate the table instead of just writing a script to do it by hand, and as we know, the default settings for most libraries generate a massive amount of bloat. It didn’t even save time because every single built-in time-saving feature that I wanted to use was buggy, which forced me to write all of the heatmap/gradient/styling code myself anyway! Due to laziness, I left the pandas table generating scaffolding code, resulting in a table that looks like it’s roughly an order of magnitude larger than it needs to be.

Thanks to Leah Hanson and Jason Owen for comments/corrections

本文前端(javascript)相关术语:javascript是什么意思 javascript下载 javascript权威指南 javascript基础教程 javascript 正则表达式 javascript设计模式 javascript高级程序设计 精通javascript javascript教程

主题: ChromeCSSHTML3G2GFirefox
分页:12
转载请注明
本文标题:A large fraction of the web is unusable for people on slow connections
本站链接:http://www.codesec.net/view/531320.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 前端(javascript) | 评论(0) | 阅读(31)