INTRODUCTION This file contains various miscellaneous pieces of information gleaned by Google in July 2007, from a study using a parser based on r967 of the HTML5 specification. NUMBER OF ATTRIBUTES ON ELEMENTS The following table shows the percentage of elements with a number of attributes or less: elements with <= 0 attributes: 33.5% elements with <= 1 attributes: 66.1% elements with <= 2 attributes: 81.6% elements with <= 3 attributes: 90.2% elements with <= 4 attributes: 95.3% elements with <= 5 attributes: 98.3% elements with <= 6 attributes: 99.4% elements with <= 7 attributes: 99.8% elements with <= 8 attributes: 99.9% elements with <= 9 attributes: 100.0% The following table shows the percentage of pages whose element with the most attributes had a given number of attributes or less: max is <= 0 attributes: 0.3% max is <= 1 attributes: 0.8% max is <= 2 attributes: 3.1% max is <= 3 attributes: 6.8% max is <= 4 attributes: 12.8% max is <= 5 attributes: 27.9% max is <= 6 attributes: 52.1% max is <= 7 attributes: 70.5% max is <= 8 attributes: 83.8% max is <= 9 attributes: 90.5% max is <= 10 attributes: 95.9% max is <= 11 attributes: 97.8% max is <= 12 attributes: 99.0% max is <= 13 attributes: 99.5% max is <= 14 attributes: 99.6% max is <= 16 attributes: 99.7% max is <= 20 attributes: 99.8% max is <= 26 attributes: 99.9% ... max is <= 95 attributes: 99.99% ... max is <= 340 attributes: 99.998% ... max is <= 627 attributes: 99.9990% Note that many pages have an element with an extremely large number of attributes. In particular, many pages include a element where they have misquoted the "content" attribute, and so each keyword ends up generating its own attribute. Such pages often have hundreds or thousands of keywords. METHODOLOGY The sample data consisted of several billion documents from Google's index. An attempt was made to eliminate duplicates from the sample. Documents that did not appear, a priori, to be HTML documents (e.g. PDF documents, Flash animations, Word documents) were not included in the sample. Documents which, when parsed, resulted in DOMs with more than 65535 elements were only parsed up to the creation of the 65535th element. It is hoped that this data will help implementors of HTML5 parsers to optimise their branches to better serve actual Web content. For more information, please contact Ian Hickson . LICENSE Copyright 2007 Google, Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.