An Investigation of Documents from the World Wide Web - http://www.paulaoki.com/papers/www5-color.pdf Paper by Woodruff, Aoki, Brewer, Gauthier, and Rowe describing their analysis of over 2.6 million HTML documents collected by their Inktomi Web crawler. The authors examined many characteristics of these documents, including size, number and types of tags and attributes, file extensions, and links.
Help build the largest human-edited
directory on the web.