An Investigation of Documents from the World Wide Web - http://www.paulaoki.com/papers/www5-color.pdf Paper by Woodruff, Aoki, Brewer, Gauthier, and Rowe describing their anaylsis of over 2.6 million HTML documents collected by thre Inktomi Web crawler. The authors examined many characteristics of these documents, including size, number and types of tags and attributes, file extensions, and links.
Help build the largest human-edited
directory on the web.