| Feb | MAR | Apr |
| 18 | ||
| 2010 | 2011 | 2012 |
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What’s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available “warts and all” for people to experiment with. We have also done some further analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.

40 Ways to Make Your Data Center More Efficient
Solving Storage for Your SMB
Exploring the Private Cloud for Your Organization
HTML5: An Introduction
Guide to Cutting Data Center Power Costs
| elements for the header row, and a foreach loop that emitted a series of | elements for each row to display in the grid. While this approach certainly works, it does lead to a bit of repetition and inflates the size of our Views.The ASP.NET MVC framework includes an HtmlHelper class that adds support for rendering HTML elements in a View. An instance of this class is available through the Html object, and is often used in a View to create action links
Preserving & Restoring Application State for Silverlight Based Windows Phone Apps
The Windows Phone 7 platform allows only one application to run at a time. For application users who switch context frequently, this can present great pain, unless the application is coded to preserve and restore application state. This article walks through the steps a Windows Phone 7 application developer needs to understand regarding preserving and restoring application state.
Windows Phone 7 Development Just Got Easier
Whether you're just getting started developing for Windows Phone 7 or just need a hand, CodeGuru has got you covered with our newest release, The Windows Phone 7 Quick Reference Card.
.NET Framework: Task Parallel Library Dataflow
Learn about the Task Parallel Library Dataflow, a new member of Microsoft's Technical Computing Initiative built on the Task Parallel Library.
Overview of the Windows Phone 7 Execution Model
Interested in developing applications that will run on Windows Phone 7? As the Windows Phone platform gains popularity amongst users, developers intending to develop applications targeting the platform should get a solid understanding of the execution model of the Windows Phone architecture.
Creating PDF Documents with ASP.NET and iTextSharp
The Portable Document Format (PDF) is a popular file format for documents. Due to their ubiquity and layout capabilities, it's not uncommon for a websites to use PDF technology. For example, an eCommerce store may offer a "printable receipt" option that, when selected, displays a PDF file within the browser. Last week's article, Filling in PDF Forms with ASP.NET and iTextSharp, looked at how to work with a special kind of PDF document, namely one that has one or more fields defined. A PDF document can contain various types of user interface elements, which are referred to as fields. For instance, there is a text field, a checkbox field, a combobox field, and more. Typically, the person viewing the PDF on her computer interacts with the document's fields; however, it is possible to
Unobtrusive jQuery Validation Using MVC3 and Razor
Sharpen your form validation skills using unobtrusive validation in MVC3 and Razor.
Creating and Using a jQuery Plug-in in ASP.NET Web Forms
Create a custom plug-in to reuse your jQuery code across web pages.
Passing Large Files in Windows Communication Foundation (WCF) using Streaming and TCP
Take advantage of Streaming and TCP for transferring large amounts of data to and from using Windows Communication Foundation (WCF).
Filling in PDF Forms with ASP.NET and iTextSharp
The Portable Document Format (PDF) is a popular file format for documents. PDF files are a popular document format for two primary reasons: first, because the PDF standard is an open standard, there are many vendors that provide PDF readers across virtually all operating systems, and many proprietary programs, such as Microsoft Word, include a "Save as PDF" option. Consequently, PDFs server as a sort of common currency of exchange. A person writing a document using Microsoft Word for Windows can save the document as a PDF, which can then be read by others whether or not they are using Windows and whether or not they have Microsoft Word installed. Second, PDF files are self-contained. Each PDF file includes its complete text, fonts, images, input fields, and other content. This means that
Working with URL Routing in ASP.NET Framework 4.0
Take advantage of URL Routing in the ASP.NET framework to optimize search engine optimization (SEO).
Unit and Load Testing With Team Foundation Server 2010
Microsoft Team Foundation Server 2010 is a versatile solution for professional software development and application lifecycle management. One of the feature sets supports automated unit testing and load testing of your .NET framework applications. Read on to learn how Microsoft Visual Studio and TFS can help you with your testing needs.
Visual Basic .NET Development 101: Learning and Using Microsoft Visual Studio 2010
Microsoft Visual Studio 2010 is a great integrated development environment for turning out top quality .NET framework applications. VB 2010 Express provides the same basic tools without some of the high-end team development pieces. This first of a two-part article will introduce the user to the VB 2010 Express product.
Microsoft Proposes a More Secure Browser
A secure browser and self-contained OS is Microsoft's answer to the common security problems that bedevil Internet Explorer.
Microsoft Won't Stop in 'Albany' After April
Microsoft kills off its subscription Office and OneCare service and gives users two and a half months before Office subscriptions self-destruct.
Microsoft Looks to 'Elevate America'
The software giant says its initiative of technology training tools and resources could help millions of people in their quest to find a better job.
Will Windows 7 Be a PC Mover?
Microsoft and its PC partners hope Windows 7 can drive PC sales, even in a down economy.
Windows 7's Worst-Kept Secret? Its Release Date
Redmond sticks to its 'by early 2010' mantra but leaks to a tech enthusiast site point to a much earlier ship date for Vista's replacement
Free C# Developer Quick Reference Guide Now Available
Register and download your free C# reference card.
40 Ways to Make Your Data Center More Efficient Managing a collection of computer systems is no easy task. But, through better management and proper planning even the most inefficient data center can change its ways. Download this eBook for 40 steps you can take to the most out of your data center and its employees.
Solving Storage for Your SMB Sponsored by EMC Many Storage solutions are aimed at large enterprises and are designed to address their concerns surrounding infomation lifecycle management and corporate compliance. But small and medium-sized businesses have storage concerns of their own. Download this Internet.com eBook for a guide to choosing a storage server and building a storage strategy for your SMB.
Exploring the Private Cloud for Your Organization One of the ways around the issues of security and control that make some businesses wary of cloud computing is to build a private cloud -- one that remains within the corporate firewall and is wholly controlled internally. Private clouds also increase the agility of IT an organization's IT infrastructure and make it easier to roll out new technology projects. Download this eBook to get the facts behind the private cloud and learn how your organization can get started.
HTML5: An Introduction HTML5 is the new standard that is expected to take over the Web. New versions of browsers are already starting to support the advanced features. Learn why HTML5 is important and discover how to use start using it today.
Guide to Cutting Data Center Power Costs There are a number of strategies you can employ to trim the amount of money you're spending on electricity for your data center. Virtualization is an increasingly popular strategy because it consolidates servers; reconfiguring your data center might improve the circulation of air; and simply giving your data center a good cleaning might also work wonders. Download this Internet.com eBook for more power-saving ideas.
Top HTML5 Tutorials from HTMLGoodies
HTML5 is an emerging technology that is slowly changing the face of the web. The latest browsers support it, and developers are eager to begin using it on their websites. HTMLGoodies features many tutorials on the topic, and we've brought them all together here for your perusal.
Getting Started with Joomla!
If you've never heard of Joomla!, it's known as a CMS (content management system), which allows you to build complex web sites and run various applications. In this article we'll look at the many options for setting up Joomla! and how to configure the software.
The Joomla! Demo Site with Cloudaccess.net
In this article, we're going to walk you through the process of installing Joomla! using Cloudaccess.net, a virtual Joomla! application host.
How Can I Select the Best Images for My Website?
Why do you want an image? Is it for a web page as a dominant image, a background or a series of images that are part of a gallery? In this tutorial we will show you how to select the best images for your website.
12 Ways to Create Compeling Graphic Banner Ads for Your Website
As with any design process, the first thing to do is to know what you want to say. In this case, you have the choice of a static, or animated banner. In this tutorial we will tell you 12 tips to help you create compelling banner ads.
Do Web Safe Colors Make a Difference?
Web Safe colors, also known as browser safe colors, were introduced many years ago, when the web was in its infancy. A guestion that comes up every now and then is whether modern web designers should be using this type of palette - or not?
How Can I Create Images for Mobile Devices?
When you think about mobile devices, the idea of creating images would seem to be a straightforward process. Unfortunately, it's not, but in this tutorial we will cut to the chase and show you how it's done!
One Video URL to Rule Them All
HTML5 is changing the landscape on how video is presented on web pages. New formats, codecs, screen sizes, devices and browsers are being developed on a continuous basis. What’s one to do? Enter Vid.ly, which provides a universal video URL.
Web Developer Basics: The HTML5 Video Element
This article begins a "mini-series" of our in-depth coverage of important new elements in HTML5 to help you create media-rich pages that will work in any compliant browser. In this session we introduce the Video element!
Web Developer Class: Creating Forms with CSS3
You only have to look as far as Twitter's relatively new interface to see that stylized text boxes are still a popular design choice for input fields and forms. In this tutorial we will teach you a few different techniques to create styled forms and input fields using CSS3.
Web Developer Basics: Using The HTML5 Canvas Element
This article continues our "mini-series" of important new elements in HTML5 to help you learn to create media-rich pages that will work in any compliant browser.
Using HTML5 Automatically For Placeholders in Email Forms--With No JavaScript
Previously, if a developer wanted to include some sample text in an email field in a form, and then make that text disappear when the user clicked within that field, they had to use some JavaScript to make it happen. Enter HTML5 and the placeholder attribute, which simplifies the whole process--using no JavaScript at all!
Web Developer Basics: Differences Between HTML4 And HTML5
Now that we've seen how to use some of the newer whiz-bang features of the draft HTML5 standard, it's time to take a few steps back and take a look at some of the other differences between HTML4 and HTML5.
HTML5 Primer: How To Use the Audio Tag
In our series on HTML5, we've discussed geolocation, link relations, form and keyboard events, media events, mouse events, global attributes and multimedia. This week we're going to expand upon our discussion on multimedia and delve further into HTML5's <audio> tag.
Add Read More Link When People Copy From Your Site Using JavaScript
You may have been to a website and went to copy some text from the page and noticed that when you pasted the text it also copied a "read more" note which included the URL of the page you copied from. In this tutorial we will show you how you can do the same thing using a tidbit of JavaScript.
Ten Awesome Things Most People Don't Know About PHP
We all know the magic tricks, we all know the cool little functions. We all know the way to make dates look cool and numbers look awesome. But there are things most people don't think about, and I'm one of them. When I started writing this article I did a little research and found a few things that will also make me change the way I use PHP, and why I use.
Web Developer Tutorial: HTML5 Microdata
The HTML 5 draft specification includes Microdata. The Microdata spec provides a standardized syntax for addition additional semantic markup to your web pages to enhance the machine readability of your web pages. This tutorial will discuss microdata and will show why you should be interested in it!
Web Developer Class: Building a Twitter Feed with PHP
In today's world of social networking there will come a time that someone asks you to do Twitter integration for their website. This tutorial will walk you through the process, and includes source code!
Web Developer Basics: Multimedia In HTML5
In the last installment of this series, we talked about link relations in HTML5. In this article, we'll see how support for various multimedia formats in HTML5 will make things much easier for you as a developer...eventually.
Firefox Extension for Web Development: Font Finder
Continuing with the series on Firefox extensions for web development, this article features the Font Finder add-on which allows developers to analyze font information on a Web page and modify it for demonstration purposes.
HP Integrates 3PAR into Cloud Platform
HP has certified 3PAR's utility storage systems with HP BladeSystem Matrix and CloudSystem. The company also introduced an iSCSI SAN blade for the BladeSystem c7000 and a new data deduplication platform.
A Quick Start Guide for Deployment to the Amazon EC2 Cloud
You've heard all the buzz about cloud computing, but how do you actually get started with a cloud serviced like Amazon EC2.
Cloud Computing vs. Grid Computing: Does It Matter for Developers?
Find out why developers should care whether the infrastructure supporting their apps is in the cloud or in a grid.
Open Source Cloud Computing Platform OpenStack Goes Commercial
The OpenStack cloud computing platform, backed by Rackspace, NASA, Cisco, Dell and others, ramps up commercial support options for cloud and enterprise users.
Salesforce.com CRM Spring 11 Release Reviewed
With new Chatter and Jigsaw integration, Salesforce.com has once again redefined online customer relationship management (CRM) with its Spring 11 release.
DEMO: Enterprise Cloud Services Debut
The aging showcase for emerging technology products and services opens a spot for cloud-oriented technologies.
HP Lands $400 Million Cloud Services Deal
Vendor secures seven-year outsourcing contract to move U.K.-based Centrica to a utility services-based private cloud environment. Deal adds to HP's $2.15 billion enterprise services haul of the past few months.
Gomez: Which Cloud Service Is the Fastest?
New survey ranks speeds of leading cloud services vendors, finding Microsoft and Google are neck and neck.
Pundit Calls Salesforce Cloud Claim 'Wishful Thinking'
Salesforce's study claims using its cloud computing leads to huge CO2 Reductions. Not everyone agrees.
Cloud Computing's Enterprise Challenges: Public, Private, Hybrid?
Keynoters at Cloud Connect conference debate public versus private clouds, but agree that cloud computing in the enterprise is more of a 'when' than 'if' proposition.
![]()
Legal Notices, Licensing, Permissions, Privacy Policy.
Whitepapers and eBooksWebcasts
Downloads and eKits |
|---|