Today, we’d like to share some thoughts on one of the major components of any search engine, and something we’ve been focusing on a lot lately, namely the Crawler.
How does a search engine find pages to include in the database? By automatically mass-surfing the web, jumping from page to page and grabbing the information found in the process. This is done by a program called a Crawler, Bot or Spider. Entireweb’s crawler, affectionately called Speedy Spider, simultaneously runs on a large number of machines and visits tens of thousands of pages at the same time, all day, every day.
"Garbage in, Garbage out" is an old engineering proverb: To build a great search engine, you need to start at the source, namely the way you select pages to include, at what schedule you visit them, and what you do with the stuff you find on the pages (that’s what’s often called Indexing). This is a problem we’re taking very seriously.
The ideal crawler makes sure to grab every page on the net that is of any interest to human searchers, and does so as soon as it has been created or updated. Our goal is to approximate this as closely as we can.
We feel there is no point in including pages that don’t carry any tangible information. For example, many search engines will include pages that basically contains nothing more than advertisements, simply because the search engine may benefit financially from showing such ads. These pages, which include things like parked domains, scraper sites and various kinds of web-spam, do little more than dilute the search results. We’ve also made innovations to duplicate content removal, so that the information you find in our search results will be truly unique, diverse and useful.
The team at Entireweb would like to wish you a happy Halloween!
Take care and keep following us in this blog for more news.
A big thing when it comes to building a top-class search engine is the hardware that goes into the project. To be able to cope with millions of search queries every day, choosing correct hardware is a major decision.
For this new search engine we have decided to work with Dell as our hardware provider, based on past experience of both their machines but also their excellent customer support. The deal we have struck with Dell will provide us with the comfort that we need when building this search engine.

Well, as in all large scale projects, hardware is a critical thing, but we are very proud to have built a very cost effective technology, meaning that we will not need tens of thousands of servers, like some of our competitors, to reach our goal. Instead, we have developed a technology that optimizes the way we use the hardware. This will also minimize the environmental impact of running the search engine, a very important goal for us.
Building what we think will be the best search engine in the world, with a very large search index, will primarily mean that we will expand our current data center here in Halmstad, Sweden, with a large number of Dell servers.
We look forward to share more information about our high-performance computing platform in this blog, so stay tuned!
If you’re on Facebook (isn’t everyone these days?), we invite you to join our new Facebook page where you can get updates, let us know what you think and discuss Entireweb with other members!
Show your support for Entireweb and become a fan of Entireweb at http://www.facebook.com/pages/Entireweb/156427702959.
We are overwhelmed by all the positive response we’ve received since announcing our upcoming search engine, both in this blog, but also on other places around the web. This feels great for everyone on the Entireweb team and will inspire us to work even harder to bring you the best search engine around.
We would also like to update you with some of our thoughts on the overall search experience that we’re currently working on.
An important aspect of the new search engine will be the user interface. We’re putting a lot of work into creating a modern and advanced interface that still feels sleek and with true ease of use. You’ll feel right at home from the start. We won’t change the way you search, but we’ll improve your experience and your search results!
We’ve got some great new features under development that we are really excited about, but we would also like your ideas and suggestions. Let us know by commenting this post!
Halmstad, Sweden. – October 19th, 2009
In the spring of 2010, Entireweb will launch a brand new search engine that aims to be the best search engine in the world. With more than 10 years of experience from the business, this next-generation search engine brings together feedback and comments from around the world. With unique new features and an easy-to-use interface, we promise it will be a very welcome addition to the search engine industry and a worthy competitor to the current market leaders.
From now on and up until the launch of our new search engine in the beginning of next year (2010), we will work hard to complete and perfect what we believe will be the best search engine in the world. We’d also like to get our users involved in the process – We’ve created this blog because we think it’s important and fun to share some of what goes on here with you.
In this blog, we will let you know what is happening on our side. This could include improvements to our search, new features that we develop and good first-hand looks into what goes on inside our team.
More information on this new search engine will be announced gradually in this blog (http://blog.entireweb.com) as development progresses. As always, your feedback is very important, so please keep your comments and suggestions coming!
About Entireweb
Entireweb’s goal is to be a leading supplier of search technology solutions. The international Web search engine www.entireweb.com is not only a highly popular general purpose search engine used by millions of people around the world – it is also a showcase of our search technology and our expertise in the field of ultra-high-performance information retrieval from huge unstructured data sources.
The Web search engine www.entireweb.com is a highly popular general purpose search engine used by millions of people around the world - now announcing a new search engine set to be launched in the spring of 2010.
Recent Comments