How does a Web Crawler work?

The first thing you need to understand is what a Web Crawler or Spider is and how it works. A Search Engine Spider (also known as a crawler, Robot, SearchBot or simply a Bot) is a program that most search engines use to find what’s new on the Internet. Google’s web crawler is known as GoogleBot. There are many types of web spiders in use, but for now, we’re only interested in the Bot that actually “crawls” the web and collects documents to build a searchable index for the different search engines. The program starts at a website and follows every hyperlink on each page.

Google Bot

So we can say that everything on the web will eventually be found and spidered, as the so called “spider” crawls from one website to another. Search engines may run thousands of instances of their web crawling programs simultaneously, on multiple servers. When a web crawler visits one of your pages, it loads the site’s content into a database. Once a page has been fetched, the text of your page is loaded into the search engine’s index, which is a massive database of words, and where they occur on different web pages. All of this may sound too technical for most people, but it’s important to understand the basics of how a Web Crawler works.

Continue reading

How to upload images in WordPress

When you install WordPress for first time and you want to insert images or pictures in your articles, something must be configured in your installation. Otherwise, every time you try to upload an image and insert it in your post, you will probably received the following message:  “Unable to create directory /home/domains/yourdomain.com/htdocs/wp-content/uploads/2009/09. Is its parent directory writable…

Continue reading