Category
page 1Free web crawlers
curl
cURL (pronounced like "curl", ) is a free and open source CLI app for uploading and downloading individual files. It can download a URL from a web server over HTTP, and supports a variety of other network protocols, URI schemes, multiple versions of HTTP, and proxying. The project consists of a library (libcurl) and command-line tool (curl), which have been widely ported to different computing platforms. It was created by Daniel Stenberg, who is still the lead developer of the project.
Wget
GNU Wget (or just Wget, formerly Geturl, also written as its package name, wget) is a computer program that retrieves content from web servers. It is part of the GNU Project. Its name derives from "World Wide Web" and "get", a HTTP request method. It supports downloading via HTTP, HTTPS, and FTP.

YaCy
YaCy (pronounced “ya see”) is a free distributed search engine built on the principles of peer-to-peer (P2P) networks, created by Michael Christen in 2003. The engine is written in Java and distributed on several hundred "YaCy-peer" computers, .

HTTrack
HTTrack is a free and open-source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3.
Nutch
open source web crawler software
libwww
Libwww is an early World Wide Web software library providing core functions for web browsers, implementing HTML, HTTP, and other technologies. Tim Berners-Lee, at the European Organization for Nuclear Research (CERN), released libwww (then also called the Common Library) in late 1992, comprising reusable code from the first browsers (WorldWideWeb and Line Mode Browser).
Heritrix
Heritrix is a web crawler designed for web archiving. It was originally written in collaboration between the Internet Archive, National Library of Norway and National Library of Iceland. Heritrix is available under a free software license and written in Java. The main interface is accessible using a web browser, and there is a command-line tool that can optionally be used to initiate crawls.