Skip to content
Category

Web scraping

page 1
Wireshark
Wireshark is free and open-source packet analyzer software. It is used for computer network analysis and troubleshooting, software and communications protocol development, and education. Originally named Ethereal, the project was renamed Wireshark in May 2006 due to trademark issues.
curl
cURL (pronounced like "curl", ) is a free and open source CLI app for uploading and downloading individual files. It can download a URL from a web server over HTTP, and supports a variety of other network protocols, URI schemes, multiple versions of HTTP, and proxying. The project consists of a library (libcurl) and command-line tool (curl), which have been widely ported to different computing platforms. It was created by Daniel Stenberg, who is still the lead developer of the project.
robots.txt
robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
Wget
GNU Wget (or just Wget, formerly Geturl, also written as its package name, wget) is a computer program that retrieves content from web servers. It is part of the GNU Project. Its name derives from "World Wide Web" and "get", a HTTP request method. It supports downloading via HTTP, HTTPS, and FTP.
Firebug
web development add-on for Firefox
web scraping
data scraping used for extracting data from websites
Greasemonkey
Greasemonkey is a userscript manager made available as a Mozilla Firefox extension. It enables users to install scripts that make on-the-fly changes to web page content after or before the page is loaded in the browser (also known as augmented browsing).
Scrapy
Scrapy ( ) is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler. It is currently maintained by Zyte (formerly Scrapinghub), a web-scraping development and services company.
HTTrack
HTTrack is a free and open-source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3.
Apache Camel
open source framework for enterprise integration
Anubis
anti-web scraping software
Yahoo! Pipes
web application
iMacros
iMacros was a browser-based application for macro recording, editing and playback for web automation and testing. It was provided as a standalone application and extension for Mozilla Firefox, Google Chrome, and Internet Explorer web browsers. Developed by iOpus/Ipswitch, it added record and replay functionality similar to that found in web testing and form filler software. The macros can be combined and controlled via JavaScript. Demo macros and JavaScript code examples were included with the software. Running strictly JavaScript-based macros was removed in later versions of iMacros browser e
Watir
Watir (Web Application Testing in Ruby, pronounced water), is an open-source family of Ruby libraries for automating web browsers. It drives Internet Explorer, Firefox, Chrome, Opera and Safari, and is available as a RubyGems gem. Watir was primarily developed by Bret Pettichord and Paul Rogers.