Penetration testers can use theHarvester, a Python-developed program similar to sublist3r. theHarvester collects data on emails, sub-domains, hosts, employees, open ports, and banners from a variety of public sources, including search engines, such as Google, Bing, Baidu, DuckDuckGo, Twitter, etc.
Anybody who wants to know what an attacker can observe about the organization as well as for passive reconnaissance can utilize this tool.
We cover the Harvester installation and basic use in this tutorial for Ethical Hacking and Penetration Testing purposes only, giving you a good taste of what this amazing tool can do in the intelligence-gathering process.
What is theHarvester?
The capacity of theHarvester to extensively scrape publicly accessible information, such as business names and email addresses from all across the Internet, makes it one of the best command-line tools currently available.
Although DNS brute force attacks and snapshots of subdomains are feasible when used as part of active reconnaissance, it is primarily utilized for passive surveillance.
TheHarvester uses a variety of techniques to gather information on its targets, including DNS reverse lookups, hostnames, search engine Dorking, dictionary enumerations to perform DNS brute force attacks, and DNS reverse lookups.
You can download theHarvester from its official repository on GitHub, and it requires Python 3.7+.
- Users have reported mixed success using Docker to make the Harvester available on Windows due to dependency systems.
- The latest 2022.4 releases of Kali Linux preinstall theHarvester.
You can use the command to install it in other Linux OS.
sudo apt-get install theharvester
If this doesn’t work, then you can use commands to clone the Git hub repository.
git clone https://github.com/laramies/theHarvester.git
The directory where you copied theHarvester GitHub should now include the following commands.
sudo apt-get install python3-pip
Install the prerequisites by going to the cloned directory now, and then install theHarvester with the help of python3.
sudo pip3 install -r requirements.txt
sudo python3 ./theHarvester.py
theHarvester Syntax and Options
The -h option supports and activates the following command-line syntax:
theHarvester [-h] -d DOMAIN [-l LIMIT] [-S START] [-g] [-p] [-s] [--screenshot SCREENSHOT] [-v] [-e DNS_SERVER] [-t DNS_TLD] [-r] [-n] [-c] [-f FILENAME] [-b SOURCE]
Using theHarvester to Scan
TheHarvester will add pauses between requests to avoid discovery because search engines dislike data scraping.
By limiting the number of return results, it is desirable to reduce the number of searches. This will also allow you to work with a list that is easier to manage.
Your scan data may be saved in HTML and XML formats using theHarvester.
NOTE: If no parameters are entered when doing a scan, theHarvester will return zero  results.
TheHarvester scan in the above sample returned No IPs, emails, or hosts detected. The scan settings were incorrectly entered, which is the cause.
How to use theHarvester
Note: TheHarvester must receive one of the following domain [-d], source [-b], or limit [-l] arguments in order to create a scan.
The domain or company name to be searched is specified using the -d parameter.
The search engine chosen by the -l parameter’s maximum number of searches is specified (s).
The source search engine for your query, such as Google, Bing, Yahoo, etc., is specified by the -b parameter.
IMPORTANT: The “all” parameter has been deleted from the list of the Harper optional parameters.
WARNING: Increasing your search limit can cause Google or other search engines to temporarily suspend your account. The first 500 search results will be examined by theHarvester because the default search LIMIT setting is 500.
The default setting will scan 50 pages because Google shows 10 search results on each page.
For Google [or any other search engine], that is more than enough to recognize you as a scraper and either display a Captcha warning or ban your results.
To get over this restriction, it is strongly advised to use a proxy service [like Storm Proxies] that routes all requests via a different IP address.
Let’s use LinkedIn as a source and run the following command to do a scan for the hackreveal.com domain with a limit set to 50:
theHarvester -d hackreveal.com -l 50 -b linkedin
According to the screenshot below, the search result for “hackreveal.com” revealed 75 LinkedIn references.
One of the greatest OSINT tools available, theHarvester is simple to use and essential to have in your toolbox for either aggressive or passive reconnaissance.