Skip to main content
Penetration Testing
CHAPTER 05

Information Gathering and Reconnaissance

Updated: May 15, 2026
25 min read

# CHAPTER 5

Information Gathering and Reconnaissance

1. Introduction

In Hollywood movies, hackers smash their keyboards for 30 seconds and instantly breach the mainframe. In reality, a professional penetration test is 80% research and 20% exploitation. You cannot attack what you do not know exists. The very first phase of the Security Assessment Lifecycle is Information Gathering (also known as Reconnaissance). In this chapter, we will explore Open Source Intelligence (OSINT), learning how to passively extract vast amounts of data about a target organization—its infrastructure, its employees, and its hidden subdomains—without ever directly interacting with their servers.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define OSINT (Open Source Intelligence).
  • Differentiate between Passive and Active Reconnaissance.
  • Utilize public WHOIS databases to find domain ownership.
  • Discover hidden subdomains using tools and search engines.
  • Identify the security risks of oversharing on corporate social media.

3. Beginner-Friendly Explanation

Imagine you are planning to rob a highly secure museum (for an authorized security test, of course).
  • Active Reconnaissance: You walk up to the front door of the museum and start jiggling the handle to see if it's locked. The security guard sees you, writes down your description, and triggers an alarm.
  • Passive Reconnaissance (OSINT): You sit at a coffee shop across the street. You look up the museum's blueprints at the public library. You search LinkedIn to find out who the night security guard is. You read a public forum where an ex-employee complains that the back door's lock has been broken for months.

You just learned exactly how to break into the museum without ever touching the building. This is OSINT.

4. Passive vs. Active Reconnaissance

  • Passive: Gathering data from third-party public sources (Google, LinkedIn, public databases). The target has no idea you are investigating them.
  • Active: Sending network packets directly to the target's servers (Ping sweeps, Port scans). The target's firewall will log your IP address. *Active recon requires written authorization.*

5. Open Source Intelligence (OSINT) Techniques

1. WHOIS Lookups: Every registered domain has a public record detailing who owns it, their email address, and the server names.
bash
12
# In the Kali Linux terminal:
whois example.com

2. Subdomain Enumeration: A company might secure their main site (company.com), but forget to secure their hidden development site (dev-testing-portal.company.com). We can use tools or search engines to find these.

  • *Google Dorking:* Typing site:company.com -www into Google asks Google to show you all pages associated with company.com *except* the main www page, revealing hidden subdomains.

3. The Harvester: A Kali Linux tool that scrapes Google, LinkedIn, and public databases to find email addresses of employees.

bash
1
theHarvester -d example.com -b google

6. Mini Project: Perform OSINT Research Safely

*Ethical Note: Passive reconnaissance using public search engines is legal. However, do not use automated tools against companies aggressively. We will use a safe, designated target.*

Step-by-Step Walkthrough:

  1. 1. Open a web browser and go to hunter.io (a public email finding service).
  1. 2. Type in a large, public organization's domain name. Notice how it instantly reveals the email format (e.g., first.last@company.com).
  1. 3. The Security Impact: If an attacker knows the email format, they can go to LinkedIn, find 50 employee names, and instantly generate a list of 50 valid company email addresses. They now have half of the login credentials required to breach the corporate VPN.

7. Real-World Scenarios

A penetration testing firm was hired to breach a bank. The bank's external firewalls were flawless. During the OSINT phase, the testers searched social media. They found an Instagram post from a new IT employee celebrating their first day. In the background of the photo, written on a whiteboard, was the default password for the company's internal Wi-Fi. The testers drove to the bank's parking lot, connected to the Wi-Fi using the password from the photograph, bypassed the external firewalls entirely, and gained access to the internal network.

8. Best Practices

  • Google Dorks (Advanced Operators): Penetration testers heavily utilize "Google Dorks." By typing site:example.com filetype:pdf, you force Google to only return PDF documents hosted by that company. Often, companies accidentally expose sensitive internal documents, network diagrams, or confidential memos to public search engines.

9. Security Recommendations

  • Metadata Scrubbing: When a company uploads a PDF or a Word document to their public website, the file contains hidden "Metadata." This metadata often reveals the exact version of Microsoft Word used, the internal username of the employee who created it, and internal network file paths. Defenders must utilize metadata scrubbing tools before publishing documents to the web to prevent OSINT data leakage.

10. Troubleshooting Tips

  • OSINT Tools Failing: If command-line tools like theHarvester fail to return results, it is usually because search engines (like Google or Bing) have detected automated scraping and temporarily blocked your IP address (Rate Limiting). Wait a few minutes, or use a VPN.

11. Exercises

  1. 1. Explain the fundamental difference between Active Reconnaissance and Passive Reconnaissance (OSINT).
  1. 2. How can an attacker weaponize a list of employee names found on a corporate LinkedIn page?

12. FAQs

Q: Is OSINT illegal? A: No. By definition, Open Source Intelligence relies on information that is publicly available on the internet. Browsing public websites, reading public DNS records, and searching Google is legal. However, using that gathered information to attempt to guess a password (Exploitation) is highly illegal without authorization.

13. Interview Questions

  • Q: Describe the methodology of "Google Dorking." How can advanced search operators be utilized during the reconnaissance phase to identify exposed, sensitive corporate data?
  • Q: Explain the operational intelligence value of Subdomain Enumeration during a Black-Box penetration test. Why are subdomains often more vulnerable than the primary apex domain?

14. Summary

In Chapter 5, we learned that a successful penetration test is won before a single hacking tool is ever launched. We explored the vast landscape of Open Source Intelligence (OSINT), realizing that the internet is a permanent record of corporate data, exposed subdomains, and employee behavior. We differentiated between the silent observation of Passive Reconnaissance and the loud probing of Active Reconnaissance. By understanding how to query WHOIS databases, enumerate hidden infrastructure, and leverage social media data, we built the intelligence profile necessary to identify the weakest links in an organization's perimeter.

15. Next Chapter Recommendation

We have gathered the intelligence from afar. Now, with strict authorization, we must actively scan the target's servers to find exactly which doors are unlocked and vulnerable. Proceed to Chapter 6: Vulnerability Assessment Fundamentals.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·