Information Gathering and Reconnaissance
# CHAPTER 5
Information Gathering and Reconnaissance
1. Introduction
In Hollywood movies, hackers smash their keyboards for 30 seconds and instantly breach the mainframe. In reality, a professional penetration test is 80% research and 20% exploitation. You cannot attack what you do not know exists. The very first phase of the Security Assessment Lifecycle is Information Gathering (also known as Reconnaissance). In this chapter, we will explore Open Source Intelligence (OSINT), learning how to passively extract vast amounts of data about a target organization—its infrastructure, its employees, and its hidden subdomains—without ever directly interacting with their servers.2. Learning Objectives
By the end of this chapter, you will be able to:- Define OSINT (Open Source Intelligence).
- Differentiate between Passive and Active Reconnaissance.
- Utilize public WHOIS databases to find domain ownership.
- Discover hidden subdomains using tools and search engines.
- Identify the security risks of oversharing on corporate social media.
3. Beginner-Friendly Explanation
Imagine you are planning to rob a highly secure museum (for an authorized security test, of course).- Active Reconnaissance: You walk up to the front door of the museum and start jiggling the handle to see if it's locked. The security guard sees you, writes down your description, and triggers an alarm.
- Passive Reconnaissance (OSINT): You sit at a coffee shop across the street. You look up the museum's blueprints at the public library. You search LinkedIn to find out who the night security guard is. You read a public forum where an ex-employee complains that the back door's lock has been broken for months.
You just learned exactly how to break into the museum without ever touching the building. This is OSINT.
4. Passive vs. Active Reconnaissance
- Passive: Gathering data from third-party public sources (Google, LinkedIn, public databases). The target has no idea you are investigating them.
- Active: Sending network packets directly to the target's servers (Ping sweeps, Port scans). The target's firewall will log your IP address. *Active recon requires written authorization.*
5. Open Source Intelligence (OSINT) Techniques
1. WHOIS Lookups: Every registered domain has a public record detailing who owns it, their email address, and the server names.2. Subdomain Enumeration:
A company might secure their main site (company.com), but forget to secure their hidden development site (dev-testing-portal.company.com).
We can use tools or search engines to find these.
-
*Google Dorking:* Typing
site:company.com -wwwinto Google asks Google to show you all pages associated withcompany.com*except* the mainwwwpage, revealing hidden subdomains.
3. The Harvester: A Kali Linux tool that scrapes Google, LinkedIn, and public databases to find email addresses of employees.
6. Mini Project: Perform OSINT Research Safely
*Ethical Note: Passive reconnaissance using public search engines is legal. However, do not use automated tools against companies aggressively. We will use a safe, designated target.*Step-by-Step Walkthrough:
-
1.
Open a web browser and go to
hunter.io(a public email finding service).
-
2.
Type in a large, public organization's domain name. Notice how it instantly reveals the email format (e.g.,
first.last@company.com).
- 3. The Security Impact: If an attacker knows the email format, they can go to LinkedIn, find 50 employee names, and instantly generate a list of 50 valid company email addresses. They now have half of the login credentials required to breach the corporate VPN.
7. Real-World Scenarios
A penetration testing firm was hired to breach a bank. The bank's external firewalls were flawless. During the OSINT phase, the testers searched social media. They found an Instagram post from a new IT employee celebrating their first day. In the background of the photo, written on a whiteboard, was the default password for the company's internal Wi-Fi. The testers drove to the bank's parking lot, connected to the Wi-Fi using the password from the photograph, bypassed the external firewalls entirely, and gained access to the internal network.8. Best Practices
-
Google Dorks (Advanced Operators): Penetration testers heavily utilize "Google Dorks." By typing
site:example.com filetype:pdf, you force Google to only return PDF documents hosted by that company. Often, companies accidentally expose sensitive internal documents, network diagrams, or confidential memos to public search engines.
9. Security Recommendations
- Metadata Scrubbing: When a company uploads a PDF or a Word document to their public website, the file contains hidden "Metadata." This metadata often reveals the exact version of Microsoft Word used, the internal username of the employee who created it, and internal network file paths. Defenders must utilize metadata scrubbing tools before publishing documents to the web to prevent OSINT data leakage.
10. Troubleshooting Tips
-
OSINT Tools Failing: If command-line tools like
theHarvesterfail to return results, it is usually because search engines (like Google or Bing) have detected automated scraping and temporarily blocked your IP address (Rate Limiting). Wait a few minutes, or use a VPN.
11. Exercises
- 1. Explain the fundamental difference between Active Reconnaissance and Passive Reconnaissance (OSINT).
- 2. How can an attacker weaponize a list of employee names found on a corporate LinkedIn page?
12. FAQs
Q: Is OSINT illegal? A: No. By definition, Open Source Intelligence relies on information that is publicly available on the internet. Browsing public websites, reading public DNS records, and searching Google is legal. However, using that gathered information to attempt to guess a password (Exploitation) is highly illegal without authorization.13. Interview Questions
- Q: Describe the methodology of "Google Dorking." How can advanced search operators be utilized during the reconnaissance phase to identify exposed, sensitive corporate data?
- Q: Explain the operational intelligence value of Subdomain Enumeration during a Black-Box penetration test. Why are subdomains often more vulnerable than the primary apex domain?