If you’re reading this, you’ve probably used tools like Ffuf or Gobuster to fuzz an application to expand the attack surface and potentially find sensitive files and directories. Unfortunately, we here at Brackish find that a lot of testers are doing fuzzing incorrectly. Read below to get our Rules of Fuzzing and even a generic script to create a custom wordlist, if you don’t want to use the tools that are widely available.
Brackish Fuzzing Rules
- CREATE A CUSTOM WORDLIST: Don’t just use the default wordlists in Kali or clone SecLists and be done with it. There are many, many prebuilt tools out there to do this – one that you probably know already is Cewl. Brackish has a custom Golang tool that uses Natural Language Processing and AI/ML to create custom wordlists based off a client’s external assets, industry, and other parameters. This is what we use in our ASM/CPT tool Pincher.
- FUZZ ALL DIRECTORIES: Don’t just fuzz / and be done. Also, ensure you fuzz endpoints like /static. You may think that nothing can hide in there, but Brackish testers have found server-side source code and leaked credentials in these directories.
- USE DIFFERENT HTTP METHODS: You probably just fuzz with GET requests, right? Well, try some other HTTP methods like POST and PUT.
- FUZZ RECURSIVELY: Use tools that can fuzz recursively and put them to work.
- FAMILIARIZE YOURSELF WITH DIFFERENT TOOLS: There are a plethora of fuzzing tools out there and they all have their nuances and different functionality. Also, don’t forget you can just simply fuzz in Burp Suite, which is handy for authenticated testing.
- USE CANARIES AND PAYLOADS: Fuzz with Blind XSS, SSRF, and RCE payloads or canary tokens. You never know where your payload will show up.
- USE PROPER FILE EXTENSIONS: Why are you fuzzing a Java web application with a wordlist containing .php files?
Sample Custom Tool
Here is a very basic Python script that can be used to gather a wordlist from a file containing links, one per line.
#!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import threading
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm
import logging
# Download NLTK data
nltk.download('punkt', quiet=True)
nltk.download('punkt_tab', quiet=True)
nltk.download('stopwords', quiet=True)
stop_words = set(stopwords.words('english'))
logging.basicConfig(filename='errors.log', level=logging.ERROR)
wordlist_lock = threading.Lock()
wordlist = []
headers = {
'User-Agent': (
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/107.0.0.0 Safari/537.36'
)
}
def process_url(url):
try:
response = requests.get(url, timeout=3, headers=headers)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
for element in soup(['script', 'style']):
element.decompose()
text = soup.get_text(separator=' ', strip=True)
tokens = word_tokenize(text)
tokens = [
word.lower() for word in tokens
if word.isalpha() and word.lower() not in stop_words
]
with wordlist_lock:
wordlist.extend(tokens)
except Exception as e:
logging.error(f"Error processing {url}: {e}")
def main():
with open('links.txt', 'r') as file:
urls = [line.strip() for line in file if line.strip()]
max_threads = min(30, len(urls))
progress_bar = tqdm(total=len(urls), desc="Processing URLs")
futures = []
try:
with ThreadPoolExecutor(max_workers=max_threads) as executor:
for url in urls:
future = executor.submit(process_url, url)
futures.append(future)
for future in as_completed(futures):
progress_bar.update(1)
except KeyboardInterrupt:
print("\nScript interrupted by user. Saving collected data...")
finally:
progress_bar.close()
unique_words = list(dict.fromkeys(wordlist))
with open('wordlist.txt', 'w') as f:
for word in unique_words:
f.write(f"{word}\n")
print("Wordlist has been saved to 'wordlist.txt'.")
if __name__ == '__main__':
main()
While you could just use this script as is, most likely, it will be far from optimal. We suggest you think of ways to enhance this. For example, what if you added some OpenAI API calls to analyze the targets and come up with some other custom words that may be pertinent to the particular company or industry? How about you filter the wordlist more to remove entries that are likely not going to lead to discoveries. Or, what if you try to parse script tags (hint, hint)? The options are endless.
If you like this post, and want to expand the attack surface even more, consider reading about IIS Short File Name enumeration, which is also a way to fuzz and discover assets.
Happy hunting and pentesting!