File analysis
Glossary
Magic Bytes = beginning of the file
End Bytes = end of the file
Metadata
File Type
file <file>
Searching within files with grep / findstr
Grep
grep <option> <regex> <folder/file> | <filter>
# -e PATTERN, -E PATTERN
# -i --ignore-case : ignore case sensitivity
# -R -r, --recursive : recursive search within the directory
# -l --files-with-matches : display the file name, not the matching text
# -v --invert-match : select data that does not match the pattern
# -n --line-number : display the line number
# -o --only-matching : display only the matching portion
# -A : number of lines to display after the match
# -B : number of lines to display before the match
grep -Roni -E "<regex>"
Findstr
findstr /R /S "<regex>" .\*.*
Links
grep -Roni -E "[a-ZA-Z]{2,5}://[^]\"\<\>\^\`\{\|\}]*" ./folder/ | sort -u
http & https
grep -Roni -E "(http|https)://[^]\"\<\>\^\`\{\|\}]*" ./folder/ | sort -u
Mails
grep -Roni -E '[a-zA-Z0–9._%+-]+@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,10}' ./folder/ | sort -u
Domain extraction
grep -Roni -E "[a-zA-Z0–9._%+-]+@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,10}" ./folder/ | sort -u | grep -oi -E "@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,10}" | sort -u
IP Addresses
IPv4 :
grep -Roni -E '\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b' ./folder/ | grep -E '^0' -v | sort -u
Secrets
AWS / Google / JWT / SSH keys / API key others secrets keys :
grep -Roni -E "AKIA|AIza|eyJ|PRIVATE\sKEY|API[\s\-]KEY|ghp_|gitlab_|bitbucket_|xox[baprs]|sk_live_|sk_test_|api_key|secret_key|access_token|auth_token|password|private_key|client_secret|jwt_token|db_password|api_secret|encryption_key|app_secret|oauth_token|ssh_key|master_key|session_token|auth_key|service_account_key|refresh_token|service_account|(postgres|mysql)://"
grep -Roni -E "()"
Text Search
grep -Rnil "text-to-find-here" ./folder/
grep -Rnil "text-to-find-here" -A 3 -B 3 ./folder/
Searching in Binary Files
grep <...> ./binary.raw --binary-files=text
Regex search in JSON objects
This python program returns object ids that match a wordlist of regex. The search is restricted to a predefined list of interesting fields within the object.
import sys
import os
if len(sys.argv) < 4:
print("Usage: python search_string.py '<json_file_path>' regex_wordlist.txt fields.txt")
sys.exit(1)
files = sys.argv[1]
regex_wordlist = sys.argv[2]
interesting_fields = sys.argv[3]
with open(regex_wordlist, "r") as file: search_regex = [line.strip() for line in file]
with open(interesting_fields, "r") as file: search_fields = [line.strip() for line in file]
all_commands = []
print(search_fields, "\n\n")
for regex in search_regex :
#TODO: replace ".users[]" with you object array selector
search = "echo '#regex:"+regex+"'; cat "+files+" | jq -C '[.users[] | select( "
array = []
regex = regex.replace('\\','\\\\')
for s in search_fields:
array.append('('+s+' and ('+s+' | test("'+regex+'";"i")))')
search += ' or '.join(array)
search += ")] | .[].id'"
all_commands.append(search)
print(search)
print("\n\n#results:>")
for cmd in all_commands :
os.system(cmd)
# regex_wordlist.txt
API_KEY
pass[a-ZA-Z0-9]+
ADMIN[0-9]{2,3}
# fields.txt
.id
.value
.info.comment
.options[0].details
Data Decoding
Base64
<string> | base64 -d
cat <file> | grep -oE "[A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==" | base64 -d
Hexadecimal
echo "<hexadecimal>" | xxd -r -p > output
Data Extraction
Office Suite Files
Microsoft Office files use the OLE2 format. OOXML documents (.docx, .xlsm, etc.) supported by MS Office use zip compression to store content. Macros embedded in OOXML files are stored in the OLE2 binary file found in the zip archive.
OLE2 Objects
An OLE (Object Linking and Embedding) object is an external file (document, graphic, or video) created using an external application and inserted into another application.
# lister les flux OLE2
oledump <file>
# Estraction du flux <s>
oledump -s <s> -v <file>
RTF Objects
RTF documents do not support macros but can contain embedded files as OLE1 objects
rtfdump <file>
# lister les groupes dans le fichier
rtfdump <file> -f O
# extraire l'objet du groupe <g>
rtfdump <file> -s 5 -H -d > out.bin
Magic Bytes : 0x255044462D
= %PDF-
End Bytes: 0x49454E44
= IEND
🗎 Structure
Understanding PDF Files A PDF file consists of objects linked together by a dictionary.
Scanning the Object Dictionary
pdfid <file>
Searching for Malformed Objects
peepdf -fl <file>
Compressed Archives
PKZIP / APK
Magic Bytes : 0x504B
= PK
Magic Bytes (archive vide) : 0x504B0506
Tools : unzip
, apktool
GZIP
Magic Bytes : 0x1F8B
Tools : unzip
TAR
Magic Bytes : 0x7573746172
Sometimes, you can list files in a ZIP archive even if it is encrypted.
Image Files
Extracting Image Properties
exiftool <image>
Data found after the End Bytes is ignored by most image viewers.
JPEG , JPG
Magic Bytes : 0xFFD8FFE0
End Bytes: 0xFFD9
PNG
Magic Bytes : 0x89504E470D0A1A0A
= .PNG.
End Bytes: 0x49454E44
= IEND
.
Checking PNG File Integrity
pngcheck <img>
pngcheck -v -f <img>
Excutables
MS-DOS, OS/2 or MS Windows
Magic Bytes : 0x4D5A
= MZ
Magic Bytes : 0x5A4D
= MZ
ELF
Magic Bytes : 0x7F454C46
= .ELF
File Recovery / Carving
sudo foremost -v -q -i <file/data> -o <output/directory> #quick mode
sudo foremost -v -i <file/data> -o <output/directory>
sudo photorec <file/data>