Forensic cheatsheet ( File Analysis )

File analysis

Glossary

Magic Bytes = beginning of the file
End Bytes = end of the file

Metadata

File Type

file <file>

Searching within files with grep / findstr

Grep

grep <option> <regex> <folder/file> | <filter>
   
#    -e PATTERN, -E PATTERN
#    -i --ignore-case : ignore case sensitivity
#    -R -r, --recursive : recursive search within the directory
#    -l --files-with-matches : display the file name, not the matching text
#    -v --invert-match : select data that does not match the pattern
#    -n  --line-number : display the line number
#    -o --only-matching : display only the matching portion
#    -A : number of lines to display after the match
#    -B : number of lines to display before the match
grep -Roni -E "<regex>" 

Findstr

findstr /R /S "<regex>" .\*.*

Links

grep -Roni -E "[a-ZA-Z]{2,5}://[^]\"\<\>\^\`\{\|\}]*" ./folder/ | sort -u

http & https

grep -Roni -E "(http|https)://[^]\"\<\>\^\`\{\|\}]*" ./folder/ | sort -u

Mails

grep -Roni -E '[a-zA-Z0–9._%+-]+@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,10}' ./folder/ | sort -u

Domain extraction

grep -Roni -E "[a-zA-Z0–9._%+-]+@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,10}" ./folder/ | sort -u | grep -oi -E "@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,10}" | sort -u

IP Addresses

IPv4 :

grep -Roni -E '\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b' ./folder/ | grep -E '^0' -v | sort -u 

Secrets

AWS / Google / JWT / SSH keys / API key others secrets keys :

grep -Roni -E "AKIA|AIza|eyJ|PRIVATE\sKEY|API[\s\-]KEY|ghp_|gitlab_|bitbucket_|xox[baprs]|sk_live_|sk_test_|api_key|secret_key|access_token|auth_token|password|private_key|client_secret|jwt_token|db_password|api_secret|encryption_key|app_secret|oauth_token|ssh_key|master_key|session_token|auth_key|service_account_key|refresh_token|service_account|(postgres|mysql)://"
grep -Roni -E "()"

Text Search

grep -Rnil "text-to-find-here" ./folder/
grep -Rnil "text-to-find-here" -A 3 -B 3 ./folder/

Searching in Binary Files

grep <...> ./binary.raw  --binary-files=text

Regex search in JSON objects

This python program returns object ids that match a wordlist of regex. The search is restricted to a predefined list of interesting fields within the object.

import sys
import os 

if len(sys.argv) < 4:
    print("Usage: python search_string.py '<json_file_path>' regex_wordlist.txt fields.txt")
    sys.exit(1)

files = sys.argv[1]
regex_wordlist = sys.argv[2]
interesting_fields = sys.argv[3]
with open(regex_wordlist, "r") as file: search_regex = [line.strip() for line in file]
with open(interesting_fields, "r") as file: search_fields = [line.strip() for line in file]

all_commands = []
print(search_fields, "\n\n")

for regex in  search_regex :
        #TODO: replace ".users[]" with you object array selector
        search = "echo '#regex:"+regex+"'; cat "+files+" | jq -C '[.users[] | select( "
        array = []
        regex = regex.replace('\\','\\\\')
        for s in search_fields:
                array.append('('+s+' and ('+s+' | test("'+regex+'";"i")))')
        search += ' or '.join(array)
        search += ")] | .[].id'"
        all_commands.append(search)
        print(search)
print("\n\n#results:>")
for cmd in all_commands  :
        os.system(cmd)
# regex_wordlist.txt
API_KEY
pass[a-ZA-Z0-9]+
ADMIN[0-9]{2,3}
# fields.txt
.id
.value
.info.comment
.options[0].details

Data Decoding

Base64

<string> | base64 -d
cat <file> | grep -oE "[A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==" | base64 -d

Hexadecimal

echo "<hexadecimal>" | xxd -r -p > output

Data Extraction

Office Suite Files

Microsoft Office files use the OLE2 format. OOXML documents (.docx, .xlsm, etc.) supported by MS Office use zip compression to store content. Macros embedded in OOXML files are stored in the OLE2 binary file found in the zip archive.

OLE2 Objects

An OLE (Object Linking and Embedding) object is an external file (document, graphic, or video) created using an external application and inserted into another application.

# lister les flux OLE2
oledump <file>
# Estraction du flux <s>
oledump -s <s> -v <file>

RTF Objects

RTF documents do not support macros but can contain embedded files as OLE1 objects

rtfdump <file>
# lister les groupes dans le fichier 
rtfdump <file> -f O
# extraire l'objet du groupe <g>
rtfdump <file> -s 5 -H -d > out.bin

PDF

Magic Bytes : 0x255044462D = %PDF-
End Bytes: 0x49454E44 = IEND
🗎 Structure

Understanding PDF Files A PDF file consists of objects linked together by a dictionary.

Scanning the Object Dictionary

pdfid <file>

Searching for Malformed Objects

peepdf -fl <file>

Compressed Archives

PKZIP / APK

Magic Bytes : 0x504B = PK
Magic Bytes (archive vide) : 0x504B0506
Tools : unzip, apktool

GZIP

Magic Bytes : 0x1F8B
Tools : unzip

TAR

Magic Bytes : 0x7573746172

Sometimes, you can list files in a ZIP archive even if it is encrypted.

Image Files

Extracting Image Properties

exiftool <image>

Data found after the End Bytes is ignored by most image viewers.

JPEG , JPG

Magic Bytes : 0xFFD8FFE0
End Bytes: 0xFFD9

PNG

Magic Bytes : 0x89504E470D0A1A0A = .PNG.
End Bytes: 0x49454E44 = IEND
.
Checking PNG File Integrity
pngcheck <img>
pngcheck -v -f <img>

Excutables

MS-DOS, OS/2 or MS Windows

Magic Bytes : 0x4D5A = MZ
Magic Bytes : 0x5A4D = MZ

ELF

Magic Bytes : 0x7F454C46 = .ELF

File Recovery / Carving

sudo foremost -v -q -i <file/data> -o <output/directory> #quick mode
sudo foremost -v -i <file/data> -o <output/directory> 
sudo photorec <file/data> 

Sources

https://en.wikipedia.org/wiki/List_of_file_signatures