This course teaches more than just reverse engineering because as a malware analyst you need a variety of other skills. You will learn how to classify samples into malware types, how to identify malware families and how to determine file verdicts like clean, malicious, potentially unwanted programs, junk, grayware, or corrupt. Additionally, you will learn how malware persists, how to identify malicious autostart entries and clean infected systems.
The course aims to dispel common myths such as "trojan in a detection name means the file is a trojan horse" or "antivirus detection names are a malware classification".
This course teaches more than just reverse engineering because as a malware analyst you need a variety of other skills. You will learn how to classify samples into malware types, how to identify malware families and how to determine file verdicts like clean, malicious, potentially unwanted programs, junk, grayware, or corrupt. Additionally, you will learn how malware persists, how to identify malicious autostart entries and clean infected systems.
The course aims to dispel common myths such as "trojan in a detection name means the file is a trojan horse" or "antivirus detection names are a malware classification".
As a malware analyst with experience working at an antivirus company since 2015, I have trained many beginners in the field. I understand the usual pitfalls and the concepts that you need to grasp to become proficient. I focus on building strong foundations that make you flexible in the face of new malware advancements, rather than providing shortcuts with step-by-step recipes.
I will teach you how to differentiate between different types of files, including installers, wrappers, packed files, non-packed files, hybrid, and native compiled files. You will learn which tools to apply in which situations and how to analyse samples efficiently. To do that I give you example approaches that work for most situations.
This course is ideal for you if you already have some IT background, such as hobby or professional programmers, computer enthusiasts, administrators, computer science students, or gamers with an interest in the inner workings of software or IT security.
If you have a strong interest in the topic but lack the necessary IT background, I recommend that you learn programming first. Please refer to the course requirements for more information.
Tools
All the tools and web services that we use during the course are free:
Ghidra
x64dbg
VirtualBox
SysInternals Suite
PortexAnalyzer CLI and GUI
VirusTotal (without account)
Speakeasy by Mandiant
API Monitor
CyberChef
EXIFTool
Meld
VBinDiff
AnalyzePESig
DnSpy
C# Online Compiler programwiz
TriD
Detect-it-Easy
ReNamer
7zip
Notepad++
HxD
Malpedia
lnk_parser
Requirements
You should have a strong understanding of at least one programming language, such as Python, C, C++, Java, or C#. This is a crucial requirement for the course, not only because we create small scripts during the course but because reverse engineering needs an understanding of software as foundation. The specific language does not matter, as you cannot learn every language you may encounter during analysis anyways. The concepts of programming must be clear, though.
If you are not there yet, you should not buy this course and start learning C instead. C is great because it is low-level and will integrate well with x86 assembly language.
Additionally, you must be able to read (not write) x86 assembly to understand everything in the course. Without assembly you will only be able to understand two-thirds of the content. So if you consider starting this course right away and learning assembly alongside it, that should work fine.
During this course we look at samples that use the following execution environments:
x86, x64 assembly
.NET
Batch
PowerShell
Nullsoft scripts
However, you do not need to learn all of these languages. Because an analyst encounters new languages all the time, your skillset is rather in using the available documentation, manuals and help provided for those environments and languages. I also show you during the course how to use the documentation for ,e.g., PowerShell.
Out of scope
Malware analysis is a broad field, so there are inevitably topics that I will not teach during this course because they would rather require their own course. Some of these topics are: assembly language, programming, how computers work, URL and website analysis, networks, analysis of malware for other platforms than Windows, mobile malware, IoT malware.
Course overview and requirements
The general process of analysing files and samples with the purpose of creating analysis reports and a verdict.
Overview to building your analysis lab and how to proceed if you already have a VM.
Download links for VirtualBox, Defender Remover, course samples and Windows 10 Evaluation copy
Install a VirtualBox VM with Windows 10 Pro.
Add convenience features to VirtualBox
Remove Windows Defender from the malware lab to ensure smoothless malware analysis. Adjust the view settings to see hidden files and system files.
How we handle potential malware samples and make sure that you stay safe.
We create a shared folder setup that allows to move files to and from the host and the guest system while also preventing infections to spill over to the host.
We change the access rights so that the files in shared folders cannot execute.
We make sure that worms do not infect your home network and execute the first sample in the lab. Snapshots make it possible to set the VM back to a clean state after each malware execution.
We disable Windows updates in the VM
Summary of the lab setup and safety instructions when dealing with suspicious and malicious files and URLs.
Understand why triage is an important analysis step and what purposes it serves
Tools and links that we use for this section.
We determine the file type of 5 different files using TrID and Detect-it-Easy.
What a file type actually is and how it is different from file extensions. Can files have no file type or several file types at once?
After figuring out a file type, we look for a format specification which is an official documentation of the data layout. What information are we looking for when reading specifications? How do we find the relevant parts of the sample?
The lesson also explains how to deal with some quirks of Windows shortcuts.
Coming from a malware analyst who works for an antivirus company: What are antivirus detection names really? Who creates them? How are they different from Caro naming conventions?
We cover also:
Current naming schemes of antivirus vendors
Default values
Basic components
To interpret antivirus detection names correctly, we must be able to:
distinguish specific from unspecific detection names
know what certain keywords mean
identifiy names that describe antivirus detection technologies
understand that detection names are not a malware classification
understand why "Trojan" does not mean trojan horse
We put our knowledge about antivirus detection names to the test and interpret the detection names for our LNK sample. We find a candidate for the malware family and alias names of the family on malpedia.
Actual analysis of the sample's code. You learn how shortcut worms work and why you should not copy shortcut arguments from the properties window.
Full analysis solution for a second sample. You also learn how to update PowerShell Help and interpret unknown PowerShell commands.
Binaries do not only contain the code that the developer has written. What other code is there?
Wrappers create files that carry the whole execution environment with them. How do we identify the used wrapper and how do we extract embedded files?
All tools and links we need for the labs in this section.
Triage: We use trid.exe and Detect-it-Easy, but none of these tools can detect the wrapper correctly. How do we find out the wrapper anyways?
We also check the file's detection names and behavior on VirusTotal to get an idea how to extract it.
Learn how to use Sysinternals Process Monitor and Process Explorer, how to add proper filters to monitor the file in action. The wrapped file unpacks the payload into TEMP but deletes it faster than we can copy it. So we apply ACLs that prevent deletion operations in TEMP.
Now that we got the payload, we analyse the code in Notepad++. We discover that it creates a PowerShell script that we did not extract so far. We modify the payload so that it creates the PowerShell script for us without deleting or executing it.
ACLs are not always a working solution, so we use APIMonitor this time to extract the payload. Learn how to set up APIMonitor to log API calls and how to set breakpoints.
What are installers? What is their structure? How can we extract installation scripts and embedded files from installers?
We identify an NSIS (Nullsoft Scriptable Install System) installer and extract the first layer of this sample as well as the install script.
The second layer is another installer: A 7zip self-extracting archive. We extract the contained files.
We find out how 7zip self-extracting archives are built up and extract the configuration of the second layer sample, so that we know what file is executed by the installer.
After unpacking the 7zip SFX we got a lot of files. We learn how to use PowerShell commands to run trid.exe and Detect-it-Easy on all of the files and print a report. That way we can determine interesting samples.
What are Auto Start Extensibility Points (ASEPs) and how are they used for malware persistence?
The Windows registry is crucial to understand malware persistence on Windows.
Topics covered in this lecture:
structure of a registry entry
root keys and links between root keys
registry hives
value data types and what they are used for
New ASEPs appear all the time. How do we find out things on our own? We reverse how service creation and deletion works by using monitoring tools and sc.exe, thus, find out what we need to do to remove a malicious service.
We use disinfector_trainer to train system disinfection. The first scenario applies Run keys and IFEO. We remove the entries using Autoruns Sysinternals and regedit.exe.
We use disinfector_trainer to train system disinfection. We remove persistence via RunOnce keys, Active Setup, Scheduled tasks and Windows shortcuts.
What is the Portable Executable format? What means endianness?
The Portable Executable format explained.
Tool and specification links for PortexAnalyzerGUI, DnSpy and the PE specification.
We examine a file with a Portable Executable viewer, namely, PortexAnalyzer. You learn how to interpret values in the MS DOS stub, the COFF file header and when timestamps are inaccurate or wrong.
We examine a file with a Portable Executable viewer, namely, PortexAnalyzer. You learn how to interpret values in the Optional Header and the section table.
We examine a file with a PortexAnalyzer and Resource Hacker. We look at resources, debug data and imports of a PE file, learn, what icon groups are, what version information is and what the imphash is.
We look at file format anomalies with PortexAnalyzer and create a visualization of the file that shows the byteplot, entropy and PE layout of a specific file.
Language processor types (decompilors, interpreters, hybrid compilers) and how they influence our tools of choice when we reverse engineer samples.
Triage of a hybrid-compiled file, a .NET assembly. How .NET works.
Decompilation of .NET assemblies. You learn the basics of using DnSpy: assembly explorer, decompilation options, assembly meta data, finding main, when to disassemble .NET into IL code instead of using decompilation.
You learn the basics of using DnSpy: searching referenced strings in code, finding the developer's code for a file that contains auto generated GUI code, how to view .NET resources
An introduction to the section contents
Analysis types and when to use them: static analysis, dynamic analysis, meta inspection, code inspection.
Understand verdicts that malware analysts give to files as a result of an analysis. What does each verdict mean and when should they be used?
How do you know if a file is clean? We discuss challenging cases and what options you have to determine the verdict.
We discuss tools for binary diffing and certificate analysis.
Download links for the tools we need in the lab.
Scenario: A known software publisher provides a download hash for their file. There are two download locations and for one of them the file hash is different. Is this a case of a maliciously patched software? How do we find the difference between those files?
How do we identify certificate manipulation in files?
The basics of certificate structure in PE files and which areas of the file are used to calculate the authentihash.
What is strict signature verification and how can we enable it to combat CVE-2013-3900?
Detection names of antivirus software have key words that indicate certain verdicts. What are these key words and what do they mean?
Introduction to writing analysis reports. We look at two types of reports: responding to an antivirus submission and a technical analysis blog article. What components should be added to such reports?
How we classify malware into types, families, variants, ...
What is a malware type and which types describe malware propagation? What is a trojan and why is it not a good term to be used for malware types?
Which types describe payload behavior? How do we determine the malware type if several types fit? What misunderstandings are there about certain malware types?
How can we identify a malware family? What information resources and strategies help us to do that?
Tools and links for the lab
We analyse the main code of a .NET malware and determine the malware type.
The next malware stage is hidden in an image. We use exiftool to extract the hidden and still encrypted data. Meanwhile we take notes for our analysis report.
We use CyberChef to decrypt the final stage and we finalize our report.
We use Obsidian, a free markdown editor, to put the analysis notes into a format that is directly useable for blog articles. I provide some tips on blog article writing, e.g., how to prevent your blog from being detected by antivirus software but still showcase malware code from your analysis.
What is Ghidra? What do we learn in this section?
The download location for Ghidra
How to install Ghidra on your VM
Creating new projects, importing files, autoanalysis of files in Ghidra
Ghidra windows and they are for: Listing window, decompiler, function graph. program trees
Also: Adjusting the layout of windows and components, basic navigation, different comment types, renaming variables and functions, fragments
Ghidra windows and they are for: Symbol tree, data type manager, console scripting, defined strings, function call graph
Also: Imports, exports, namespaces, functions, classes, entry point
A short introduction to this section.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.