Adding Detection Rules
This guide walks through extending the malware detection pipeline with new features, classification rules, and policy actions.
Related: ML Classifier Module ยท Policy Engine Module ยท PE Parser Module ยท Scan Pipeline Flow
1. Extension Points
The detection pipeline has four extension points, each in a separate file for clean separation:
flowchart LR
subgraph ExtensionPoints["Extension Points"]
PE["โ PE Parser\n(pe_parser.cpp)\nExtract new data"]
FE["โก Feature Extractor\n(features.cpp)\nAdd new features"]
ML["โข Classifier\n(ml.cpp)\nAdd new rules"]
POL["โฃ Policy Engine\n(policy.cpp)\nAdd new actions"]
end
PE --> FE --> ML --> POL
style PE fill:#4361ee,color:#fff
style FE fill:#e07a5f,color:#fff
style ML fill:#7209b7,color:#fff
style POL fill:#2d6a4f,color:#fff
2. Example: Adding Suspicious API Import Detection
This walkthrough adds a new feature that flags PE files importing suspicious APIs commonly used by malware (e.g., VirtualAllocEx, WriteProcessMemory, CreateRemoteThread).
Step 1: Extend ParsedFile (pe_parser.h)
Add a field to store whether suspicious imports were found:
struct ParsedFile
{
bool is64Bit = false;
DWORD sectionCount = 0;
DWORD importCount = 0;
DWORD textSize = 0;
float textEntropy = 0.0f;
vector<BYTE> textOpcodes;
// NEW: Suspicious API detection
DWORD suspiciousImportCount = 0; // Count of suspicious API imports
};
Step 2: Extract the Data (pe_parser.cpp)
Add a function to check import names against a suspicious API list. This would be called from ParsePE_CPP after ParseImports():
// List of suspicious APIs
static const char* SUSPICIOUS_APIS[] = {
"VirtualAllocEx",
"WriteProcessMemory",
"CreateRemoteThread",
"NtUnmapViewOfSection",
"SetWindowsHookEx",
"LoadLibraryA",
"GetProcAddress",
NULL
};
// Count suspicious imports in the parsed PE
static DWORD CountSuspiciousImports(PEParser& parser, BYTE* fileData, DWORD fileSize)
{
DWORD count = 0;
// Walk import name tables and match against SUSPICIOUS_APIS
// ... implementation ...
return count;
}
Step 3: Add to FeatureVector (features.h)
typedef struct {
float entropy;
int importCount;
// NEW
int suspiciousImportCount;
} FeatureVector;
Step 4: Extract the Feature (features.cpp)
void ExtractFeatures(const ParsedFile* parsed, FeatureVector* fv)
{
fv->entropy = parsed->textEntropy;
fv->importCount = parsed->importCount;
// NEW
fv->suspiciousImportCount = parsed->suspiciousImportCount;
}
Step 5: Update Classification (ml.cpp)
SCAN_RESULT Classify(const FeatureVector* fv)
{
// Original rule: high entropy + many imports
if (fv->entropy > 6.99f && fv->importCount > 10)
return SCAN_MALICIOUS;
// NEW: Suspicious APIs regardless of entropy
if (fv->suspiciousImportCount >= 3)
return SCAN_SUSPICIOUS;
return SCAN_CLEAN;
}
Step 6: Handle New Verdict (policy.cpp)
void ApplyPolicy(const wchar_t* path, SCAN_RESULT verdict)
{
switch (verdict)
{
case SCAN_MALICIOUS:
wprintf(L"[MALICIOUS] %s\n", path);
break;
// NEW
case SCAN_SUSPICIOUS:
wprintf(L"[SUSPICIOUS] %s โ flagged for suspicious API imports\n", path);
break;
case SCAN_CLEAN:
wprintf(L"[CLEAN] %s\n", path);
break;
default:
break;
}
}
3. File Modification Summary
flowchart TB
subgraph Changes["Files to Modify"]
direction TB
H1["pe_parser.h\n+ Add field to ParsedFile"]
C1["pe_parser.cpp\n+ Add extraction logic"]
H2["features.h\n+ Add field to FeatureVector"]
C2["features.cpp\n+ Map new field"]
C3["ml.cpp\n+ Add classification rule"]
C4["policy.cpp\n+ Handle new verdict"]
end
H1 --> C1
H2 --> C2
C2 --> C3
C3 --> C4
style Changes fill:#1a1a2e,color:#fff
4. Other Extension Ideas
Adding a New File Extension to Monitor
To monitor additional file types (e.g., .scr, .cpl, .ocx):
File: Windows File System Minifilter/FsMinifilter.cpp โ IsTargetExtension()
Add new extension checks after the existing .exe and .dll checks:
// Check .scr (screensaver - common malware vector)
if ((ext[1] == L's' || ext[1] == L'S') &&
(ext[2] == L'c' || ext[2] == L'C') &&
(ext[3] == L'r' || ext[3] == L'R'))
return TRUE;
Also update: FsMinifilterMonitor/main.cpp โ IsExecutableOrDll() to match.
Adding File Logging to Policy Engine
void ApplyPolicy(const wchar_t* path, SCAN_RESULT verdict)
{
// Existing console output...
// NEW: File logging
FILE* logFile = _wfopen(L"scan_log.txt", L"a");
if (logFile) {
SYSTEMTIME st;
GetLocalTime(&st);
fwprintf(logFile, L"[%04d-%02d-%02d %02d:%02d:%02d] [%s] %s\n",
st.wYear, st.wMonth, st.wDay,
st.wHour, st.wMinute, st.wSecond,
verdict == SCAN_MALICIOUS ? L"MALICIOUS" : L"CLEAN",
path);
fclose(logFile);
}
}
Adding Entry Point Analysis
Detect when the PE entry point is outside the .text section (common in packed malware):
// In ParsedFile
bool entryPointOutsideText = false;
// In PEParser::Initialize()
DWORD entryPoint = nt->OptionalHeader.AddressOfEntryPoint;
// Check if entry point RVA falls within .text section bounds
// If not, set entryPointOutsideText = true
5. Testing Your Changes
Single File Mode
The easiest way to test new rules:
- Build the scanner
- Run
scanner.exe, select mode2(Single File Scan) - Enter the path to a known test file
- Verify the output matches your expected verdict
With Known Samples
| Test Case | Expected Entropy | Expected Imports | Expected Verdict |
|---|---|---|---|
C:\Windows\System32\notepad.exe |
~6.0 | ~20 | CLEAN |
C:\Windows\System32\calc.exe |
~5.5 | ~15 | CLEAN |
| UPX-packed binary | ~7.5 | ~5 | Depends on rule |
| Custom test binary (high entropy) | ~7.2 | ~15 | MALICIOUS |
Generating Test Files
The project includes test/generate_entropy.py for creating files with specific entropy levels:
python test/generate_entropy.py
6. Best Practices
flowchart TD
subgraph DOs["โ
Do"]
D1["Add features to FeatureVector\n(not raw data)"]
D2["Keep Classify() fast\n(no I/O or network)"]
D3["Test with both clean\nand malicious samples"]
D4["Use SCAN_SUSPICIOUS\nfor uncertain verdicts"]
end
subgraph DONTs["โ Don't"]
N1["Don't add blocking logic\nwithout driver changes"]
N2["Don't access the file\ninside Classify()"]
N3["Don't remove existing\nrules without testing"]
N4["Don't forget to update\nboth kernel + monitor\nextension checks"]
end
style DOs fill:#2d6a4f,color:#fff
style DONTs fill:#e63946,color:#fff
- Keep the pipeline stages separate โ PE parsing produces data, feature extraction transforms it, classification decides, policy acts
- Never do I/O in
Classify()โ It runs on the worker thread and should be pure computation - Test with real-world samples โ The entropy threshold was tuned empirically; new rules should be validated similarly
- Use
SCAN_SUSPICIOUSfor low-confidence rules โ ReserveSCAN_MALICIOUSfor high-confidence detections - Document your rules โ Add comments explaining why specific thresholds were chosen
Next Steps
- Understand the current classifier: ML Classifier Module
- Understand what data is available: PE Parser Module
- Full pipeline context: Scan Pipeline Flow