Adding Detection Rules

This guide walks through extending the malware detection pipeline with new features, classification rules, and policy actions.

1. Extension Points

The detection pipeline has four extension points, each in a separate file for clean separation:

flowchart LR
    subgraph ExtensionPoints["Extension Points"]
        PE["① PE Parser\n(pe_parser.cpp)\nExtract new data"]
        FE["② Feature Extractor\n(features.cpp)\nAdd new features"]
        ML["③ Classifier\n(ml.cpp)\nAdd new rules"]
        POL["④ Policy Engine\n(policy.cpp)\nAdd new actions"]
    end

    PE --> FE --> ML --> POL

    style PE fill:#4361ee,color:#fff
    style FE fill:#e07a5f,color:#fff
    style ML fill:#7209b7,color:#fff
    style POL fill:#2d6a4f,color:#fff

2. Example: Adding Suspicious API Import Detection

This walkthrough adds a new feature that flags PE files importing suspicious APIs commonly used by malware (e.g., VirtualAllocEx, WriteProcessMemory, CreateRemoteThread).

Step 1: Extend ParsedFile (pe_parser.h)

Add a field to store whether suspicious imports were found:

struct ParsedFile
{
    bool is64Bit = false;
    DWORD sectionCount = 0;
    DWORD importCount = 0;
    DWORD textSize = 0;
    float textEntropy = 0.0f;
    vector<BYTE> textOpcodes;
    
    // NEW: Suspicious API detection
    DWORD suspiciousImportCount = 0;  // Count of suspicious API imports
};

Step 2: Extract the Data (pe_parser.cpp)

Add a function to check import names against a suspicious API list. This would be called from ParsePE_CPP after ParseImports():

// List of suspicious APIs
static const char* SUSPICIOUS_APIS[] = {
    "VirtualAllocEx",
    "WriteProcessMemory", 
    "CreateRemoteThread",
    "NtUnmapViewOfSection",
    "SetWindowsHookEx",
    "LoadLibraryA",
    "GetProcAddress",
    NULL
};

// Count suspicious imports in the parsed PE
static DWORD CountSuspiciousImports(PEParser& parser, BYTE* fileData, DWORD fileSize)
{
    DWORD count = 0;
    // Walk import name tables and match against SUSPICIOUS_APIS
    // ... implementation ...
    return count;
}

Step 3: Add to FeatureVector (features.h)

typedef struct {
    float entropy;
    int importCount;
    
    // NEW
    int suspiciousImportCount;
} FeatureVector;

Step 4: Extract the Feature (features.cpp)

void ExtractFeatures(const ParsedFile* parsed, FeatureVector* fv)
{
    fv->entropy = parsed->textEntropy;
    fv->importCount = parsed->importCount;
    
    // NEW
    fv->suspiciousImportCount = parsed->suspiciousImportCount;
}

Step 5: Update Classification (ml.cpp)

SCAN_RESULT Classify(const FeatureVector* fv)
{
    // Original rule: high entropy + many imports
    if (fv->entropy > 6.99f && fv->importCount > 10)
        return SCAN_MALICIOUS;

    // NEW: Suspicious APIs regardless of entropy
    if (fv->suspiciousImportCount >= 3)
        return SCAN_SUSPICIOUS;

    return SCAN_CLEAN;
}

Step 6: Handle New Verdict (policy.cpp)

void ApplyPolicy(const wchar_t* path, SCAN_RESULT verdict)
{
    switch (verdict)
    {
    case SCAN_MALICIOUS:
        wprintf(L"[MALICIOUS] %s\n", path);
        break;

    // NEW
    case SCAN_SUSPICIOUS:
        wprintf(L"[SUSPICIOUS] %s — flagged for suspicious API imports\n", path);
        break;

    case SCAN_CLEAN:
        wprintf(L"[CLEAN] %s\n", path);
        break;

    default:
        break;
    }
}

3. File Modification Summary

flowchart TB
    subgraph Changes["Files to Modify"]
        direction TB
        H1["pe_parser.h\n+ Add field to ParsedFile"]
        C1["pe_parser.cpp\n+ Add extraction logic"]
        H2["features.h\n+ Add field to FeatureVector"]
        C2["features.cpp\n+ Map new field"]
        C3["ml.cpp\n+ Add classification rule"]
        C4["policy.cpp\n+ Handle new verdict"]
    end

    H1 --> C1
    H2 --> C2
    C2 --> C3
    C3 --> C4

    style Changes fill:#1a1a2e,color:#fff

4. Other Extension Ideas

Adding a New File Extension to Monitor

To monitor additional file types (e.g., .scr, .cpl, .ocx):

File: Windows File System Minifilter/FsMinifilter.cpp — IsTargetExtension()

Add new extension checks after the existing .exe and .dll checks:

// Check .scr (screensaver - common malware vector)
if ((ext[1] == L's' || ext[1] == L'S') &&
    (ext[2] == L'c' || ext[2] == L'C') &&
    (ext[3] == L'r' || ext[3] == L'R'))
    return TRUE;

Also update: FsMinifilterMonitor/main.cpp — IsExecutableOrDll() to match.

Adding File Logging to Policy Engine

void ApplyPolicy(const wchar_t* path, SCAN_RESULT verdict)
{
    // Existing console output...
    
    // NEW: File logging
    FILE* logFile = _wfopen(L"scan_log.txt", L"a");
    if (logFile) {
        SYSTEMTIME st;
        GetLocalTime(&st);
        fwprintf(logFile, L"[%04d-%02d-%02d %02d:%02d:%02d] [%s] %s\n",
            st.wYear, st.wMonth, st.wDay,
            st.wHour, st.wMinute, st.wSecond,
            verdict == SCAN_MALICIOUS ? L"MALICIOUS" : L"CLEAN",
            path);
        fclose(logFile);
    }
}

Adding Entry Point Analysis

Detect when the PE entry point is outside the .text section (common in packed malware):

// In ParsedFile
bool entryPointOutsideText = false;

// In PEParser::Initialize()
DWORD entryPoint = nt->OptionalHeader.AddressOfEntryPoint;
// Check if entry point RVA falls within .text section bounds
// If not, set entryPointOutsideText = true

5. Testing Your Changes

Single File Mode

The easiest way to test new rules:

Build the scanner
Run scanner.exe, select mode 2 (Single File Scan)
Enter the path to a known test file
Verify the output matches your expected verdict

With Known Samples

Test Case	Expected Entropy	Expected Imports	Expected Verdict
`C:\Windows\System32\notepad.exe`	~6.0	~20	CLEAN
`C:\Windows\System32\calc.exe`	~5.5	~15	CLEAN
UPX-packed binary	~7.5	~5	Depends on rule
Custom test binary (high entropy)	~7.2	~15	MALICIOUS

Generating Test Files

The project includes test/generate_entropy.py for creating files with specific entropy levels:

python test/generate_entropy.py

6. Best Practices

flowchart TD
    subgraph DOs["✅ Do"]
        D1["Add features to FeatureVector\n(not raw data)"]
        D2["Keep Classify() fast\n(no I/O or network)"]
        D3["Test with both clean\nand malicious samples"]
        D4["Use SCAN_SUSPICIOUS\nfor uncertain verdicts"]
    end
    
    subgraph DONTs["❌ Don't"]
        N1["Don't add blocking logic\nwithout driver changes"]
        N2["Don't access the file\ninside Classify()"]
        N3["Don't remove existing\nrules without testing"]
        N4["Don't forget to update\nboth kernel + monitor\nextension checks"]
    end

    style DOs fill:#2d6a4f,color:#fff
    style DONTs fill:#e63946,color:#fff

Keep the pipeline stages separate — PE parsing produces data, feature extraction transforms it, classification decides, policy acts
Never do I/O in Classify() — It runs on the worker thread and should be pure computation
Test with real-world samples — The entropy threshold was tuned empirically; new rules should be validated similarly
Use SCAN_SUSPICIOUS for low-confidence rules — Reserve SCAN_MALICIOUS for high-confidence detections
Document your rules — Add comments explaining why specific thresholds were chosen

Next Steps

Understand the current classifier: ML Classifier Module
Understand what data is available: PE Parser Module
Full pipeline context: Scan Pipeline Flow