Adding Detection Rules

This guide walks through extending the malware detection pipeline with new features, classification rules, and policy actions.

Related: ML Classifier Module ยท Policy Engine Module ยท PE Parser Module ยท Scan Pipeline Flow


1. Extension Points

The detection pipeline has four extension points, each in a separate file for clean separation:

flowchart LR
    subgraph ExtensionPoints["Extension Points"]
        PE["โ‘  PE Parser\n(pe_parser.cpp)\nExtract new data"]
        FE["โ‘ก Feature Extractor\n(features.cpp)\nAdd new features"]
        ML["โ‘ข Classifier\n(ml.cpp)\nAdd new rules"]
        POL["โ‘ฃ Policy Engine\n(policy.cpp)\nAdd new actions"]
    end

    PE --> FE --> ML --> POL

    style PE fill:#4361ee,color:#fff
    style FE fill:#e07a5f,color:#fff
    style ML fill:#7209b7,color:#fff
    style POL fill:#2d6a4f,color:#fff

2. Example: Adding Suspicious API Import Detection

This walkthrough adds a new feature that flags PE files importing suspicious APIs commonly used by malware (e.g., VirtualAllocEx, WriteProcessMemory, CreateRemoteThread).

Step 1: Extend ParsedFile (pe_parser.h)

Add a field to store whether suspicious imports were found:

struct ParsedFile
{
    bool is64Bit = false;
    DWORD sectionCount = 0;
    DWORD importCount = 0;
    DWORD textSize = 0;
    float textEntropy = 0.0f;
    vector<BYTE> textOpcodes;
    
    // NEW: Suspicious API detection
    DWORD suspiciousImportCount = 0;  // Count of suspicious API imports
};

Step 2: Extract the Data (pe_parser.cpp)

Add a function to check import names against a suspicious API list. This would be called from ParsePE_CPP after ParseImports():

// List of suspicious APIs
static const char* SUSPICIOUS_APIS[] = {
    "VirtualAllocEx",
    "WriteProcessMemory", 
    "CreateRemoteThread",
    "NtUnmapViewOfSection",
    "SetWindowsHookEx",
    "LoadLibraryA",
    "GetProcAddress",
    NULL
};

// Count suspicious imports in the parsed PE
static DWORD CountSuspiciousImports(PEParser& parser, BYTE* fileData, DWORD fileSize)
{
    DWORD count = 0;
    // Walk import name tables and match against SUSPICIOUS_APIS
    // ... implementation ...
    return count;
}

Step 3: Add to FeatureVector (features.h)

typedef struct {
    float entropy;
    int importCount;
    
    // NEW
    int suspiciousImportCount;
} FeatureVector;

Step 4: Extract the Feature (features.cpp)

void ExtractFeatures(const ParsedFile* parsed, FeatureVector* fv)
{
    fv->entropy = parsed->textEntropy;
    fv->importCount = parsed->importCount;
    
    // NEW
    fv->suspiciousImportCount = parsed->suspiciousImportCount;
}

Step 5: Update Classification (ml.cpp)

SCAN_RESULT Classify(const FeatureVector* fv)
{
    // Original rule: high entropy + many imports
    if (fv->entropy > 6.99f && fv->importCount > 10)
        return SCAN_MALICIOUS;

    // NEW: Suspicious APIs regardless of entropy
    if (fv->suspiciousImportCount >= 3)
        return SCAN_SUSPICIOUS;

    return SCAN_CLEAN;
}

Step 6: Handle New Verdict (policy.cpp)

void ApplyPolicy(const wchar_t* path, SCAN_RESULT verdict)
{
    switch (verdict)
    {
    case SCAN_MALICIOUS:
        wprintf(L"[MALICIOUS] %s\n", path);
        break;

    // NEW
    case SCAN_SUSPICIOUS:
        wprintf(L"[SUSPICIOUS] %s โ€” flagged for suspicious API imports\n", path);
        break;

    case SCAN_CLEAN:
        wprintf(L"[CLEAN] %s\n", path);
        break;

    default:
        break;
    }
}

3. File Modification Summary

flowchart TB
    subgraph Changes["Files to Modify"]
        direction TB
        H1["pe_parser.h\n+ Add field to ParsedFile"]
        C1["pe_parser.cpp\n+ Add extraction logic"]
        H2["features.h\n+ Add field to FeatureVector"]
        C2["features.cpp\n+ Map new field"]
        C3["ml.cpp\n+ Add classification rule"]
        C4["policy.cpp\n+ Handle new verdict"]
    end

    H1 --> C1
    H2 --> C2
    C2 --> C3
    C3 --> C4

    style Changes fill:#1a1a2e,color:#fff

4. Other Extension Ideas

Adding a New File Extension to Monitor

To monitor additional file types (e.g., .scr, .cpl, .ocx):

File: Windows File System Minifilter/FsMinifilter.cpp โ€” IsTargetExtension()

Add new extension checks after the existing .exe and .dll checks:

// Check .scr (screensaver - common malware vector)
if ((ext[1] == L's' || ext[1] == L'S') &&
    (ext[2] == L'c' || ext[2] == L'C') &&
    (ext[3] == L'r' || ext[3] == L'R'))
    return TRUE;

Also update: FsMinifilterMonitor/main.cpp โ€” IsExecutableOrDll() to match.

Adding File Logging to Policy Engine

void ApplyPolicy(const wchar_t* path, SCAN_RESULT verdict)
{
    // Existing console output...
    
    // NEW: File logging
    FILE* logFile = _wfopen(L"scan_log.txt", L"a");
    if (logFile) {
        SYSTEMTIME st;
        GetLocalTime(&st);
        fwprintf(logFile, L"[%04d-%02d-%02d %02d:%02d:%02d] [%s] %s\n",
            st.wYear, st.wMonth, st.wDay,
            st.wHour, st.wMinute, st.wSecond,
            verdict == SCAN_MALICIOUS ? L"MALICIOUS" : L"CLEAN",
            path);
        fclose(logFile);
    }
}

Adding Entry Point Analysis

Detect when the PE entry point is outside the .text section (common in packed malware):

// In ParsedFile
bool entryPointOutsideText = false;

// In PEParser::Initialize()
DWORD entryPoint = nt->OptionalHeader.AddressOfEntryPoint;
// Check if entry point RVA falls within .text section bounds
// If not, set entryPointOutsideText = true

5. Testing Your Changes

Single File Mode

The easiest way to test new rules:

  1. Build the scanner
  2. Run scanner.exe, select mode 2 (Single File Scan)
  3. Enter the path to a known test file
  4. Verify the output matches your expected verdict

With Known Samples

Test Case Expected Entropy Expected Imports Expected Verdict
C:\Windows\System32\notepad.exe ~6.0 ~20 CLEAN
C:\Windows\System32\calc.exe ~5.5 ~15 CLEAN
UPX-packed binary ~7.5 ~5 Depends on rule
Custom test binary (high entropy) ~7.2 ~15 MALICIOUS

Generating Test Files

The project includes test/generate_entropy.py for creating files with specific entropy levels:

python test/generate_entropy.py

6. Best Practices

flowchart TD
    subgraph DOs["โœ… Do"]
        D1["Add features to FeatureVector\n(not raw data)"]
        D2["Keep Classify() fast\n(no I/O or network)"]
        D3["Test with both clean\nand malicious samples"]
        D4["Use SCAN_SUSPICIOUS\nfor uncertain verdicts"]
    end
    
    subgraph DONTs["โŒ Don't"]
        N1["Don't add blocking logic\nwithout driver changes"]
        N2["Don't access the file\ninside Classify()"]
        N3["Don't remove existing\nrules without testing"]
        N4["Don't forget to update\nboth kernel + monitor\nextension checks"]
    end

    style DOs fill:#2d6a4f,color:#fff
    style DONTs fill:#e63946,color:#fff
  1. Keep the pipeline stages separate โ€” PE parsing produces data, feature extraction transforms it, classification decides, policy acts
  2. Never do I/O in Classify() โ€” It runs on the worker thread and should be pure computation
  3. Test with real-world samples โ€” The entropy threshold was tuned empirically; new rules should be validated similarly
  4. Use SCAN_SUSPICIOUS for low-confidence rules โ€” Reserve SCAN_MALICIOUS for high-confidence detections
  5. Document your rules โ€” Add comments explaining why specific thresholds were chosen

Next Steps