Scan Pipeline Flow

This document traces the complete path of a scan request from receipt by the scanner through PE parsing, feature extraction, ML classification, and policy enforcement.

Related: File Interception Flow · Scanner Module · PE Parser Module · ML Classifier · Policy Engine


1. Pipeline Overview

flowchart LR
    subgraph Input
        Pipe["Named Pipe\n(from Monitor)"]
        CLI["Manual Input\n(Single Scan Mode)"]
    end

    subgraph Queue["Thread-Safe Queue"]
        Q["std::queue\n(max 10,000)"]
        Event["Win32 Event\n(auto-reset)"]
    end

    subgraph Worker["ScanWorker Thread"]
        Deq["Dequeue"]
        Norm["NormalizePath"]
        Valid["Validate\n(file exists, not dir,\nnot self)"]
        Parse["SafeParsePE_SEH"]
        Feat["ExtractFeatures"]
        Class["Classify"]
        Policy["ApplyPolicy"]
    end

    Pipe --> Q
    CLI --> Q
    Q --> Deq --> Norm --> Valid --> Parse --> Feat --> Class --> Policy

    style Queue fill:#e07a5f,color:#fff
    style Worker fill:#2d6a4f,color:#fff

2. Detailed Sequence

sequenceDiagram
    participant M as Monitor
    participant Pipe as Named Pipe
    participant Q as ScanQueue
    participant W as ScanWorker Thread
    participant PP as PEParser
    participant FE as FeatureExtractor
    participant ML as Classifier
    participant POL as PolicyEngine

    M->>Pipe: WriteFile(SCAN_REQUEST)
    Pipe->>Q: Enqueue(req)
    Q->>Q: SetEvent(g_QueueEvent)
    
    W->>Q: Dequeue(&req)
    
    W->>W: NormalizePath(req.filePath)
    Note over W: Convert NT paths<br/>(\Device\HarddiskVolumeX\...)<br/>to Win32 paths (C:\...)

    W->>W: GetFileAttributesW()
    Note over W: Skip if:<br/>- INVALID_FILE_ATTRIBUTES<br/>- FILE_ATTRIBUTE_DIRECTORY<br/>- Scanner.exe or Monitor.exe

    W->>PP: SafeParsePE_SEH(path, &parsed)
    
    rect rgb(64, 64, 128)
        Note over PP: __try / __except (SEH)
        PP->>PP: LoadFile() → HeapAlloc + ReadFile
        PP->>PP: Validate DOS header (MZ)
        PP->>PP: Validate NT header (PE\0\0)
        PP->>PP: Determine 32/64-bit
        PP->>PP: ParseSections()
        PP->>PP: ParseImports()
        PP->>PP: ExtractMaxEntropySection()
        PP-->>W: ParsedFile struct
    end

    W->>FE: ExtractFeatures(&parsed, &fv)
    FE-->>W: FeatureVector { entropy, importCount }

    W->>ML: Classify(&fv)
    Note over ML: entropy > 6.99 AND<br/>importCount > 10<br/>→ MALICIOUS
    ML-->>W: SCAN_RESULT verdict

    W->>POL: ApplyPolicy(path, verdict)
    Note over POL: Log verdict to console

3. Stage-by-Stage Breakdown

Stage 1: Path Normalization

The kernel driver sends NT-format paths (e.g., \Device\HarddiskVolume3\Windows\malware.exe). The scanner must convert these to Win32 paths for CreateFileW.

flowchart TD
    Input["Raw Path from Kernel"]
    
    Input --> C1{"Starts with\n\\\\?\\ or \\\\.\\<br/>or contains ':'?"}
    C1 -->|Yes| Win32["Already Win32\n→ Use as-is"]
    
    C1 -->|No| C2{"Starts with\n\\??\\?"}
    C2 -->|Yes| Strip["Strip \\??\\ prefix\n→ Win32 path"]
    
    C2 -->|No| C3{"Starts with\n\\Device\\?"}
    C3 -->|Yes| Resolve["QueryDosDeviceW()\nfor each drive letter\n→ Map to Win32"]
    
    C3 -->|No| Fail["❌ Cannot normalize"]

    style Win32 fill:#2d6a4f,color:#fff
    style Strip fill:#2d6a4f,color:#fff
    style Resolve fill:#e07a5f,color:#fff
    style Fail fill:#e63946,color:#fff

Stage 2: Pre-Scan Validation

Before investing in PE parsing, the worker applies quick rejection filters:

Check Method Reject If
File exists GetFileAttributesW() Returns INVALID_FILE_ATTRIBUTES
Not a directory GetFileAttributesW() FILE_ATTRIBUTE_DIRECTORY is set
Not self wcsstr() path check Path contains \Scanner.exe or \FsMinifilterMonitor.exe

Stage 3: PE Parsing (SEH-Protected)

flowchart TD
    Start["SafeParsePE_SEH()"]
    Start --> SEH["__try block"]
    
    SEH --> Load["LoadFile()\nCreateFileW + ReadFile\n→ heap buffer"]
    Load --> DOS["Validate DOS Header\ne_magic == 'MZ'"]
    DOS --> NT["Validate NT Header\nSignature == 'PE\\0\\0'"]
    NT --> Bits["Check OptionalHeader.Magic\n→ 32-bit or 64-bit"]
    
    Bits --> Sections["ParseSections()\nEnumerate IMAGE_SECTION_HEADER[]"]
    Sections --> Imports["ParseImports()\nWalk IMAGE_IMPORT_DESCRIPTOR[]"]
    Imports --> Entropy["ExtractMaxEntropySection()\nFind section with highest byte entropy"]
    
    Entropy --> Result["ParsedFile {\n  is64Bit, sectionCount,\n  importCount, textSize,\n  textEntropy, textOpcodes[4096]\n}"]

    SEH --> Except["__except handler"]
    Except --> Crash["Return FALSE\n(malformed PE)"]

    style SEH fill:#4361ee,color:#fff
    style Result fill:#2d6a4f,color:#fff
    style Crash fill:#e63946,color:#fff

Key Detail: The entropy calculation scans all sections (not just .text) and selects the one with the highest Shannon entropy. This catches packed executables where the payload may be in arbitrarily named sections.

Stage 4: Feature Extraction

Currently extracts two features from the ParsedFile:

Feature Source Type
entropy Highest-entropy section’s Shannon entropy float (0.0–8.0)
importCount Number of IMAGE_IMPORT_DESCRIPTOR entries int

Stage 5: Classification

flowchart TD
    FV["FeatureVector"]
    FV --> E{"entropy > 6.99?"}
    E -->|No| Clean["SCAN_CLEAN"]
    E -->|Yes| I{"importCount > 10?"}
    I -->|No| Clean
    I -->|Yes| Malicious["SCAN_MALICIOUS"]

    style Malicious fill:#e63946,color:#fff
    style Clean fill:#2d6a4f,color:#fff

See ML Classifier Module for the classification logic and rationale.

Stage 6: Policy Enforcement

Verdict Action
SCAN_MALICIOUS Print [MALICIOUS] <path> to console
SCAN_CLEAN Print [CLEAN] <path> to console
SCAN_SUSPICIOUS (Reserved — not currently used)
SCAN_ERROR (Reserved — not currently used)

See Policy Engine Module for extension points.


4. Queue Backpressure

flowchart TD
    Enq["Enqueue(req)"]
    Enq --> Lock["lock_guard(g_Mutex)"]
    Lock --> Full{"Queue size\n≥ 10,000?"}
    Full -->|Yes| Pop["Pop oldest entry\n(drop)"]
    Full -->|No| Push["Push new entry"]
    Pop --> Push
    Push --> Signal["SetEvent(g_QueueEvent)"]

    style Pop fill:#e63946,color:#fff
    style Signal fill:#2d6a4f,color:#fff

The queue uses a bounded FIFO with tail-drop strategy:

  • Maximum capacity: 10,000 entries
  • If full, the oldest entry is dropped to make room
  • This prevents unbounded memory growth under heavy I/O
  • The worker dequeues with a 500ms timeout, waking on either the event signal or the timeout

5. Error Handling Summary

Stage Error Recovery
Path normalization Unknown path format Skip request, continue
File validation File doesn’t exist Skip request
PE parsing Invalid DOS/NT header SafeParsePE returns false → skip
PE parsing Access violation (malformed PE) SEH catches → SafeParsePE_SEH returns FALSE
PE parsing File locked CreateFileW with FILE_SHARE_READ|WRITE|DELETE handles most cases
Feature extraction No valid sections Zero features → classified as CLEAN

Next Steps