Scan Pipeline Flow
This document traces the complete path of a scan request from receipt by the scanner through PE parsing, feature extraction, ML classification, and policy enforcement.
Related: File Interception Flow · Scanner Module · PE Parser Module · ML Classifier · Policy Engine
1. Pipeline Overview
flowchart LR
subgraph Input
Pipe["Named Pipe\n(from Monitor)"]
CLI["Manual Input\n(Single Scan Mode)"]
end
subgraph Queue["Thread-Safe Queue"]
Q["std::queue\n(max 10,000)"]
Event["Win32 Event\n(auto-reset)"]
end
subgraph Worker["ScanWorker Thread"]
Deq["Dequeue"]
Norm["NormalizePath"]
Valid["Validate\n(file exists, not dir,\nnot self)"]
Parse["SafeParsePE_SEH"]
Feat["ExtractFeatures"]
Class["Classify"]
Policy["ApplyPolicy"]
end
Pipe --> Q
CLI --> Q
Q --> Deq --> Norm --> Valid --> Parse --> Feat --> Class --> Policy
style Queue fill:#e07a5f,color:#fff
style Worker fill:#2d6a4f,color:#fff
2. Detailed Sequence
sequenceDiagram
participant M as Monitor
participant Pipe as Named Pipe
participant Q as ScanQueue
participant W as ScanWorker Thread
participant PP as PEParser
participant FE as FeatureExtractor
participant ML as Classifier
participant POL as PolicyEngine
M->>Pipe: WriteFile(SCAN_REQUEST)
Pipe->>Q: Enqueue(req)
Q->>Q: SetEvent(g_QueueEvent)
W->>Q: Dequeue(&req)
W->>W: NormalizePath(req.filePath)
Note over W: Convert NT paths<br/>(\Device\HarddiskVolumeX\...)<br/>to Win32 paths (C:\...)
W->>W: GetFileAttributesW()
Note over W: Skip if:<br/>- INVALID_FILE_ATTRIBUTES<br/>- FILE_ATTRIBUTE_DIRECTORY<br/>- Scanner.exe or Monitor.exe
W->>PP: SafeParsePE_SEH(path, &parsed)
rect rgb(64, 64, 128)
Note over PP: __try / __except (SEH)
PP->>PP: LoadFile() → HeapAlloc + ReadFile
PP->>PP: Validate DOS header (MZ)
PP->>PP: Validate NT header (PE\0\0)
PP->>PP: Determine 32/64-bit
PP->>PP: ParseSections()
PP->>PP: ParseImports()
PP->>PP: ExtractMaxEntropySection()
PP-->>W: ParsedFile struct
end
W->>FE: ExtractFeatures(&parsed, &fv)
FE-->>W: FeatureVector { entropy, importCount }
W->>ML: Classify(&fv)
Note over ML: entropy > 6.99 AND<br/>importCount > 10<br/>→ MALICIOUS
ML-->>W: SCAN_RESULT verdict
W->>POL: ApplyPolicy(path, verdict)
Note over POL: Log verdict to console
3. Stage-by-Stage Breakdown
Stage 1: Path Normalization
The kernel driver sends NT-format paths (e.g., \Device\HarddiskVolume3\Windows\malware.exe). The scanner must convert these to Win32 paths for CreateFileW.
flowchart TD
Input["Raw Path from Kernel"]
Input --> C1{"Starts with\n\\\\?\\ or \\\\.\\<br/>or contains ':'?"}
C1 -->|Yes| Win32["Already Win32\n→ Use as-is"]
C1 -->|No| C2{"Starts with\n\\??\\?"}
C2 -->|Yes| Strip["Strip \\??\\ prefix\n→ Win32 path"]
C2 -->|No| C3{"Starts with\n\\Device\\?"}
C3 -->|Yes| Resolve["QueryDosDeviceW()\nfor each drive letter\n→ Map to Win32"]
C3 -->|No| Fail["❌ Cannot normalize"]
style Win32 fill:#2d6a4f,color:#fff
style Strip fill:#2d6a4f,color:#fff
style Resolve fill:#e07a5f,color:#fff
style Fail fill:#e63946,color:#fff
Stage 2: Pre-Scan Validation
Before investing in PE parsing, the worker applies quick rejection filters:
| Check | Method | Reject If |
|---|---|---|
| File exists | GetFileAttributesW() |
Returns INVALID_FILE_ATTRIBUTES |
| Not a directory | GetFileAttributesW() |
FILE_ATTRIBUTE_DIRECTORY is set |
| Not self | wcsstr() path check |
Path contains \Scanner.exe or \FsMinifilterMonitor.exe |
Stage 3: PE Parsing (SEH-Protected)
flowchart TD
Start["SafeParsePE_SEH()"]
Start --> SEH["__try block"]
SEH --> Load["LoadFile()\nCreateFileW + ReadFile\n→ heap buffer"]
Load --> DOS["Validate DOS Header\ne_magic == 'MZ'"]
DOS --> NT["Validate NT Header\nSignature == 'PE\\0\\0'"]
NT --> Bits["Check OptionalHeader.Magic\n→ 32-bit or 64-bit"]
Bits --> Sections["ParseSections()\nEnumerate IMAGE_SECTION_HEADER[]"]
Sections --> Imports["ParseImports()\nWalk IMAGE_IMPORT_DESCRIPTOR[]"]
Imports --> Entropy["ExtractMaxEntropySection()\nFind section with highest byte entropy"]
Entropy --> Result["ParsedFile {\n is64Bit, sectionCount,\n importCount, textSize,\n textEntropy, textOpcodes[4096]\n}"]
SEH --> Except["__except handler"]
Except --> Crash["Return FALSE\n(malformed PE)"]
style SEH fill:#4361ee,color:#fff
style Result fill:#2d6a4f,color:#fff
style Crash fill:#e63946,color:#fff
Key Detail: The entropy calculation scans all sections (not just .text) and selects the one with the highest Shannon entropy. This catches packed executables where the payload may be in arbitrarily named sections.
Stage 4: Feature Extraction
Currently extracts two features from the ParsedFile:
| Feature | Source | Type |
|---|---|---|
entropy |
Highest-entropy section’s Shannon entropy | float (0.0–8.0) |
importCount |
Number of IMAGE_IMPORT_DESCRIPTOR entries |
int |
Stage 5: Classification
flowchart TD
FV["FeatureVector"]
FV --> E{"entropy > 6.99?"}
E -->|No| Clean["SCAN_CLEAN"]
E -->|Yes| I{"importCount > 10?"}
I -->|No| Clean
I -->|Yes| Malicious["SCAN_MALICIOUS"]
style Malicious fill:#e63946,color:#fff
style Clean fill:#2d6a4f,color:#fff
See ML Classifier Module for the classification logic and rationale.
Stage 6: Policy Enforcement
| Verdict | Action |
|---|---|
SCAN_MALICIOUS |
Print [MALICIOUS] <path> to console |
SCAN_CLEAN |
Print [CLEAN] <path> to console |
SCAN_SUSPICIOUS |
(Reserved — not currently used) |
SCAN_ERROR |
(Reserved — not currently used) |
See Policy Engine Module for extension points.
4. Queue Backpressure
flowchart TD
Enq["Enqueue(req)"]
Enq --> Lock["lock_guard(g_Mutex)"]
Lock --> Full{"Queue size\n≥ 10,000?"}
Full -->|Yes| Pop["Pop oldest entry\n(drop)"]
Full -->|No| Push["Push new entry"]
Pop --> Push
Push --> Signal["SetEvent(g_QueueEvent)"]
style Pop fill:#e63946,color:#fff
style Signal fill:#2d6a4f,color:#fff
The queue uses a bounded FIFO with tail-drop strategy:
- Maximum capacity: 10,000 entries
- If full, the oldest entry is dropped to make room
- This prevents unbounded memory growth under heavy I/O
- The worker dequeues with a 500ms timeout, waking on either the event signal or the timeout
5. Error Handling Summary
| Stage | Error | Recovery |
|---|---|---|
| Path normalization | Unknown path format | Skip request, continue |
| File validation | File doesn’t exist | Skip request |
| PE parsing | Invalid DOS/NT header | SafeParsePE returns false → skip |
| PE parsing | Access violation (malformed PE) | SEH catches → SafeParsePE_SEH returns FALSE |
| PE parsing | File locked | CreateFileW with FILE_SHARE_READ|WRITE|DELETE handles most cases |
| Feature extraction | No valid sections | Zero features → classified as CLEAN |
Next Steps
- Understand how the driver reaches this point: File Interception Flow
- Deep-dive into PE parsing: PE Parser Module
- Learn how to add new features: Adding Detection Rules