Design Decisions

This document records the key architectural decisions made during the design of the Windows File System Minifilter project, the alternatives considered, and the rationale for each choice.

Related: System Overview ยท Driver Architecture ยท ML Classifier Module


Decision Map

flowchart TB
    subgraph Decisions["Key Design Decisions"]
        D1["DD-1: Minifilter\nvs Legacy Filter"]
        D2["DD-2: Three-Process\nArchitecture"]
        D3["DD-3: Filter Port\nvs DeviceIoControl"]
        D4["DD-4: Named Pipe\nvs Shared Memory"]
        D5["DD-5: Entropy-Based\nClassification"]
        D6["DD-6: Extension-Only\nFiltering"]
        D7["DD-7: Altitude 47777"]
        D8["DD-8: Stream Context\nfor Delete Tracking"]
    end

    D1 --> D3
    D2 --> D4
    D3 --> D2
    D5 --> D6

    style Decisions fill:#1a1a2e,color:#fff

DD-1: Minifilter API vs Legacy File System Filter Driver

ย  Minifilter (Chosen โœ…) Legacy Filter Driver
API FltRegisterFilter, high-level callbacks IoAttachDeviceToDeviceStack, raw IRP handling
Complexity Medium โ€” Filter Manager handles stack ordering High โ€” must manually manage attachment and IRP forwarding
Stability High โ€” Microsoft-maintained infrastructure Low โ€” easy to crash the system with incorrect IRP handling
Future-proof Recommended by Microsoft since Vista Deprecated; no new features
Altitude system Built-in ordering and isolation Manual stack management

Rationale: The minifilter framework abstracts away most of the complexity of file system filter development. Filter Manager handles device attachment, IRP routing, and teardown. This dramatically reduces the risk of blue-screen crashes and simplifies maintenance.


DD-2: Three-Process Architecture (Driver โ†’ Monitor โ†’ Scanner)

flowchart LR
    subgraph Chosen["Chosen: Three-Process โœ…"]
        D1["Driver"] --> M1["Monitor"] --> S1["Scanner"]
    end
    
    subgraph Alt["Alternative: Two-Process"]
        D2["Driver"] --> MS["Monitor + Scanner\n(combined)"]
    end

    style Chosen fill:#2d6a4f,color:#fff
    style Alt fill:#6c757d,color:#fff

Rationale:

  • Separation of concerns: The monitor is a thin relay that must be highly reliable (connected to the kernel port). The scanner performs expensive PE parsing that may crash on malformed binaries. Isolating them prevents a scanner crash from disconnecting the kernel communication.
  • Independent scaling: The scanner can be restarted without losing the kernel connection.
  • SEH isolation: The scanner wraps PE parsing in __try/__except (SEH). A crash in the scanner process does not affect the monitor or driver.

Trade-off: Adds IPC overhead (named pipe) and operational complexity (two user-mode processes to manage).


DD-3: Filter Communication Port vs DeviceIoControl

ย  Filter Comm Port (Chosen โœ…) DeviceIoControl
Direction Kernel โ†’ User push model User โ†’ Kernel pull model
Latency Low โ€” driver pushes immediately Higher โ€” user must poll
Complexity FltSendMessage / FilterGetMessage DeviceIoControl + IOCTL codes
Security Built-in security descriptor Must implement manually

Rationale: The filter communication port provides a natural push model โ€” the driver sends messages as events occur. DeviceIoControl would require the monitor to poll, adding latency and CPU overhead. The filter port also integrates cleanly with the minifilter framework and supports connection/disconnection callbacks for lifecycle management.


DD-4: Named Pipe vs Shared Memory (Monitor โ†” Scanner)

ย  Named Pipe (Chosen โœ…) Shared Memory + Events
Complexity Low โ€” message-oriented, built-in serialization High โ€” manual synchronization, ring buffers
Message boundaries Automatic (PIPE_TYPE_MESSAGE) Manual framing required
Cross-process Built-in Requires CreateFileMapping + named events
Throughput ~100K msgs/sec (sufficient) Higher (millions/sec)
Debugging Easy โ€” PipeList, handle inspection Difficult โ€” memory dumps

Rationale: Named pipes provide message-oriented IPC with built-in framing and are trivial to implement. The scan workload is I/O-bound (reading PE files from disk), so pipe throughput is not the bottleneck. Shared memory would be over-engineering for this use case.


DD-5: Entropy-Based Heuristic Classification

flowchart LR
    subgraph Chosen["Chosen: Entropy Heuristic โœ…"]
        E["Section Entropy\n> 6.99"]
        I["Import Count\n> 10"]
        E --> V{"Both true?"}
        I --> V
        V -->|Yes| MAL["MALICIOUS"]
        V -->|No| CLN["CLEAN"]
    end

    style MAL fill:#e63946,color:#fff
    style CLN fill:#2d6a4f,color:#fff

Rationale:

  • Packed/encrypted malware exhibits high entropy (> 7.0) in code sections because compression and encryption produce near-random byte distributions.
  • Legitimate software typically has structured code sections with lower entropy (5.0โ€“6.5) and uses many standard imports.
  • This heuristic is deliberately simple to serve as a baseline classifier that can be extended with additional features (opcode analysis, API call patterns) without restructuring the pipeline.

Trade-off: High false-positive rate on legitimately packed software (UPX, Themida) and high-entropy data sections. The threshold (6.99) was chosen as an aggressive starting point; see Adding Detection Rules for tuning guidance.


DD-6: Extension-Based Filtering (.exe / .dll Only)

Rationale:

  • Focusing on PE executables reduces noise by orders of magnitude โ€” most file I/O on Windows involves documents, logs, and temporary files.
  • .exe and .dll are the primary vectors for malware execution on Windows.
  • The check is performed at the kernel level (IsTargetExtension()) to avoid sending irrelevant messages across the kernel/user boundary, saving IPC overhead.

Trade-off: Misses malware using non-standard extensions (.scr, .cpl, .ocx). This is an intentional scope limitation for the current version. The function can be trivially extended โ€” see Adding Detection Rules.


DD-7: Altitude 47777 (FSFilter Activity Monitor)

Altitude Range Group Our Position
320000โ€“329999 FSFilter Anti-Virus โ€”
140000โ€“149999 FSFilter Encryption โ€”
40000โ€“49999 FSFilter Bottom 47777 โœ…

Rationale: The FSFilter Activity Monitor / Bottom group is designed for filters that passively observe I/O without modifying it. Since this driver never blocks, modifies, or redirects any I/O operation, a low altitude is appropriate. This ensures the driver sees I/O after other filters have processed it, providing a more accurate view of what actually reaches the file system.


DD-8: Stream Context for Delete Tracking

Problem: File deletion in Windows is not a single atomic operation. It can occur via:

  1. FILE_DELETE_ON_CLOSE flag on CreateFile
  2. FileDispositionInformation / FileDispositionInformationEx via SetFileInformation
  3. Actual deletion confirmed only during IRP_MJ_CLEANUP

Solution: A FLT_STREAM_CONTEXT tracks the delete disposition state across these IRPs. The context is created on the first delete-related operation and persists until the file handle is cleaned up.

flowchart LR
    C["IRP_MJ_CREATE\n(DELETE_ON_CLOSE)"] --> CTX["Stream Context\nCreated"]
    S["IRP_MJ_SET_INFO\n(SetDisposition)"] --> CTX
    CTX --> CL["IRP_MJ_CLEANUP"]
    CL --> Q["FltQueryInformationFile"]
    Q -->|STATUS_FILE_DELETED| N["Notify User Mode\n(once only)"]

    style CTX fill:#e07a5f,color:#fff
    style N fill:#2d6a4f,color:#fff

Rationale: This pattern is directly derived from the Microsoft Delete File Detection Sample. The InterlockedIncrement/InterlockedDecrement pattern on NumOps handles the race condition where multiple threads issue concurrent SetDisposition calls on the same file. The IsNotified flag ensures exactly-once notification semantics.


Summary Table

ID Decision Key Driver Risk
DD-1 Minifilter API Stability, simplicity Lock-in to Filter Manager
DD-2 Three-process architecture Fault isolation Operational complexity
DD-3 Filter communication port Push model, low latency Single connection limit
DD-4 Named pipe for IPC Simplicity, message framing Throughput ceiling
DD-5 Entropy-based classification Catches packed malware False positives on packed legitimate software
DD-6 Extension filtering Noise reduction Misses non-standard PE extensions
DD-7 Altitude 47777 Passive monitoring Sees post-filter I/O only
DD-8 Stream context delete tracking Correctness Memory overhead per tracked file