Skip to content

Knowledge for System Security Research

In this post, I am maintainling a list of what we should know for system security research based on my experience.

Programming Language and Tooling

  • Be familiar with C and Python. Don't use many advanced features of Python in one project, which otherwise introduces difficulty for code reviewers.

  • Learn basic C++ and basic bash. Enable linters, e.g., shellcheck, to remove stupid bugs and enable ChatGPT for functionality you do not remember.

  • Learn Java for object-oriented programming. I suggest that even developing in Python/C++, apply the best practice of OOP in Java that is simple and mostly efficient.

  • Learn how to develop Dockerfile to make your artifact deployable everywhere. Learn Docker's entrypoint/arguments/environment variables/volumns/capabilities and run your program in the Docker container all the time.

  • Learn how to develop Json/Yaml to make your configs universal.

  • To compile C/C++, learn how to use gcc/g++/clang/clang++, how to install and uninstall these compilers, and how to debug any warnings and errors. Learn update-alternatives to have multiple versions of the tooling.

  • Learn what is cross compilation and how to use relative compilers.

  • [Optional] Considering the memory safety issues of C/C++, learn memory safe programming language, e.g., Rust.

(Linux) utilities

  • ls/mv/pwd/cat/echo/mkdir/rm/touch
  • cd/pushd/popd
  • cp/rsync
  • vim
  • git
  • sudo/chmod/chown
  • ps/kill/pkill
  • find/grep
  • tree/htop/df/du/timeout/watch/locate/head/tail/diff/ping/history/man
  • tar/zip
  • ssh/scp/rsync
  • screen
  • apt-get/apt-cache
  • source/bash/hash/ldconfig/update-grub

+PRBLM: Necessarily know what problem is going to address
-PRBLM: Not necessarily know what problem is going to address
+IMPL: Necessarily know how to implement it
-IMPL: Not necessarily know how to implement it

Data Structure and Algorithm

Textbook: Introduction to Algorithms (4th)

  • Sorting algorithms (-PRBLM, -IMPL)

    • Insertion sort, Merge sort, Heapsort, Quick sort, Counting sort, Radix sort, Bucket sort
  • Dynamic sets (+PRBLM, -IMPL)

    • Operations: Search, Insert, Delete, Minimum, Maximum, Successor, Predecessor

    • Data structures: Array (Fast lookup), Stack, Matrices, Queue, Linked List (Fast addition/removal), Rooted tree, Hash table

    • Data structure: Binary search tree (Binary tree, Left <= Right, Fast lookup/addition/removal)

      • Red-black tree

        • Every node in a red-black tree is either red or black, the children of a red node are both black, and every simple path from a node to a descendant leaf contains the same number of black nodes. Red-black trees are one of many search-tree schemes that are ``balanced'' in order to guarantee that basic dynamic-set operations take O(lgn) time in the worst case.

        • Rbtress in Linux kernel: Red-black trees are a type of self-balancing binary search tree, used for storing sortable key/value data pairs. This differs from radix trees (which are used to efficiently store sparse arrays and thus use long integer indexes to insert/access/delete nodes) and hash tables (which are not kept sorted to be easily traversed in order, and must be tuned for a specific size and hash function where rbtrees scale gracefully storing arbitrary keys). Red-black trees are similar to AVL trees, but provide faster real-time bounded worst case performance for insertion and deletion (at most two rotations and three rotations, respectively, to balance the tree), with slightly slower (but still O(log n)) lookup time.

      • AVL tree

        • An AVL tree is a binary search tree that is height balanced: for each node x, the heights of the left and right subtrees of x differ by at most 1.
  • Graph algorithm (+PRBLM, -IMPL)

    • Search, Topological sort, Strong connected components, Minimum spanning tree, Shortest paths, Maximum flow, Matching in bipartitie graphs
  • Advanced data structures (-PRBLM, -IMPL)

    • Augmented red-black tree, B-trees, data structures for disjointed sets
  • Algorithm design and analysis

    • Simulation, Recursive algorithms, Enumeration, Iterative algorithm, Divide and conquer, Dynamic programming, Greedy algorithms, Amortized analysis, Data structure augment
  • Automata, regular expression, context-free grammar, temporal and spatial complexity, the halting problem over Turing machines

Architecture and Computation System

  • x86 shits

    • segments
      • segment selector: cs/ds/ss, es/fs/gs, indexing segment descriptor
      • segment descriptor/cache segment descriptor: base, limit
      • segment descriptor table
        • gdt: all processes can access this
        • ldt: a system can have one or multiple ldts; a process can exclusively use an LDT or use the ldt shared by multiple processes
        • To quickly access GDT and LDT, gdtr and ldtr are leveraged
      • To demystify
        • gdt: array, gdtr stores where the gdt is
        • ldt: array, ldtr stores where the ldt for the current process is
    • logical address must exist (since x86 cannot disable segments)
      • inside the program, a pointer stores the offset to a segment
      • iirc, ldt is not enabled, gdt is transparent
    • logical address -> linear address -> physical address
      • logical address -> segment selector -> gdt/ldt -> linear address
      • w/o paging, linear address = physical address
      • w/ paging, linear address (virtual address) -> page table -> physical address
    • four modes
      • Real mode: linear address = physical address, 1MB
      • Protect mode: the commonly used mode
      • Virtual 8086 mode, provide a real mode on the protect mode (for compatibility)
  • Be familiar with x86/x86, arm/aarch64, and riscv assembly.

  • Learn ISA extension for security (PAC).
  • Learn processor design and pipelines.
  • Learn cache and cache coherence.
  • Learn magnetic storage and solid-state drive (SSD).
  • Learn PCI/USB/Wi-Fi/BE/BLE/BaseBand controllers.
  • Learn hardware for virtualization (VT-x/VT-d/EPT/SMMU/IOMMU).
  • Learn hardware for security (SGX/TrustZone/TDX/SEV/Realm/HSM/TPM).

Operating System

  • Learn process management.
  • Learn memory management.
  • Learn file and filesystems
  • Learn access control.
  • Learn hardening techniques.

Virtualization

VMX = Virtual Machine Extension

Intel VT-x

  • Trap-and-emulation requires all sensitive instructions to be privilege instructions
  • Challenges
    • Not all sensitive instructions are privilege instructions
    • Cannot trigger an exception to be compatible with existing software
  • Add one more mode
    • VMX Root Operation: when the VMM is running, compatible to existing software
    • VMX Non-Root Operation: when the guest is running
      • All sensitive instructions are re-designed
    • VMM Operations are orthogonal to Ring 0 and 3
  • Specifically

    • vmxon -> VMX Root Operation -> vmlaunch/vmresume -> VM-Entry -> VMX Non-Root Operation -> Some sensitive instructions/MMIO/EPT Violation -> VM-Exit -> VMX Root Operation -> vmxoff

    • VM-Exit

      • sysenter won't introduce a VM-Exit, even it is a sensitive instruction
      • cpuid must introduce a VM-Exit
      • The behaviors of some sensitive instructions are controlled by VMCS
    • VMCS (Virtual-Machine Control Structure)

      • VMCS is stored in memory, different from the guest memory
      • VMCS and a physical CPU are 1-1 mapping
        • vmptrld addr_of_vmcs/vmcleaer
        • VMCS migration may be not implemented
      • VMCS includes

        • Guest-state area - registers
          • VM-Exit -> save, VM-Entry -> restore
          • Including LDTR
        • Host-state area
          • VM-Exit -> restore
          • Not including LDTR, CS:RIP is where the VM-Exit is handled
        • VM-Entry control field
          • VMM can inject event (exception/external interrupts/NMI) into the guest vCPU, say when a DMA operation is completed.
        • VM-Execution control field
          • External-interrupt/Exception bitmap -> VM-Exit?
          • HLT/INVLPG/WBINVD/RDPMC/RDTSC/... -> VM-Exit?
          • Unconditional I/O exiting/Use I/O bigmaps/Use MSR bitmaps -> VM-Exit?
        • VM-Exit control field (seems not interesting)
        • VM-Exit information field
          • Exit reason
          • Exit qualification
          • VM-Exit interrupt information/interrupt error code for external interrupts/exceptions, and NMI
          • Guest linear address/instruction length/instruction information for the sensitive instruction that triggers the VM-Exit
      • vmread/vmwrite to configure VMCS

        • VT-x requires that the paging must be enabled in Non-Root operation

EPT

  • GVA -> GPA -> HPA
  • Challenges
    • Shadow page table is very complicated and expensive
  • Add one more hardware: EPT
  • Specifically, given a GVA, 5 queries in total
    • Guest CR3 (GPA) -> EPT TLB -> EPT page table (EPT MMU) -> HPA or EPT Violation (1)
    • GVA + L4 GHA -> L3 GPA or page fault (no VM-Exit) -> HPA or EPT Violation (2)
    • GVA + L3 GHA -> L2 GPA or page fault (no VM-Exit) -> HPA or EPT Violation (3)
    • GVA + L2 GHA -> L1 GPA or page fault (no VM-Exit) -> HPA or EPT Violation (4)
    • GVA + L1 GHA -> GPA or page fault (no VM-Exit) -> HPA or EPT Violation (5)
  • Specifically
    • EPT is configured (enabling and address) in VMCS's VM-Execution control field
  • EPT Violation happens when
    • GPA's address is large
    • Guest is reading a not readable page
    • Guest is writing a not writable page
    • Guest is executing a not executable page

Nested Virtualization

Slide 1 Paper 1

Computer Networking

  • Learn TCP/UDP programming.

Database

  • Learn SQL

Compiler

  • LLVM IR and passes

Software Security

Distributed Systems

Probability

Graph Theory

Game Theory