C++PerformanceInterview

我在 Crypto/HFT 公司的 C++ 效能面試

2026-03-31

在 HFT 場景效能是最優先的，掌握最好的寫法非常重要。這些都是很深度的 domain knowledge，也讓我更發現 C++ 的深奧。題目經過改寫，分析是我事後整理的。

第一題：建構子參數怎麼接

有一個 struct，成員是 std::string，建構子需要把外部傳入的字串存下來。比較以下三種寫法的效能差異。

其中 std::string_view (C++17) 是一個輕量的唯讀字串參考，內部只存 pointer + length，本身不擁有資料也不做任何 allocation，建構成本接近零。

cpp

// 寫法 A：pass by value + move
struct Widget {
    std::string name;
    Widget(std::string n) : name{std::move(n)} {}
};

// 寫法 B：pass by string_view
struct Widget {
    std::string name;
    Widget(std::string_view n) : name{n} {}
};

// 寫法 C：pass by const reference
struct Widget {
    std::string name;
    Widget(const std::string& n) : name{n} {}
};

分析

三種寫法各有不同的 copy / move 成本，且會因 caller 傳的是 lvalue 還是 rvalue 而改變：

cpp

std::string s = "hello";

Widget w1(s);                // lvalue：s 是有名字的變數
Widget w2(std::move(s));     // rvalue：std::move 把 s 轉成 rvalue
Widget w3("hello");          // string literal：直接傳字串常量
Widget w4(ptr);              // const char*：傳 C-style 字串指標

// 以上四種都可以隱式轉成 string_view，
// 這也是為什麼寫法 B (string_view) 在下表中表現最好，
// 但 string_view 不擁有資料，caller 要確保原始字串的 lifetime 夠長。

// 不能隱式轉 string_view 的常見例子：
// std::vector<char>  → 要手動 string_view(v.data(), v.size())
// std::stringstream  → 要先 .str() 拿到 string
// 單個 char           → 要 string_view(&c, 1)

傳入	A: value + move	B: string_view	C: const ref
lvalue string	1 copy (參數 n) + 1 move (into name)	1 copy (view → name 建構)	1 copy (ref → name 建構)
rvalue string	1 move (參數 n) + 1 move (into name)	1 copy (view → name 建構)	1 copy (ref → name 建構)
string literal	1 construct (literal → 參數 n) + 1 move (into name)	1 construct (view → name 建構)	1 construct (literal → temp) + 1 copy (temp → name)
const char*	1 construct (char* → 參數 n) + 1 move (into name)	1 construct (view → name 建構)	1 construct (char* → temp) + 1 copy (temp → name)

Fastest

B (string_view) - 在所有情境下都只做一次 string construction（從 view 建構），沒有多餘的 move 或 copy。 string_view 本身只是 pointer + length，建構零成本。

A (value + move)- rvalue 情境下接近零成本（兩次 move 幾乎免費），但 lvalue 情境多了一次 copy。這是 Modern C++ 常見推薦的 “sink parameter” 模式。

Slower

C (const ref) - lvalue 時跟 B 一樣（1 copy），但 rvalue / literal 時無法利用 move，永遠是 copy。對 string literal 還多一次隱式 temp construction。

實務上如果這個 constructor 只有一個 string 參數，A 是最平衡的選擇 - 程式碼簡單且 rvalue 效能最佳。 B 是效能最好的選擇，但 caller 必須知道傳的是 view（不能假設 ownership transfer）。如果 caller 幾乎都傳 lvalue（例如從 config 讀值），B 和 C 效能相同。

第二題：Lambda 捕獲模式

以下四種 lambda 寫法，分析各自的效能與正確性：

cpp

void process(std::string_view msg) {
    // 寫法 0：static + capture by reference
    static const auto fn0 = [&]() {
        std::cout << msg << "\n";
    };
    fn0();
}

void process(std::string_view msg) {
    // 寫法 1：non-static + capture by reference
    const auto fn1 = [&]() {
        std::cout << msg << "\n";
    };
    fn1();
}

void process(std::string_view msg) {
    // 寫法 2：static constexpr + parameter
    static constexpr auto fn2 = [](std::string_view s) {
        std::cout << s << "\n";
    };
    fn2(msg);
}

void process(std::string_view msg) {
    // 寫法 3：non-static + parameter
    const auto fn3 = [](std::string_view s) {
        std::cout << s << "\n";
    };
    fn3(msg);
}

逐一拆解

Dangerous

寫法 0：static + [&] - Undefined Behavior!
static 意味著 lambda 物件只初始化一次。第一次呼叫時 [&] 捕獲的是第一次 msg 的 reference。之後每次呼叫，msg 的位址不同，但 lambda 仍然持有第一次的 reference - 這是一個 dangling reference。這是最嚴重的 bug。

寫法 1：non-static + [&] - 安全且高效。
每次呼叫都重新建構 lambda（捕獲當前的 msg），reference 永遠有效。 Lambda 物件在 stack 上，compiler 幾乎一定會 inline。

Fastest

寫法 2：static constexpr + parameter - 最快且最安全。
Stateless lambda（no capture），compiler 可以當成普通 function pointer。constexpr 保證編譯期初始化，static 只是一個 hint（stateless lambda 沒有 state 可以 init）。不依賴任何外部 reference，完全沒有 lifetime 問題。

寫法 3：non-static + parameter - 安全，效能與寫法 2 幾乎相同。
也是 stateless lambda，compiler 同樣會 inline。跟寫法 2 的差異在理論上可以忽略 - 少了 static 但 stateless lambda 本身就沒有初始化成本。

前提是 lambda 只需要用到傳入的參數，不依賴外部變數。寫法 2/3 是 stateless lambda（沒有 capture），不會多捕捉到其他變數， compiler 可以直接 inline，不需要額外建構 lambda 物件，也沒有任何 lifetime 風險。

Inline 的實際影響

cpp

// inline 前
fn2(msg);          // call → jump → 執行 → return

// inline 後：compiler 直接展開在 call site
std::cout << msg << "\n";  // 沒有 call/return，省掉 stack frame 開銷

展開後 compiler 還能跨 lambda 邊界做 constant folding、dead code elimination 等優化。在 HFT 場景，每一次多餘的 function call 都可能是幾十 ns 的差距。

第三題：Class 裡的 Lambda

把 lambda 放在 class method 裡，比較 [this] 捕獲 vs 直接傳參數：

cpp

class Handler {
    std::string data;
public:
    // 寫法 0：static + [this]
    void run0() {
        static const auto fn = [this]() {
            std::cout << data << "\n";
        };
        fn();
    }

    // 寫法 1：non-static + [this]
    void run1() {
        const auto fn = [this]() {
            std::cout << data << "\n";
        };
        fn();
    }

    // 寫法 2：static constexpr + string_view parameter
    void run2() {
        static constexpr auto fn = [](std::string_view s) {
            std::cout << s << "\n";
        };
        fn(data);  // string → string_view 隱式轉換
    }

    // 寫法 3：non-static + const string& parameter
    void run3() {
        const auto fn = [](const std::string& s) {
            std::cout << s << "\n";
        };
        fn(data);
    }
};

分析

Dangerous

寫法 0：static + [this] - 跟 Q2 一樣的問題，更危險。
static lambda 只初始化一次，this pointer 被永久固定。如果 Handler 物件被 move、destroy、或不同 instance 呼叫 run0()， lambda 裡的 this 指向的是第一個 instance 的位址 - dangling pointer，UB。
編譯器實際上會把 [this] 展開成類似 struct { Handler* thiz; void operator()() { ... thiz->data ... } }，而 static 讓這個 pointer 永遠不更新。

寫法 1：non-static + [this] - 安全。每次呼叫都重新捕獲當前的 this，compile 後基本等同 inline function call。唯一風險是如果 lambda 被存起來傳到外部（escape），this 可能 dangle - 但在 method 內直接呼叫沒問題。

Fastest

寫法 2：static constexpr + string_view - 最快且最安全。
Stateless lambda（no capture），不依賴任何 instance，compiler 直接 inline。
比寫法 0/1 快的原因：不需要建構 lambda 物件來存 captured pointer（寫法 1 每次呼叫都要把 this 寫進 lambda 物件），也不存在 this-> 間接存取（寫法 1 要先讀 this，再讀 this->data，多一次 indirection）。
string_view 傳遞只是 pointer + size（16 bytes on stack），零 copy，零 heap allocation。

寫法 3：non-static + const string& - 安全，效能接近寫法 2。也是 stateless lambda。const string& 和 string_view 在這個場景下效能幾乎一樣，因為都只傳 reference / pointer，不做 copy。

Pattern：在 class method 裡的 lambda，能不 capture 就不 capture。改用 parameter 傳入需要的資料，lambda 就是 stateless 的， compiler 優化空間最大，也不會有 lifetime 問題。

第四題：Pointer 參數 vs Value 參數

函式需要對兩個字串做複雜操作，比較 pointer 和 value 傳遞：

cpp

// 寫法 A：第二個參數用 pointer
void transform(std::string x, std::string* y) {
    // 對 x 和 *y 做複雜操作
}

// 寫法 B：兩個都 pass by value
void transform(std::string x, std::string y) {
    // 對 x 和 y 做複雜操作
}

差異分析

面向	A: string*	B: string value
Copy 成本	y 不 copy（只傳 8B pointer）	y 會被 copy（可能 heap alloc）
Null safety	y 可能是 nullptr - 需要檢查	不可能是 null
Aliasing	*y 可能 alias x（同一物件）	x 和 y 獨立，不會 alias
副作用	可以修改 caller 的原始字串	修改不影響 caller
Compiler 優化	pointer aliasing 限制優化	value 獨立，optimizer 更自由

Fastest

A (pointer) - 如果函式只需要讀取 y，pointer 省去了整個 copy。這在 HFT 場景很重要 - 一次 string copy 可能觸發 heap allocation，而 heap allocation 是 latency spike 的主要來源之一。

B (value) - 更安全、語意更清楚、compiler 可以更自由地優化。如果函式需要修改自己的副本，或 caller 傳的是 rvalue（move 進來零成本），B 更合理。

第五題：SSO 底層實作

“std::string 底層怎麼實作 Small String Optimization?”

核心：union - 同一塊記憶體，兩種用途

std::string 物件內部有一塊固定大小的 union。當字串長度小於閾值，字元直接存在物件內部（stack / static 上）；超過閾值才 malloc heap 空間：

cpp

// 簡化版 std::string 結構（以 GCC libstdc++ 為概念）
class basic_string {
    size_t _size;

    union {
        // 短字串模式：字元直接存在 16 bytes 裡
        char _short_buf[16];       // 15 chars + \0

        // 長字串模式：pointer + capacity
        struct {
            size_t _capacity;
            char*  _ptr;           // → heap
        } _long;
    };
    // sizeof ≈ 8 (size) + 16 (union) = 24~32（因 alignment）
};

text

string 物件固定 32 bytes（64-bit, GCC）
┌────────┬────────────────────────┐
│  size  │        union           │
│  (8B)  │        (16B)           │
│        ├─ 短字串模式 ───────────┤
│        │  char buf[16]          │  ← 字元直接存這裡
│        ├─ 長字串模式 ───────────┤
│        │  capacity(8B) + ptr(8B)│  ← ptr 指向 heap
└────────┴────────────────────────┘
         ↑ 同一塊記憶體，兩種解讀方式

SSO 閾值（常見 64-bit 實作）：
  GCC libstdc++: ~15 bytes
  Clang libc++:  ~22 bytes
  MSVC:          ~15 bytes

SSO 的代價

SSO 不是免費的 - sizeof(std::string) 通常是 32 bytes（比 sizeof(std::vector) 的 24 bytes 大），因為 union 要容納 inline buffer。這是用「更大的物件」換「更少的 heap allocation」。

第六題：Order Book 資料結構設計

最後一題是系統設計。需求是實作一個 Order Book（訂單簿），支援高精度價格和兩種更新模式。

需求

text

Order Book 結構：
price      quantity
130.000    1          ← 最高 bid
119.000    1
118.000    1
...
111.000    99
110.000    100        ← best bid
------------------------------
109.000    99         ← best ask
108.000    94
...
90.000     3          ← 最低 ask

限制條件：
  - price precision: 18 位小數
  - minimum tick size: 已知（如 0.5）
  - 兩種更新模式：
    Case 1: Snapshot - 一次收到完整 N 層報價，全部覆蓋
    Case 2: Delta    - 只收到有變動的 level
      bid: [ { price: 111, qty: 0 }, { price: 111.5, qty: 60 } ]
      ask: [ { price: 90, qty: 0 }, { price: 108, qty: 100 } ]
      （qty = 0 代表移除該 level）

Price Precision 18 是什麼

在加密貨幣交易所（如 Binance、Coinbase），價格和數量通常用 18 位小數表示，這是因為 Ethereum 的原生單位是 wei（1 ETH = 10¹⁸ wei）， ERC-20 token 預設也是 18 decimals。交易所為了精確表示鏈上資產的最小單位， API 回傳的價格就會帶到 18 位小數。

傳統金融（股票、期貨）通常只需要 2-8 位小數，但 crypto 因為鏈上精度的關係， 18 位是常態。這也直接影響了資料結構的選擇。

為什麼不能用 double 當 key

面試中 interviewer 直接給了 std::map<double, double> bid, ask 的暗示，這正好是要你挑戰的：

cpp

// Bad: floating point 比較不精確
std::map<double, double> bid;  // ✗

// 0.1 + 0.2 == 0.3 是 false - 這在 order book 裡是致命的
// 兩個 "相同" 價格可能被當成不同 key

Precision 18 意味著價格需要精確到 18 位小數。浮點數只有 ~15-16 位有效數字。正確做法是轉成整數：

cpp

// Good: 把價格轉成整數 tick
// 如果 min tick = 0.5, precision = 18
// price 111.5 → tick_index = 111.5 / 0.5 = 223
// 或直接用 fixed-point integer: price * 10^18

using Price = int64_t;  // fixed-point, 或 tick index
using Qty = int64_t;

// bid 要快速找最大 → 降序
std::map<Price, Qty, std::greater<Price>> bid;
// ask 要快速找最小 → 升序（預設）
std::map<Price, Qty> ask;

兩種更新模式的實作

cpp

// Case 1: Snapshot - 全量更新
void applySnapshot(
    std::map<Price, Qty, std::greater<Price>>& bid,
    const std::vector<std::pair<Price, Qty>>& levels
) {
    bid.clear();
    for (const auto& [px, qty] : levels) {
        if (qty > 0) bid[px] = qty;
    }
}

// Case 2: Delta - 增量更新
void applyDelta(
    std::map<Price, Qty, std::greater<Price>>& bid,
    const std::vector<std::pair<Price, Qty>>& deltas
) {
    for (const auto& [px, qty] : deltas) {
        if (qty == 0) {
            bid.erase(px);    // qty=0 代表移除
        } else {
            bid[px] = qty;    // 新增或更新
        }
    }
}

Best Bid/Ask 的頻繁更新優化

Order book 最前面幾層（best bid / best ask）是變動最頻繁的，每一筆 market order 或 cancel 都可能改動 top of book。常見的優化方式：

快取 best bid/ask pointer

不每次都從 map 的 begin() 取，而是額外維護一個指向 best level 的 pointer / iterator。更新時只在 best level 被刪除或有新的更好價格時才重新定位，省掉 tree traversal。

Top-of-book 用 array，深層用 map

前幾層（如 top 5）用固定大小的 sorted array，更新頻繁但數量少，cache friendly。深層 level 變動少，用 std::map 或 flat sorted vector 處理。

Price level pooling

頻繁的 insert/erase 會觸發大量 heap allocation。用 memory pool（pre-allocate 一批 node）避免每次 new/delete，減少 allocator contention 跟 latency spike。

其他進階考量

Sequence number / 防亂序

實際系統中 delta 可能亂序到達。需要 sequence number 來偵測 gap，遇到 gap 時 fallback 到 snapshot 重建。

Lock-free 設計

HFT 場景中 order book 更新在 hot path 上。常見做法是 single-writer（market data thread 獨佔更新）， reader 用 seqlock 或 RCU 做 lock-free read。

C++PerformanceInterviewHFTMove SemanticsLambdaOrder Book