To run a practical size of quantum program, computer architects have been making tremendous efforts to realize fault-tolerant quantum computing (FTQC) which constructs a fault-tolerant logical qubit by correcting many erroneous physical qubits during runtime. However, it is extremely challenging to build a full-stack FTQC system consisting of heterogeneous stacks, where each stack incurs unique challenges regarding the fault tolerance and the interplay of the challenges incurs very complicated overall design choice problem. Therefore, computer architects must fully understand stack-specific optimizations and their system-level trade-offs, and resolve all the challenges together. In this article, we first introduce critical design challenges in architecting an FTQC system built on the superconducting technology, and then present our research outcomes to resolve the challenges to build an FTQC system realizing thousands of logical qubits. We also outline near-future directions to resolve the remaining challenges and provide insights toward realizing more scalable future FTQC systems.