As a data hub in the Asia-Pacific region, Hong Kong's data centers host a large number of businesses with extremely high data integrity requirements—from financial transactions to cross-border e-commerce, from enterprise ERP to SaaS platforms. However, Hong Kong's data center space is extremely limited, and the cost of a single high-end enterprise-grade hard drive is generally more than 20% higher than in mainland China. This means that every hard drive failure could potentially escalate into a costly data crisis. In this environment, configuring a suitable RAID array for servers essentially involves establishing a precise calculation model between failure probability, recovery capabilities, and storage costs. Faced with the three most common solutions—RAID 1, RAID 5, and RAID 10—many operations personnel tend to fall into the habitual mindset of "good enough" when selecting a solution, neglecting the performance collapse and secondary failure risks during the fault recovery phase—which are precisely the variables that need to be quantified when backing up data on Hong Kong servers.
Let's start with the most basic mirroring solution, RAID 1. RAID 1's architecture is extremely pure: two hard drives mirror each other, and all data is written simultaneously to two independent physical disks; the disks are essentially exact copies of each other. This design brings two direct consequences. On the positive side, RAID 1 offers extremely high data security—if one hard drive fails, the other retains complete data, and the system can seamlessly switch to the mirror disk without any computation, ensuring negligible business interruption. On the negative side, disk space utilization is only 50%, meaning two 1TB hard drives in RAID 1 only provide 1TB of usable capacity, doubling storage costs. In terms of performance, RAID 1 exhibits significant asymmetry—read operations can retrieve data from both disks in parallel, theoretically achieving nearly twice the throughput of a single disk; however, write operations must write to both disks simultaneously, with write speeds limited by and slightly lower than a single disk, as the controller needs to wait for both writes to complete before confirming the operation. In the context of Hong Kong servers, RAID 1 is a robust but costly safety solution, suitable for core transaction databases, financial system logs, and other scenarios with extremely high data integrity requirements but relatively controllable storage capacity needs.
RAID 5, however, adopts a different design philosophy—trading parity calculations for storage space. RAID 5 employs a striped architecture with distributed parity checking, distributing data blocks and parity information across all disks in the array in a round-robin fashion. Taking a 4-disk RAID 5 configuration as an example, data blocks D1, D2, and D3 are written to Disk1, Disk2, and Disk3 respectively, while parity block P1 is stored on Disk4; the storage location of the next set of data is then rotated sequentially. The core advantage of this design is space efficiency—the usable capacity is (N-1) × the capacity of a single disk, achieving a 75% utilization rate for a 4-disk array, far exceeding the 50% of RAID 1. However, the cost of this space advantage for RAID 5 is primarily reflected in write performance. Each write operation requires four steps: "reading old data → reading old parity bits → calculating new parity bits → writing new data → writing new parity bits." This process is known as the RAID 5 "write penalty," with a penalty value of 4, meaning that a single logical write will generate 4 physical I/O operations on the back-end disks. Real-world testing data confirms the actual impact of this mechanism: in 4K random write scenarios, RAID 5 IOPS is reduced by approximately 45% compared to RAID 0, while latency increases by 60%. For read-heavy, write-light applications—such as archive storage, static file servers, and video media libraries—RAID 5, with its excellent sequential read performance (nearly N times that of a single disk) and high space utilization, is a cost-effective choice. However, for write-intensive loads, the write penalty of RAID 5 becomes a persistent performance bottleneck.
RAID 10 combines the mirroring redundancy of RAID 1 with the striping parallelism of RAID 0. Its standard construction method involves first pairing multiple disks into mirror pairs (RAID 1), and then combining these mirror pairs into a single logical volume (RAID 0) using striping technology. Therefore, a minimum of four disks is required, and the total number must be even. RAID 10 inherits both the high-speed read/write capabilities of RAID 0 and the data redundancy features of RAID 1, demonstrating a significant advantage in performance tests. Taking a benchmark test of four enterprise-grade SAS hard drives as an example, RAID 10 achieved 32,000 4K random read IOPS and 9,800 random write IOPS, while RAID 5 with the same configuration achieved 18,000 and 4,500 respectively. RAID 10's performance lead in small file random write scenarios reached as high as 118%. Under typical database load (70% read/30% write mix), RAID 10's transaction processing capacity (TPS) was more than 2.3 times that of RAID 5. Real-world deployment cases have verified this: in a Hong Kong data center, using a RAID 10 array paired with enterprise-grade NVMe SSDs increased database random write TPS from 4,800 with a single SSD to 12,300, concurrent read/write TPS from 5,100 to 13,900, and P99 latency from 38ms to 9.6ms. The cost of RAID 10 is equally obvious—disk utilization is fixed at 50%, requiring 200TB of raw disk capacity to build 100TB of usable space, an increase of approximately 67% in hardware investment compared to RAID 5 of the same capacity.
Bandwidth and line quality also indirectly affect the effectiveness of RAID selection. During failover, RAID arrays generate a large amount of disk I/O. If backup traffic simultaneously consumes public network bandwidth, online services and backup tasks may compete for network resources. High-quality lines such as CN2 GIA available for Hong Kong servers can effectively reduce latency and packet loss rates in cross-border transmission, providing a more stable transmission channel for backup data during RAID rebuilding. Therefore, when planning backup strategies, RAID selection needs to be considered in conjunction with bandwidth configuration—the longer the rebuild window, the longer the overlap between backup traffic and service traffic, and the more urgent the need for bandwidth isolation.
When discussing RAID selection, one issue most easily overlooked in daily operations and maintenance, yet often the most fatal, arises during the failover phase. While RAID 5 offers good read performance during normal operation, a single hard drive failure triggers a degraded array mode. In this mode, every data read requires real-time reconstruction from the parity information of the remaining disks via an XOR operation. With a failed drive and no hot spare, RAID 5's I/O and CPU performance plummets—because the data is incomplete, the system needs immediate data reconstruction to maintain business operations, but by then, RAID 5 performance has deteriorated to an unbearable level. Even more alarming is the risk of secondary failures during the reconstruction process: in a 12TB high-capacity disk array, a complete RAID 5 reconstruction takes 12 to 15 hours, during which array performance drops by 40% to 60%. During this performance trough lasting over ten hours, the remaining disks are forced to read all data at full capacity; if any disk fails during this process, the entire array's data will be permanently lost. The rebuild logic of RAID 10 is completely different—data from the failed disk is directly copied from the mirror copy to the replacement disk without any verification calculations. The rebuild time is only 1/5 to 1/4 of RAID 5 (only 2 to 3 hours for the same 12TB disk), and performance fluctuations are controlled within 15%. For nodes like Hong Kong servers that handle a large amount of real-time business, the difference in the rebuild window is sometimes more decisive than the performance difference during normal operation.
In the specific scenario of data backup, the applicable boundaries of the three RAID schemes in reality become clearer. If the server's primary load is high-volume business such as financial transactions, online payments, and order processing—services extremely sensitive to write performance and response latency—RAID 10, with its direct write path without parity calculations and excellent random write IOPS, combined with the RAID controller's Battery Backup Unit (BBU) cache protection and WriteBack mode, can compress fsync latency to sub-millisecond levels. Deployment examples show that switching from SATA SSDs with RAID 1 to NVMe SSDs with RAID 10 reduced MySQL fsync latency from 3-5 milliseconds to below 300 microseconds, and TPS increased from 5500 to over 10000. If the business primarily involves read-heavy, write-light scenarios such as static file distribution, log archiving, and video storage, RAID 5, with its over 75% space utilization and excellent support for sequential reads, achieves a good balance between cost and performance. If the budget is extremely limited, the data volume is small, but integrity is critical, RAID 1, requiring only two hard drives for complete mirror protection, is a low-barrier entry-level solution.
In Hong Kong server deployment practice, several common misconceptions require special attention. First, treating RAID as a backup—RAID solves the problem of automatic fault tolerance for hard drive-level hardware failures, but it cannot prevent risks such as accidental deletion, ransomware encryption, application-layer data corruption, or overall physical damage to the server. RAID provides availability, not backup; any production environment should follow the 3-2-1 backup principle (3 copies, 2 types of media, 1 off-site storage). Second, neglecting RAID controller cache protection—if a WriteBack policy is used without a BBU (Browser Back Unit) configuration, a sudden power outage may cause permanent loss of cached data, leading to database or file system corruption. If the server does not support BBU, it is better to use WriteThrough mode, sacrificing some write performance for data consistency and security. Third, failing to configure a hot spare disk when the budget allows—a hot spare disk can automatically initiate a rebuild process after detecting a disk failure, minimizing the time the array is exposed to a degraded state, which is crucial for reducing the probability of secondary failures.
Here are some of the most frequently asked questions about RAID selection:
Q1: What's the difference between RAID 1 and RAID 10? Both have mirroring, so are they similar?
RAID 1 is the most basic mirroring scheme, using only two hard drives. Its read performance is up to twice that of a single drive, but write performance is limited by the speed of a single drive because each write operation must be performed on both disks simultaneously. RAID 10 performs mirroring first and then striping, requiring at least four disks. The core difference is that RAID 10 achieves load balancing across multiple mirror groups through striping. Therefore, whether for sequential or random read/write operations, RAID 10's performance is significantly better than RAID 1, especially in high-concurrency scenarios. You can understand it this way: RAID 1 is a single mirror pair, while RAID 10 is a parallel array composed of multiple mirror pairs.
Q2: The usable capacity of RAID 5 is (N-1) × single disk capacity. Does this mean that more disks result in higher space utilization?
The formula is correct, but more disks also mean higher risk. With 4 disks in RAID 5, 3 disks are usable, a utilization rate of 75%; with 8 disks in RAID 5, 7 disks are usable, and the utilization rate rises to 87.5%. However, the more disks there are, the greater the probability of any disk failing, and RAID 5 only allows for single disk failure. Once degradation occurs, all remaining disks in the array need to participate in a lengthy and high-load rebuilding process. For large-capacity disks (8TB and above), the rebuilding time can exceed 24 hours. During this period, if any remaining disk fails, all data will be lost. Therefore, RAID 5 is not recommended for large-capacity disks (over 4TB). Although RAID 6 has a lower usable capacity (N-2), it allows for the simultaneous failure of two disks, making it a safer choice for large-capacity scenarios.
Q3: RAID 10 only has a disk utilization rate of 50%. Are there any space-saving and safer alternatives?
If the business is primarily read-based with less write-based activity and a limited budget, RAID 5 is still an acceptable option, but it is recommended to strictly control the capacity of each disk (not exceeding 2TB) and shorten the rebuilding time to reduce risk. For write-intensive scenarios, 50% utilization of RAID 10 is actually a necessary premium for performance and security—compared to the business losses caused by a single data loss, this premium is usually worthwhile. Another approach is to use RAID 50 (RAID 5 first, then RAID 0), finding a trade-off between space utilization and performance, but this is more complex to configure and requires at least six disks.
Q4: I have SSDs, can I be less concerned about RAID write penalties?
SSDs do indeed have significantly higher IOPS capabilities than HDDs, which can mitigate the performance loss caused by RAID 5 write penalties to some extent. However, two problems remain: First, the CPU resources consumed by RAID 5's parity calculations are not reduced by SSDs; second, and more importantly, SSDs do not change the structural defects of RAID 5, such as long rebuild windows and high risk of secondary failures. While large-capacity SSDs (such as NVMe disks of 4TB or more) configured for RAID 5 have faster rebuild speeds than HDDs, the larger the disk capacity, the longer the rebuild time, and the risk cannot be ignored. In addition, the extra write operations of RAID 5 accelerate the write amplification effect of SSDs, shortening their lifespan—a problem that is not obvious in enterprise-grade high-endurance SSDs, but is particularly prominent in consumer-grade SSDs.
EN
CN