A phrase like “4D detection” is engineered to sound like a breakthrough, so the useful move is to ask what the fourth dimension is. In NIO's 2021 grant it is velocity. The three spatial dimensions — where an object is in x, y, z — are joined by how fast it is moving, recovered in a single frame rather than by comparing positions across successive frames. That is a real and meaningful capability, and it is worth understanding precisely.
The record: on September 7, 2021, NIO USA, Inc. was granted US11113584B2, “Single frame 4D detection using deep fusion of camera image, imaging RADAR and LiDAR point cloud.” The CPC classes span all three modalities — G01S 13/865 and 13/867 (radar fusion), camera-vision classes G06K 9/00791 and 9/46, and the autonomous-control class G05D 1/0088. Three sensors, deeply fused.
Here is why single-frame matters. The conventional way to get velocity is to track an object across several frames and measure how its position changed — which takes time and assumes you tracked the same object correctly. Imaging radar, by contrast, measures velocity directly via Doppler shift in a single observation. Fusing that with the camera's classification and lidar's precise geometry lets the system know, from one frame, both where something is and how fast it is moving. Less latency, fewer tracking errors.
The spin check: “4D” is accurate here but easy to oversell. It does not mean the car perceives time travel or some exotic extra axis; it means velocity is part of the per-object output. That is genuinely useful — knowing a pedestrian's speed in the first frame you see them is better than guessing it over the next three — but it is an engineering refinement, not magic. The verb “detect” still applies; the system measures, it does not foresee.
Notice also the three-sensor choice. This is the maximalist stack — camera and radar and lidar — the precision-first architecture, opposite to the cost-sensitive camera-plus-radar approach. The sensor list in a fusion patent is always a bet, and NIO's 2021 bet was on redundancy and precision over cheapness. Different companies make this bet differently, and the patent record is where you can read which side each is on.
The caveat stands: a granted method for single-frame 4D fusion is not a deployed, validated self-driving system. Fusing three sensors reduces the chance all are wrong at once but multiplies the arbitration problem when they disagree. “4D detection” is a real capability with a precise meaning — three dimensions plus velocity, in one frame — and reading it correctly is how you tell substance from spin.