Tesla AI Day
Deep Understanding Tesla FSD Part 2: Vector Space
From Theory to Reality, Analyze the Evolution of Tesla Full Self-Driving
This is the second article in my series on Deep Understanding Tesla FSD.
- Deep Understanding Tesla FSD Part 1: HydraNet
- Deep Understanding Tesla FSD Part 2: Vector Space
- Deep Understanding Tesla FSD Part 3: Planning & Control, Auto Labeling, Simulation
- Deep Understanding Tesla FSD Part 4: Labeling, Simulation, etc
In the previous article, we discussed the architecture of Tesla’s neural network — HydraNet. At present, HydraNet can only process the input from a single camera.
When the Tesla AI team worked towards FSD, they quickly found that this is not enough. They need more cameras, and the prediction results of the perception system must be converted to three-dimensional space that is also the foundation of the Plan & Control system. Tesla calls this 3D space “Vector Space”. The information of the vehicle and the space in which it is located, such as the position, speed, lane, signs, signal lights, and surrounding objects of the vehicle, is digitized and then visualized in this space.
They developed a system named Occupancy Tracker using C++. This system stitched up the curb detections from the images, that across camera scenes, camera boundaries, and over time. But this design has two problems:
Problem1: The across-camera fusion and the tracker are very difficult to write explicitly. Tuning the occupancy tracker and all of its hyperparameters was extremely complicated. Tuning C++ programs by hand is a nightmare for every programmer.
Problem2: Image space is not the right output space. You should make predictions in the vector…