About This Architecture
FlowMap Gradient Descent Pipeline combines optical flow estimation and monocular depth estimation through a unified neural architecture with frozen depth modules and learnable flow networks. Images I0–I3 feed into a frozen MDE (Monocular Depth Estimator) producing depths D0–D3, while a Flow NN generates optical flows F01, F12, F23 that correlate with dense correspondence maps. Dense correlations C0–C3 compute per-pixel flow and depth gradients (S^i, f_x^i, f_y^i) that feed into an iterative gradient descent optimization loop with stop-gradient barriers preventing backprop through frozen depth estimates. This architecture demonstrates selective gradient flow—optimizing only learnable parameters while preserving pretrained depth knowledge—a critical pattern for multi-task vision pipelines. Fork and customize this diagram on Diagrams.so to adapt the optimization loop, add loss functions, or integrate alternative depth or flow backbones. The stop-gradient markers (❄) and purple gradient paths clarify which components participate in training, essential for practitioners debugging convergence or memory issues in large-scale video understanding systems.