Structure-from-Motion (SfM) & Dense MVS
VRGS can rebuild a 3D scene from a set of overlapping photographs. The pipeline runs as a chain of stages — the first always runs, the rest are optional follow-ons you trigger when you want them:
- Sparse SfM — detects features in each photo, matches them across overlapping pairs, and solves for every camera's position/orientation and a sparse coloured point cloud (the matched tie points). This stage always runs.
- Dense MVS (optional) — takes the solved cameras and estimates a depth map per view, then fuses them into a dense coloured point cloud of the surface.
- Create Mesh (optional) — turns the dense cloud into a triangulated surface (Poisson or Greedy Projection), carrying the cloud's per-vertex colour.
- Texture Mesh from Photos (optional) — projects the solved photographs back onto the mesh and bakes a high-detail photographic texture.
Typical inputs are drone/UAV imagery or hand-held photos of an outcrop. Typical outputs are a georeferenced camera set (with image thumbnails in the 3D view), a sparse point cloud, and—if enabled—a dense point cloud, a surface mesh, and a photo-textured mesh you can interpret.
New an SfM workflow → Add Photos… → set parameters → Run Reconstruction → (optional) Create Mesh → (optional) Texture Mesh from Photos.
Running a reconstruction
All actions live on the SFM branch of the Data Tree.
| Step | Action | Where |
|---|---|---|
| 1. Create a workflow | Right-click the SFM Group → New | Creates Model 1, Model 2, … |
| 2. Add images | Right-click the workflow → Add Photos… | Browse to your JPEGs |
| 3. Configure | Select the workflow; edit the Parameters in the Properties panel | See Parameters |
| 4. Reconstruct | Right-click the workflow → Run Reconstruction | Runs on a background thread |
| 5. Review images | Right-click the workflow → Open Photo Browser | Per-photo keypoint/track stats |
| 6. Mesh (optional) | Right-click the workflow → Create Mesh (Poisson)… or Create Mesh (Greedy Projection)… | Needs a dense cloud; see Building a mesh |
| 7. Texture (optional) | Right-click the workflow → Texture Mesh from Photos… | Needs a mesh; see Texturing the mesh |
The reconstruction runs in the background; progress and a final summary appear in the log (and the workflow's Status property). When it finishes, the results are added to the project automatically:
<name> sparse— the sparse tie-point cloud (e.g.Model 1 sparse).<name> dense— the dense MVS cloud, only if Dense MVS was enabled.<name> mesh— the surface mesh, once you run Create Mesh (it later carries the baked texture too).- Camera poses — each photo is placed in the 3D view with its image thumbnail.
The mesh and texturing stages also run on background threads; the mesh links back to its workflow, so Texture Mesh from Photos always finds the mesh that workflow created.
With Use GPS prior on and valid EXIF GPS, the model is aligned to a local metric East-North-Up frame (real-world scale, correct orientation). Without GPS the model is reconstructed up to an arbitrary scale and orientation fixed by the first camera pair.
Parameter reference
Parameters are grouped below the way they appear in the Properties panel. Every value has a sensible default—you can run a first reconstruction without changing anything and tune afterwards.
Feature extraction & matching
These control how features are found in each image and matched across images. They have the biggest effect on how many photos register and how complete the sparse cloud is.
Detector
- What it is: the feature detector/descriptor algorithm.
- Options / default:
SIFT(default),ORB,AKAZE. - What it does:
SIFTgives the most robust, repeatable matches on natural rock texture and is the best default for quality.ORBis a fast binary detector—much quicker and lighter on memory, but produces fewer reliable matches.AKAZEsits in between. - Example: keep
SIFTfor final outcrop reconstructions. Switch toORBfor a fast sanity check on a very large image set, or when you only need a rough camera layout quickly.
Max image dimension (px)
- Default:
3200.0disables downsampling (full resolution). - What it does: images are downsampled so their longest side is at most this many pixels before features are extracted. Smaller is faster and uses less memory but finds fewer fine features; larger captures more detail but is slower.
- Example:
1600for quick tests;3200for normal work;0(or a large value like6000) to squeeze maximum detail from a small, high-resolution set.
Max features per image
- Default:
8000. - What it does: the upper bound on keypoints kept per image. More features give more matches and a denser sparse cloud at the cost of speed and memory.
- Example: raise to
12000–20000on low-texture scenes (smooth, uniform rock) where pairs are hard to verify; drop to4000for speed.
Lowe ratio test
- Default:
0.8(typical range0.7–0.8). - What it does: a match is accepted only if the best descriptor match is clearly better than the second-best by this ratio. Lower is stricter (fewer but cleaner matches); higher keeps more matches but admits more noise.
- Example: lower to
0.75on repetitive texture (e.g. bedded sandstone, brickwork) to suppress ambiguous matches.
Min geometric inliers
- Default:
20. - What it does: the minimum number of geometrically-verified inlier matches for an image pair to be trusted. It is also the floor used when choosing the initial pair and registering later views. Higher = stricter (cleaner pose graph, but a thin-overlap dataset may fragment); lower admits weaker pairs.
- Example: raise to
30–40on clean, high-overlap datasets for robustness; only lower below20(cautiously) when overlap is genuinely sparse.
Cameras & calibration
Share intrinsics per camera
- Default: on.
- What it does: groups photos by camera make/model/focal length so they share one set of intrinsics (focal length, principal point, distortion). Fewer unknowns means a more stable, faster bundle adjustment.
- Turn off when: every image may have different intrinsics—mixed cameras, or a zoom lens used at varying focal lengths. Per-image intrinsics need more overlap to solve reliably.
- Example: leave on for a single drone camera at a fixed focal length; consider off for an ad-hoc mix of phone and drone photos.
GPS & georeferencing
These only matter when your photos carry EXIF GPS (most drone imagery does).
Use GPS prior
- Default: on.
- What it does: uses EXIF GPS to (1) optionally pre-filter image pairs that are too far apart to overlap, and (2) after a purely visual reconstruction, align the whole model to a metric East-North-Up frame and refine it with GPS as a soft constraint. The result has real-world scale and georeferencing.
- Turn off when: photos have no/poor GPS, or you deliberately want a scale-free visual reconstruction.
- Example: on for drone surveys; off for ground-based photos with no GPS.
GPS sigma XY (m)
- Default:
5.0. - What it does: the assumed horizontal uncertainty of the GPS positions— how much to trust GPS versus the visual geometry. Smaller pulls cameras harder onto their GPS coordinates; larger lets the photo geometry dominate.
- Example:
1–2m for RTK/PPK drones;5–10m for consumer/phone EXIF.
GPS sigma Z (m)
- Default:
10.0. - What it does: the assumed vertical uncertainty. GPS altitude is usually much less accurate than horizontal position, so this is normally larger than GPS sigma XY.
- Example:
10–20m for consumer EXIF altitude; smaller for survey-grade vertical control.
GPS pair max distance (m)
- Default:
0(disabled). - What it does: when greater than
0and both photos in a pair have GPS, matching is skipped if the cameras are farther apart than this. On large flights most pairs cannot overlap, so this is a big speed-up. - Example: set to roughly 2–3× your photo spacing/footprint (e.g.
30–50m) on a big survey; leave0for small sets where every pair is worth trying.
Bundle adjustment
BA max iterations (global)
- Default:
100. - What it does: the maximum optimiser iterations per global bundle-adjustment pass. The solver usually converges well before the cap; more iterations can refine a difficult solve at the cost of time.
- Example: leave at
100; drop to30–50for fast previews.
Dense MVS
Dense MVS runs after sparse SfM and produces the dense surface cloud. It is off by default.
Dense MVS (learned)
- Default: off.
- What it does: the master on/off switch for the dense stage. When on, a dense
coloured point cloud (
<name> dense) is produced in addition to the sparse one. - Example: turn on whenever you want a dense surface to mesh or interpret.
Dense MVS backend
- Options / default:
PLANE_SWEEP(default),CASMVSNET. - What it does: chooses the depth estimator.
PLANE_SWEEPis the built-in CPU plane-sweep (normalised cross-correlation) — it needs no model file and runs anywhere.CASMVSNETis a learned neural-network estimator that requires acasmvsnet.onnxmodel in the projectMODELSfolder and a CUDA-capable GPU. - Example: use
PLANE_SWEEPfor almost everything. ChooseCASMVSNETonly if you have the model file and a GPU and want to compare learned depth.
Dense MVS neighbor views
- Default:
5. - What it does: how many neighbouring source views are used to estimate depth for each reference view. More views give more robust depth (slower); fewer are faster but noisier.
- Example:
5is a good default; try7for wide-baseline captures,3for speed. (TheCASMVSNETbackend uses the view count baked into its model.)
Dense MVS confidence cutoff
- Default:
0.75(used by thePLANE_SWEEPbackend). - What it does: per-pixel confidence threshold on the plane-sweep result,
mapped as
(NCC + 1) / 2, so0.75corresponds toNCC >= 0.5(a well-matched, textured pixel). Pixels below the cutoff are discarded before fusion. Raise for sparser/cleaner output; lower for denser/noisier. - Example:
0.75default;0.6(NCC >= 0.2) to densify a sparse cloud once depth is trustworthy;0.85for a cleaner cloud.
Dense MVS CasMVSNet confidence cutoff
- Default:
0.3(used only when backend isCASMVSNET). - What it does: the equivalent cutoff for the learned backend. Its confidence
is a network probability on a different scale (it peaks much lower than the
plane-sweep value), so this default is far below
0.75. Applying the plane-sweep cutoff to it would reject almost every point. - Example:
0.3default; raise to0.4–0.5for a cleaner learned cloud.
Dense MVS consistent views
- Default:
2. - What it does: a fused point is kept only if at least this many other
reference views agree on its depth (so
2means a 3-view consensus including the source view). Higher = cleaner/sparser; lower = denser/noisier. - Example:
2default;1for maximum density (noisier);3for a clean cloud on high-overlap data.
Dense MVS depth tolerance
- Default:
0.025(= 2.5%). - What it does: the relative depth-agreement tolerance used by the cross-view consistency check above. Loosen it to keep more points; tighten it for a cleaner cloud.
- Example: raise to
0.04if the cloud is too sparse once per-view depth is good; drop to0.015for a tighter, cleaner surface.
Dense MVS sky filter
- Default: off.
- What it does: removes the fringe of sky-coloured points that can appear along outcrop silhouettes (where the estimator is forced to assign a depth to sky-adjacent pixels). A point is dropped only if it is both sky-coloured (saturated blue, or bright near-white) and weakly matched—so confidently reconstructed blue/white rock or water is kept.
- Example: turn on for outcrop or landscape scenes that show a blue/white
halo around the model. When on, the fusion log reports a
sky-cutcount so you can see how many points it removed.
Diagnostics & output
Write undistorted previews
- Default: on.
- What it does: saves an undistorted JPEG per registered view to
<workspace>/undistorted/. Useful for quality control and for downstream tools that expect undistorted images. - Example: turn off to save disk space and a little time once you no longer need the previews.
Verbose logging
- Default: off.
- What it does: emits extra per-photo, per-pair, and per-cache diagnostic messages to the log.
- Example: turn on while tuning parameters or troubleshooting a run; turn off for clean logs.
Building a mesh
Once you have a dense cloud (<name> dense), turn it into a triangulated
surface. Both methods run on a background thread, copy the cloud's per-vertex
colour onto the mesh, and add <name> mesh to the project. Right-click the
workflow and pick one:
Create Mesh works on the dense MVS cloud, so enable Dense MVS (learned) and run a reconstruction first. If the dense cloud is missing or still loading, the action tells you so.
Create Mesh (Poisson)
Poisson reconstruction fits a single watertight surface through the points — it fills small gaps and gives clean, closed geometry. Best default for outcrops.
Octree depth
- Default:
8(8–10 typical). - What it does: the resolution of the reconstruction octree. Higher = finer detail but slower and more memory; lower is coarser and faster.
- Example:
8for a first pass;9–10for a detailed outcrop where the dense cloud is large and clean.
Trim factor (× point spacing)
- Default:
6.0keeps the full watertight surface. - What it does: Poisson extrapolates a "bubble" beyond the real data; this
trims away triangles whose vertices sit farther than factor × the local point
spacing from any input point. Lower trims more aggressively (tighter to the
data, but can punch holes); higher keeps more surface;
0disables trimming. - Example:
6removes the balloon-like overhang around the edges while keeping the surface intact. Lower to3–4if the bubble is still obvious; set0if you specifically want a closed watertight mesh.
Create Mesh (Greedy Projection)
Greedy projection triangulates the points directly — it follows the cloud closely and is fast, but it does not fill gaps (sparse areas stay holed). Use it when you want the raw measured surface rather than an interpolated one.
| Field | Default | What it does |
|---|---|---|
| Search radius | 0.5 | Maximum edge length / connection distance, in model units. The single most important value — set it to a few times the point spacing. |
| Max edge multiplier (μ) | 2.5 | Caps an edge at μ × the local point distance, so dense areas use short edges and sparse areas longer ones. |
| Max nearest neighbours | 100 | How many neighbours each point may connect to. Higher closes more triangles (slower). |
| Max surface angle | 45° | Don't connect points across a normal change larger than this — preserves sharp edges. |
| Min triangle angle | 10° | Lower bound on triangle angles (avoids slivers). |
| Max triangle angle | 120° | Upper bound on triangle angles. |
Use Poisson for a clean, closed surface you'll texture or interpret. Use Greedy for a fast, faithful triangulation of exactly the measured points when you don't want gaps filled in.
Texturing the mesh
Texture Mesh from Photos… projects the solved photographs back onto the mesh and bakes a photographic texture — far more detail than the per-vertex colour the mesh inherits from the cloud. It needs a mesh (run Create Mesh first) and the workflow's photos. The bake runs on a background thread and the texture is saved with the project.
When you start it you choose how the texture resolution is set, then a couple of values.
Resolution mode
The first prompt picks the mode:
- Target pixel size (answer Yes) — you specify the real-world size each texel should cover (the ground sample distance). VRGS packs the atlas at exactly that density: a target of 0.005 m means each texel is 5 mm on the outcrop (200 texels per metre). This is the most intuitive control — ask for the detail you need and let the page count follow.
- Page budget (answer No) — you instead fix the number of texture pages and VRGS spreads the available texels evenly over the surface. Use this when you want a hard limit on texture memory rather than a guaranteed detail.
Page size
- Default:
4096texels per side (rounded up to a power of two). - What it does: the dimensions of each atlas page. Larger pages hold more
detail per page (so fewer pages) but use more memory: a page is
size × size × 4bytes (a 4096 page ≈ 64 MB). - Example:
4096is a good default;8192for a single very detailed mesh;2048to keep memory down.
Target pixel size (pixel-size mode)
- Default:
0.01m/texel. - What it does: the real-world size of one texel. Smaller = sharper texture but more pages and more memory. There is a safety cap of 8 pages — if your pixel size would need more, VRGS coarsens the density automatically and notes it in the log rather than exhausting memory.
- Example:
0.005for close-range outcrop detail;0.02for a quick, light texture of a large area.
Number of pages (page-budget mode)
- Default:
2(range 1–8). - What it does: how many pages to fill. More pages = finer texture over the same surface.
- Example:
1for a light single-page texture;4for more detail at a fixed memory budget.
How a triangle is textured
For each triangle VRGS picks the single best photograph that can see it — most head-on and highest-resolution wins, with back-facing and very oblique views rejected and occlusion checked so a photo can't texture a surface hidden behind the mesh. Lens distortion is corrected when sampling. Triangles no photo can see fall back to the mesh's per-vertex colour, so there are no black holes.
Each triangle is textured from one source (no multi-photo blending yet), so faint seams can appear where neighbouring chart regions drew from different photos or exposures. Patches that show flat colour instead of photographic detail are triangles no photo could see (occluded or outside every image).
Reading the run log
Two summary lines tell you most of what you need to tune a run.
Sparse summary
SFM done: 20/21 views, 22196 points (reproj px: med=0.34, p90=0.81, p99=2.10, rms=0.55)
20/21 views— registered vs. total photos. If many photos fail to register, the cause is usually too little overlap, too few features, or too strict matching thresholds.22196 points— size of the sparse cloud.reproj px— per-point reprojection error percentiles. A sub-pixel median is healthy. A small median with a largep99means a few outliers; a large median everywhere means a weak solve.
Dense fusion funnel
Fusion: 5836800 -> 4443855 depth>0 -> 3841509 conf>=0.75 -> 2241200 consistency>=2 -> 706202 voxels -> 698061 after SOR
Each arrow shows how many pixels/points survive a stage, so you can see where points are lost and which knob to turn:
- big drop at
conf>=→ lower Dense MVS confidence cutoff; - big drop at
consistency>=→ loosen Dense MVS depth tolerance or lower Dense MVS consistent views; - with the sky filter on, an extra
sky-cutterm shows how many sky points were removed.
Run once with defaults, read the funnel, then change the one stage that is cutting too much (or too little). Re-run and compare.
Worked examples
Drone survey with GPS (nadir)
SIFT, Max image dimension 3200, Share intrinsics per camera on,
Use GPS prior on, GPS sigma XY 2–5, GPS pair max distance set to a
few times the photo spacing. Enable Dense MVS with the PLANE_SWEEP backend.
Convergent outcrop (oblique, high overlap)
SIFT, Max features 8000+, Use GPS prior on if the photos are tagged. Enable
Dense MVS; turn on the Sky filter if the outcrop is shot against the sky.
Densest possible dense cloud
Dense MVS confidence cutoff 0.6, Dense MVS consistent views 1, Dense MVS
depth tolerance 0.04. (Expect more noise—clean up afterwards with point-cloud
filters.)
Cleanest dense cloud
Dense MVS confidence cutoff 0.85, Dense MVS consistent views 3, Dense MVS
depth tolerance 0.015.
Fast preview
ORB, Max image dimension 1600, Max features 4000, BA max iterations
40, Dense MVS off (or on with neighbor views 3).
Textured outcrop mesh
Run the reconstruction with Dense MVS on, then Create Mesh (Poisson)
with Octree depth 9, Trim factor 6. Finally Texture Mesh from Photos
in Target pixel size mode at 0.005 m with Page size 4096 for a sharp,
photo-detailed surface.
Troubleshooting
| Symptom | Likely cause / fix |
|---|---|
Few photos register (5/20 views) | Too little overlap, too few features, or matching too strict. Raise Max features and/or Max image dimension; lower Lowe ratio test slightly; only lower Min geometric inliers as a last resort. Check the photos aren't blurry. |
| Dense cloud nearly empty / very sparse | Read the fusion funnel and relax the stage that cuts most: lower confidence cutoff, loosen depth tolerance, or lower consistent views. |
| Dense cloud is noisy | Tighten the same three: raise confidence cutoff, raise consistent views, tighten depth tolerance. |
| Blue/white halo around the outcrop | Turn on the Dense MVS sky filter. |
| GPS-tagged run looks worse than expected | Check the EXIF GPS is valid; raise GPS sigma XY/Z (trust GPS less), or turn Use GPS prior off to reconstruct in a purely visual frame. |
CASMVSNET backend does nothing | It needs casmvsnet.onnx in the project MODELS folder and a CUDA GPU; otherwise use PLANE_SWEEP. |
| "No mesh to texture" | Run Create Mesh on the workflow first — texturing needs a <name> mesh. |
| Texture looks coarse / blurry | Use a smaller Target pixel size or more pages, and/or a larger Page size. Check the log for a "density coarsened" note — you may have hit the 8-page safety cap. |
| Mesh has holes | Poisson Trim factor too aggressive — raise it or set 0 for a watertight surface; or you used Greedy, which leaves gaps in sparse areas (switch to Poisson). |
| Patches show flat colour, not photo detail | Those triangles were seen by no photograph (occluded or outside every image), so they fall back to per-vertex colour. Add photos covering that area or re-shoot it. |
| Faint seams across the texture | Expected in v1 — each triangle uses one source photo, so exposure/source changes can show at chart boundaries. |