Skip to main content

Structure-from-Motion (SfM) & Dense MVS

VRGS can rebuild a 3D scene from a set of overlapping photographs. The pipeline runs as a chain of stages — the first always runs, the rest are optional follow-ons you trigger when you want them:

  1. Sparse SfM — detects features in each photo, matches them across overlapping pairs, and solves for every camera's position/orientation and a sparse coloured point cloud (the matched tie points). This stage always runs.
  2. Dense MVS (optional) — takes the solved cameras and estimates a depth map per view, then fuses them into a dense coloured point cloud of the surface.
  3. Create Mesh (optional) — turns the dense cloud into a triangulated surface (Poisson or Greedy Projection), carrying the cloud's per-vertex colour.
  4. Texture Mesh from Photos (optional) — projects the solved photographs back onto the mesh and bakes a high-detail photographic texture.

Typical inputs are drone/UAV imagery or hand-held photos of an outcrop. Typical outputs are a georeferenced camera set (with image thumbnails in the 3D view), a sparse point cloud, and—if enabled—a dense point cloud, a surface mesh, and a photo-textured mesh you can interpret.

Workflow in one line

New an SfM workflow → Add Photos… → set parameters → Run Reconstruction → (optional) Create Mesh → (optional) Texture Mesh from Photos.


Running a reconstruction

All actions live on the SFM branch of the Data Tree.

StepActionWhere
1. Create a workflowRight-click the SFM GroupNewCreates Model 1, Model 2, …
2. Add imagesRight-click the workflow → Add Photos…Browse to your JPEGs
3. ConfigureSelect the workflow; edit the Parameters in the Properties panelSee Parameters
4. ReconstructRight-click the workflow → Run ReconstructionRuns on a background thread
5. Review imagesRight-click the workflow → Open Photo BrowserPer-photo keypoint/track stats
6. Mesh (optional)Right-click the workflow → Create Mesh (Poisson)… or Create Mesh (Greedy Projection)…Needs a dense cloud; see Building a mesh
7. Texture (optional)Right-click the workflow → Texture Mesh from Photos…Needs a mesh; see Texturing the mesh

The reconstruction runs in the background; progress and a final summary appear in the log (and the workflow's Status property). When it finishes, the results are added to the project automatically:

  • <name> sparse — the sparse tie-point cloud (e.g. Model 1 sparse).
  • <name> dense — the dense MVS cloud, only if Dense MVS was enabled.
  • <name> mesh — the surface mesh, once you run Create Mesh (it later carries the baked texture too).
  • Camera poses — each photo is placed in the 3D view with its image thumbnail.

The mesh and texturing stages also run on background threads; the mesh links back to its workflow, so Texture Mesh from Photos always finds the mesh that workflow created.

Coordinate frame

With Use GPS prior on and valid EXIF GPS, the model is aligned to a local metric East-North-Up frame (real-world scale, correct orientation). Without GPS the model is reconstructed up to an arbitrary scale and orientation fixed by the first camera pair.


Parameter reference

Parameters are grouped below the way they appear in the Properties panel. Every value has a sensible default—you can run a first reconstruction without changing anything and tune afterwards.

Feature extraction & matching

These control how features are found in each image and matched across images. They have the biggest effect on how many photos register and how complete the sparse cloud is.

Detector

  • What it is: the feature detector/descriptor algorithm.
  • Options / default: SIFT (default), ORB, AKAZE.
  • What it does: SIFT gives the most robust, repeatable matches on natural rock texture and is the best default for quality. ORB is a fast binary detector—much quicker and lighter on memory, but produces fewer reliable matches. AKAZE sits in between.
  • Example: keep SIFT for final outcrop reconstructions. Switch to ORB for a fast sanity check on a very large image set, or when you only need a rough camera layout quickly.

Max image dimension (px)

  • Default: 3200. 0 disables downsampling (full resolution).
  • What it does: images are downsampled so their longest side is at most this many pixels before features are extracted. Smaller is faster and uses less memory but finds fewer fine features; larger captures more detail but is slower.
  • Example: 1600 for quick tests; 3200 for normal work; 0 (or a large value like 6000) to squeeze maximum detail from a small, high-resolution set.

Max features per image

  • Default: 8000.
  • What it does: the upper bound on keypoints kept per image. More features give more matches and a denser sparse cloud at the cost of speed and memory.
  • Example: raise to 1200020000 on low-texture scenes (smooth, uniform rock) where pairs are hard to verify; drop to 4000 for speed.

Lowe ratio test

  • Default: 0.8 (typical range 0.70.8).
  • What it does: a match is accepted only if the best descriptor match is clearly better than the second-best by this ratio. Lower is stricter (fewer but cleaner matches); higher keeps more matches but admits more noise.
  • Example: lower to 0.75 on repetitive texture (e.g. bedded sandstone, brickwork) to suppress ambiguous matches.

Min geometric inliers

  • Default: 20.
  • What it does: the minimum number of geometrically-verified inlier matches for an image pair to be trusted. It is also the floor used when choosing the initial pair and registering later views. Higher = stricter (cleaner pose graph, but a thin-overlap dataset may fragment); lower admits weaker pairs.
  • Example: raise to 3040 on clean, high-overlap datasets for robustness; only lower below 20 (cautiously) when overlap is genuinely sparse.

Cameras & calibration

Share intrinsics per camera

  • Default: on.
  • What it does: groups photos by camera make/model/focal length so they share one set of intrinsics (focal length, principal point, distortion). Fewer unknowns means a more stable, faster bundle adjustment.
  • Turn off when: every image may have different intrinsics—mixed cameras, or a zoom lens used at varying focal lengths. Per-image intrinsics need more overlap to solve reliably.
  • Example: leave on for a single drone camera at a fixed focal length; consider off for an ad-hoc mix of phone and drone photos.

GPS & georeferencing

These only matter when your photos carry EXIF GPS (most drone imagery does).

Use GPS prior

  • Default: on.
  • What it does: uses EXIF GPS to (1) optionally pre-filter image pairs that are too far apart to overlap, and (2) after a purely visual reconstruction, align the whole model to a metric East-North-Up frame and refine it with GPS as a soft constraint. The result has real-world scale and georeferencing.
  • Turn off when: photos have no/poor GPS, or you deliberately want a scale-free visual reconstruction.
  • Example: on for drone surveys; off for ground-based photos with no GPS.

GPS sigma XY (m)

  • Default: 5.0.
  • What it does: the assumed horizontal uncertainty of the GPS positions— how much to trust GPS versus the visual geometry. Smaller pulls cameras harder onto their GPS coordinates; larger lets the photo geometry dominate.
  • Example: 12 m for RTK/PPK drones; 510 m for consumer/phone EXIF.

GPS sigma Z (m)

  • Default: 10.0.
  • What it does: the assumed vertical uncertainty. GPS altitude is usually much less accurate than horizontal position, so this is normally larger than GPS sigma XY.
  • Example: 1020 m for consumer EXIF altitude; smaller for survey-grade vertical control.

GPS pair max distance (m)

  • Default: 0 (disabled).
  • What it does: when greater than 0 and both photos in a pair have GPS, matching is skipped if the cameras are farther apart than this. On large flights most pairs cannot overlap, so this is a big speed-up.
  • Example: set to roughly 2–3× your photo spacing/footprint (e.g. 3050 m) on a big survey; leave 0 for small sets where every pair is worth trying.

Bundle adjustment

BA max iterations (global)

  • Default: 100.
  • What it does: the maximum optimiser iterations per global bundle-adjustment pass. The solver usually converges well before the cap; more iterations can refine a difficult solve at the cost of time.
  • Example: leave at 100; drop to 3050 for fast previews.

Dense MVS

Dense MVS runs after sparse SfM and produces the dense surface cloud. It is off by default.

Dense MVS (learned)

  • Default: off.
  • What it does: the master on/off switch for the dense stage. When on, a dense coloured point cloud (<name> dense) is produced in addition to the sparse one.
  • Example: turn on whenever you want a dense surface to mesh or interpret.

Dense MVS backend

  • Options / default: PLANE_SWEEP (default), CASMVSNET.
  • What it does: chooses the depth estimator. PLANE_SWEEP is the built-in CPU plane-sweep (normalised cross-correlation) — it needs no model file and runs anywhere. CASMVSNET is a learned neural-network estimator that requires a casmvsnet.onnx model in the project MODELS folder and a CUDA-capable GPU.
  • Example: use PLANE_SWEEP for almost everything. Choose CASMVSNET only if you have the model file and a GPU and want to compare learned depth.

Dense MVS neighbor views

  • Default: 5.
  • What it does: how many neighbouring source views are used to estimate depth for each reference view. More views give more robust depth (slower); fewer are faster but noisier.
  • Example: 5 is a good default; try 7 for wide-baseline captures, 3 for speed. (The CASMVSNET backend uses the view count baked into its model.)

Dense MVS confidence cutoff

  • Default: 0.75 (used by the PLANE_SWEEP backend).
  • What it does: per-pixel confidence threshold on the plane-sweep result, mapped as (NCC + 1) / 2, so 0.75 corresponds to NCC >= 0.5 (a well-matched, textured pixel). Pixels below the cutoff are discarded before fusion. Raise for sparser/cleaner output; lower for denser/noisier.
  • Example: 0.75 default; 0.6 (NCC >= 0.2) to densify a sparse cloud once depth is trustworthy; 0.85 for a cleaner cloud.

Dense MVS CasMVSNet confidence cutoff

  • Default: 0.3 (used only when backend is CASMVSNET).
  • What it does: the equivalent cutoff for the learned backend. Its confidence is a network probability on a different scale (it peaks much lower than the plane-sweep value), so this default is far below 0.75. Applying the plane-sweep cutoff to it would reject almost every point.
  • Example: 0.3 default; raise to 0.40.5 for a cleaner learned cloud.

Dense MVS consistent views

  • Default: 2.
  • What it does: a fused point is kept only if at least this many other reference views agree on its depth (so 2 means a 3-view consensus including the source view). Higher = cleaner/sparser; lower = denser/noisier.
  • Example: 2 default; 1 for maximum density (noisier); 3 for a clean cloud on high-overlap data.

Dense MVS depth tolerance

  • Default: 0.025 (= 2.5%).
  • What it does: the relative depth-agreement tolerance used by the cross-view consistency check above. Loosen it to keep more points; tighten it for a cleaner cloud.
  • Example: raise to 0.04 if the cloud is too sparse once per-view depth is good; drop to 0.015 for a tighter, cleaner surface.

Dense MVS sky filter

  • Default: off.
  • What it does: removes the fringe of sky-coloured points that can appear along outcrop silhouettes (where the estimator is forced to assign a depth to sky-adjacent pixels). A point is dropped only if it is both sky-coloured (saturated blue, or bright near-white) and weakly matched—so confidently reconstructed blue/white rock or water is kept.
  • Example: turn on for outcrop or landscape scenes that show a blue/white halo around the model. When on, the fusion log reports a sky-cut count so you can see how many points it removed.

Diagnostics & output

Write undistorted previews

  • Default: on.
  • What it does: saves an undistorted JPEG per registered view to <workspace>/undistorted/. Useful for quality control and for downstream tools that expect undistorted images.
  • Example: turn off to save disk space and a little time once you no longer need the previews.

Verbose logging

  • Default: off.
  • What it does: emits extra per-photo, per-pair, and per-cache diagnostic messages to the log.
  • Example: turn on while tuning parameters or troubleshooting a run; turn off for clean logs.

Building a mesh

Once you have a dense cloud (<name> dense), turn it into a triangulated surface. Both methods run on a background thread, copy the cloud's per-vertex colour onto the mesh, and add <name> mesh to the project. Right-click the workflow and pick one:

Needs a dense cloud

Create Mesh works on the dense MVS cloud, so enable Dense MVS (learned) and run a reconstruction first. If the dense cloud is missing or still loading, the action tells you so.

Create Mesh (Poisson)

Poisson reconstruction fits a single watertight surface through the points — it fills small gaps and gives clean, closed geometry. Best default for outcrops.

Octree depth

  • Default: 8 (8–10 typical).
  • What it does: the resolution of the reconstruction octree. Higher = finer detail but slower and more memory; lower is coarser and faster.
  • Example: 8 for a first pass; 910 for a detailed outcrop where the dense cloud is large and clean.

Trim factor (× point spacing)

  • Default: 6. 0 keeps the full watertight surface.
  • What it does: Poisson extrapolates a "bubble" beyond the real data; this trims away triangles whose vertices sit farther than factor × the local point spacing from any input point. Lower trims more aggressively (tighter to the data, but can punch holes); higher keeps more surface; 0 disables trimming.
  • Example: 6 removes the balloon-like overhang around the edges while keeping the surface intact. Lower to 34 if the bubble is still obvious; set 0 if you specifically want a closed watertight mesh.

Create Mesh (Greedy Projection)

Greedy projection triangulates the points directly — it follows the cloud closely and is fast, but it does not fill gaps (sparse areas stay holed). Use it when you want the raw measured surface rather than an interpolated one.

FieldDefaultWhat it does
Search radius0.5Maximum edge length / connection distance, in model units. The single most important value — set it to a few times the point spacing.
Max edge multiplier (μ)2.5Caps an edge at μ × the local point distance, so dense areas use short edges and sparse areas longer ones.
Max nearest neighbours100How many neighbours each point may connect to. Higher closes more triangles (slower).
Max surface angle45°Don't connect points across a normal change larger than this — preserves sharp edges.
Min triangle angle10°Lower bound on triangle angles (avoids slivers).
Max triangle angle120°Upper bound on triangle angles.
Poisson vs Greedy

Use Poisson for a clean, closed surface you'll texture or interpret. Use Greedy for a fast, faithful triangulation of exactly the measured points when you don't want gaps filled in.


Texturing the mesh

Texture Mesh from Photos… projects the solved photographs back onto the mesh and bakes a photographic texture — far more detail than the per-vertex colour the mesh inherits from the cloud. It needs a mesh (run Create Mesh first) and the workflow's photos. The bake runs on a background thread and the texture is saved with the project.

When you start it you choose how the texture resolution is set, then a couple of values.

Resolution mode

The first prompt picks the mode:

  • Target pixel size (answer Yes) — you specify the real-world size each texel should cover (the ground sample distance). VRGS packs the atlas at exactly that density: a target of 0.005 m means each texel is 5 mm on the outcrop (200 texels per metre). This is the most intuitive control — ask for the detail you need and let the page count follow.
  • Page budget (answer No) — you instead fix the number of texture pages and VRGS spreads the available texels evenly over the surface. Use this when you want a hard limit on texture memory rather than a guaranteed detail.

Page size

  • Default: 4096 texels per side (rounded up to a power of two).
  • What it does: the dimensions of each atlas page. Larger pages hold more detail per page (so fewer pages) but use more memory: a page is size × size × 4 bytes (a 4096 page ≈ 64 MB).
  • Example: 4096 is a good default; 8192 for a single very detailed mesh; 2048 to keep memory down.

Target pixel size (pixel-size mode)

  • Default: 0.01 m/texel.
  • What it does: the real-world size of one texel. Smaller = sharper texture but more pages and more memory. There is a safety cap of 8 pages — if your pixel size would need more, VRGS coarsens the density automatically and notes it in the log rather than exhausting memory.
  • Example: 0.005 for close-range outcrop detail; 0.02 for a quick, light texture of a large area.

Number of pages (page-budget mode)

  • Default: 2 (range 1–8).
  • What it does: how many pages to fill. More pages = finer texture over the same surface.
  • Example: 1 for a light single-page texture; 4 for more detail at a fixed memory budget.

How a triangle is textured

For each triangle VRGS picks the single best photograph that can see it — most head-on and highest-resolution wins, with back-facing and very oblique views rejected and occlusion checked so a photo can't texture a surface hidden behind the mesh. Lens distortion is corrected when sampling. Triangles no photo can see fall back to the mesh's per-vertex colour, so there are no black holes.

v1 behaviour

Each triangle is textured from one source (no multi-photo blending yet), so faint seams can appear where neighbouring chart regions drew from different photos or exposures. Patches that show flat colour instead of photographic detail are triangles no photo could see (occluded or outside every image).


Reading the run log

Two summary lines tell you most of what you need to tune a run.

Sparse summary

SFM done: 20/21 views, 22196 points (reproj px: med=0.34, p90=0.81, p99=2.10, rms=0.55)
  • 20/21 views — registered vs. total photos. If many photos fail to register, the cause is usually too little overlap, too few features, or too strict matching thresholds.
  • 22196 points — size of the sparse cloud.
  • reproj px — per-point reprojection error percentiles. A sub-pixel median is healthy. A small median with a large p99 means a few outliers; a large median everywhere means a weak solve.

Dense fusion funnel

Fusion: 5836800 -> 4443855 depth>0 -> 3841509 conf>=0.75 -> 2241200 consistency>=2 -> 706202 voxels -> 698061 after SOR

Each arrow shows how many pixels/points survive a stage, so you can see where points are lost and which knob to turn:

  • big drop at conf>= → lower Dense MVS confidence cutoff;
  • big drop at consistency>= → loosen Dense MVS depth tolerance or lower Dense MVS consistent views;
  • with the sky filter on, an extra sky-cut term shows how many sky points were removed.
Tune from the funnel, not by guesswork

Run once with defaults, read the funnel, then change the one stage that is cutting too much (or too little). Re-run and compare.


Worked examples

Drone survey with GPS (nadir)

SIFT, Max image dimension 3200, Share intrinsics per camera on, Use GPS prior on, GPS sigma XY 25, GPS pair max distance set to a few times the photo spacing. Enable Dense MVS with the PLANE_SWEEP backend.

Convergent outcrop (oblique, high overlap)

SIFT, Max features 8000+, Use GPS prior on if the photos are tagged. Enable Dense MVS; turn on the Sky filter if the outcrop is shot against the sky.

Densest possible dense cloud

Dense MVS confidence cutoff 0.6, Dense MVS consistent views 1, Dense MVS depth tolerance 0.04. (Expect more noise—clean up afterwards with point-cloud filters.)

Cleanest dense cloud

Dense MVS confidence cutoff 0.85, Dense MVS consistent views 3, Dense MVS depth tolerance 0.015.

Fast preview

ORB, Max image dimension 1600, Max features 4000, BA max iterations 40, Dense MVS off (or on with neighbor views 3).

Textured outcrop mesh

Run the reconstruction with Dense MVS on, then Create Mesh (Poisson) with Octree depth 9, Trim factor 6. Finally Texture Mesh from Photos in Target pixel size mode at 0.005 m with Page size 4096 for a sharp, photo-detailed surface.


Troubleshooting

SymptomLikely cause / fix
Few photos register (5/20 views)Too little overlap, too few features, or matching too strict. Raise Max features and/or Max image dimension; lower Lowe ratio test slightly; only lower Min geometric inliers as a last resort. Check the photos aren't blurry.
Dense cloud nearly empty / very sparseRead the fusion funnel and relax the stage that cuts most: lower confidence cutoff, loosen depth tolerance, or lower consistent views.
Dense cloud is noisyTighten the same three: raise confidence cutoff, raise consistent views, tighten depth tolerance.
Blue/white halo around the outcropTurn on the Dense MVS sky filter.
GPS-tagged run looks worse than expectedCheck the EXIF GPS is valid; raise GPS sigma XY/Z (trust GPS less), or turn Use GPS prior off to reconstruct in a purely visual frame.
CASMVSNET backend does nothingIt needs casmvsnet.onnx in the project MODELS folder and a CUDA GPU; otherwise use PLANE_SWEEP.
"No mesh to texture"Run Create Mesh on the workflow first — texturing needs a <name> mesh.
Texture looks coarse / blurryUse a smaller Target pixel size or more pages, and/or a larger Page size. Check the log for a "density coarsened" note — you may have hit the 8-page safety cap.
Mesh has holesPoisson Trim factor too aggressive — raise it or set 0 for a watertight surface; or you used Greedy, which leaves gaps in sparse areas (switch to Poisson).
Patches show flat colour, not photo detailThose triangles were seen by no photograph (occluded or outside every image), so they fall back to per-vertex colour. Add photos covering that area or re-shoot it.
Faint seams across the textureExpected in v1 — each triangle uses one source photo, so exposure/source changes can show at chart boundaries.