Large-Data Downsampling¶
GlyphX automatically downsamples large datasets before SVG generation to keep file size small, browser rendering fast, and tooltip hit-testing reliable. SVG performance degrades visibly above roughly 5 000 points; GlyphX handles datasets with millions of points transparently.
All downsampling is implemented in glyphx.downsample and is fully
vectorised with NumPy. Every algorithm can be called manually as well as
used automatically through the series classes.
Algorithms¶
GlyphX uses different algorithms depending on the series type, because each has different structural assumptions about the data.
2-D Line — M4 + LTTB two-stage pipeline¶
For ordered, axis-aligned line data GlyphX runs two stages automatically:
Stage 1 — M4 fires when the point count exceeds M4_THRESHOLD
(default 50 000). For each pixel-wide column of the canvas it retains
four points: first, last, minimum-Y, and maximum-Y. At the actual render
resolution this is visually lossless. M4 requires monotone (sorted) X values
and is fully vectorised via np.digitize and np.minimum.reduceat.
Stage 2 — LTTB (Largest-Triangle-Three-Buckets, Steinarsson 2013) fires
when the result still exceeds AUTO_THRESHOLD (default 5 000). It selects
the point in each bucket whose triangle area with its neighbours is largest,
preserving peaks, troughs, and inflection points far better than simple
decimation. The inner bucket scan is vectorised using NumPy slice expressions
and np.argmax.
2-D Scatter — Voxel grid thinning¶
Scatter data has no ordering assumption, so LTTB and M4 do not apply.
GlyphX divides the bounding box into a ceil(sqrt(max_points))² grid and
keeps the point nearest to each occupied cell centroid. This preserves the
spatial distribution of the cloud. The nearest-centroid selection uses
np.minimum.reduceat after a single sort by cell ID — no full distance
argsort.
3-D Line — LTTB in screen space¶
For a 3-D polyline, GlyphX first projects all points through the camera using
a vectorised NumPy matrix transform, then runs LTTB on the resulting 2-D
screen coordinates. This means the camera angle determines which kinks are
visually significant — a bend that is invisible from the current viewpoint is
correctly discarded. Results are cached per (camera-state, data-fingerprint)
pair in a WeakKeyDictionary so repeated renders at the same angle are free.
3-D Scatter — 3-D voxel grid thinning¶
Same principle as 2-D voxel thinning but in three dimensions. The grid is
ceil(cbrt(max_points))³. Per-point color lists are reindexed to match
the thinned output.
3-D Surface — Grid decimation + face culling¶
Surface3DSeries renders a regular M×N grid of quad faces. GlyphX applies
two reductions:
Grid decimation — independent step sizes per axis (proportional to grid aspect ratio) reduce the grid so the quad count stays below
AUTO_THRESHOLD. A 1 000×10 grid is decimated mostly along the long axis, not the short one.Face culling — after decimation, quads whose projected screen area is below
MIN_FACE_AREA(default 0.5 px²) are removed before the painter’s-sort. This is fully vectorised using the NumPy shoelace formula.
Thresholds and defaults¶
Constant |
Default |
Meaning |
|---|---|---|
|
5 000 |
Point count above which LTTB and voxel thinning activate |
|
50 000 |
Point count above which M4 fires as a first pass for line data |
|
0.5 px² |
Minimum projected quad area below which surface faces are culled |
Per-series threshold override¶
Every series constructor accepts a threshold keyword argument that overrides
AUTO_THRESHOLD for that series only. threshold=None (the default) uses
the module-level AUTO_THRESHOLD.
from glyphx.series import LineSeries, ScatterSeries
from glyphx.line3d import Line3DSeries
from glyphx.scatter3d import Scatter3DSeries
from glyphx.surface3d import Surface3DSeries
# LineSeries: keep only 500 points instead of the default 5 000
ls = LineSeries(x, y, threshold=500)
# ScatterSeries: only thin when above 20 000 (looser than default)
sc = ScatterSeries(x, y, threshold=20_000)
# 3-D series
l3 = Line3DSeries(x, y, z, threshold=1_000)
s3 = Scatter3DSeries(x, y, z, threshold=2_000)
sf = Surface3DSeries(x, y, Z, threshold=100) # max faces
Inspecting downsampling after render¶
After to_svg() is called, each series exposes last_downsample_info — a
dict with keys algorithm, original_n, and thinned_n — or None
if no downsampling occurred.
from glyphx.scatter3d import Scatter3DSeries
from glyphx import Figure3D
s3 = Scatter3DSeries(xs, ys, zs)
fig = Figure3D()
fig.add(s3)
fig.render_svg() # triggers to_svg internally
info = s3.last_downsample_info
if info:
print(f"{info['algorithm']}: {info['original_n']} → {info['thinned_n']}")
# e.g. "voxel-3D: 50000 → 4847"
The same property is available on LineSeries, ScatterSeries,
Line3DSeries, and Surface3DSeries.
SVG comments are also embedded inline whenever downsampling occurs:
<!-- glyphx: M4+LTTB downsampled 200000 -> 1631 points -->
<!-- glyphx: voxel-2D downsampled 100000 -> 5041 points -->
<!-- glyphx: grid-decimate (faces) downsampled 39601 -> 3481 points -->
Global kill-switch¶
Call disable() to turn off all downsampling on the current thread, for
example when generating publication-quality SVG exports where file size is not
a concern. The kill-switch is stored in a threading.local so disabling on
one thread does not affect others.
import glyphx.downsample as ds
ds.disable()
# ... render high-fidelity charts ...
ds.enable()
# Check status
print(ds.is_enabled()) # True / False
# Thread safety: disabling in a worker thread does not affect the main thread
import threading
def worker():
ds.disable()
# render without downsampling on this thread only
t = threading.Thread(target=worker)
t.start()
t.join()
print(ds.is_enabled()) # still True on the main thread
Manual use of the downsampling API¶
All functions are importable and callable directly:
from glyphx.downsample import (
lttb,
m4,
maybe_downsample_line,
voxel_thin_2d,
voxel_thin_3d,
lttb_3d,
decimate_grid,
cull_faces,
)
from glyphx.projection3d import Camera3D
import numpy as np
# LTTB — ordered 2-D line
x = np.linspace(0, 1, 100_000)
y = np.sin(x * 100)
x_down, y_down = lttb(x, y, threshold=2_000)
# M4 — pixel-aligned first pass
x_m4, y_m4 = m4(x, y, pixel_width=800)
# Two-stage pipeline (what LineSeries uses internally)
x_pipe, y_pipe = maybe_downsample_line(x, y, pixel_width=800,
threshold=5_000, m4_threshold=50_000)
# Voxel thinning — 2-D scatter (unordered)
xs = np.random.uniform(0, 1, 500_000)
ys = np.random.uniform(0, 1, 500_000)
c = np.arange(500_000, dtype=np.int32) # per-point class labels
x_thin, y_thin, c_thin = voxel_thin_2d(xs, ys, c=c, max_points=5_000)
# c_thin dtype is preserved as int32
# Voxel thinning — 3-D scatter
zs = np.random.uniform(0, 1, 500_000)
cols = [f"#{i % 0xFFFFFF:06x}" for i in range(500_000)]
x3, y3, z3, c3 = voxel_thin_3d(xs, ys, zs, colors=cols, max_points=5_000)
# LTTB in 3-D screen space
cam = Camera3D(azimuth=45, elevation=30, cx=320, cy=240, scale=200)
t = np.linspace(0, 4 * np.pi, 100_000)
lx, ly, lz = np.cos(t), np.sin(t), t / (4 * np.pi)
lx_d, ly_d, lz_d = lttb_3d(lx, ly, lz, cam, threshold=2_000)
# Grid decimation — 3-D surface
x1 = np.linspace(-3, 3, 500)
y1 = np.linspace(-3, 3, 500)
Z = np.sin(np.sqrt(x1[None,:]**2 + y1[:,None]**2))
x_dec, y_dec, Z_dec = decimate_grid(x1, y1, Z, max_faces=5_000)
# Face culling (called automatically by Surface3DSeries)
faces_kept = cull_faces(faces, min_area=0.5)
Benchmark reference¶
The table below shows approximate wall-clock times on a typical laptop
(measured with timeit, 5 runs each). All hot paths are fully vectorised.
Algorithm |
Input |
Output |
Approx. time |
|---|---|---|---|
LTTB |
500 000 pts |
5 000 pts |
~35 ms |
M4 |
1 000 000 pts |
~3 200 pts |
~75 ms |
M4 + LTTB pipeline |
500 000 pts |
5 000 pts |
~33 ms |
Voxel thin 2-D |
1 000 000 pts |
~5 000 pts |
~250 ms |
Voxel thin 3-D |
200 000 pts |
~5 000 pts |
~62 ms |
LTTB-3D |
100 000 pts |
5 000 pts |
~29 ms |
cull_faces (vectorised) |
50 000 faces |
varies |
~53 ms |
decimate_grid |
500×500 grid |
~70×70 grid |
< 1 ms |
Running the test suite¶
The downsampling module ships with a self-contained test suite that covers both correctness and performance. It requires only the stdlib and NumPy:
# Full suite — correctness tests + speed benchmarks
python glyphx/test_downsample.py
# Correctness tests only (faster)
python glyphx/test_downsample.py --fast
The suite covers 47 cases across 8 test classes:
Output length, first/last preservation, peak retention, monotonicity
Empty inputs, length mismatch errors, dtype preservation
Spatial coverage (voxel thinning)
Cache hit/miss (LTTB-3D)
Thread safety (kill-switch isolation)
Deprecation warning (legacy wrapper)
Performance ceiling (50 000 faces culled in under 500 ms)