← Back to portfolio

Project Case Study

Riverbreak

A browser-first system for finding typographic rivers in justified text, then using those detections to compare better line breaks.

Typographic rivers are the vertical or diagonal whitespace channels that appear when justified text stretches gaps into accidental paths. They are easy for a reader to notice and hard for a layout engine to reason about because they emerge across lines, not inside a single word or sentence.

I built Riverbreak as a complete research-to-demo loop: a Knuth-Plass-style browser line breaker, a real-time geometric detector, synthetic weak-label data generation, a small U-Net segmentation model, hand-labeled validation and test sets, ONNX browser inference, and a detector-guided reranker that compares baseline and optimized paragraph layouts.

  • Python
  • PyTorch
  • Small U-Net
  • ONNX Runtime
  • JavaScript
  • HTML Canvas
Riverbreak project card showing justified text with detected whitespace rivers
0.5018 Test Dice for the focal-loss SmallUNet, up from 0.4182 for the heuristic baseline
24 + 24 Hand-labeled validation and test images kept separate from synthetic weak-label training data
ONNX Local browser model runs live neural overlays on the rendered paragraph canvas
Reranker Candidate layouts are scored by classical demerits plus river severity

Problem

Justified text layout is usually optimized around line badness: avoid awkward spacing, avoid extreme stretch, and choose a visually balanced set of line breaks. That misses a cross-line failure mode. A paragraph can look acceptable line by line while repeated whitespace gaps align into a visible river.

Riverbreak asks a practical question: can a detector make this failure mode inspectable enough to influence layout decisions? The project is not only a model. It is a toolchain for rendering paragraphs, detecting rivers, comparing detectors, and feeding the signal back into line-breaking choices.

Product goal

Give a user an interactive way to see where rivers form, adjust layout settings, and compare baseline text against river-aware alternatives.

Engineering goal

Connect classic layout logic, computer vision, annotated evaluation, and browser inference without hiding which parts are proven and which parts are still prototype-grade.

System Shape

I split the project into four surfaces so each part could be tested and explained independently: the browser layout demo, the heuristic detector, the neural detector, and the reranking prototype.

Browser line breaker

The demo implements a Knuth-Plass-style paragraph layout simulator with beam search. It renders justified lines with explicit per-line spacing, which makes gap geometry available for inspection instead of leaving it implicit in browser text layout.

Heuristic detector

The first detector traces inter-line whitespace corridors using gap-center alignment, diagonal drift tolerance, chain length, and confidence scoring. It is fast enough to update interactively while users change width, tolerance, and river-weight settings.

Neural detector

The ML path uses a small U-Net trained as a binary segmentation model. It takes a 256 x 256 grayscale paragraph image and returns a per-pixel probability map for river-like regions.

Detector-guided reranker

The reranker generates bounded candidate layouts, scores each one with a combined classical and river-severity objective, then shows baseline and optimized paragraphs side by side.

Heuristic First

I started with geometry because the browser demo needed instant feedback. A model that takes a second to run is useful for analysis, but it is the wrong default for dragging a slider. The heuristic detector gave the project a fast baseline and a way to generate weak labels for early model training.

  • Rendered layout as data: the line breaker knows where every word and expanded space sits, so river detection can operate on explicit gap positions instead of screenshots alone.
  • Corridor-style matching: the detector looks for gap chains that stay aligned vertically or diagonally across neighboring lines, then assigns severity based on chain strength and count.
  • Fast interaction loop: heuristic overlays update in real time, so users can explore how paragraph width and line-breaking penalties affect river formation.
  • Honest baseline: the heuristic is kept in the demo and benchmark reports even after adding the neural model, making progress visible instead of replacing the baseline silently.

Training Pipeline

The neural detector is intentionally small and reproducible. I trained a focal-loss SmallUNet on 128 synthetic weak-label paragraphs, then evaluated it against separate hand-labeled validation and test images. The training signal is imperfect by design, so the annotated split is the evidence that matters.

Input and target

Each sample is normalized to a 256 x 256 grayscale paragraph render. The model outputs a probability map, allowing overlay visualization and pixel-level comparison against masks.

Loss choice

Focal loss fits the task because positive river pixels are sparse. The model needs pressure on the difficult minority class instead of optimizing mostly for background.

Data boundary

Synthetic weak labels are used for training. Hand-labeled validation and test sets are kept separate so evaluation is not just measuring agreement with the heuristic label generator.

Tracked artifacts

The repo tracks annotations, the selected benchmark report, browser samples, and the ONNX model. Large PyTorch checkpoints and generated synthetic data are regenerated locally.

Evaluation

The current best checkpoint is a focal-loss SmallUNet trained on synthetic weak labels. Threshold selection is done on validation data and then applied unchanged to the test split, so the test numbers are not tuned after the fact.

Best test result: neural focal model Dice 0.5018 and IoU 0.3349, compared with heuristic baseline Dice 0.4182 and IoU 0.2644. On the same test split, neural precision is 0.4177 and recall is 0.6282.

  • Validation split: heuristic Dice 0.4150 and IoU 0.2618; neural Dice 0.5069 and IoU 0.3395.
  • Test split: heuristic Dice 0.4182 and IoU 0.2644; neural Dice 0.5018 and IoU 0.3349.
  • Interpretation: the benchmark is deliberately small, so I treat the numbers as directional evidence and failure-analysis support rather than a final statistical claim.
  • Practical value: the model improves agreement with human labels while keeping qualitative outputs inspectable through overlays and contact sheets.

Browser Inference

I exported the focal checkpoint to a single ONNX model and wired it into the browser as an optional live neural overlay. The goal was not to replace the heuristic toggle. It was to make the model inspectable on the exact paragraph the user is viewing.

  • Lazy runtime loading: the demo loads ONNX Runtime Web from a CDN only when the user turns on the live neural toggle, keeping the default demo lightweight.
  • Local model execution: the browser fetches the local web/models/river_detector.onnx model and runs CPU WASM inference against a rasterized 256 x 256 paragraph canvas.
  • Fresh overlays: each layout change can trigger a new forward pass, and the returned probability map is painted as a pink overlay on top of the rendered paragraph.
  • Tradeoff made visible: the model file is roughly 30 MB and recurring inference can take hundreds of milliseconds to seconds, so live neural remains opt-in while heuristic overlays stay instant.

Reranking

The most interesting product step was turning detection into action. The reranker does not rewrite the line breaker. It generates a small set of bounded candidate layouts around the current settings, scores them, and picks the best candidate for side-by-side comparison.

Scoring objective: classical_demerits + alpha * river_severity. Classical demerits come from the line breaker. River severity currently comes from the geometry detector, not from a live U-Net call.

  • Candidate generation: variants include classical, light, current, firm, aggressive, and deeper-lookback settings so the comparison stays fast and understandable.
  • Delta-focused UI: the demo shows baseline and optimized layouts side by side, with changed lines flagged so the user can inspect what the detector actually influenced.
  • Why a surrogate: running six live U-Net forward passes per rerank would make the browser interaction sluggish. The geometry surrogate is cheap and aligned with the same river-chain structures used for weak-label training.
  • Migration path: the scoring function is isolated so future work can replace the surrogate with a rasterize-and-run-ONNX path without redesigning the UI.

What This Project Shows

Riverbreak is the project on my portfolio that best shows how I connect product demos, algorithms, and ML evaluation without treating any one layer as the whole system.

Algorithmic systems

The line-breaking simulator, beam search, layout demerits, and candidate reranking show comfort working with algorithmic product behavior, not only UI presentation.

Applied ML judgment

The model is scoped to a concrete inspection task, measured against hand-labeled data, and documented with caveats instead of oversized claims.

Browser delivery

ONNX export, lazy runtime loading, canvas rasterization, and local overlays turn the model into something a user can actually try in a browser.

Research hygiene

The repo separates tracked evidence from regenerable artifacts, documents benchmark limits, and keeps lightweight verification checks for published files.

Limitations and Next Work

The project is intentionally transparent about what is done and what still needs work. That matters because typography quality is subjective, and the current benchmark is small.

  • Expand annotation: increase the hand-labeled validation and test sets before treating the metrics as strong statistical evidence.
  • Quantize the browser model: reduce the roughly 30 MB ONNX payload so live neural inference feels less expensive on first use.
  • Use neural scoring in reranking: replace the geometry surrogate with live model probability mass once performance is good enough for multiple candidates.
  • Broaden typography coverage: evaluate more fonts, widths, paragraph lengths, and browser rendering differences so the detector is less tied to the synthetic training distribution.