Understanding Comparison Algorithms
Diffing is the process of identifying the differences between two data sets. While it seems simple to the human eye, teaching a computer to find the "shortest edit script" for text or detect visual changes in images is a complex computer science problem.
The Myers Difference Algorithm
At the heart of most diff tools, including diff.quest and Git, lies the Myers Diff Algorithm. Published by Eugene W. Myers in 1986, this algorithm finds the longest common subsequence (LCS) between two sequences.
It works by creating a "trace" through a grid of edits to transform String A into String B with the minimum number of additions and deletions. It's efficient, operating in $O(ND)$ time.
Adaptive Granularity: Text vs. Data
One size does not fit all when it comes to comparison. diff.quest now employs an adaptive strategy based on your input type:
1. Granular Character Diff (Prose)
When you select the Text tool, we assume you are comparing prose, letters, or unstructured documents.
- Algorithm: Character-level Myers Diff.
- Behavior: Every single character change is highlighted. If you fix a typo changing "teh" to "the", we highlight the specific letters moved.
- Goal: Precision for human-readable text.
2. Token-Based Diff (Code & Config)
When you select JSON, YAML, XML, or TOML, strict character diffing often creates a "ransom note" effect—where highlighting individual quotes or brackets makes the output unreadable.
- Algorithm: Token-based Diff.
- Behavior: We break your code into semantic "tokens" (words, numbers, symbols). If a variable name changes, we highlight the whole word, not just the differing letters.
- Goal: Readability for code and structured data.
View Modes: Split vs. Unified
We offer two distinct ways to visualize changes:
- Split View: A side-by-side comparison. This view uses our adaptive highlighting (character vs. token) to show you exactly what changed within a line. It is best for detailed analysis.
- Unified View: A linear representation standard in command-line tools (like `git diff`). To maintain high scannability, this view focuses purely on Line Differences. It shows which lines were added or removed without the visual noise of intra-line highlighting.
Structured Data Normalization
For formats like JSON or YAML, diff.quest "normalizes" your input before comparing:
- Parsing: We validate and parse the structure.
- Sorting: Keys are sorted alphabetically to ensure that
{"a": 1, "b": 2}matches{"b": 2, "a": 1}. - Formatting: The data is reprinted with consistent indentation.
This ensures that you only see meaningful data changes, ignoring irrelevant formatting differences.
Image Comparison: Perceptual Hashing
Unlike text, where differences are discrete characters or tokens, images contain millions of pixels. diff.quest uses multiple techniques to detect and visualize image differences:
Perceptual Hash (pHash)
To measure overall similarity between images, we use a DCT-based perceptual hash algorithm:
- Resize: Scale the image down to 32×32 pixels to reduce complexity.
- Grayscale: Convert to grayscale to focus on structure, not color.
- DCT Transform: Apply Discrete Cosine Transform to extract frequency information.
- Hash Generation: Extract the 8×8 low-frequency components and compare against the median to generate a 64-bit binary hash.
- Similarity Score: Calculate Hamming distance between hashes to produce a 0-100% similarity score.
pHash is resilient to minor changes like compression, resizing, or slight color adjustments, making it ideal for detecting whether two images are "perceptually similar."
Pixel-Level Comparison
For detailed visual analysis, we also support pixel-by-pixel comparison:
- Highlight Mode: Different pixels are highlighted in red, with a configurable threshold to ignore minor variations.
- Subtract Mode: Shows the absolute RGB difference for each pixel, creating a heatmap of changes.
Image View Modes
We offer five visualization modes for image comparison:
- Split View: Classic side-by-side comparison (default).
- Fade View: Overlay images with an opacity slider to fade between them.
- Slider View: Interactive swipe divider for before/after comparison.
- Highlight View: Visual heatmap showing different pixels in red.
- Subtract View: Absolute pixel difference visualization.
Privacy-First Architecture
All processing on diff.quest—whether text parsing, diffing, or image analysis—happens entirely in your browser using Web Workers. Your data never touches our servers. This ensures complete privacy while maintaining high performance.