Foundations of Computer Vision

Computational Imaging & Graph-Based Vision — Dr. Ofer Miller

Miller's ResearchPublications



Miller'sPerspectives
Research Perspective

Miller Perspective

Human vision naturally interprets and understands the surrounding world as a three-dimensional (3D) environment. In contrast, most common visual sensors, such as cameras, still capture only two-dimensional (2D) projections of this world. During the projection from 3D to 2D, a significant amount of information, particularly depth information, is lost.

While humans can effortlessly interpret the dynamic structure of 2D image sequences, achieving comparable understanding computationally is challenging, especially when relying on a single visual sensor. Although multiple sensors can facilitate 3D reconstruction, many multimedia applications rely on a single-sensor setup, making the absence of explicit 3D information one of the fundamental challenges in computer vision.

Consequently, the effective use of spatial and temporal cues becomes essential for understanding the dynamic structure of scenes captured with a single camera.

Complexity Note

Graph-Based Algorithmic Structure

In much of my below public research, I rely on graph-based data structures and their associated algorithms, including breadth-first search (BFS), depth-first search (DFS), graph contraction, minimum spanning tree (MST), and finding the k shortest paths between two vertices, to obtain efficient implementations.

This provides a set of related constructive algorithms for high-level processing with linear, almost-linear, and polynomial time complexity. Linear or almost-linear time-complexity algorithms are proportional to the image size n.

This enables polynomial time complexity for some algorithms when the complexity is proportional to the number of arcs E in the image segmentation boundaries. Then the complexity becomes O(E × E) rather than O(N × N), where E << N.

BFS
DFS
MST
O(E × E)
E << N
Polynomial Time
Research Perspective

Miller Perspective : K-means to AI

A useful starting point for understanding artificial intelligence is the K-means algorithm, a classical unsupervised machine-learning method that groups data points into k clusters according to similarity. Each data point is assigned to the cluster whose centroid is closest, and each centroid is then updated as the mean of the points assigned to it. This assignment-and-update procedure is repeated until the cluster memberships stabilize.

Although K-means is conceptually simple, it captures a fundamental computational principle: complex data can be organized into structured groups by iteratively refining an internal representation. It is useful in applications such as customer segmentation, image analysis, and visual-data organization. Its limitations are also important: the number of clusters must be specified in advance, and the method is most effective when the underlying clusters are approximately spherical in Euclidean space.

Artificial intelligence systems learn statistical patterns from data and use those patterns to make predictions, classifications, decisions, or generative outputs. Modern deep neural networks extend this idea by optimizing large collections of parameters through iterative training. The model compares its predicted output with the desired output, computes an error, and updates its internal weights using gradient-based optimization methods such as backpropagation.

Conceptually, deep learning can be viewed as a broad generalization of the intuition embodied in K-means. While K-means partitions data in a relatively simple geometric space, neural networks learn high-dimensional feature spaces in which similar inputs are mapped to nearby regions and dissimilar inputs are separated. In this sense, AI systems perform a dynamic, learned, and highly expressive form of pattern organization, preserving the core idea of grouping similarity while expanding it toward abstraction, generalization, reasoning, and complex decision-making.

Conceptual Evolution

From Clustering to Learned Representation

The conceptual link between K-means and modern AI is not that they are the same algorithm, but that both organize data by discovering structure. K-means does this explicitly through centroids; neural networks do it implicitly through learned features, internal layers, and optimized parameters.

K-means → Feature Space → Deep AI
K-meansNearest-centroid grouping in a geometric space.
RepresentationData becomes organized into compact internal structure.
Deep AIHigh-dimensional features are learned by optimization.
centroidslearned feature layers
K-means minimizes:
J = Σᵢ || xᵢ − μcᵢ ||²

Deep learning generalizes optimization through:
θ* = argminθ L(fθ(x), y)
Unsupervised Learning
Centroids
Feature Space
Representation Learning
Backpropagation
Deep AI
Basic andVisual Terms
Visual AI in Motion

What is Computer Vision?

Computer vision enables machines to interpret visual information: capturing an image, detecting objects, extracting patterns, and turning pixels into meaningful understanding.

Human vision meets machine perception
Human Vision
Camera Input
Object Detection
AI Perception
Human vision inspires machine perception: a camera captures visual data, features are extracted, and an AI model detects meaningful objects.
Mathematical Structure in Motion

What is Graph Theory?

Graph theory studies networks made of nodes and connections. In computer vision, graphs can represent pixels, image regions, objects, relationships, motion paths, or visual structures.

Node selected · degree = 3
A graph = vertices / nodes + edges / connections. Example: image regions can become nodes, and their visual relationships become edges.
Computational Growth in Motion

What is Algorithm Complexity?

Algorithm complexity uses mathematical expressions to describe how computation grows: running time, memory cost, recursive behavior, input size, and scalability.

Asymptotic analysis
T(n) = 2T(n/2) + Θ(n)
T(n) = Θ(n log n)
Σᵢ₌₁ⁿ i = n(n+1)/2
T(n) = T(n-1) + n
S(n) = Θ(n²)
limₙ→∞ T(n) / n² = c
O(f(n)) = { g(n): g(n) ≤ c·f(n) }
Examples: divide-and-conquer recurrences, summations, memory complexity, limits, and Big-O definitions describe how computation scales.
Multidimensional Intelligence Flow

What is Artificial Intelligence?

Artificial intelligence can be visualized as information flowing through multiple graph-like dimensions. Each layer transforms the signal, detects patterns, and passes learned representations forward.

Signal moving through AI layers
Input Graph
Hidden Representation
Decision Space
Input Layer
Feature Graph
Electric Signal Flow
Decision Output
AI transforms information across multiple graph-like layers: raw input becomes features, features become representations, and representations become decisions.
Research Signature

From Graph Theory to Visual Understanding

A graph-based vision model transforms visual data into structure: pixels become regions, regions become nodes, relationships become edges, and the resulting graph supports segmentation, recognition, and scene understanding.

Graph-based visual intelligence
Image Regions
Graph Model
Visual Output
Pixels
Regions
Graph Structure
Visual Meaning
Image data is transformed into a graph: nodes represent regions or features, edges represent relationships, and graph structure enables visual interpretation.
Research Pipeline

From Pixels to Intelligent Vision

A unified research pipeline: raw visual data is processed, transformed into structured representations, analyzed through computational complexity, passed through AI models, and finally converted into visual understanding.

01

Image Processing

Raw pixels are filtered, enhanced, denoised, and prepared for higher-level analysis.

02

Graph Representation

Image regions, objects, or features become nodes, while visual relationships become edges.

03

Algorithm Complexity

Computation is evaluated for scalability, time growth, memory use, and efficient execution.

04

AI Model

Signals flow through learned representations, extracting patterns and forming predictions.

05

Computer Vision

The system detects, recognizes, segments, and understands the visual world.

This pipeline connects the core ideas of the site: image processing, graph theory, algorithmic efficiency, artificial intelligence, and computer vision into one coherent research flow.

Segmentation

Is it possible to segment "correctly" on still Images ? (while no motion info available)

Video

Here is the official publication Link of the above article.

The published algorithm was considered to be the Most Cited paper !

Moving Objects Segmentation

Graph based Segmentation of Moving Objects Based on Connectivity Analysis of Spatio-temporal information

עופר מילר שינויים מבוססי אובייקטים

Here is the official publication Link of the above article.


Change Detection

Identify changes between two gray images taken in extreme! different illumination.

Video

Here is the publication Link of the above article.

Smart Tracking





Occluded objects tacking in linear complexity

Video

Here is the publication Link of the above article.

Graph Model Representation

Ultimate Face Recognition

Video

Graph Connectivity Analsys