Ofer Miller Research Page

עופר מילר : אתר פרסומי המחקר הרשמיים

Foundations of Computer Vision

Computational Imaging & Graph-Based Vision — Dr. Ofer Miller

Arrow Down

עופר מילר : אתר פרסומי המחקר הרשמיים

Miller's ResearchPublications

Back To Home Page

עופר מילר : אתר פרסומי המחקר הרשמיים

Miller'sPerspectives

Research Perspective

Miller Perspective

Human vision naturally interprets and understands the surrounding world as a three-dimensional (3D) environment. In contrast, most common visual sensors, such as cameras, still capture only two-dimensional (2D) projections of this world. During the projection from 3D to 2D, a significant amount of information, particularly depth information, is lost.

While humans can effortlessly interpret the dynamic structure of 2D image sequences, achieving comparable understanding computationally is challenging, especially when relying on a single visual sensor. Although multiple sensors can facilitate 3D reconstruction, many multimedia applications rely on a single-sensor setup, making the absence of explicit 3D information one of the fundamental challenges in computer vision.

Consequently, the effective use of spatial and temporal cues becomes essential for understanding the dynamic structure of scenes captured with a single camera.

Complexity Note

Graph-Based Algorithmic Structure

In much of my below public research, I rely on graph-based data structures and their associated algorithms, including breadth-first search (BFS), depth-first search (DFS), graph contraction, minimum spanning tree (MST), and finding the k shortest paths between two vertices, to obtain efficient implementations.

This provides a set of related constructive algorithms for high-level processing with linear, almost-linear, and polynomial time complexity. Linear or almost-linear time-complexity algorithms are proportional to the image size n.

This enables polynomial time complexity for some algorithms when the complexity is proportional to the number of arcs E in the image segmentation boundaries. Then the complexity becomes O(E × E) rather than O(N × N), where E << N.

BFS

DFS

MST

O(E × E)

E << N

Polynomial Time

Research Perspective

Miller Perspective : K-means to AI

A useful starting point for understanding artificial intelligence is the K-means algorithm, a classical unsupervised machine-learning method that groups data points into k clusters according to similarity. Each data point is assigned to the cluster whose centroid is closest, and each centroid is then updated as the mean of the points assigned to it. This assignment-and-update procedure is repeated until the cluster memberships stabilize.

Although K-means is conceptually simple, it captures a fundamental computational principle: complex data can be organized into structured groups by iteratively refining an internal representation. It is useful in applications such as customer segmentation, image analysis, and visual-data organization. Its limitations are also important: the number of clusters must be specified in advance, and the method is most effective when the underlying clusters are approximately spherical in Euclidean space.

Artificial intelligence systems learn statistical patterns from data and use those patterns to make predictions, classifications, decisions, or generative outputs. Modern deep neural networks extend this idea by optimizing large collections of parameters through iterative training. The model compares its predicted output with the desired output, computes an error, and updates its internal weights using gradient-based optimization methods such as backpropagation.

Conceptually, deep learning can be viewed as a broad generalization of the intuition embodied in K-means. While K-means partitions data in a relatively simple geometric space, neural networks learn high-dimensional feature spaces in which similar inputs are mapped to nearby regions and dissimilar inputs are separated. In this sense, AI systems perform a dynamic, learned, and highly expressive form of pattern organization, preserving the core idea of grouping similarity while expanding it toward abstraction, generalization, reasoning, and complex decision-making.

Conceptual Evolution

From Clustering to Learned Representation

The conceptual link between K-means and modern AI is not that they are the same algorithm, but that both organize data by discovering structure. K-means does this explicitly through centroids; neural networks do it implicitly through learned features, internal layers, and optimized parameters.

K-means → Feature Space → Deep AI

K-meansNearest-centroid grouping in a geometric space.

RepresentationData becomes organized into compact internal structure.

Deep AIHigh-dimensional features are learned by optimization.

K-means minimizes:
J = Σᵢ || xᵢ − μ_cᵢ ||²

Deep learning generalizes optimization through:
θ* = argmin_θ L(f_θ(x), y)

Unsupervised Learning

Centroids

Feature Space

Representation Learning

Backpropagation

Deep AI

Back

Basic andVisual Terms

Visual AI in Motion

What is Computer Vision?

Computer vision enables machines to interpret visual information: capturing an image, detecting objects, extracting patterns, and turning pixels into meaningful understanding.

Human vision meets machine perception

Human Vision

Camera Input

Object Detection

AI Perception

Human vision inspires machine perception: a camera captures visual data, features are extracted, and an AI model detects meaningful objects.

Mathematical Structure in Motion

What is Graph Theory?

Graph theory studies networks made of nodes and connections. In computer vision, graphs can represent pixels, image regions, objects, relationships, motion paths, or visual structures.

Node selected · degree = 3

A graph = vertices / nodes + edges / connections. Example: image regions can become nodes, and their visual relationships become edges.

Computational Growth in Motion

What is Algorithm Complexity?

Algorithm complexity uses mathematical expressions to describe how computation grows: running time, memory cost, recursive behavior, input size, and scalability.

Asymptotic analysis

T(n) = 2T(n/2) + Θ(n)

T(n) = Θ(n log n)

Σᵢ₌₁ⁿ i = n(n+1)/2

T(n) = T(n-1) + n

S(n) = Θ(n²)

limₙ→∞ T(n) / n² = c

O(f(n)) = { g(n): g(n) ≤ c·f(n) }

Examples: divide-and-conquer recurrences, summations, memory complexity, limits, and Big-O definitions describe how computation scales.

Multidimensional Intelligence Flow

What is Artificial Intelligence?

Artificial intelligence can be visualized as information flowing through multiple graph-like dimensions. Each layer transforms the signal, detects patterns, and passes learned representations forward.

Signal moving through AI layers

Input Graph

Hidden Representation

Decision Space

Input Layer

Feature Graph

Electric Signal Flow

Decision Output

AI transforms information across multiple graph-like layers: raw input becomes features, features become representations, and representations become decisions.

Research Signature

From Graph Theory to Visual Understanding

A graph-based vision model transforms visual data into structure: pixels become regions, regions become nodes, relationships become edges, and the resulting graph supports segmentation, recognition, and scene understanding.

Graph-based visual intelligence

Image Regions

Graph Model

Visual Output

Pixels

Regions

Graph Structure

Visual Meaning

Image data is transformed into a graph: nodes represent regions or features, edges represent relationships, and graph structure enables visual interpretation.

Research Pipeline

From Pixels to Intelligent Vision

A unified research pipeline: raw visual data is processed, transformed into structured representations, analyzed through computational complexity, passed through AI models, and finally converted into visual understanding.

Image Processing

Raw pixels are filtered, enhanced, denoised, and prepared for higher-level analysis.

Graph Representation

Image regions, objects, or features become nodes, while visual relationships become edges.

Algorithm Complexity

Computation is evaluated for scalability, time growth, memory use, and efficient execution.

AI Model

Signals flow through learned representations, extracting patterns and forming predictions.

Computer Vision

The system detects, recognizes, segments, and understands the visual world.

This pipeline connects the core ideas of the site: image processing, graph theory, algorithmic efficiency, artificial intelligence, and computer vision into one coherent research flow.