David Young
  • Home
  • About
  • Contact
  • Personal Projects
    • Computer Science >
      • Computer Vision >
        • 2016 - Homography w/ RANSAC
        • 2016 - Fundamental Matrix & Triangulation
        • 2016 - Laplacian Blob Detector
        • 2016 - Photometric Stereo: Shape From Shading
        • 2015 - Optical Character Recognition w/ OpenCV and Deep Learning
        • 2015 - Feature Detection
        • 2015 - Feature Description
        • 2015 - Feature Matching
        • 2015 - Panoramas (Alignment, Stitching, Blending)
        • 2015 - Facial Detection & Recognition
        • 2015 - Single View Modeling
      • Artificial Intelligence >
        • 2019 - Talk: How Neural Networks See the World
        • 2018 - Generating Text and Poetry
        • 2015 - Optical Character Recognition w/ OpenCV and Deep Learning
        • 2015 - Constraint Satisfaction Problems
        • 2015 - Adversarial Search
        • 2015 - Path Planning (Mazes + Pacman)
        • 2015 - Digit Classification (Bayes)
        • 2015 - Text Document Classification (Bayes)
        • 2015 - Multi-Class Perceptrons
        • 2015 - Markov Decision Processes & Reinforcement Q-Learning
        • 2015 - Simulating Neuronal Learning during Brain-Machine Interface
      • Machine Learning >
        • 2016 - Naive Bayes Classifiers in R
        • 2016 - Stochastic Gradient Descent (SVM in R)
        • 2016 - Comparing Classifiers in R
        • 2016 - Visualize High Dim Data: Blob Analysis + PCA
        • 2016 - Image Segmentation w/ EM
        • 2016 - Regression Kernel Smoothing
        • 2016 - Multinomial Regression on Wide Datasets
      • Robotics >
        • 2017 - 3dof Parallel Motion Simulator
        • 2015 - Designing a Hybrid Controller
        • 2015 - Controlling Pendubot with a Kinect
      • Computer Architecture >
        • 2016 - Architecture Support for Accelerator Rich CMPs
        • 2014 - Weighted Vector Addition with Cuda Framework
        • 2014 - Parallel Reduction with Cuda Framework
        • 2014 - Designing a Pipelined CPU
        • 2014 - Intel SSE Intrinsics Applications in Rudimetary Matrix Algorithms
        • 2014 - LIFC to MIPS Compiler and Assembler
      • Web Development >
        • 2014 - Javascript Calendar
        • 2014 - Multi-Room Chat Server
      • Graphics >
        • 2015 - Basic Animation w/ WebGL
        • 2015 - Diamond Square Terrain Generator
        • 2015 - Flight Simulator w/ WebGL
        • 2015 - Multi-Program Texture Mapping WebGL
      • Software >
        • 2015 - Consumer Grade Gaze Pattern Recognition Software
        • 2015 -Test History Jenkins Plugin
      • Other >
        • 2014 - Hashtable for Genomic DNA Sequences
        • 2014 - Closest Pair of Points
    • Virtual Reality, Game Design, & Animation >
      • 2019 - Interactive Music Visualization
      • 2016 - Visualizing Runtime Flowpath in VR
      • 2016 - Fiducial Marker Tracking for Augmented Reality
      • 2015 - Experimenting with PhysX & APEX Destruction
      • 2015 - Rigging Tank Treads using MEL in Maya
      • 2015 - Automated Simulation Teddy Bear Bin
      • 2014 - Networked Multiplayer Game of Set
      • 2014 - Asymmetrical Multiplayer Destruction
      • 2016 - Tracking & Depth Perception
      • 2014 - 8 Week Game Design (Cave Survival)
      • 2015 - Experimenting with Nvidia FLEX
    • Computers >
      • Custom and Watercooled PCs
      • Component Reviews
      • Installation Guides
    • Quantitative Physiology >
      • Computational >
        • 2015 - Modelling Neurons & Action Potentials
        • 2015 - Simulating Neuronal Learning during Brain-Machine Interface
        • 2014 - Imaging: Rabbit Optical Mapping
        • 2014 - Simulating Electrical Stimulation w/ Comsol
        • 2014 - Ion Channels
        • 2013 - Designing Filters to Simulate Olfactory Sensation
        • 2014 - CardioVascular Mechanics
        • 2014 - Renal
        • 2013 - Principal Component Analysis & Singlar Value Decomposition
        • 2013 - 3D Printed Frog Muscle Holder
      • Physical >
        • 2013 - Biomedical Signal Acquisition
        • 2013 - Electrooculogram
        • 2013 - Compound Action Potential in Frog Sciatic Nerve
        • 2013 - Contractile Properties of Frog Skeletal Muscle
        • 2013 - Locust Olfaction
        • 2014 - Voltage Clamp
        • 2013 - Dive Response
        • 2014 - Frog Heart Muscle
        • 2013 - Ultrasound
        • 2014 - Biological Signal Conditioning
        • 2014 - EKG, Vector Cardiograms & Pulse Wave Velocity
    • Electrical Projects >
      • Self Balancing Robot Pendulum
      • Custom Beer Pong Tables
      • 4-axis Robotic Arm
      • Modified Electric MiniBike
      • Secret Knock Detecting Automatic Door Opener
      • Car Audio
      • Tree-House Wiring
      • Laser Harp
    • Auto & Mechanical Projects >
      • Single Turbo Lexus SC300
      • Track Day Mx-5
      • Karting
      • Racing Simulator Rig
      • 50cc Barbie Jeep
    • Random Other Projects >
      • Talk: Embodied Cognition
      • Bathymetry Coffee Table
      • Not your average Tree House
      • Pneumatic Tennis Ball Cannon
  • Experience
    • Resume
    • Work Experience
    • Programming Experience
    • Research Experience
    • Service Work
  • Education
  • Hobbies
    • Motorsports
    • Art
    • Music
    • Gaming
    • Dancing

​Image Segmentation w/ EM Algorithm
(2016)


Overview

It is possible to segment images using a clustering method, where each segment is the cluster center to which a pixel belongs. Images are represented by their r,g, and b values. In this implementation image pixels are clustered by applying an Expectation Maximization (EM) algorithm to a mixture of normal distribution model. Then the image is segmented by mapping each pixel to the cluster center with the highest value of the posterior probability for that pixel. The results of segmentation can be displayed easily as simple images, where each pixel's color is replaced with the mean color of the closest segment. This is the same basic technique used for some image compression, as storage only involves mapping pixels to a limited set of colors (cluster centers).

EM Algorithm (Concept)

A normal distribution is capable of modelling a single blob of data points. But a single normal distribution will not produce a good model if there are multiple blobs of data points. The multivariate normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. It is often used to describe a set of (possibly) correlated components (random variables) each of which should cluster around a mean value. With the help of affine transformation, any multivariate gaussian can be converted to have zero mean and independent components. Therefore, under the right transformations, any gaussian can be represented as the product of one-dimensional normal distributions (each with zero mean and unit standard deviation). This permits a way to simulate multivariate normal distributions. Simply simulate a standard normal distribution for each component, applying the appropriate transformations afterwards. 

Extrapolating this idea to clustering, think of the data as if it were generated in a certain way and then figure out the specifics of that process. Assume that each data item was pulled from one of multiple distinct models. For k clusters, assume there were k distinct models. Assume that a magical process determined which model to use when producing a data item in question. We know neither the models nor the origin of each data item (it's classified cluster). This is a very common situation, applicable to many types of problems.

A natural approach to solving this problem is to alternate between estimating classifications of data items to models, and re-estimating the models themselves. For example, K-means iteratively assigns points to clusters and then re-estimates the cluster centers. But what if the models are probabilistic or there is not a clear "distance function". In this case an Expectation Maximization (EM) algorithm is the standard. 

At a high level, the EM algorithm works to average out what we don't know. It alternates between averaging and re-estimating parameters with soft methods. The E step (expectation) involves creating a function for the expectation of the log-likelihood evaluated using the current parameter estimates. The M step (maximization) involves re-computing the parameters to maximize the function created in the E step (the expected log-likelihood). For the next iteration, the E step creates the log-likelihood using the newer parameter estimates and so on. The algorithm repeats until acceptable convergence.  

Results

Results were generated by segmenting 5 different images to 10,20 and 50 segments. The resulting pixel to cluster mappings were stored as images after each iteration and the results were compiled into animated GIFs. Additionally, one image (polar lights) was segmented into 20 segments using 6 different random initialized values fed into the EM algorithm. 
Balloons (10, 20 and 50 segments, 15 iterations)
Mountains (10, 20 and 50 segments, 15 iterations)
Nature (10, 20 and 50 segments, 15 iterations)
Ocean (10, 20 and 50 segments, 15 iterations)
Polar Lights (10, 20 and 50 segments, 15 iterations)

Exploring Variation in Results

To keep everything consistent, the EM algorithm was run for a fixed 30 iterations on the polarlights image using 6 different initial conditions for “pi” and “mu”. For each of the 6 runs, the initial values for pi were randomized and then unitized so they summed to 1. The same process was used for each topic’s mu vector. It is easiest to detect origins of variation by looking at the progression of iterations in the animated gifs. Although each run converges on very similar segmentations, there are slight differences. Depending on the random initial values for pi and mu, the first assignments of pixels to segments is different for each of the 6 trials. Certain colors appear in the early iterations and then morph more towards the original image by later iterations. Obviously the center shifts to shades of green, the right blues, and the left purples. These generalities hold across runs despite the difference in initial conditions, as they are the “ground truth” assignments looking at the original image. The variance in the final images reflects the fact that the EM algorithm converges to find parameters that will obtain local maxima and likelihood of the data given the model’s parameters. (LOCAL maxima not GLOBAL maxima). Intuitively, the process will find different maximizations given different initial conditions. If one cluster is assigned a high Pi value originally, it will be relatively easier for this cluster to achieve assignments… while for a cluster that was originally assigned a low Pi value, it will struggle to achieve assignments (relatively).
Polar Lights (20 segments, 30 iterations) -- Randomized Starting Values (6 different cases)

Implementation

All of the code was written from scratch in Matlab. It has been included in its entirety below. I'm sure that further optimizations will be apparent to those experienced with vectorizing code, but this worked well enough for my purposes. A few notes are worth mentioning...
  • While conceptually the EM algorithm involves selecting parameters that maximize an entire function, the only values required for successive iterations are the weights and the parameters themselves. Instead of calculating the full equation, the weights are recalculated and the parameters are adjusted each iteration.
  • Here the pi's (probability that a given cluster was chosen for data point creation) and mu's (mean value for each model/distribution) are initialized to values that are close to even, plus a small amount of gaussian random noise. Shown later is a snippet to initialize the pi's and mu's in a uniform random fashion. 
  • The E step contains terms that may lead to underflow errors because of the multiplication of many tiny numbers. To avoid underflow errors, the implementation performs the E step in the log domain where  a product of k terms becomes a sum of k terms. Unfortunately the denominator of one element in the equation contains a sum of products, meaning I couldn't decompose it into a sum of logs of individual terms. Instead I compute the logarithm of this expression for sums of products. One could use a log sum expression command to accomplish this, but I did it from scratch. I'll try to elaborate on this below, with an example from the topic model case  (see INSERT LINK). The same logic applies to this implementation. Consider the following expression:
\[p\left ( \delta _{ij}=1|\theta ^{(n)},x \right )=\frac{[\prod _{k}p_{j,k}^{xk}]\pi_{j}}{\sum _{l}[\prod _{k}p_{l,k}^{xk}]\pi_{l}}\]
To avoid underflow, this will have to be computed in the log domain. But that denominator is a sum of products which doesn't decompose into a sum of logs of individual terms. Instead, compute the logarithm of sums of products. For easy notation, denote:
\[\begin{align*} Y &= p\left ( \delta _{ij}=1|\theta ^{(n)},x \right )\\ A_{j} &= [\prod _{k}p_{j,k}^{x_k}]\pi_{j}\\ A_{max} &= \max_{j} A_{j} \end{align*} \]
Therefore,
\[\log A_{j} = \log \pi_{j} + \sum_{k} x_{k}\log p_{j, k}\\ \log A_{max} = \log\left(\max_{j}A_{j}\right) =\max_{j} \log\left(A_{j}\right)\]
Then, derive and calculate the logarithm of Y as:
\[\begin{align*} \log Y &= {\log A_{j}} - {\log\left(\sum_{l} A_{l}\right)}\\ &={\log A_{j}} - {\log\left(\sum_{l} e^{\log A_{l}}\right)}\\ &={\log A_{j}} - {\log\left(e^{\log A_{max}}\sum_{l} e^{\log A_{l} - \log A_{max}}\right)}\\ &={\log A_{j}} - {\log e^{\log A_{max}}} - {\log\left(\sum_{l} e^{\log A_{l} - \log A_{max}}\right)}\\ &={\log A_{j}} - {\log A_{max}} - {\log\left(\sum_{l} e^{\log A_{l} - \log A_{max}}\right)} \end{align*}\]
Then, simply take the exponent of that to yield Y. This means that each iteration, the E step will follow the following formula:
For every data item "i", do:
  1. Compute log(Aj) for every j
  2. When performing step 1, note the biggest computed value (considering each final log(Aj) ) and remember it as log(Amax)
  3. Because it is identical for all j, compute the following term from the final equation for logY:
\[{\log\left(\sum_{l} e^{\log A_{l} - \log A_{max}}\right)} \]
    4. For each cluster j, compute the logY 
    5. Exponentiate this logY ie: exp(logY) to yield w(i,j)
​ 
The M step will change the values of Aj, so each E step this will be an effective computation of new weights corresponding to the assignments of data points to clusters. 
For the random initializations used for the exploration of variation, the following snippets were used.

Proudly powered by Weebly