<< Chapter < Page Chapter >> Page >
  1. Random Component: Distribution of { Y t | Y t - 1 , } belongs to the exponential family of distributions.
    f ( y t ; θ t , φ | F ) = e x p y t θ t - b ( θ t ) a t ( θ t ) + c ( y t , φ ) ,
    where θ t is the parameter of the distribution and φ is a dispersion parameter. In the Poisson case, we have: a t ( φ ) = 1 ; θ t = log ( μ t ) ; b ( θ t ) = μ t ; c ( y t ; φ ) = - log y .
  2. Systematic Component: μ t , the mean of Y t , is modeled by a monotone link function g ( · ) such that
    g ( μ t ) = X t T β ,
    where X is the set of covariates and β is a vector of coefficients. In the Poisson case μ = λ , g ( λ ) = log ( λ ) .

In the case of time series data, we can augment the exogenous covariates in the model with lagged values of the response variable, i.e. the observed counts at previous time points. Thus the model is “observation-driven.” Lags of exogenous covariates can also be included. For instance let the new covariate matrix be represented by Z , where

Z t T = ( X t , X t - 1 , , X t - p , Y t - 1 , , Y t - n ) .

For more details see [link] .

Clustering of tsc models

Armed with the GLM model for Poisson regression, we can begin clustering the TSC. In order to determine the similarity or dissimilarity between two TSC, a metric is needed to measure the “distance.” The classic Euclidean metric is not adequate for data with time dependence. We will use the empirical Kullback-Leibler (KL) likelihood metric [link] , which calculates the distance between two TSC by evaluating the relative fit of their respective models.

Let λ j be a given “model structure” for the data, i.e. an observation-driven Poisson model with specified covariates. The KL metric has the following expression.

D K ( λ k , λ j ) = 1 | Y K | y Y K ( log p ( y | λ k ) - log p ( y | λ j ) )

where Y K is the set of data objects which belong to cluster k . Note that log p ( y | λ k ) is an expression for the likelihood of the model. See [link] for discussion on the likelihood of observation-driven Poisson models. The measure is made symmetric by,

D S K = D K ( λ k , λ j ) + D K ( λ j , λ k ) 2

With the KL metric, we apply a hierarchical bottom-up clustering algorithm. A flowchart of the algorithm is displayed in Figure 1.

The MBC algorithm

The algorithm produces a cluster tree similar to the figure below. The bottom-up clustering method is easy to visualize and break down objects into groups and eliminates the need for any stopping criterion.

Sample hierarchical cluster tree

Data

Finding relevant data

Though count data are prevalent in consumer behavior, obtaining commercial commercial data for MBC is expensive. Thus, for this project, we use results from previous studies on marketing data to creat a data set that realistically mimics consumer behavior.

Data simulation

Niraj et al. [link] proposed an economic model for consumer purchases of bacon and eggs. Based on store scanner data, the authors studied the consumer sensitivities to various variables such as personal utility, product prices, product displays, and purchase history. For the purpose of data simulation, key elements from this economic model were borrowed to create our own consumer bacon and eggs purchase data.

We let Y b , t and Y e , t be a bivariate Poisson random variable which represent a consumer's purchase of bacon and eggs during time window t respectively, then Y b , t and Y e , t can be modeled using a trivariate reduction [link] :

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, The art of the pfug. OpenStax CNX. Jun 05, 2013 Download for free at http://cnx.org/content/col10523/1.34
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'The art of the pfug' conversation and receive update notifications?

Ask