How Customer Prospecting by using PCA works

Customer Prospecting by using PCA is a tool used to analyze a customer database quickly and easily. It removes the burden of an analyst to determine the demographic thresholds of their customer data. The user simply identifies the demographic variables that are indicators of their customers, and the tool does the rest. It analyzes the selected demographics within the geographies that your customer points fall within and compares them using the Principal Component Analysis (PCA) Method, ranking these geographies based on how similar the demographic values are to your customer data. The output will be a list of geographies thematically mapped based on their rank. It will rank the geographies from 1 to x (x being the number of geographies selected to rank), where 1 represents the best possible match to the customer input.

Principal Component Analysis (PCA) Method

The PCA method removes the burden of variable selection while still providing a ranking of the sites according to the level of similarity. You may want to score similarity using a predefined set of variables you choose or use all the variables provided.

The figure below illustrates how the variables or neighbors can be selected, where K is the number of neighbors to be found.

PCA

The PCA algorithm considers a set of variables for each site as a vector. It then considers a set of vectors for all potential sites and the major site and performs the PCA on it in the following sequence:

  1. It builds a covariations matrix.
  2. It finds eigenvectors and values for covariations matrix.
  3. Using Kaiser Criterion, it drops eigenvectors with eigenvalues less than 1.
  4. These eigenvectors form subspace in the initial space.
  5. Projections are calculated for all vectors to this subspace.
  6. It standardizes the projected data to [0,1] interval.
  7. It uses L2 distance (Euclidean) to choose K closest similar potential sites.

The resulting layer containing K potential sites closest to the major site will be color-coded according to the L2 distance from the major site.

3/3/2014