How to Do Principal Component Analysis in Excel
Trying to make sense of a dataset with dozens of variables can feel like you're staring at a wall of static. Principal Component Analysis, or PCA, is a powerful statistical technique that cuts through that noise by simplifying your complex data into a few meaningful "principal components." This article will give you a practical, step-by-step guide to performing PCA directly within Excel.
What is Principal Component Analysis (PCA) and Why Use It?
Imagine you're a marketer analyzing the performance of 20 different ad campaigns. You have a spreadsheet with columns for ad spend, impressions, clicks, click-through rate (CTR), conversion rate, cost per click, and more. Feeling overwhelmed yet? It’s hard to see the big picture when you're juggling so many metrics.
This is where PCA comes in. It's a dimensionality reduction technique, which is a fancy way of saying it distills numerous related variables into a smaller, more manageable set of new variables called "principal components." These new components capture most of the crucial information from the original data, making it much easier to analyze and visualize.
Instead of trying to compare 10 different metrics, PCA might find that most of your performance can be explained by just two components:
- Component 1 (maybe you call it "Campaign Efficiency"): This component could be a combination of low cost per click, high CTR, and high conversion rate.
- Component 2 (maybe you call it "Audience Reach"): This one might be primarily driven by ad spend and a high number of impressions.
Suddenly, instead of a cluttered table, you can plot your 20 campaigns on a simple 2D chart to see which ones offered efficient performance and which ones prioritized broad reach. You’ve simplified the story without losing the plot.
Step-by-Step Guide: Performing PCA in Excel
Let’s walk through how to conduct a PCA from start to finish. We’ll analyze campaign data to make it relatable.
A Quick Note: Excel Requires an Add-In for PCA
While Excel is a powerhouse for many types of analysis, it doesn’t have a built-in, one-click button for PCA. The underlying math involves matrix algebra (specifically, calculating eigenvalues and eigenvectors), which is beyond the scope of Excel’s standard functions. Therefore, a specialized statistics add-in is the most practical way to get the job done without diving deep into linear algebra.
We’ll use XLSTAT, a popular and robust statistics add-in for Excel. It has a free trial you can use to follow along. Other options exist, but this is a very common tool for this purpose.
Step 1: Install the XLSTAT Add-In
First, head over to the XLSTAT website, download the free trial, and follow the installation instructions. Once installed, it will appear as a new tab in your Excel ribbon, giving you access to a suite of advanced statistical tools.
Step 2: Prepare Your Data
Your data needs to be organized properly before you can run the PCA. The setup is simple:
- Organize your data in a table. Each row should represent an observation (e.g., a single ad campaign), and each column should represent a variable (e.g., Ad Spend, Conversions).
- Use only numeric data. PCA works with numbers. Remove any columns with text or non-numeric values that you don't intend to analyze.
- Standardize your variables (optional but highly recommended). If your variables are measured on different scales (e.g., Ad Spend in thousands of dollars vs. CTR as a small percentage), the variable with the largest scale will disproportionately influence the results. Standardization rescales the data so everything has a comparable weight. Luckily, the XLSTAT add-in can do this for you automatically, so you don’t have to do it manually with formulas.
Step 3: Run the Principal Component Analysis in XLSTAT
With your data prepped and XLSTAT installed, you're ready to perform the analysis. This part is surprisingly straightforward.
- Navigate to the XLSTAT tab in your Excel ribbon.
- Click on Analyzing data > Principal Component Analysis (PCA).
This will open the PCA dialog box. Now, we just need to tell it what to do:
- In the General tab, under "Observations/Quantitative variables," click the icon and select your entire data table, including the headers (e.g., Column A to E).
- Check the box for Variable labels since our selection includes the column headers.
- Crucially, set the PCA type to Correlation. This automatically standardizes your data, solving the scaling issue we talked about earlier. Choose this method anytime your variables are on different scales.
- Go to the Outputs tab. Check all the boxes you need. For a standard analysis, make sure Eigenvalues, Factor loadings, and Variable correlations are checked.
- Now go to the Charts tab. Check Scree plot and Correlation circle. These are two of the most important visuals for interpreting your results.
- Click OK. XLSTAT will run the analysis and generate a new sheet in your workbook containing all the results.
How to Interpret the PCA Results in Excel
XLSTAT generates a lot of outputs, which can look intimidating at first. Don’t worry, you only need to focus on a few key pieces to get the core insights.
1. The Eigenvalues Table & Scree Plot
The first thing to look at is the eigenvalues table and the corresponding scree plot. An eigenvalue represents the amount of variance explained by a single principal component. Larger eigenvalues mean more variance explained.
- % Variance: Shows the percentage of total variance captured by each component.
- Cumulative %: Shows the cumulative total. This is the one to watch.
In our example, the first component (PC1) explains a whopping 61.64% of the variation in our data. The second component (PC2) explains another 25.20%. Together, the first two components explain 86.84% of the total variance! This is great — it means we can condense our 5 variables into just 2 and still keep most of the story intact.
The Scree Plot is just a visual representation of this table. It helps you decide how many components to keep. A common method is the "elbow rule": keep all components before the point where the line starts to flatten out (the "elbow").
In this plot, the elbow is clearly at component 3, confirming that keeping the first two components is a good choice.
2. The Factor Loadings Table & Correlation Circle
Okay, so we know we need two components. But what do they mean? That's where the factor loadings table comes in. It shows how strongly each original variable is correlated with each new principal component.
An easier way to interpret this is with the Correlation Circle (also called a Biplot).
Here’s how to read this chart:
- Axes represent components: The horizontal axis is PC1, and the vertical axis is PC2.
- Vectors are your original variables: Each arrow represents one of your original metrics.
- Interpreting the directions:
From our chart, we can give stories to our components:
- Story for PC1 (Horizontal Axis): The variables pointing strongly to the right are CTR and Conversions. The variable pointing to the left is Cost per Conversion. This component seems to represent campaign efficiency. High PC1 scores mean high CTR/conversions and low cost — very efficient. Low scores mean the opposite. We can name PC1 "Campaign Efficiency".
- Story for PC2 (Vertical Axis): The variables pointing straight up are Reach and Ad Spend. This component is simply about raw volume and budget. High PC2 scores mean high spend and high reach. We can name PC2 "Campaign Scale".
You have successfully reduced five complex metrics into two intuitive concepts: Efficiency and Scale. Now you can analyze your campaigns through this much simpler lens without losing critical insight.
Final Thoughts
Principal Component Analysis converts cluttered datasets into clear, powerful insights by boiling down many variables into just a few. While performing the underlying math manually in Excel is complex, using an add-in like XLSTAT makes the entire process manageable for anyone comfortable in a spreadsheet.
As valuable as this is, we know that setting up spreadsheets, installing add-ins, and wading through statistical outputs can be time-consuming. We created Graphed to remove this friction entirely. Instead of running a multi-step analysis, you can just connect your data platforms and ask in plain English, "What are the main drivers of my campaign performance?" or "Group my ad campaigns into clusters based on their key metrics." Graphed serves up a digestible analysis almost instantly, turning hours of work into a 30-second conversation.
Related Articles
What SEO Tools Work with Google Analytics?
Discover which SEO tools integrate seamlessly with Google Analytics to provide a comprehensive view of your site's performance. Optimize your SEO strategy now!
Looker Studio vs Metabase: Which BI Tool Actually Fits Your Team?
Looker Studio and Metabase both help you turn raw data into dashboards, but they take completely different approaches. This guide breaks down where each tool fits, what they are good at, and which one matches your actual workflow.
How to Create a Photo Album in Meta Business Suite
How to create a photo album in Meta Business Suite — step-by-step guide to organizing Facebook and Instagram photos into albums for your business page.