5
Capstone Project
The Capstone project is completed at the culmination of the program.
Students have the opportunity to identify, review and interpret data
that pertains to a significant problem, issue or opportunity through
statistical and operational analysis. This comprehensive analysis will
include: the use of statistical tests, predictive models (using both
supervised and unsupervised techniques) and visualizations of data
that addresses the research question.
This is generally an individual project. The project specifications are
described below. The final analysis will result in a report and a
presentation. The presentation specifications are described separately
following the capstone report specifications.
CAPSTONE REPORT SPECIFICATIONS:
The capstone report should be a professionally formatted report that is
10-20 pages in length. Use single spacing and a font size of 12. The
report should include [clearly label each section with topic headings in
bold; the items in bold below are topic heading names]:
1. Introduction – this is a 1 paragraph summary of the project.
What are you investigating and why is it important.
2. Research Question(s) – What is the specific research
question(s).
3. Literature Review – Discuss any prior research in the area of
study. Consider 3-5 sources. What were their conclusions? What
were their variables? How does this pertain to your research
question? Use peer reviewed research sources only; cite them
within the report and list them in a references section at the end
of the report.
4. Data Utilized for Analysis – Clearly indicate where your data
came from, how many/which data sets did you use and what
was the format of the file(s) (e.g., SAS, Excel, SQL, etc.
How many rows (observations) are contained within each file.
In an Appendix to the report:
Create a table that lists:
1. Variable Name
2. SAS Level
3. SAS Role
4. Missing Values – the number of rows containing missing
values
6
5. Missing Value Method – what you did to handle the missing
values (e.g., mean, mode, etc.)
5. Data Preparation –
Data
Describe which variables are inputs, which were rejected and
which variable(s) are your target. How does your target support
your research question(s)?
Missing Values.
For any variables that had missing values describe what method
you used to handle them and why you chose that method.
Include a figure which from the output which shows the number
of missing variables.
Outliers
Which variables contained outliers (include graphs to show
them). How did you handle the outliers? Explain the method you
chose and why the method you chose was appropriate.
Skewness and Kurtosis
Which variables contained skewed data and how did you handle
it (justify the technique you used). Did any of the variables
indicate kurtosis? Did you transform the variable as a result? If
so, describe the method used and the results. Show the output
which indicated the variables that had kurtosis and/or skewness.
Data Partition
How did you partition your data? What other partitions did you
try? Explain the partition you chose and why.
6. Data Analysis
Descriptive statistics, predictive modeling (supervised), data
mining (unsupervised), and visualization must be used. Other
methods of analysis may be used but are optional.
You must use at least 5 different predictive and/or data mining
techniques using SAS Enterprise Miner (e.g. decision tree, neural
network, regression, gradient boosting, ensemble model, cluster
analysis, two stage models, random forests, cluster analysis,
market basket analysis, etc. (each of these count as one
technique though you will find that within each you may try
multiple variations to optimize the technique). There must be at
least one supervised and at least one unsupervised technique.
Describe each technique you used and how you optimized each
7
technique; if you used multiple variations, describe each one.
When describing, discuss what you did. For example, if a neural
network was used, what type of network, what functions were
used, how many hidden units, etc.
Include at least 3 different data visualizations and describe how
they support the analysis of your research question(s).
7. Results
Describe the selection statistic used and why it was the
appropriate statistic. Present the output statistics from a model
comparison that shows the results of all of the techniques used.
Describe which model/technique provided the best fit and why.
Conclusions
Discuss the results of all of your analysis. (e.g., Why are they
significant? How can they be applied in the future?)
8. References (list any external references used, use APA format)
9. Appendices. In addition to your data table, include a copy of
your SAS diagram. It is ok if it spans multiple pages but it must
be portrait and readable. Any other outputs may be embedded
within the report or included within the appendix. Each exhibit
should include only one output and be specifically referenced
within the report.
Your capstone project is a report contained within a single Microsoft
Word document and submitted via the Assignment Link.
Title Page Requirements
Include a title page and please number your pages. On your title page,
in addition to your name, include the path to the location of your SAS
project (it must be on the BANUSERDRIVE\@STUDENTS within your
student folder).
Include the name of the project file and the names of each SAS
diagrams within your project. If there is more than one SAS diagram,
include the purpose of each diagram.
Include the location of your data source(s) – for SAS or Excel files they
must also be in your BAN690 folder within your student folder on the
8