# Introduction to big data midterm exam solution

Introduction to big data midterm exam solution   QUESTION 1
What are the three characteristics of Big Data, and what are the main considerations in processing Big Data?

QUESTION 2
Explain the differences between BI and Data Science.

QUESTION 3
Briefly describe each of the four classifications of Big Data structure types. (i.e. Structured to Unstructured)

QUESTION 4
List and briefly describe each of the phases in the Data Analytics Lifecycle.

QUESTION 5
In which phase would the team expect to invest most of the project time? Why? Where would the team expect to spend the least time?

QUESTION 6
Which R command would create a scatterplot for the dataframe “df”, assuming df contains values for x and y?

QUESTION 7
What is a rug plot used for in a density plot?

QUESTION 8
What is a type I error? What is a type II error? Is one always more serious than the other? Why?

QUESTION 9
Why do we consider K-means clustering as a unsupervised machine learning algorithm?

QUESTION 10
Detail the four steps in the K-means clustering algorithm.

QUESTION 11
List three popular use cases of the Association Rules mining algorithms.

QUESTION 12
Define Support and Confidence

QUESTION 13
How do you use a “hold-out” dataset to evaluate the effectiveness of the rules generated?

QUESTION 14
List two use cases of linear regression models.

QUESTION 15
Compare and contrast linear and logistic regression methods.  QUESTION 1
What are the three characteristics of Big Data, and what are the main considerations in processing Big Data?

QUESTION 2
Explain the differences between BI and Data Science.

QUESTION 3
Briefly describe each of the four classifications of Big Data structure types. (i.e. Structured to Unstructured)

QUESTION 4
List and briefly describe each of the phases in the Data Analytics Lifecycle.

QUESTION 5
In which phase would the team expect to invest most of the project time? Why? Where would the team expect to spend the least time?

QUESTION 6
Which R command would create a scatterplot for the dataframe “df”, assuming df contains values for x and y?

QUESTION 7
What is a rug plot used for in a density plot?

QUESTION 8
What is a type I error? What is a type II error? Is one always more serious than the other? Why?

QUESTION 9
Why do we consider K-means clustering as a unsupervised machine learning algorithm?

QUESTION 10
Detail the four steps in the K-means clustering algorithm.

QUESTION 11
List three popular use cases of the Association Rules mining algorithms.

QUESTION 12
Define Support and Confidence

QUESTION 13
How do you use a “hold-out” dataset to evaluate the effectiveness of the rules generated?

QUESTION 14
List two use cases of linear regression models.

QUESTION 15
Compare and contrast linear and logistic regression methods.

Introduction to big data midterm exam solution