Introduction to big data midterm exam solution

QUESTION 1

What are the three characteristics of Big Data, and what are the main considerations in processing Big Data?

QUESTION 2

Explain the differences between BI and Data Science.

QUESTION 3

Briefly describe each of the four classifications of Big Data structure types. (i.e. Structured to Unstructured)

QUESTION 4

List and briefly describe each of the phases in the Data Analytics Lifecycle.

QUESTION 5

In which phase would the team expect to invest most of the project time? Why? Where would the team expect to spend the least time?

QUESTION 6

Which R command would create a scatterplot for the dataframe “df”, assuming df contains values for x and y?

QUESTION 7

What is a rug plot used for in a density plot?

QUESTION 8

What is a type I error? What is a type II error? Is one always more serious than the other? Why?

QUESTION 9

Why do we consider K-means clustering as a unsupervised machine learning algorithm?

QUESTION 10

Detail the four steps in the K-means clustering algorithm.

QUESTION 11

List three popular use cases of the Association Rules mining algorithms.

QUESTION 12

Define Support and Confidence

QUESTION 13

How do you use a “hold-out” dataset to evaluate the effectiveness of the rules generated?

QUESTION 14

List two use cases of linear regression models.

QUESTION 15

Compare and contrast linear and logistic regression methods.

**ad: ****Delta 8 online | THC Online | Buy hemp online | THC Carts | Delta 8 Carts | THCO Carts | HQD Vape | Fume Vape | Fume Extra | Fume Unlimited | HQD Cuvie | HQD Cuvie plus | zero Nic Disposable Vapes | Fume Infinity | Fume Ultra | FoodGod | Supreme Vape **

QUESTION 1

What are the three characteristics of Big Data, and what are the main considerations in processing Big Data?

QUESTION 2

Explain the differences between BI and Data Science.

QUESTION 3

Briefly describe each of the four classifications of Big Data structure types. (i.e. Structured to Unstructured)

QUESTION 4

List and briefly describe each of the phases in the Data Analytics Lifecycle.

QUESTION 5

In which phase would the team expect to invest most of the project time? Why? Where would the team expect to spend the least time?

QUESTION 6

Which R command would create a scatterplot for the dataframe “df”, assuming df contains values for x and y?

QUESTION 7

What is a rug plot used for in a density plot?

QUESTION 8

What is a type I error? What is a type II error? Is one always more serious than the other? Why?

QUESTION 9

Why do we consider K-means clustering as a unsupervised machine learning algorithm?

QUESTION 10

Detail the four steps in the K-means clustering algorithm.

QUESTION 11

List three popular use cases of the Association Rules mining algorithms.

QUESTION 12

Define Support and Confidence

QUESTION 13

How do you use a “hold-out” dataset to evaluate the effectiveness of the rules generated?

QUESTION 14

List two use cases of linear regression models.

QUESTION 15

Compare and contrast linear and logistic regression methods.

Introduction to big data midterm exam solution