Free Databricks-Certified-Professional-Data-Scientist Sample Questions and 100% Cover Real Exam Questions (Updated 140 Questions)
Download Real Databricks Databricks-Certified-Professional-Data-Scientist Exam Dumps Test Engine Exam Questions
NEW QUESTION 74
Select the statement which applies correctly to the Naive Bayes
- A. Works with a small amount of data
- B. Sensitive to how the input data is prepared
- C. Works with nominal values
Answer: A,B,C
NEW QUESTION 75
What is the considerable difference between L1 and L2 regularization?
- A. L1 regularization has more accuracy of the resulting model
- B. Size of the model can be much smaller in L1 regularization than that produced by L2-regularization
- C. All of the above are correct
- D. L2-regularization can be of vital importance when the application is deployed in resource-tight environments such as cell-phones.
Answer: B
Explanation:
Explanation
The two most common regularization methods are called L1 and L2 regularization. L1 regularization penalizes the weight vector for its L1-norm (i.e. the sum of the absolute values of the weights), whereas L2 regularization uses its L2-norm. There is usually not a considerable difference between the two methods in terms of the accuracy of the resulting model (Gao et al 2007), but L1 regularization has a significant advantage in practice. Because many of the weights of the features become zero as a result of L1-regularized training, the size of the model can be much smaller than that produced by L2-regularization. Compact models require less space on memory and storage, and enable the application to start up quickly. These merits can be of vital importance when the application is deployed in resource-tight environments such as cell-phones.
Regularization works by adding the penalty associated with the coefficient values to the error of the hypothesis. This way, an accurate hypothesis with unlikely coefficients would be penalized whila a somewhat less accurate but more conservative hypothesis with low coefficients would not be penalized as much.
81
NEW QUESTION 76
Question-34. Stories appear in the front page of Digg as they are "voted up" (rated positively) by the community. As the community becomes larger and more diverse, the promoted stories can better reflect the average interest of the community members. Which of the following technique is used to make such recommendation engine?
- A. Collaborative filtering
- B. Naive Bayes classifier
- C. Content-based filtering
- D. Logistic Regression
Answer: A
Explanation:
Explanation
One scenario of collaborative filtering application is to recommend interesting or popular information as judged by the community. As a typical example, stories appear in the front page of Digg as they are "voted up" (rated positively) by the community. As the community becomes larger and more diverse, the promoted stories can better reflect the average interest of the community members.
NEW QUESTION 77
Which method is used to solve for coefficients bO, b1, ... bn in your linear regression model:
- A. Ridge and Lasso
- B. Apriori Algorithm
- C. Integer programming
- D. Ordinary Least squares
Answer: D
Explanation:
Explanation : RY = b0 + b1x1+b2x2+ .... +bnxn
In the linear model, the bi's represent the unknown p parameters. The estimates for these unknown parameters are chosen so that, on average, the model provides a reasonable estimate of a person's income based on age and education. In other words, the fitted model should minimize the overall error between the linear model and the actual observations. Ordinary Least Squares (OLS) is a common technique to estimate the parameters
NEW QUESTION 78
Suppose A, B , and C are events. The probability of A given B , relative to P(|C), is the same as the probability of A given B and C (relative to P ). That is,
- A. P(A,B|C) P(B|C) =P(B|A,C)
- B. P(A,B|C) P(B|C) =P(C|B,C)
- C. P(A,B|C) P(B|C) =P(A|B,C)
- D. P(A,B|C) P(B|C) =P(A|C,B)
Answer: C
Explanation:
Explanation
From the definition, P(A,B|C) P(B|C) =P(A,B.C)/P(C) P(B.C)/P(C) =P(A,B.C) P(B,C) =P(A|BC) This follows from the definition of conditional probability, applied twice: P(A,B)=(PA|B)P(B)
NEW QUESTION 79
What is the probability that the total of two dice will be greater than 8, given that the first die is a 6?
- A. 2/6
- B. 2/3
- C. 1/3
- D. 1/6
Answer: B
NEW QUESTION 80
Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?
- A. There is not enough data to create a test set.
- B. The data is unformatted.
- C. There are missing values in the data.
- D. There are categorical variables in the model.
Answer: A
NEW QUESTION 81
A problem statement is given as below
Hospital records show that of patients suffering from a certain disease, 75% die of it. What is the probability that of 6 randomly selected patients, 4 will recover?
Which of the following model will you use to solve it.
- A. Poisson
- B. Any of the above
- C. Normal
- D. Binomial
Answer: D
NEW QUESTION 82
In which of the scenario you can use the linear regression model?
- A. Predicting sales of the text book based on the number of students in state
- B. Predicting demand of the goods and services based on the weather
- C. Predicting tumor size reduction based on input as number of radiation treatment
- D. Predicting Home Price based on the location and house area
Answer: A,B,C,D
Explanation:
Explanation : You can use the linear regression model for predicting the continuous output variable based on the input variables. In all the cases mentioned in the question option, you can see that output can be predicted based on the input variable.
Option-A: Input: Location, House Area and Output: House Price
Option-B : Input: Weather condition, Output: Demand for the goods and services Option-C : Input: Number of Radiation Session Output: Tumor Size Reduction Option-D : Input: Number of students and Output: Sale quantity of text book
NEW QUESTION 83
Projecting a multi-dimensional dataset onto which vector has the greatest variance?
- A. second principal component
- B. second eigenvector
- C. not enough information given to answer
- D. first eigenvector
- E. first principal component
Answer: E
Explanation:
Explanation
The method based on principal component analysis (PCA) evaluates the features according to the projection of the largest eigenvector of the correlation matrix on the initial dimensions, the method based on Fisher's linear discriminant analysis evaluates. Them according to the magnitude of the components of the discriminant vector.
The first principal component corresponds to the greatest variance in the data, by definition. If we project the data onto the first principal component line, the data is more spread out (higher variance) than if projected onto any other line, including other principal components.
NEW QUESTION 84
You are working on a Data Science project and during the project you have been gibe a responsibility to interview all the stakeholders in the project. In which phase of the project you are?
- A. Operationnalise the models
- B. Data Preparations
- C. Creating Models
- D. Creating visuals from the outcome
- E. Executing Models
- F. Discovery
Answer: F
Explanation:
Explanation
During the discovery phase you will be interviewing all the project stakeholders because they would be having quite a good amount of knowledge for the problem domain you will be working and you also interviewing project sponsors you will get to know what all are the expectations once project get completed. Hence, you will be noting down all the expectations from the project as well as you will be using their expertise in the domain.
NEW QUESTION 85
RMSE is a useful metric for evaluating which types of models?
- A. All of the above
- B. Linear regression
- C. Naive Bayes classifier
- D. Logistic regression
Answer: B
Explanation:
Explanation
Error calculation allows you to see how well a machine learning
method is performing.
One way of determining this performance is to calculate a numerical error This number is sometimes a percent, however it can also be a score or distance. The goal is usually to minimize an error percent or distance:
however th goal may be to minimize or maximize a score. Encog supports the following error calculation methods.
Sum of Squares Error (ESS)
Root Mean Square Error (RMS)
Mean Square Error (MSE) (default)
SOM Error (Euclidean Distance Error)
RMSE measures error of a predicted numeric value, and so applies to contexts like regression and some recommender system techniques, which rely on predicting a numeric value. It is not relevant to classification techniques like logistic regression and Naive Bayes, which predict categorical values.
It also is not relevant to unsupervied techniques like clustering.
The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values predicted by a model or an estimator and the values actually observed. Basically, the RMSD represents the sample standard deviation of the differences between predicted values and observed values.
These individual differences are called residuals when the calculations are performed over the data sample that was used for estimation, and are called prediction errors when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSD is a good measure of accuracy, but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent.
NEW QUESTION 86
Which of the following statement true with regards to Linear Regression Model?
- A. Ordinary Least Square is a sum of the squared individual distance between each point and the fitted line of regression model.
- B. Ordinary Least Square can be used to estimates the parameters in linear model
- C. In Linear model, it tries to find multiple lines which can approximate the relationship between the outcome and input variables.
- D. Ordinary Least Square is a sum of the individual distance between each point and the fitted line of regression model.
Answer: A,B
Explanation:
Explanation
Linear regression model are represented using the below equation
Where B(0) is intercept and B(1) is a slope. As B(0) and B(1) changes then fitted line also shifts accordingly on the plot. The purpose of the Ordinary Least Square method is to estimates these parameters B(0) and B(1).
And similarly it is a sum of squared distance between the observed point and the fitted line. Ordinary least squares (OLS) regression minimizes the sum of the squared residuals. A model fits the data well if the differences between the observed values and the model's predicted values are small and unbiased.
NEW QUESTION 87
You are doing advanced analytics for the one of the medical application using the regression and you have two variables which are weight and height and they are very important input variables, which cannot be ignored and they are also highly co-related. What is the best solution for that?
- A. You will take square of the height.
- B. You will take square root of weight
- C. You would consider using BMI (Body Mass Index)
- D. You will take cube root of height
Answer: C
Explanation:
Explanation
If multiple variables are highly co-related then it is better you consider using the either of the variable which correlates more (which is not in the given option) or go for the new variable which is a function of the both the variable in this case it could be BMI (Body Mass Index). Because it is a function of both weight and height as per the below formula. BMI = Weight/(Height * Height)
NEW QUESTION 88
What is the best way to evaluate the quality of the model found by an unsupervised algorithm like k-means clustering, given metrics for the cost of the clustering (how well it fits the data) and its stability (how similar the clusters are across multiple runs over the same data)?
- A. The most stable clustering
- B. The lowest cost clustering subject to a stability constraint
- C. The lowest cost clustering
- D. The most stable clustering subject to a minimal cost constraint
Answer: B
Explanation:
Explanation
There is a tradeoff between cost and stability in unsupervised learning. The more tightly you fit the data, the less stable the model will be, and vice versa. The idea is to find a good balance with more weight given to the cost. Typically a good approach is to set a stability threshold and select the model that achieves the lowest cost above the stability threshold.
NEW QUESTION 89
You are working in a classification model for a book, written by HadoopExam Learning Resources and decided to use building a text classification model for determining whether this book is for Hadoop or Cloud computing. You have to select the proper features (feature selection) hence, to cut down on the size of the feature space, you will use the mutual information of each word with the label of hadoop or cloud to select the 1000 best features to use as input to a Naive Bayes model. When you compare the performance of a model built with the 250 best features to a model built with the 1000 best features, you notice that the model with only 250 features performs slightly better on our test data.
What would help you choose better features for your model?
- A. Include least mutual information with other selected features as a feature selection criterion
- B. Decrease the size of our training data
- C. Include the number of times each of the words appears in the book in your model
- D. Evaluate a model that only includes the top 100 words
Answer: A
Explanation:
Explanation
Correlation measures the linear relationship (Pearson's correlation) or monotonic relationship (Spearman's correlation) between two variables, X and Y.
Mutual information is more general and measures the reduction of uncertainty in Y after observing X.
It is the KL distance between the joint density and the product of the individual densities. So Ml can measure non-monotonic relationships and other more complicated relationships Mutual information is a quantification of the dependency between random variables. It is sometimes contrasted with linear correlation since mutual information captures nonlinear dependence.
Features with high mutual information with the predicted value are good. However a feature may have high mutual information because it is highly correlated with another feature that has already been selected.
Choosing another feature with somewhat less mutual information with the predicted value, but low mutual information with other selected features, may be more beneficial. Hence it may help to also prefer features that are less redundant with other selected features.
NEW QUESTION 90
What are the advantages of the Hashing Features?
- A. Less pass through the training data
- B. Requires the less memory
- C. Easily reverse engineer vectors to determine which original feature mapped to a vector location
Answer: A,B
Explanation:
Explanation
SGD-based classifiers avoid the need to predetermine vector size by simply picking a reasonable size and shoehorning the training data into vectors of that size. This approach is known as feature hashing. The shoehorning is done by picking one or more locations by using a hash of the name of the variable for continuous variables or a hash of the variable name and the category name or word for categorical, text*like, or word-like data.
This hashed feature approach has the distinct advantage of requiring less memory and one less pass through the training data, but it can make it much harder to reverse engineer vectors to determine which original feature mapped to a vector location. This is because multiple features may hash to the same location. With large vectors or with multiple locations per feature, this isn't a problem for accuracy but it can make it hard to understand what a classifier is doing.
An additional benefit of feature hashing is that the unknown and unbounded vocabularies typical of word-like variables aren't a problem.
NEW QUESTION 91
......
Databricks Databricks-Certified-Professional-Data-Scientist Exam Syllabus Topics:
| Topic | Details |
|---|---|
| Topic 1 |
|
| Topic 2 |
|
| Topic 3 |
|
New Databricks-Certified-Professional-Data-Scientist exam dumps Use Updated Databricks Exam: https://pdfvce.trainingdumps.com/Databricks-Certified-Professional-Data-Scientist-valid-vce-dumps.html

