Data Analyst Interview Questions

A comprehensive guide covering SQL, Python/R, Statistics, Data Visualization, Excel, Data Cleaning, Business Domain Knowledge, and Machine Learning basics for Data Analyst roles.

Total Questions:225
Difficulty Levels:
BeginnerIntermediateAdvanced
0%

Overall Progress

0/225

1.What is the difference between WHERE and HAVING clauses?

2.Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.

3.What are window functions and when would you use them?

4.How do you find duplicate records in a table?

5.What is the difference between DELETE, TRUNCATE, and DROP?

6.Explain primary key vs foreign key.

7.What are indexes and why are they important?

8.How do you optimize a slow-running SQL query?

9.What is a subquery and when would you use one?

10.Explain GROUP BY and its use cases.

11.What is the difference between UNION and UNION ALL?

12.How do you handle NULL values in SQL?

13.What are CTEs (Common Table Expressions)?

14.Explain the RANK, DENSE_RANK, and ROW_NUMBER functions.

15.How do you find the second highest salary in a table?

16.What is normalization and denormalization?

17.Explain the different types of keys in SQL (candidate, composite, surrogate).

18.What is a self-join and when would you use it?

19.How do you calculate running totals in SQL?

20.What is the difference between correlated and non-correlated subqueries?

21.Explain CASE statements with examples.

22.What are aggregate functions? Name some common ones.

23.How do you use PARTITION BY in window functions?

24.What is the difference between clustered and non-clustered indexes?

25.How do you find the Nth highest value in a table?

26.What are stored procedures and when would you use them?

27.Explain the COALESCE function.

28.How do you perform date calculations in SQL?

29.What is the difference between CHAR and VARCHAR?

30.How do you pivot and unpivot data in SQL?

31.What Python libraries do you use for data analysis?

32.Explain the difference between a list and a tuple.

33.What is pandas and why is it useful?

34.How do you handle missing data in pandas?

35.What is the difference between loc and iloc?

36.Explain list comprehension with an example.

37.What are lambda functions?

38.How do you merge or join dataframes in pandas?

39.What is NumPy and when would you use it?

40.How do you read CSV and Excel files in Python?

41.What is the difference between apply, map, and applymap in pandas?

42.How do you group data in pandas (groupby)?

43.What are dictionaries in Python and when do you use them?

44.Explain broadcasting in NumPy.

45.How do you handle datetime data in pandas?

46.What is the difference between shallow copy and deep copy?

47.How do you sort data in pandas?

48.What are generators in Python?

49.How do you filter data in pandas?

50.What is matplotlib and seaborn?

51.How do you create custom functions in Python?

52.What is the difference between Series and DataFrame?

53.How do you handle categorical data in pandas?

54.What are pandas pivot tables?

55.How do you optimize pandas code for large datasets?

56.What is the difference between mean, median, and mode?

57.Explain standard deviation and variance.

58.What is a p-value and how do you interpret it?

59.What is the difference between correlation and causation?

60.Explain Type I and Type II errors.

61.What is a confidence interval?

62.What is the Central Limit Theorem?

63.Explain hypothesis testing.

64.What is the difference between population and sample?

65.What are outliers and how do you detect them?

66.Explain normal distribution.

67.What is statistical significance?

68.What is the difference between parametric and non-parametric tests?

69.Explain regression analysis.

70.What is the difference between descriptive and inferential statistics?

71.What is the difference between linear and logistic regression?

72.Explain R-squared and adjusted R-squared.

73.What is the difference between one-tailed and two-tailed tests?

74.What is probability distribution?

75.Explain Bayes' Theorem.

76.What is sampling and what are different sampling methods?

77.What is the law of large numbers?

78.Explain skewness and kurtosis.

79.What is multicollinearity and how do you detect it?

80.What is the difference between covariance and correlation?

81.What is a z-score?

82.Explain the t-test and when to use it.

83.What is chi-square test?

84.What is ANOVA?

85.What is time series analysis?

86.What tools do you use for data visualization?

87.When would you use a bar chart vs a line chart?

88.What is Tableau and have you used it?

89.Explain the importance of data visualization.

90.What makes a good dashboard?

91.How do you choose the right chart type?

92.What is Power BI?

93.What are some best practices for data visualization?

94.How do you visualize data in Python?

95.What is the difference between a histogram and a bar chart?

96.When would you use a scatter plot?

97.What is a heat map and when would you use it?

98.How do you handle too many categories in a visualization?

99.What is a box plot and what does it show?

100.How do you make your visualizations accessible and easy to understand?

101.What Excel functions do you use most frequently?

102.Explain VLOOKUP and its limitations.

103.What is the difference between VLOOKUP and INDEX-MATCH?

104.How do you create a pivot table?

105.What are some advanced Excel formulas you know?

106.How do you handle large datasets in Excel?

107.What is Power Query?

108.Explain conditional formatting.

109.How do you remove duplicates in Excel?

110.What is the difference between absolute and relative cell references?

111.What are array formulas in Excel?

112.How do you use SUMIF, SUMIFS, COUNTIF, COUNTIFS?

113.What is Power Pivot?

114.How do you create dynamic charts in Excel?

115.What are Excel macros and VBA?

116.What is data cleaning and why is it important?

117.How do you handle missing values in a dataset?

118.What is data normalization?

119.How do you detect and handle outliers?

120.What is ETL and have you worked with it?

121.How do you deal with inconsistent data formats?

122.What is data validation?

123.How do you handle duplicate records?

124.What is data transformation?

125.Explain the 80/20 rule in data cleaning.

126.What is feature engineering?

127.How do you handle imbalanced datasets?

128.What is data profiling?

129.How do you validate your cleaned data?

130.What is data standardization vs normalization?

131.How do you translate business requirements into analytical tasks?

132.What KPIs have you worked with?

133.How do you communicate findings to non-technical stakeholders?

134.What is A/B testing and when would you use it?

135.How do you measure customer churn?

136.What is customer segmentation?

137.How do you calculate ROI?

138.What business metrics are most important to track?

139.How do you prioritize analytical projects?

140.Give an example of how your analysis impacted business decisions.

141.What is funnel analysis?

142.How do you measure customer lifetime value (CLV)?

143.What is cohort analysis?

144.How do you build a business case for your recommendations?

145.What is RFM analysis?

146.How do you measure marketing campaign effectiveness?

147.What is the difference between leading and lagging indicators?

148.How do you handle stakeholder disagreements about data interpretation?

149.What is revenue forecasting?

150.How do you measure product performance?

151.What is the data analysis process you follow?

152.What is the difference between structured and unstructured data?

153.What is data warehousing?

154.Explain OLAP vs OLTP.

155.What is big data and have you worked with it?

156.What is data modeling?

157.What is the difference between quantitative and qualitative data?

158.What are data pipelines?

159.What is data governance?

160.Explain dimensional modeling.

161.What is the star schema and snowflake schema?

162.What is data quality and how do you measure it?

163.What is the difference between batch processing and real-time processing?

164.What are fact and dimension tables?

165.What is data lineage?

166.What is the difference between data lake and data warehouse?

167.What is master data management?

168.What is metadata?

169.What are the challenges of working with big data?

170.What is data migration?

171.What is the difference between supervised and unsupervised learning?

172.What is overfitting and underfitting?

173.What is cross-validation?

174.Explain the bias-variance tradeoff.

175.What is feature selection and why is it important?

176.What is clustering and name some clustering algorithms.

177.What is classification vs regression?

178.What is a confusion matrix?

179.What is precision vs recall?

180.What is k-means clustering?

181.What BI tools have you worked with?

182.Have you used Google Analytics? Explain key metrics.

183.What is Git and version control?

184.What cloud platforms have you used (AWS, Azure, GCP)?

185.What is Jupyter Notebook?

186.Have you worked with API integrations?

187.What is Apache Spark?

188.What is Hadoop?

189.What database management systems have you used?

190.What is the difference between SQL and NoSQL databases?

191.How would you analyze a sudden drop in user engagement?

192.A stakeholder questions your data findings. How do you respond?

193.You're given a dataset with 80% missing values. What do you do?

194.How would you design a dashboard for executive leadership?

195.Your SQL query is running too slowly. How do you troubleshoot?

196.Two data sources show conflicting numbers. How do you resolve this?

197.You need to present complex findings in 5 minutes. What's your approach?

198.How would you analyze the success of a new product launch?

199.You discover an error in a report you sent last week. What do you do?

200.How would you identify factors driving customer churn?

201.You're asked to analyze data you don't have access to. How do you proceed?

202.How would you measure the impact of a price change?

203.Your analysis shows a result opposite to what the team expected. How do you handle it?

204.How would you automate a repetitive reporting task?

205.You have one week to learn a new tool for a project. How do you approach it?

206.How would you validate data from a third-party vendor?

207.Your team disagrees on which metric to focus on. How do you decide?

208.How would you analyze seasonal trends in sales data?

209.You're asked to reduce report generation time by 50%. What's your strategy?

210.How would you handle a request to manipulate data to show desired results?

211.Describe a challenging data project you worked on.

212.How do you ensure data accuracy?

213.Tell me about a time you found an insight that changed business strategy.

214.How do you handle tight deadlines?

215.What do you do when you don't know the answer to a data question?

216.How do you stay updated with data analysis trends?

217.Describe your experience working with cross-functional teams.

218.How do you handle conflicting requirements from stakeholders?

219.What's your approach to learning new tools or technologies?

220.Why do you want to be a Data Analyst?

221.Tell me about a time you made a mistake in your analysis.

222.How do you manage multiple projects simultaneously?

223.Describe a time you had to explain technical concepts to a non-technical audience.

224.What's the most complex dataset you've worked with?

225.How do you handle feedback and criticism on your work?