MINI CASE: CAR RECALL DATA
UCL School of Management
MSIN0116 Business Strategy and Analytics
MINI CASE: CAR RECALL DATA1
INSTRUCTIONS: The mini case is to be completed and discussed during Lecture 2.
Every year, a large number of cars are recalled by the automobile firms due to safety reasons. In the U.S., all
car recall data is recorded and stored by National Highway Traffic Safety Administration (NHTSA), Department
of Transportation. This dataset is available to the public and contains all NHTSA safety-related defect and
compliance campaigns since 1967. The data is in text format with 89431 records. All the records are TAB
delimited, and all dates are in YYYYMMDD format. The data also includes 24 variables listed in the table below.
As a data analyst, you are interested in the car recalls related to Ford Focus and Honda Accord. To analyse the
data, please follow the steps below.
1. Download the car recall data from Moodle.
2. Convert the text file to a data file that you can analyse (i.e., Stata, SAS, Excel, SPSS or Python) .
3. For Ford Focus and Honda Accord respectively, find out how many records in the data set are related
to these two models.
4. For Ford Focus and Honda Accord respectively, how many of the recalls are initiated by the
manufacturer (MFR), Office of Vehicle Safety Compliance (OVSC) or Office of Defects Investigation
(ODI). Tabulate the results in a table with the frequency and percentage.
5. For Ford Focus and Honda Accord respectively, draw a bar chart to demonstrate the number of cars
that are affected for each model year. (Hint: use the variable “YEARTXT” and “POTAFF”)
In 2016, Vauxhall decided to recall all its Zafira B model in the UK, involving 234,938 cars manufactured from
2009 to 2014. The root cause behind this large recall is that, all affected cars used the same faulty thermal
fuse that may cause fire2. In fact, sharing the same component across different products is a common practice
in manufacturing industry. However, when the shared component fails, the firms have to recall a large number
of products, resulting in very high cost and negative impact on the firms’ public image.
6. Use the car recall data, verify that component sharing indeed exists. (Hint: you can create a new
variable called “sharing”, which counts how many different car models are using the same defective
component, then draw a histogram of “sharing”. You can also study how many cars are recalled for
each of the defective component.)
7. Briefly discuss the advantages and disadvantages of component sharing in the context of product
recalls (Word limit: 300). You can make use of the reference listed below.
1 Based on the car recall data downloaded in July 2011 from National Highway Traffic Safety Administration (NHTSA), Department of
Transportation, USA. This mini case can only be used in the class of MSIN0116 Business Strategy and Analytics at UCL. All right
reserved. Dr. Yufei Huang.
Davidson III, W.N. and Worrell, D.L., 1992. Research notes and communications: The effect of product recall
announcements on shareholder wealth. Strategic Management Journal, 13(6), pp.467-473.
Fisher, M., Ramdas, K. and Ulrich, K., 1999. Component sharing in the management of product variety: A study of
automotive braking systems. Management Science, 45(3), pp.297-315.
Haunschild, P.R. and Rhee, M., 2004. The role of volition in organizational learning: The case of automotive product
recalls. Management Science, 50(11), pp.1545-1560.
Oshri, I. and Newell, S., 2005. Component sharing in complex products and systems: challenges, solutions, and
practical implications. IEEE Transactions on Engineering Management, 52(4), pp.509-521.
Ramdas, K., Fisher, M. and Ulrich, K., 2003. Managing variety for assembled products: Modeling component systems
sharing. Manufacturing & Service Operations Management, 5(2), pp.142-156.
Ramdas, K. and Randall, T., 2008. Does component sharing help or hurt reliability? An empirical study in the
automotive industry. Management Science, 54(5), pp.922-938.
Rhee, M. and Haunschild, P.R., 2006. The liability of good reputation: A study of product recalls in the US automobile
industry. Organization Science, 17(1), pp.101-117.
In 2016, Summer Olympic Games was held in Rio. We are interested in the
performance of the countries in this game. Please investigate the following
questions. The data file, “Rio Olympic Medal Data.xlsx”, can be download
1. Researchers argue that a country’s GDP can impact the number of total
medals and the number of gold medals. Please use the regression
model to study the effect of 1% change of GDP on the number of
total medals and gold medals, respectively. Report and Explain your
2. Other researchers think that, besides GDP, a country’s Population
may also influence the performance in 2016 Rio Olympic Games. So we
need to consider both GDP and Population in our regression. Please
study the influence of Population and 1% change of GDP on total
medals and gold medals, respectively. Report and Explain your results.
Courneya and Carron (1992) propose that home advantage exists in sports
competition, namely the home team usually performs much better. Moreover,
for Olympic games, researchers argue that such home advantage may even
last until the next Olympic games.
3. Assuming that you are now focusing on the performance of China
(2008 home team) and Team GB (2012 home team) in the summer
Olympic Games from 1986 (Los Angeles) to 2016 (Rio). Based on the
historical data, for each of the two countries respectively, draw a bar
chart for the total metal, gold metal and the country’s rankings based
on total medal and gold metal to visualise the two countries
performance from 1986 to 2016. Can you observe the home advantage
and its lasting effect? Discuss your results.
(Hint: Olympic Games data can be easily found online, i.e.,
Courneya, K.S. & Carron, A.V. (1992). The home advantage in sport competitions: A literature
review. Journal of Sport and Exercise Psychology, 14(1), 13-27.
Forrest, D., Sanz, I., & Tena, J. D. D. (2010). Forecasting national team medal totals at the
Summer Olympic Games. International Journal of Forecasting, 26(3), 576-588.
Shibli, S., Gratton, C., & Bingham, J. (2012). A forecast of the performance of Great Britain
and Northern Ireland in the London 2012 Olympic Games. Managing Leisure, 17(2-3), 274-
Online keyword auctions in search engines have become a billion-dollar business. As a data analyst,
you are interested in the online auctions data. Please download the data and answer the following
questions. (Students are encouraged to use Stata to finish this mini case, especially Q4. Stata is
available via Desktop@UCL Anywhere. More information can be found here:
1. For the variable PRICE, report the descriptive statistics (i.e., mean, median, mode, range,
minimum and maximum).
2. Find out how many bids are placed manually and by an automatic bidding program. Tabulate
the results in a table with the frequency and percentage.
3. Count how may different ACCOUNT_ID there are in the data. Each ACCOUNT_ID may bid for
multiple phrases. Find the ACCOUNT_ID that places bids in the most phrases. How many
different PHRASE_ID does this ACCOUNT_ID bid for? What are they?
4. Count how may different PHRASE_ID there are in the data. Draw a histogram based on the
average price of each PHRASE_ID.
5. For PHRASE_ID=1, visualize the price trend over time for the bidder with ACCOUNT_ID 741
and 4265. What can you infer from the trend of the two bidders? (Hint: focus on
PHRASE_ID=1, draw a scatter plot for the bidding price of bidder 741 and 4265 overtime,
6. Read Example 1 in the reading material, “Stata-Kmeans Clustering”. Now, focus on
PHRASE_ID=1, follow the same steps as in Example 1, and perform clustering analysis for the
variable PRICE and ACCOUNT_ID using the online auction data.
• Generate a summary table and draw a graph matrix for the two variables PRICE and
• Perform clustering analysis and set the number of clusters to 3.
• Show a graph matrix for the clusters, i.e., label the data points using the cluster
• Interpret the graph matrix, i.e., why the data has clusters in the way as shown in the
1 Based on the online keyword auction data from an anonymous search engine in 2002. This mini case can only be used in
the class of MSIN0116 Business Strategy and Analytics at UCL, 2019-2020. All right reserved. Dr. Yufei Huang.
This dataset includes advertiser bid data in the following format:
FILE: ( LINE '\n' ) +
LINE: TIMESTAMP '\t' PHRASE_ID '\t' ACCOUNT_ID '\t' PRICE '\t' AUTO
TIMESTAMP: MM/DD/YYYY HH:MM:SS
AUTO: 0 or 1
For the Auto field, 0 means that the bid was placed manually, 1 that the bid was placed by an automatic
bidding program. Bids are given for 15-minute increments. The original data set has 18 million records. Only
the first 1 million records are provided in this excerpt of data set. Price is denominated in US dollars.