Category Archives: Statistics

Statistical Analysis of Medical Screening

Source:, Mar 2020

Of the 950,000 folks who are not infected. nearly all (95%) get a clean bill of health and of the 50,000 infected souls, nearly all (95%) get a red card. But along the way 2500 infected people go undetected, while 47,000 uninfected get red carded nonetheless.

Perhaps they ate a poppy seed bagel that morning, or the test was run late at night by a dog-tired technician. In any case, you will notice that half of all those getting flagged are not in fact infected.

Consequently, the media reports that the infection rate is 9.5% [(47,500+47,500)/1,000,000] rather than 5%, although what they really mean is that the test-positive rate is 9.5%.

This also affects estimates of the mortality rates. It’s not called the Dreaded Red Squamish for nothing.

But the denominator has been inflated by the false positives, so deaths will be divided by 95,000 rather than the [unknown] 50,000. There will be 47,000 “recoveries” of people who never actually had the disease. OTOH, some of the undetected 2,500 may also shuffle off the coil of mortality, but these will be assigned to collateral conditions (heart disease, asthma, etc.)

10 Data-Driven Stories

Source: HBR blog, May 2014


  1. Past: reporting story using descriptive analytics to tell what happened
  2. Present: most likely to involve some form of survey—an analysis of what people or objects are currently up to
  3. Future: predictions; they use … predictive analytics. They take data from the past … to create a statistical model, which is then used to predict the future.
    Quants create prediction stories all the time—about what customers are likely to buy, about how likely it is for an event to happen, about future economic conditions. These types of prediction stories always involve assumptions (notably that the future will be like the past in some key respects) and probability


  1. What: What stories are like reporting stories—they simply tell what happened.
  2. Why: Why stories go into the underlying factors that caused the outcome.
  3. How: How to address the issue stories explore various ways to improve the situation identified in the what and the why stories.


  1. CSI projects—relatively small, ad hoc investigations to find out why something suboptimal was happening. 
  2. Eureka stories, which involve long, analytically-driven searches for a solution to a complex problem.


  1. a correlation story—in which the relationships among variables rose or fell at the same time
  2. a causation story, in which you’ll argue that one variable caused the other?

Difference between Analytics and Experiments

Source: HBR blog, May 2014

In their 2012 feature on big data, Andrew McAfee and Erik Brynjolfsson describe the opportunity and report that “companies in the top third of their industry in the use of data-driven decision making were, on average, 5% more productive and 6% more profitable than their competitors” even after accounting for several confounding factors.

One of the most important distinctions to make is between analytics and experiments.

The former provides data on what is happening in a business, the latter actively tests out different approaches with different consumer or employee segments and measures the difference in response.


Statistics Done Wrong (FREE online book)

Source: Statistics Done Wrong website, date indeterminate

Statistics Done Wrong is a guide to the most popular statistical errors and slip-ups committed by scientists every day, in the lab and in peer-reviewed journals. Many of the errors are prevalent in vast swathes of the published literature, casting doubt on the findings of thousands of papers. Statistics Done Wrong assumes no prior knowledge of statistics, so you can read it before your first statistics course or after thirty years of scientific practice.


Type I and Type II errors

Source: Flowing Data, May 2014

Type I errors, also known as false positives, occur when you see things that are not there. Type II errors, or false negatives, occur when you don’t see things that are there

Global Internet Statistics (2014)

Source: Tech in Asia, Jan 2014

A new 180-page slideshow has been created by the WeAreSocial Singapore team, showing all the facts and stats we have so far on Earth’s 2.5 billion web users. The full slideshow is embedded below, but first let’s look over some Asia-related highlights:

  • There are now 1.86 billion active social network users around the world.
  • Across Asia, 635 million people have mobile data subscriptions so that they can go online on their phones.
  • Southeast Asia is the most mobile-centric area of the continent, with a 109 percent mobile penetration rate.
  • Southeast Asia and South Asia have internet penetration rates below the world average of 35 percent.
  • Boosted by China, East Asia’s internet penetration rate is above average, at 48 percent.
  • 3 of the top 10 social networks (by active usage) are messaging apps: WeChat with 272 million active users; WhatsApp with 400 million; and QQ with 816 million. China’s Tencent (HKG:0700) owns both QQ and WeChat.