Why Predictive Modeling Performs Worse than the Ancient Theory

Lulu Yan
6 min readApr 30, 2020
Comic source: FiveThirtyEight. “A Comic Strip Tour of the Wild World of Pandemic Modeling“. Image copyright belongs to the original author/publisher.

We were talking about why mining existing COVID-19 data does not work for a few months, then I came across two articles recently speaking about the same issue from an easier-to-understand angle:

Covid-10 Modeling: Impact of Missing Data and Ignoring Key Features

What 5 Coronavirus Models Say the Next Month Will Look Like

and found an earlier one through search: Why predicting COVID-19 is like forecasting with a broken weather model

From my professional background as a former data scientist and early career plus academic training as a statistician, there was a sudden data shift and underlying assumptions about the population being model by those “predictive models” no longer hold true. Input data captured in the pre-pandemic era no longer represent those during the fast changes, and just like the small print in most financial product sales, “past performance doesn’t guarantee future success”, in my data science career across industries, GIGO, which stands for “Garbage in, garbage out” was the first thing I had to check about input data. The same goes for any underlying modeling assumptions, model drift or decay accordingly.

In fact, the understanding of the epidemic in the United States and the Western world was initially regarded as clusters of pneumonia rather than a pandemic. No one regarded it as a potential deadly pandemic expanded to a global scale. This understanding was sensible at the beginning, but the past data did not predict what would happen: since 2020, there have been climatic conditions where the machine is not conducive to the lungs. This is the defect of the model established by the Western health system, including mortality and infection base model.

After all, the appearance of the illness is not because people are sick, but because nature is sick. The more serious nature’s disease is, the more serious the sickness is. This can only be recognized by traditional Chinese medicine under the concept of harmony between man and nature. But in modern medicine it is impossible for them to observe this point, and they can accurately predict this point. Only the nuclear weapons of traditional Chinese medicine are good fortune. If I were Trump, or the head of the Western countries, according to the previous data, I would make the same decision as they did, regardless of who he thinks he is.

Weathermen in trouble over lies and inaccurate forecasts. Image copyright belongs to the original author/publisher.

We at WeCare Holistic have been educating partnering doctors and patient/consumer clients since January 21, when the first confirmed COVID-19 case in the US reported. Back then some people laughed at and mocked our over-reaction. (Now they regard those early warnings as “over-anticipation”, which turns out to be better than genuine “over-reaction”.) We were explaining the similar idea on why you can’t over rely on data to date, and referred folks with super linear thinking to some samples in “The Art of Critical Decision Making” where intuition from experts could win over incomplete data. That’s why we are glad to see posts from the US writers getting attention, at long last!

Source: Cranky Uncle. “Critical thinking about COVID-19: downgraded death toll”. Image copyright belongs to the original author/publisher.

The underlying assumptions were violated in this case and literally most early-stage black swan events, but an unpopular theory from traditional medicine, Five Yun Six Qi, a deep theory explaining the impact of climate (not just four seasons or 24 solar terms, but much more detailed investigation of every 60-year as well as 180-year cycle) on diseases, stands constant and have proved it accurate time and again (from thousands of years ago to SARS to swine flu to COVID-19). The exact time windows were consistent with what the theory says, even when the data is missing. Just like in Physics, a physicist knows the law yet experimental data and results can vary especially when the process is not complete. Thought experiments would not have happened by mining from historical data alone.

Image Source: On the formation of taiji diagram from the perspective of twenty-four solar terms. Retrieved from https://doi.org/10.1016/j.sjbs.2016.09.006

Even with two teammates who have statistical and data science background, so far we have not used any public disease event data to do forecasting. Why? Part of the reasons were stated earlier. Put it in analogical terms: Imagine you are a physicist who understands the fundamental law of physics and faced with the options to derive results: using A) unrepresentable data (we call “data drift”, “covariate shift”) and past models that do not offer satisfactory level of interpretability in times of dramatic concept drift, and B) using a classic law of physics which has been proven timeless over the past hundred years and that you already deeply understand. Which option would you choose? While what that physicist decides to choose can be a personal choice, the results are likely to be better when the physicist opted for B, employing the timeless and time-tested theory to output results.

Similarly, I am switching to thinking in the framework of the ancient theory instead. It’s been so profound that does a better job than a modern weather forecast, in particular, when data and assumptions are not ideal. I blogged more in Chinese but did think of a gentle reminder based on the theory for last lunar year before the first case started and fall right in the window. Then in our newsletters we mentioned some important time points such as initial peak January 20 (China), best containment opportunity window February 4-March 20 (New York), change in disease but climate more conducive to spread April 20, slightly better but still cannot lower your guard in July, … possible recurrence with different appearance of symptoms (lung system -> liver and gall system) this fall October-November, till February next year when the climate will almost surely help disease under control and the world back in peace — amazingly but also sadly they have come true thus far…

For more information, check out Climate and COVID-19 Pandemic Forecast here on my blog.

Here is a former post that mentions a few common health issues on a high level this year caused by climate.

Solar Terms is a calendar of twenty-four periods and climate to govern agricultural arrangements in ancient China and functions even now. Each solar term has about 15 days and it is decided by the position of the sun in the sky. Minor Snow was one of them when we warned about the conduciveness of pandemic based on the ancient theory in “Yellow Emperor’s Canon of Internal Medicine”.

References & Relevant readings:

“Climate and COVID-19 Forecast” https://medium.com/@luluyan/climate-and-covid-19-forecast-d1ba29ca8237?sk=ea9cbe78016cf369c78136cc10ec3b42

In case you are interested in learning more starting from basic theoretical foundations, start with sexagenary cycle here.

https://www.eurekalert.org/multimedia/pub/148780.php

Circular Dynamics of Ancient Chinese Medicine I” https://www.thewanderingcloud.com/the-archives/circulardynamics1 In the translation, it gave Six Qi an explanation using “Six Environmental Factors”, yet like many scholarly talks in Chinese, Five Yun (NOT Five Elements or Five Phase, some places explains using “Five Circuits” ) have not been mentioned in this article.

https://www.leafly.com/news/health/new-covid-19-model-predicts-when-your-state-could-reopen Errrk. This influential model does not taking climate into consideration, obviously.

Forecasting inaccuracies do not just happen to Fed. Image copyright belongs to the original author/publisher.

--

--

Lulu Yan

Visionary Data Scientist; 20+ Yrs Work Adventurist since age 16; Avocationist for HealthTech in Integrative Medicine: WeCare Holistic, Herbal-Pal® & Denti-Pal®