Data sources?

Status
Not open for further replies.

SecondCor521

Give me a museum and I'll fill it. (Picasso) Give me a forum ...
Joined
Jun 11, 2006
Messages
7,897
Location
Boise
I'm interested in data sources for coronavirus.

I'd like one that is impartial, meaning having no significant interest in overstating or understating the statistics.

I'd like one that is accurate.

I'd like one that shows trends in number of tests, positive cases, recovered, and deaths over time.

I'd like one that covers the US.

I don't need fancy graphics.

The one I've been using is https://www.worldometers.info/coronavirus/country/us. It meets most of my needs.

I note that its numbers don't completely match up with numbers from other sources. Of course it's difficult to get 100% accurate numbers given the present situation.

Anyone have other sources they're using that they think are good?
 
The "flatten the curve" meme has succeeded in infecting human minds as successfully as the new coronavirus has succeeded in infecting human bodies. What is frustrating (for me) is that I don't have access to high quality data that would give me an indication of the extent to which we are "flattening the curve". For example, it would be useful to know on a daily basis for every medical facility capable of treating COVID-19 patients:
• current resource utilization percentage, all illnesses
• current resource utilization percentage, due to COVID-19
• is COVID-19 triage occurring (i.e., are doctors rationing health care due to limited medical resources?)

Summary U.S. medical resource utilization statistics could be calculated from the raw data, and also plotted on a map. This would be more useful to me than the two numbers that currently feature prominently on the CDC website: total COVID-19 cases and total deaths.

Disclaimer: I'm not an expert on public health policy. The CDC may already have the data I'm requesting and just doesn't make it public for some reason. :)
 
I think this is the best data available: https://covidtracking.com/data/

For your criteria:
  • I'd like one that is impartial, meaning having no significant interest in overstating or understating the statistics. Yes
  • I'd like one that is accurate.It's straight from each state's official tallies. They provide a letter grade for the quality of each state's data and give the rubric they used to calculate it.
  • I'd like one that shows trends in number of tests, positive cases, recovered, and deaths over time.You can download the entire data set and massage it any way you like. Most of the U.S. is not tracking recoveries, so you just won't find accurate data for that.
  • I'd like one that covers the US. Yes
  • I don't need fancy graphics. Yes
 
My county public health dept publishes daily stats for the county with the info the OP asked about. The numbers do not match the stats on the web places that aggregate data from everywhere, so it is clear to me that those other sites are behind a few days in gathering and publishing stats.

While it may be interesting to know what is going on in other states, I want to know if the folks living around me are testing positive, staying at home, getting hospitalized, dying, and/or recovering. Also shown are the number of people with pending test results.
 
https://covid19.healthdata.org/projections if you are interested in model projections for the US and by each state.

It appears during today's White House presser that Dr. Bruke and Dr. Fauci use this model or it matched their own internal models.

Of course models are only as good as their assumptions, but it is nice to see the projected peak of my state.
 
I like

https://covidtracking.com/data/

You can look at your state's historical data to see how fast positives are doubling. The numbers may be misleading if increased testing confounds the rate.
I think this is the best data available: https://covidtracking.com/data/

Thank you! I like this one, and bookmarked it. I especially like the historical info provided, so we can see at a glance how fast or slow COVID-19 is increasing in different states.
 
The "flatten the curve" meme has succeeded in infecting human minds as successfully as the new coronavirus has succeeded in infecting human bodies. What is frustrating (for me) is that I don't have access to high quality data that would give me an indication of the extent to which we are "flattening the curve". For example, it would be useful to know on a daily basis for every medical facility capable of treating COVID-19 patients:
• current resource utilization percentage, all illnesses
• current resource utilization percentage, due to COVID-19
• is COVID-19 triage occurring (i.e., are doctors rationing health care due to limited medical resources?)

Summary U.S. medical resource utilization statistics could be calculated from the raw data, and also plotted on a map. This would be more useful to me than the two numbers that currently feature prominently on the CDC website: total COVID-19 cases and total deaths.

Disclaimer: I'm not an expert on public health policy. The CDC may already have the data I'm requesting and just doesn't make it public for some reason. :)
There are a lot of people who are very busy right now trying to gather data and understand its meaning. It is no surprise to me that publicly available information is a little raggedy-andy and even inconsistent. With respect, that is data mostly for tourists and I would submit that entertaining tourists should not be a particularly high priority. Said another way, the priority for data should be to provide it to those who can determine whether it is actionable and who can actually take action.

As I said in another thread, disaster incident management is not a well-oiled and proven machine that was just waiting to be started. Yes, there is structure to the extent of the FEMA Incident Command System (ICS) that has been trained nationally for a number of years but each disaster is different and the people pulled together to do the work are going to be mostly people who do not know each other and have not worked together before. This is complicated by a plethora of politicians ranging from local mayors and sheriffs to the POTUS. All of them think they are in charge and all of them are trying to elbow out competitor politicians for television time. Demand from this mostly useless lot makes things even more difficult for those who are trying to compile actionable information and get it to the right folks. Tourists have to be lower priority.

Anybody really bored at home, here is an online introduction to the ICS: https://emilms.fema.gov/IS0100c/curriculum/1.html It will give you a feel for what is happening behind the scenes. FWIW I attended the three-day in-person ICS-300 class maybe 5 years ago and one of the agencies present was our state health department --- worried about planning for pandemics! We also had wildland fire, sheriffs, local police & fire, etc. The discussions were fascinating.
 
Side note: this epidemic with different term definitions, sampling techniques, and amateur interpretations is how you drive a test engineer crazy.
"Lies, Damn Lies, and Statistics".

As you were. I'll go back to sucking my thumb in the corner.
 
This is not a new data source, but I thought it was an interesting way of looking at the existing data and identifying some of the smaller areas that we might not realize are being hit hard because their total case counts are low.

https://www.scientificamerican.com/article/map-reveals-hidden-u-s-hotspots-of-coronavirus-infection/

The data reveal some surprising patterns in infection rates at the county level after adjusting for population size. For example, many county clusters—such as those around Albany, Ga., Detroit, Nashville, Tenn., and parts of Mississippi and Arkansas—had relatively large numbers of cases per capita. As of March 29, the county cluster encompassing New York State, New Jersey and Massachusetts still had the most confirmed infections both overall and per capita: 76,273 cases, or about 22 per 10,000 people. Yet Albany, Ga., had the second-highest number per capita: 13 cases per 10,000 people. That figure was much higher than those of other well-known hotspots, such as Seattle, which had about eight cases per 10,000, and San Francisco, which had two per 100,000.
 
This site shows information gathered by the Kinsa Smart thermometers. They have been tracking fevers for a number of years in order to predict where flu outbreaks will occur. They are doing the same now with Covid-19. You can look up your county.

https://healthweather.us/?mode=Atypical
 
Last edited:
All of the sites need to be taken with a grain of salt.

California ranks 3rd for number of cases in the US but has a huge backlog of tests. They also have extremely strict testing criteria... You pretty much need to be ready to be hospitalized in order to be tested. This is, without doubt, lowering the number of confirmed cases.
 
It is no surprise to me that publicly available information is a little raggedy-andy and even inconsistent. With respect, that is data mostly for tourists and I would submit that entertaining tourists should not be a particularly high priority.

There are a couple of possible meanings for "tourist": (1) someone intending to travel or currently travelling, or (2) a powerless bystander / observer / rubber-necker. I assume that your intended meaning is (2). Regardless, I agree that getting high quality data into the hands of empowered decision makers is top priority. However, my point is that it would be more useful for me to know that local hospitals are using 30% of capacity with 25% of that capacity allocated to COVID-19 cases (e.g.) rather than merely knowing that X number of local people have tested positive and Y people have died. :popcorn:

Thanks for the info on the FEMA ICS. I've never heard of it. :flowers:
 
... However, my point is that it would be more useful for me to know that local hospitals are using 30% of capacity with 25% of that capacity allocated to COVID-19 cases (e.g.) rather than merely knowing that X number of local people have tested positive and Y people have died. ...
Nothing wrong with that, and I think it is the extremely rare case where that kind of information is not willingly released as it becomes available. But just be patient. My objective was just to point out that behind the scenes in any of these situations is closer to complete chaos than it is to a well-trained marching band. Hence triage to prioritize the work is critical. One of the ICS positions is PAO, Public Affairs Officer, often reporting directly to the Incident Commander. This is an important position. His/her job is to inform to the extent information is available while simultaneously protecting the rest of the staff from the reporters and the gawkers.

I just laughed at the recent headline where some governor was bitching because the "Federal Government" sent some respirators that didn't work. The culprit was probably some guy in Logistics who was working double shifts, had missed dinner, and was trying desperately to deal with requisitions falling on him like a hailstorm.
 
For those who are interested in data science and modeling, here's an interesting podcast: https://fivethirtyeight.com/feature...r-is-trying-to-forecast-the-toll-of-covid-19/

This is Nate Silver of FiveThirtyEight discussing modeling with Dr. Chris Murray who manages the team that creates the IHME COVID-19 models. They talk about why they were created, how they've changed, and how they're being used. It's about as nerdy as you can get, but it's a good look at the process.
 
(Thanks to the mods for reopening this thread for me to post this update.)

I was using https://covid19.healthdata.org/united-states-of-america/idaho to look at projections for my state. It was eerily accurate for a while, but now it claims deaths that haven't occurred yet.

As of right now, my state's official site says 85 deaths:

https://coronavirus.idaho.gov/

The last actual datapoint at:

https://covid19.healthdata.org/united-states-of-america/idaho

Is for 91 deaths on June 6th, three days ago.

I just thought some people might find the discrepancy interesting.
 
Status
Not open for further replies.
Back
Top Bottom