But, the important statement was "this was statistically significant".... how significant might not be know with great accuracy, but it is not zero...
Actually we do know how significant. That's what the p-value quantifies. The researchers were testing a hypothesis. They likely assumed that the response(fate) of the patients with a certain characteristic was the same as the general population of all patients (people). This was their null hypothesis. They then performed a rigorous test of that hypothesis and rejected it to conclude that the people with a certain characteristic did have a different outcome. The p-value is the probability that that the null hypothesis (people with characteristics are the same as everyone else) is true. 1-p is the probability that the alternative is true (people with characteristics are different). The confidence interval stuff is a different, but related, type of analysis that estimates what the "average outcome" is for the population.
There is a lot of good interpretation going on here. I've had more than a few statistics classes and I wouldn't seriously criticize any of the statements I've read so far, though I do recognize both frequentist and Bayesian influences.
It's easy to dismiss such a study because it is small. But often you can't help but do a small study because there are simply very few cases (usually a good thing when speaking of disease). So it is more useful to get the most information out of the study as possible, in spite of the limitations.
In this case if I knew that 5 or 31 patients with certain characteristics survived after 10 years, I'd spend my time trying to figure out what those 5 had in common, or how to be in that group of 5 rather than getting lost in the statistics. Whether the chance of survival (or non-survival) is 16% or between 3% and 29% would be less meaningful to me than "How do I become one of those that survive?"
Incidentally, this whole small sample issue was addressed over 100 years ago in the interest of saving beer. Quality inspections at Guinness performed at high statistical confidence required wasting too much product so William Gosset developed the t-test (and with it the entire science of small sample statistics) to allow accurate estimation to be done using small samples. But in this case, with a sample size of 31, the t-distribution is essentially identical to the normal.
In this study the Poisson distribution was used according to the original statement. The Poisson is a reasonable approximation to the normal distribution for sample sizes of 10 or larger. It is appropriate to use the Poisson distribution since it is a discrete distribution where as the normal distribution is continuous.
When an estimate is given as 16%+/-13% you already have all the information about the "accuracy" of the study. Knowing whether it had 31 participants or 31 million adds no new information. It's true that the larger sample should make the confidence limits smaller. But if there is a lot of variability in the large study then the limits could also be large. Those limits are telling you how certain the estimate is. If knowing the sample size further influences your opinion of the "reliability" of the study then you probably don't understand the statistical interpretation as well as you think you do. Dismissing a study just because it is small is a pet peeve of mine. There is no mathematical basis for that. It is rooted in statistical ignorance of researchers. Some of the worst studies are the very large ones because they tend to pick up random events that confound results.
Basically, this study is telling you that it is about 97% certain than the group of people with those certain characteristics has a different outcome from the group of all people. When they say they used a two-tailed test they are telling you they can only detect a difference, not whether one if higher or lower.
The confidence interval gives you some additional information about what the average "success" rate would be 95% of the time if the study were repeated a large number of times.