Michael Lydeamore
Department of Econometrics and Business Statistics
Joint work with Brett Mitchell, Tracey Bucknall, Allen Cheng, Phil Russo & Andrew Stewardson
Healthcare associated infections (HAIs) are associated with increased morbidity and mortality.
Five of the most common HAIs are:
But, HAIs are not notifiable => We have no robust way to track whether their prevalence is increasing or decreasing.
HAIs are actively monitored across Europe through the ECDC.
In 2016 (based on 2012 data), 2,609,911 new HAIs are estimated to have occurred.
The data for this was a point prevalence survey, on an enormous scale:
A point prevalence survey counts the number of people with a condition on a given day.
We ran a PPS in Australia in 2019, consisting of:
All acute cards wards were included. Non-acute, paediatric, NICU, rehab, and ED were excluded.
The hospitals sampled make up approximately 60% of all overnight separations in Australia.
We have to go from point prevalence to annual incidence…
Hospital Prevalence (\(P\)) is estimated as:
\[P = r \times \text{Beta}(n_\text{obs}, N - n_\text{obs}+1) + (1-r) \times \text{Beta}(n_\text{obs}+1, N-n_\text{obs}),\]
where \(n_\text{obs}\) is the number of patients observed with a HAI and \(N\) is the total number of patients in the PPS.
This is a standard extrapolation from a binomial sample with a lot of zeros.
Hospital incidence (\(I\)) calculated as:
\[I = P \frac{LA}{LOI},\]
where:
🎉🎉 Australia actually captures \(LA\) through the AIHW 🎉🎉
\(LOI\) is not available, but instead we have \(LOI_{pps}\) — the length of infection until the date of survey. We can calculate
\[P(LOI_{pps} = 1),\]
which is just the probability that a patient is in the first day of their HAI. Then,
\[E[LOI] = 1/P(LOI_{pps} = 1).\]
For a small sample size, this is heavily biased, so we take a mixture of this estimator and \(E[LOI_{pps}]\).
Calculate population incidence simply as
\[I_\text{pop} = I \times N_\text{discharges}.\]
🎉🎉 We capture this too! \(N_\text{discharges} = 3,713,513,\) 60% of the total admissions in a given year.
In the ECDC PPS, this quantity is not captured. They estimate using patient-days and the number of patients.
Use a multinomial likelihood function with a Dirichlet prior, with weights taken from the number of cases in each age/sex category.
A psuedocount is added to each strata (\(0.001 \sum{\text{weights}}\)) to ensure the likelihood can be calculated with empty strata.
Just because it’s called \(w\) doesn’t make it a weight
L. Kennedy
But for today it is a weight. We’ll survive.
Use the “McCabe score”, which gives the life expectancy according to secerity of a disease.
Patients are categorised as:
These scores, combined with disease outcome trees, give DALYs and deaths.
Number of HAIs (95% CI) |
Deaths (95% CI) |
DALYs (95% CI) |
|
---|---|---|---|
SSI | 44,238 (31,176 - 73,797) |
876 (617 - 1,263) |
13,197 (9,298 - 19,001) |
UTI | 42,408 (25,200 - 68,735) |
729 (259 - 1,772) |
16,087 (5,939 - 37,218) |
CDI | 5,125 (2,360 - 10,740) |
262 (13 - 836) |
2,757 (241 - 8,655) |
HAP | 51,499 (31,343 - 82,877) |
1,904 (462 - 4,430) |
39,276 (17,608 - 77,915) |
BSI | 23,979 (15,658 - 36,245) |
3,512 (1,874 - 6,075) |
46,773 (26,205 - 79,104) |
All | 170,574 (135,779 - 213,898) |
7,583 (4,941 - 11,135) |
122,376 (85,136 - 172,784) |
That’s 1 in 20 admissions resulting in an avoidable infection!
First estimate of HAI burden in Australia using (relatively) robust survey data in an established framework
Based on first point prevalence survey since 1984
There is no routine surveillance of HAIs in Australia
Point prevalence surveys remain the only way to understand the burden of these conditions
This work has informed guidance on HAI surveillance in Australia, including new funding schemes to better understand these conditions.
And all this based on just 2767 patients from 19 hospitals…
They represent great opportunity for improvement, and we have a long way to go to prevent them entirely.
J. Lakshika, D. Cook, P. Harrison, T. Talagala, M. Lydeamore
The world collects a lot of data. Much of this data is very wide.
This “high-dimensional” data is very challenging to visualise, as we can only see in two dimensions.
One key application area is single-cell RNA data. The main task is to identify groups of cells with similar expression profiles
Data is collected on the amount of a gene expressed by a cell. Similar types of cells should express similar types of certain “marker” genes.
Dimension reduction is a common way to reduce a high dimensional dataset into a lower number of dimensions.
As with anything designed to maximise variance, it is important to check these techniques aren’t inventing structures.
Formally speaking, the problem can be specified as:
Consider the High-D data a rectangular matrix \(X_{n\times p}\), with \(n\) observations and \(p\) dimensions.
We aim to discover a projection \(Y_{n \times d}\), i.e. an \(n \times d\) matrix with \(d \ll p\).
Five different NLDRs give similar but definitely different curves:
We have developed a method to project the 2D dimension reduction back into higher dimensions where we can visualise discrepancies.
The algorithm has four steps:
Use Delauney triangulation to create a triangular mesh that is representative of the 2D data.
Beause our hexagons are regular the resulting triangles will be mostly equilateral (which is nice)
Sometimes triangulation will give edges between two very distant nodes. This is not an accurate representation of the 2D surface, so we trim them off
Define the function \(f: \mathbb{R}^p \rightarrow \mathbb{R}^2\) which maps the high-D point to it’s NLDR equivalent.
Let the set of all points in a single hexagon \(i\) be denoted by \(\mathbb{H}_i\), with centroid \(\mathcal{h}_i\).
Then, let \(g: \mathbb{R}^2 \rightarrow \mathbb{R}^2\) be a function that maps each point in 2D space to it’s closest centroid.
It follows that \(f(g(x))\) maps the high-D point to the centroid in 2D.
Define the high-dimension mean of all the points in \(\mathbb{H}_i\) by \(\hat{h}_i\). That is,
\[\hat{h}^{(p)} = \frac{1}{|\mathbb{H}_i|}\sum_{x \in \mathbb{H}_i} \sum_{d = 1}^p x_{\cdot,d}\]
Finally, we choose a function \(v: \mathbb{R}^2 \rightarrow \mathbb{R}^p\) such that \[v(\mathcal{h}_i) = \hat{h}^{(p)}.\]
That is, the function \(v\) maps the 2D centroid to the high-D mean of the points in the hexagon.
So, \((f \circ g)(x)\) gives the 2D centroid associated with \(x\), and \(v(h_i)\) gives the high-D centroid associated with 2D point \(h_i\).
Thus, \(v(f \circ g)(x)\) gives the high-D centroid associated with the 2D embedding of the point \(x\).
We use this process to define a model of the 2D embedding in high dimensions, allowing us to visualise and compute error.