<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Quantitative and Statistical Consulting Blog &#187; inclusion probabilities</title>
	<atom:link href="https://missionalconsulting.com/methods/tag/inclusion-probabilities/feed/" rel="self" type="application/rss+xml" />
	<link>https://missionalconsulting.com/methods</link>
	<description>The Consulting Blog</description>
	<lastBuildDate>Tue, 11 Nov 2014 18:15:19 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.0.38</generator>
	<item>
		<title>R code: computation of inclusion probabilities in nested case-control studies</title>
		<link>https://missionalconsulting.com/methods/r-code-computation-of-inclusion-probabilities-in-nested-case-control-studies/</link>
		<comments>https://missionalconsulting.com/methods/r-code-computation-of-inclusion-probabilities-in-nested-case-control-studies/#comments</comments>
		<pubDate>Tue, 14 Oct 2014 21:54:30 +0000</pubDate>
		<dc:creator><![CDATA[Ryung S. Kim]]></dc:creator>
				<category><![CDATA[R codes]]></category>
		<category><![CDATA[inclusion probabilities]]></category>
		<category><![CDATA[nested case-control]]></category>
		<category><![CDATA[R code]]></category>

		<guid isPermaLink="false">http://missionalconsulting.com/methods/?p=5</guid>
		<description><![CDATA[Someone recently emailed me for a code to compute inclusion probabilities in nested case-control studies. A nested case-control study design, along with case-cohort study design, is a schema to collect a statistically representative and powerful sub-sample from a cohort. They are commonly used in epidemiological studies to reduce the cost of exposure assessment when the [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Someone recently emailed me for a code to compute inclusion probabilities in nested case-control studies. A nested case-control study design, along with case-cohort study design, is a schema to collect a statistically representative and powerful sub-sample from a cohort. They are commonly used in epidemiological studies to reduce the cost of exposure assessment when the outcome of interest is time-to-event (e.g. time to death, disease incidence, etc.)</p>
<p>It was <a href="http://biomet.oxfordjournals.org/content/84/2/379.abstract">Samuelsen (1997)</a> who first proposed using IPW method to analyze nested case-control studies. I showed it was just fine to use a simpler variance estimator that can be computed by existing software <a href="http://www.koreascience.or.kr/article/ArticleFullRecord.jsp?cn=GCGHC8_2013_v20n6_455">(Kim 2013)</a>.  Recently, I also proposed using IPW method to analyze secondary outcomes in nested case-control study designs <a href="http://onlinelibrary.wiley.com/doi/10.1002/sim.6231/abstract">(Kim 2014)</a>. The probability of each subject to be included in the sub-sample must be computed to use inverse probability weighting (IPW) method.</p>
<p>Now back to calculating the inclusion probabilities. In order to compute the probabilities, you need access to the full cohort data. Consider a full cohort data (only 10 subjects shown) that looks like below.</p>
<pre style="width: 711px; height: 252px;">&gt; dt[1:10,]
  X.delta time.to.X Y.delta time.to.Y GENDER HeavyDrinking
1       0      2050       1      1741 female       FALSE
2       0       475       0       438 female        TRUE
3       0      1626       0      1155 female       FALSE
4       0      1018       1      1185   male        TRUE
5       0       427       0       550 female        TRUE
6       0      1207       0      1728 female       FALSE
7       0       490       0       616   male        TRUE
8       0      1219       1      1062   male       FALSE
9       0      1137       0      1382   male        TRUE
10      0       615       0       675   male        TRUE</pre>
<p>Consider again a nested case-control study with number of controls at 2 (i.e. m=2) with two matching variables GENDER and HeavyDrinking from this full cohort based on the primary outcome variable (X).</p>
<p>You need the following two functions to calculate the inclusion probability of each subject. The first function creates the table with risk set. The second function calculates the inclusion probabilities.</p>
<p>(For those of you who are performing secondary outcome analysis using nested case-control studies, notice that you do not need to consider failure time with respect to the secondary outcome, Y in the example, when computing inclusion probabilities. )</p>
<pre>risk.table.f&lt;-function(fail.nm, data, t.exit.nm, t.entry.nm){
 if(is.null(t.entry.nm)){t.entry&lt;-0} else {t.entry &lt;- data[,t.entry.nm]}
 t.exit&lt;-data[,t.exit.nm]
 FT&lt;-unique(sort(t.exit[data[,fail.nm]==1])) #Failure times
 risk.table&lt;- cbind(FT,
t(sapply(FT, function(x){c(sum(t.exit==x), sum(t.exit&gt;=x &amp; t.entry&lt;=x))}))
)
 risk.table&lt;-data.frame(risk.table)
 colnames(risk.table)&lt;-c("failure.time","cases","at.risk")
 return(risk.table)
}</pre>
<pre>inclusion.prob.ncc.f &lt;- function(data,t.entry.nm, t.exit.nm, fail.nm, controls, risk.table, match.nm=NULL){
 mm&lt;-length(match.nm)
 t.exit&lt;-data[,t.exit.nm]
 if(is.null(t.entry.nm)){t.entry&lt;-0} else {t.entry&lt;-data[,t.entry.nm]}
 if(mm==0 &amp; is.data.frame(risk.table)){
 CS&lt;-risk.table$cases 
 AR&lt;-risk.table$at.risk
 FT&lt;-risk.table$failure.time
 inclusion.prob&lt;-apply(cbind(t.entry,t.exit), 1, function(x){
 p&lt;- pmin(1, controls*CS/(AR-CS))
 p[FT &gt; x[2] | FT &lt; x[1]]&lt;-0 #zero when not at risk
 1-prod(1-p) #inclusion prob
 })
 inclusion.prob[data[,fail.nm]==1]&lt;-1 
}
if(mm&gt;0 &amp; !is.data.frame(risk.table)){
 match &lt;- data[,match.nm]
 inclusion.prob&lt;-apply(cbind(t.entry,t.exit,match), 1, function(x){i.design.strata &lt;- as.vector(paste(x[-(1:2)],collapse=":"))
 CS&lt;-risk.table[[i.design.strata]]$cases 
 AR&lt;-risk.table[[i.design.strata]]$at.risk
 FT&lt;-risk.table[[i.design.strata]]$failure.time
 p &lt;- pmin(1, controls*CS/(AR-CS) )
 p[FT &gt; as.numeric(x[2]) | FT &lt; as.numeric(x[1])]&lt;- 0 #zero when not at risk
 1-prod(1-p) #inclusion prob
 })
 inclusion.prob[data[,fail.nm]==1]&lt;-1
}
 return(inclusion.prob)
}</pre>
<p>Using the two functions, you can compute the inclusion probabilities the following way:</p>
<pre>t.entry.nm &lt;- NULL                        #Entry time
t.exit.nm1 &lt;- "time.to.X"                 #Time to  failure
  fail.nm1 &lt;- "X.delta"                   #Indicator for failure
  match.nm &lt;- c("GENDER","HeavyDrinking") #Matching variables
         m &lt;- 2                           #Number of controls</pre>
<pre>design.strata &lt;- as.vector(apply(dt[,match.nm],1,paste,collapse=":"))

risk.table.strata &lt;- by(dt,design.strata, function(x){risk.table.f(fail.nm=fail.nm1, x, t.entry.nm=t.entry.nm, t.exit.nm=t.exit.nm1)})

inclusion.prob &lt;- inclusion.prob.ncc.f(data=dt, t.entry.nm=t.entry.nm, t.exit.nm=t.exit.nm1, fail.nm=fail.nm1, controls=m, risk.table=risk.table.strata, match.nm=match.nm)</pre>
<p>Once you invert the inclusion probabilities, add them as a column (&#8216;wt&#8217;) to your nested case-control study data. For our illustration, I&#8217;m going to call the dataframe &#8216;nccdata&#8217;. In order to fit the Cox model with IPW method, use the following command. You can find justification for this method in my 2013 article.</p>
<pre>fit&lt;-coxph(formula=Surv(time.to.X, delta.X) ~ Gender + HeavyDrinking + cluster(ID), data=nccdata, weights=wt)</pre>
<p>If you are performing a secondary outcome analysis from the nested case-control study, use the following command. Notice that the weights are computed based on the primary outcome (X) while the risk sets and failure times are based on the secondary outcome (Y). You can find justification for this method in my 2014 article.</p>
<pre>fit&lt;-coxph(formula=Surv(time.to.Y, delta.Y) ~ Gender + HeavyDrinking + cluster(ID), data=nccdata, weights=wt)</pre>
<p>&nbsp;</p>
<p>References</p>
<p>Samuelsen SO, A Pseudolikelihood Approach to Analysis of Nested Case-Control Studies, <em>Biometrika</em>, 1997: 84(2): 379-94</p>
<p>Kim RS, Analysis of Nested Case-Control Study Designs: Revisiting the Inverse Probability Weighting Method, <em>Communications for Statistical Applications and Methods</em>, 2013: 20(6): 455–66</p>
<p>Kim RS, Kaplan R, Analysis of Secondary Outcomes in Nested Case-Control Study Designs, <em>Statistics in Medicine</em>, 2014:33 (24): 4215-26</p>
<p>Kim RS. R code: computation of inclusion probabilities in nested case-control studies, 2014 Oct. Retrieved from http://missionalconsulting.com/methods</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>https://missionalconsulting.com/methods/r-code-computation-of-inclusion-probabilities-in-nested-case-control-studies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
