Title: | Prepare Electronic Prescription Record Data to Estimate Drug Exposure |
---|---|
Description: | Prepare prescription data (such as from the Clinical Practice Research Datalink) into an analysis-ready format, with start and stop dates for each patient's prescriptions. Based on Pye et al (2018) <doi:10.1002/pds.4440>. |
Authors: | Belay Birlie Yimer [aut] |
Maintainer: | David Selby <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.5.9000 |
Built: | 2025-02-23 03:17:19 UTC |
Source: | https://github.com/belayb/drugprepr |
Given a prescription length limit, truncate any prescriptions that appear to be longer than this, or mark them as missing.
clean_duration(data, max_months = Inf, method = c("truncate", "remove"))
clean_duration(data, max_months = Inf, method = c("truncate", "remove"))
data |
A data frame containing a column called |
max_months |
The maximum plausible prescription length in months |
method |
Either 'truncate' or 'remove'. See details |
The method 'truncate' causes any duration longer than max_months
to
be replaced with the value of max_months
(albeit converted to days).
The method 'remove' causes such durations to be replaced with NA
.
There is no explicit 'ignore' method, but if you want to 'do nothing', simply
set max_months
to an arbitrarily high number.
By default, the maximum is infinite, so nothing should happen.
(Of course, you could also just not run the function...)
A data frame of the same structure as the input, possibly with some elements of the duration
column changed
Currently the variable name is hard-coded as 'duration', but in principle this could be parametrised for datasets where the column has a different name.
long_presc <- data.frame(duration = c(100, 300, 400, 800)) clean_duration(long_presc, 6) clean_duration(long_presc, 12, 'remove')
long_presc <- data.frame(duration = c(100, 300, 400, 800)) clean_duration(long_presc, 6) clean_duration(long_presc, 12, 'remove')
Given a series of prescriptions in data
, if one prescription
(for the same patient and drug) starts
min_gap
days
after the previous one finishes, we extend the length of the previous
prescription to cover the gap.
close_small_gaps(data, min_gap = 0L)
close_small_gaps(data, min_gap = 0L)
data |
A data frame containing columns |
min_gap |
Size of largest gaps to close. Default is zero, i.e. do nothing |
The input data frame data
, possibly with some of the
stop_date
s changed.
gappy_data <- data.frame( patid = 1, prodcode = 'a', start_date = Sys.Date() + (0:6) * 7, stop_date = Sys.Date() + (0:6) * 7 + 4 ) close_small_gaps(gappy_data) close_small_gaps(gappy_data, 7)
gappy_data <- data.frame( patid = 1, prodcode = 'a', start_date = Sys.Date() + (0:6) * 7, stop_date = Sys.Date() + (0:6) * 7 + 4 ) close_small_gaps(gappy_data) close_small_gaps(gappy_data, 7)
The function calls the R package doseminer to extract dose information from free-text prescribing instructions, then computes the average numerical daily dose according to a given decision rule.
compute_ndd(data, dose_fn = mean, freq_fn = mean, interval_fn = mean)
compute_ndd(data, dose_fn = mean, freq_fn = mean, interval_fn = mean)
data |
a data frame containing free-text prescribing instructions in a
column called |
dose_fn |
function to summarise range of numbers by a single value |
freq_fn |
function to summarise range of frequencies by a single value |
interval_fn |
function to summarise range of intervals by a single value |
The general formula for computing numerical daily dose (ndd) is given by
where
is dose frequency, the number of dose 'events' per day
is dose number, or number of units of drug taken during each dose 'event'
is dose interval, or the number of days between 'dose days', where an interval of 1 means every day
Prescriptions can have a variable dose frequency or dose number, such as '2-4 tablets up to 3 times per day'. In this case, the user can choose to reduce these ranges to single values by taking the minimum, maximum or average of these endpoints.
A data frame mapping the raw text
to structured dosage information.
compute_ndd(cprd, min, min, mean)
compute_ndd(cprd, min, min, mean)
A dataset containing prescription information for two individuals. The dataset is a hypothetical dataset resembling the real CPRD data.
cprd
cprd
A data frame with 18 rows and 9 variables:
unique identifier given to a patient in CPRD GOLD
unique identifier given to a practice in CPRD GOLD
Beginning of the prescription period
CPRD unique code for the treatment selected by the GP
Identifier that allows dosage information on the event to be retrieved from a Common Dosages lookup table
Prescription instruction for the prescribed product, as entered by the GP
Total quantity entered by the GP for the prescribed product
Number of treatment days prescribed for a specific therapy event
an estimated prescription duration, as entered by CPRD
...
https://cprdcw.cprd.com/_docs/CPRD_GOLD_Full_Data_Specification_v2.0.pdf
A light wrapper around impute_qty
.
decision_1(data, decision = "a")
decision_1(data, decision = "a")
data |
a data frame |
decision |
one of the following strings:
|
Decisions f
and g
are not yet implemented.
Other decision functions:
decision_10()
,
decision_2()
,
decision_3()
,
decision_4()
,
decision_5()
,
decision_6()
,
decision_7()
,
decision_8()
,
decision_9()
,
drug_prep()
Where one prescription (for the same drug and patient) starts only a short time after the previous finishes, this function can close the gap, as if the prescription was continuous over the entire period.
decision_10(data, decision = "a")
decision_10(data, decision = "a")
data |
a data frame |
decision |
one of the following strings:
|
The underlying function is called close_small_gaps
Other decision functions:
decision_1()
,
decision_2()
,
decision_3()
,
decision_4()
,
decision_5()
,
decision_6()
,
decision_7()
,
decision_8()
,
decision_9()
,
drug_prep()
A light wrapper around impute_qty
.
decision_2(data, decision = "a")
decision_2(data, decision = "a")
data |
a data frame |
decision |
one of the following strings:
|
Decisions e
and f
are not yet implemented.
Other decision functions:
decision_10()
,
decision_1()
,
decision_3()
,
decision_4()
,
decision_5()
,
decision_6()
,
decision_7()
,
decision_8()
,
decision_9()
,
drug_prep()
A light wrapper around impute_ndd
.
decision_3(data, decision = "a")
decision_3(data, decision = "a")
data |
a data frame |
decision |
one of the following strings:
|
Decisions f
and g
are not yet implemented.
Other decision functions:
decision_10()
,
decision_1()
,
decision_2()
,
decision_4()
,
decision_5()
,
decision_6()
,
decision_7()
,
decision_8()
,
decision_9()
,
drug_prep()
A light wrapper around impute_ndd
.
decision_4(data, decision = "a")
decision_4(data, decision = "a")
data |
a data frame |
decision |
one of the following strings:
|
Decisions e
and f
are not yet implemented.
Other decision functions:
decision_10()
,
decision_1()
,
decision_2()
,
decision_3()
,
decision_5()
,
decision_6()
,
decision_7()
,
decision_8()
,
decision_9()
,
drug_prep()
A light wrapper around clean_duration
.
decision_5(data, decision = "a")
decision_5(data, decision = "a")
data |
a data frame |
decision |
one of the following strings:
|
Other decision functions:
decision_10()
,
decision_1()
,
decision_2()
,
decision_3()
,
decision_4()
,
decision_6()
,
decision_7()
,
decision_8()
,
decision_9()
,
drug_prep()
This is just shorthand for defining a column equal to one of the specified
formulae. If the column(s) corresponding to decision
are missing, an
error will be thrown.
If you have already calculated or obtained the column duration
from
elsewhere, this step is not necessary.
decision_6(data, decision = "c")
decision_6(data, decision = "c")
data |
a data frame |
decision |
one of the following strings:
|
This step actually takes place before decision_5
.
Other decision functions:
decision_10()
,
decision_1()
,
decision_2()
,
decision_3()
,
decision_4()
,
decision_5()
,
decision_7()
,
decision_8()
,
decision_9()
,
drug_prep()
A light wrapper around impute_duration
.
decision_7(data, decision = "a")
decision_7(data, decision = "a")
data |
a data frame |
decision |
one of the following strings:
|
Other decision functions:
decision_10()
,
decision_1()
,
decision_2()
,
decision_3()
,
decision_4()
,
decision_5()
,
decision_6()
,
decision_8()
,
decision_9()
,
drug_prep()
A light wrapper around impute_duration
, followed by removing
duplicate rows with the same combination of prodcode
, patid
and start_date
.
decision_8(data, decision = "a")
decision_8(data, decision = "a")
data |
a data frame |
decision |
one of the following strings
|
Other decision functions:
decision_10()
,
decision_1()
,
decision_2()
,
decision_3()
,
decision_4()
,
decision_5()
,
decision_6()
,
decision_7()
,
decision_9()
,
drug_prep()
In situations where one prescription starts before another (for the same patient and drug) finishes, this function will either implicitly sum the doses (i.e. do nothing) or it will divide the intervals into non-overlapping subsets, shifting these sub-intervals forward in time until there is no overlap.
decision_9(data, decision = "a")
decision_9(data, decision = "a")
data |
a data frame |
decision |
one of the following strings:
|
The underlying algorithm for shifting overlapping intervals is implemented
by the internal function shift_interval
.
Other decision functions:
decision_10()
,
decision_1()
,
decision_2()
,
decision_3()
,
decision_4()
,
decision_5()
,
decision_6()
,
decision_7()
,
decision_8()
,
drug_prep()
Run drug preparation algorithm
drug_prep(data, plausible_values, decisions = rep("a", 10))
drug_prep(data, plausible_values, decisions = rep("a", 10))
data |
data frame containing prescription data |
plausible_values |
data frame containing variables |
decisions |
character vector of length 10 |
A data frame including estimated stop_date
for each prescription
Other decision functions:
decision_10()
,
decision_1()
,
decision_2()
,
decision_3()
,
decision_4()
,
decision_5()
,
decision_6()
,
decision_7()
,
decision_8()
,
decision_9()
plausible_values <- data.frame( prodcode = c('a', 'b', 'c'), min_qty = 0, max_qty = c(50, 100, 200), min_ndd = 0, max_ndd = c(10, 20, 30) ) drug_prep(example_therapy, plausible_values, decisions = c('a', 'a', 'a', 'a', 'a', 'c', 'a', 'a', 'a', 'a'))
plausible_values <- data.frame( prodcode = c('a', 'b', 'c'), min_qty = 0, max_qty = c(50, 100, 200), min_ndd = 0, max_ndd = c(10, 20, 30) ) drug_prep(example_therapy, plausible_values, decisions = c('a', 'a', 'a', 'a', 'a', 'c', 'a', 'a', 'a', 'a'))
Based on a hypothetical 'therapy' file from the Clinical Practical Research Datalink (CPRD), a UK database of primary care records.
example_therapy
example_therapy
An object of class data.frame
with 30 rows and 6 columns.
This dataset is now generated deterministically, so it will not vary between sessions.
Get the mode (most common value) of a vector
get_mode(v, na.rm = TRUE)
get_mode(v, na.rm = TRUE)
v |
a vector |
na.rm |
Logical. If |
This is a workhorse function used by impute_ndd
,
impute_qty
and others.
impute( data, variable, method = c("ignore", "mean", "median", "mode", "replace", "min", "max", "sum"), where = is.na, group, ..., replace_with = NA_real_ )
impute( data, variable, method = c("ignore", "mean", "median", "mode", "replace", "min", "max", "sum"), where = is.na, group, ..., replace_with = NA_real_ )
data |
A data frame containing columns |
variable |
Unquoted name of the column in |
method |
Method for imputing the values. See details. |
where |
Logical vector, or function applied to |
group |
Level of structure for imputation. Defaults to whole study population. |
... |
Extra arguments, currently ignored |
replace_with |
if the method 'replace' is selected, which value should be inserted?
|
The argument where
indicates which values are to be imputed.
It can be specified as either a vector or as a function. Thus you can
specify, for example, is.na
to impute all missing values, or
you can pass in a vector, if it depends on something else rather than just
the current values of the variable to imputed.
This design may change in future. In particular, if we want to impute
implausible values and impute missing values separately, it's important that
these steps are independent.
A data frame of the same structure as data
, with values imputed
Instead of replacing missing stop dates, we impute the durations and then infer the stop dates from there.
impute_duration( data, method, where = is.na, group = c("patid", "start_date"), ... )
impute_duration( data, method, where = is.na, group = c("patid", "start_date"), ... )
data |
A data frame containing columns |
method |
Method for imputing the values. See details. |
where |
Logical vector, or function applied to |
group |
Level of structure for imputation. Defaults to whole study population. |
... |
Extra arguments, currently ignored |
We can fix clashing start dates by setting group
to start_date
and patid
, i.e. average over groups with more than one member;
any metric should return the original values if the group size is one.
A data frame of the same structure as data
, with values imputed
example_duration <- transform(example_therapy, duration = qty / ndd) impute_duration(example_duration, method = 'mean', group = 'patid')
example_duration <- transform(example_therapy, duration = qty / ndd) impute_duration(example_duration, method = 'mean', group = 'patid')
Replace implausible or missing numerical daily doses (NDD)
impute_ndd(data, method, where = is.na, group = "population", ...)
impute_ndd(data, method, where = is.na, group = "population", ...)
data |
A data frame containing columns |
method |
Method for imputing the values. See details. |
where |
Logical vector, or function applied to |
group |
Level of structure for imputation. Defaults to whole study population. |
... |
Extra arguments, currently ignored |
A data frame of the same structure as data
, with values imputed
impute_ndd(example_therapy, 'mean')
impute_ndd(example_therapy, 'mean')
Find implausible entries Replace implausible or missing prescription quantities
impute_qty(data, method, where = is.na, group = "population", ...)
impute_qty(data, method, where = is.na, group = "population", ...)
data |
A data frame containing columns |
method |
Method for imputing the values. See details. |
where |
Logical vector, or function applied to |
group |
Level of structure for imputation. Defaults to whole study population. |
... |
Extra arguments, currently ignored |
A data frame of the same structure as data
, with values imputed
impute_qty(example_therapy, 'mean')
impute_qty(example_therapy, 'mean')
Run this function and then you can either simply discard overlapping intervals or shift them around using an appropriate algorithm.
isolate_overlaps(data)
isolate_overlaps(data)
data |
A data frame including variables |
The older implementation used isolateoverlaps
from the
intervalaverage
package and Overlap
from the DescTools
package. Here we refactor it using functions from tidyverse
instead.
A data frame of patid
, prodcode
, start_date
and
stop_date
, where intervals are either exactly overlapping or mutually
non-overlapping (but not partially overlapping), such that the union of such
intervals is equivalent to those originally provided in data
This function currently doesn't use any keys except patid
and
prodcode
. It may be desirable to add a row ID, for matching each
partial interval back to the original interval from which it was derived.
This may be relevant to models using weighted dosages.
intervalaverage::isolateoverlaps
,
foverlaps
set.seed(1) overlapping_data <- data.frame( rowid = 1:20, patid = 1:2, prodcode = 'a', start_date = Sys.Date() + c(round(rexp(19, 1/7)), -20), qty = rpois(20, 64), ndd = sample(seq(.5, 12, by = .5), 20, replace = TRUE), stringsAsFactors = FALSE ) overlapping_data <- transform(overlapping_data, stop_date = start_date + qty / ndd ) isolate_overlaps(overlapping_data)
set.seed(1) overlapping_data <- data.frame( rowid = 1:20, patid = 1:2, prodcode = 'a', start_date = Sys.Date() + c(round(rexp(19, 1/7)), -20), qty = rpois(20, 64), ndd = sample(seq(.5, 12, by = .5), 20, replace = TRUE), stringsAsFactors = FALSE ) overlapping_data <- transform(overlapping_data, stop_date = start_date + qty / ndd ) isolate_overlaps(overlapping_data)
A helper function that allows specifying decision rules using English
words rather than alphanumeric codes. Translates the rules into the
corresponding codes and then passes them to drug_prep
functions.
make_decisions( implausible_qty, missing_qty, implausible_ndd, missing_ndd, implausible_duration, calculate_duration, missing_duration, clash_start, overlapping, small_gaps )
make_decisions( implausible_qty, missing_qty, implausible_ndd, missing_ndd, implausible_duration, calculate_duration, missing_duration, clash_start, overlapping, small_gaps )
implausible_qty |
implausible total drug quantities |
missing_qty |
missing total drug quantities |
implausible_ndd |
implausible daily dosage |
missing_ndd |
missing daily dosage |
implausible_duration |
overly-long prescription durations |
calculate_duration |
formula or variable to compute prescription duration |
missing_duration |
missing prescription duration |
clash_start |
how to disambiguate prescriptions that start on the same date |
overlapping |
how to handle prescription periods that overlap with one another |
small_gaps |
how to handle short gaps between successive prescriptions The argument
|
A character vector suitable for passing to the decisions
argument of
the drug_prep
function.
make_decisions('ignore', 'mean population', 'missing', 'mean practice', 'truncate 6', 'qty / ndd', 'mean individual', 'mean', 'allow', 'close 15')
make_decisions('ignore', 'mean population', 'missing', 'mean practice', 'truncate 6', 'qty / ndd', 'mean individual', 'mean', 'allow', 'close 15')
Minimum and maximum plausible values for total prescription quantity and
numerical daily dose of prescriptions given in the cprd
dataset.
Both datasets are hypothetical.
min_max_dat
min_max_dat
A data frame with 2 rows and 5 variables:
CPRD unique code for the treatment selected by the GP
maximum possible quantity to be prescribed for the product
minimum possible quantity to be prescribed for the product
maximum possible number of daily dose to be prescribed for the product
minimum possible number of daily dose to be prescribed for the product
...
A utility function for indicating if elements of a vector are implausible.
outside_range(x, lower, upper, open = TRUE)
outside_range(x, lower, upper, open = TRUE)
x |
numeric vector |
lower |
minimum plausible value |
upper |
maximum plausible value |
open |
logical. If |
Though the function between
already exists, it is not vectorised over the bounds.
This is a function used by decision_9
.
shift_interval(x)
shift_interval(x)
x |
a data frame containing variables |
A data frame with time intervals moved such that they no longer overlap