TSGAN: Individual treatment effect estimation for multi-intervention with continuous dosage

. In recent years, causal inference has achieved great results in recommendation systems, causal chain analysis, and individual treatment effects. The individual treatment effect (ITE), also known as the complier average treatment effect (CATE), is the focus of research in the medical, economic, and political fields. Its purpose is to solve the problem that it is impossible to predict an intervention's impact on individuals when interventions’ effects vary due to individual differences. Today's research focuses on estimating the counter-fact, that is, predicting the difference in treatment effect between individuals receiving one treatment and receiving another treatment. However, the above study was limited to two interventions and did not consider the issue of therapeutic dose. In this paper, a method combining both the idea of matching that prevalent in traditional ITE estimation, and a generative adversarial neural network (GAN) is proposed to achieve individual effect estimation under multi-intervention with continuous dosage intervention. This paper first proposes the idea of treatment effect space ( 𝑇𝐸𝑆 ), and proposes a neural network based on GAN, uses an improved discriminator, which takes a different approach from common GAN, using multiple discriminators in parallel structure to achieve discrimination of true samples from treatment space. The model was tested and validated under semi-simulated data.


Introduction
In the medical field, the treatment of a patient often requires the auxiliary cooperation of multiple drugs, and different doses will also affect the final therapeutic effect; In the political field, governance usually requires the introduction of multiple policies, and the intensity of the policies has an important impact on the outcome of governance; In the economic field, multiple complex factors affect stock prices, exchange rates, and so on. In the above areas, different patients, governance regions, and market behaviors will change the results (treatment effect). In these fields, which are awash with large amounts of wealthy observational data, the use of causal inference models to develop the potential of this data is the core purpose of this paper. Although this paper uses the research on precision medicine as an example, it does not deny the scalability of the model in the political and economic fields. In the past, only a single dummy variable was designed to estimate individual treatment effects, and most existing methods predicted counterfactual outcomes based on observed factual data, and few models discussed multi-intervention, continuous numerical dosage for treatment effect estimation. This is the core of this paper.
Estimating the effects of individual treatment effect requires counterfactual results from observational data, which is usually extreme difficult, because unlike experimental data, observational data cannot control variables, that is, there is a causal and correlation relationship between variables [1]. If using traditional machine learning or numerical models to directly predict the target treatment effect, it is likely to introduce heterogeneity bias into the model, resulting in inaccurate estimation of effects. And the traditional individual effect estimation is usually with dummy variable (i.e., with intervention or with no intervention, and only one type of intervention exists), and such simple and direct estimation models cannot be extended to the treatment effect estimation of multiple intervention.
TSGAN (estimating Treatment Spaces using GANs) using GAN estimation to process treatment spaces is a novel GAN network model proposed for this problem. This paper constructs a structure similar to SCIGAN [2]. Inspired by its improvement on counter-factual discriminator, this paper further improve the discriminator, using a series of discriminators in parallel structures to achieve treatment effect space discrimination. Treatment effect space (TES) are a completely new idea proposed in this paper, referring to the individual , who owns a space corresponding to different dose levels for each intervention, , which will described in detail in following sections. Because of the complexity of treatment effect space, it is not possible to use the ordinary generator of traditional GAN to generate , nor to use its discriminator to distinguish true samples from the generated . There is only one real sample in the , but there is an entire space of counterfactual. For the generator, if the input has only one random noise and some features of the individuals, there will be too little information for generator to generate ; For discriminators, there are too many counterfactual samples that need to be discriminated against, and it is difficult to make accurate judgments. In order to solve these two problems, this paper proposes the idea of matching of traditional ITE estimation to solve the problem of generators; Use the method of parallel structural discriminator to solve the problem of discriminators. The generator ( ) generates corresponding to the input sample and to the nearest neighbor matched group, and each discriminator discriminates the 1-D segmented with individual and as Input. In this paper, the construction of the model is described in detail, and the following is expanded in four parts: First, the literature review in individual effect estimation is reviewed and various models are described. Second, the problems that need to be solved in this paper are defined in detail, and the corresponding hypotheses are proposed. Third, the model is constructed part by part, with the generative results and loss functions of each part explained in detail. Forth, Using the TCGA dataset, the data required for model validation in this paper is semi-simulated, and the performance metrics and advantages of the model are compared and explained [2].

Literature Review
The study of the treatment effect began as early as the 1980s, and the idea has evolved from the early propensity score matching, which avoids the problem of heterogeneity bias and thus estimates the complier average treatment effect, to the use of machine learning methods such as representation learning and double robust regression [1,3,4].
The core idea of estimating the individual effect is that the treatment effect obtained by different individuals after receiving treatment will vary from person to person, that is, the individual has a treatment result after receiving treatment . How to estimate the treatment effect of , if the individual were to receive treatment . And how to build a model according to , , , to choose the optimal treatment plan for each patient who has not yet received treatment.
At the earliest, research in this area began with numerical imputation based methods, the most common method is called interpolation, including covariate adjustment, backdoor relationship correction, reaction curve estimation and so on [3][4][5][6].
After this, a statistical matching method, also known as "strategic downsampling", is often used to balance the inter-group bias between treatment and control groups [7]. Other methods for similar purposes include adjusting backdoor variables, adding instrumental variables, etc. The most notable feature of this type of matching method is that it requires two "regressions" (matches) to estimate the treatment effect.
Today's popular method is causal machine learning, the most commonly used of which is the method of representational learning. The purpose of representation learning is to adjust the covariates through the neural network to balance the distribution of the treatment group and the control group, which are represented by: BNN, SITE, dragonet, TARNet, etc. [8][9][10][11].
Another class of methods that use machine learning is to implement counterfactual using adversarial neural networks, which typically also includes an inferential network to generalize the estimation of treatment effects. Representatives are: GANITE and SCIGAN [1,12].
There is also a method of estimating a two-step neural network using a similar DML (double robust machine learning) [13]. The focus of this paper is to explore the estimation of treatment effects under multi-intervention with continuous dosage, of which GANITE and SCIGAN are the ones that realize multiple dose estimation, where this paper is largely inspired by it, and the continuous dosage treatment effect estimation is achieved by DRNets, SCIGAN, etc., of which SCIGAN realizes the treatment effect estimation of continuous dosage with one intervention [1,14] (i.e., patients may receive multiple different dose levels but receive only one treatment at a time, and there is no combination of multiple treatment options). This paper builds on this idea and aimed for estimating treatment effect under multi-continuous dosage intervention.

Problem Restatement
Taking precision therapy as an example, assuming that the observed data include the covariates of the patient, the treatment received, and the actual treatment effect, where the covariate is denoted as = { } =1 , treatment received is denoted as = { 1 , 2 , … } =1 , the actual effect of the treatment is . is the type of intervention received, is the dose of a certain intervention received, is a function of and . can also be expressed as = ( , ). The observed data samples are , , . is considered as a eigen vector of space, is a eigen vector of space, = { 1 , 2 , … , : ∈ }. is the dosage range corresponding to the intervention, that is, if a dose range between 0 and 1, then ∈ [0, ]. As mentioned by PAUL R, the purpose of ITE is to calculate i.e., the difference in treatment effect between control group and experimental group under conditions of consistent covariate (localized), but this paper studies the treatment effect under multi-intervention with continuous dosage, that is, the treatment effect cannot be expressed by a value. On this basis, we propose the treatment effect space ( ), and our goal is to achieve an unbiased estimation of the TES for each patient based on the observed data given.
where ∈ , x ∈ . At the same time, to be able to estimate the treatment effect, the data should meet the following assumptions: Assumption 1 (Overlap): For all ∀x ∈ , the probability of receiving a certain treatment plan > 0, ∈ . Assumption 2 (Unconfoundedness): The treatment plan , and the treatment effect resulting from that plan , are conditionally independent under the premise given .

TSGAN Architecture
To achieve the estimation of , we propose a method of using a group-matched modified GAN generator, for each observed data sample with covariates , define: Where: ∈ argmin ( , ) , ∈ : . . ≠ is number of sample within the matched group. Since the estimation of the cannot rely solely on unique samples, estimating TES also need samples closest to as input. This paper refer to the GAN framework proposed in [1] and improve its generators and discriminators so that they can generate and discriminate against treatment effect space. As shown in Figure 2. As shown in the Figure 3, counterfactual generator can be represented as: × × × × → , where the input have covariates x ∈ , the nearest sample groups adjacent to x, J ∈ , the treatment effect ∈ , the treatment plan ∈ , and the Gaussian noise z ∈ . The output is an equation from the treatment space to the treatment result , which is the unbiased estimation of :

Counterfactual Generator
Because generated based on the sample's covariates and the samples closest to , we need to take these elements into account together when considering the generator's loss equation: As this paper has always emphasized, traditional GAN discriminators cannot be applied to the TES discriminator. Therefore, this paper tries to reduce the burden on discriminator by dividing the TES by intervention types and apply multiple discriminators. By using paralleled discriminators, each discriminator is responsible for determining 1-D treatment effect space with one intervention. We discretize the dose of each treatment and divide it into ∈ ℤ + dose levels, at this time, = { 1 , … , }, figuratively speaking, TES is meshed as dosage level is discretized. Our improved discriminator takes the covariate , partial treatment plan , and estimated by the generator as input, where is a discriminator specifically for the dose discrimination of the ℎ intervention, and is the treatment plan leave out ℎ intervention. Define D : X × × → [0, ] , where is the linear space divided from the original when only the dosage in the ℎ intervention is a variable, and define that the loss function of D is:

Discriminator
The output of the discriminator is a probability value between 0 to 1, and for the discriminator given intervention type . For each dosage level, the discriminator outputs a probability value, meaning the probability that expected effect given that dosage is trustworthy. So there is a loss function above, where ( ≠ ) log ( , , ) is the logarithm of the probability at the predicted level as the actual dose level, and the closer the prediction probability is to 1, the smaller the loss; ( ≠ ) log ( − ( , , )) is the logarithm after predicting the wrong dose level, and the closer the prediction is to 0, the smaller the loss. In summary, the optimization solution of GAN network is as follows: * = arg ∑ =1 ℒ( ; ) + ℒ (8) * = arg ℒ ( ; * ) ( * sign represents the iteration relationship)

Inference Network
After the generator and discriminator are optimized, we use the generator to generate a corresponding TES for each sample, and then train the inference network using the generated results and the original sample covariate X, so as to predict the TES for new samples.

Semi-simulated Data Validation
This paper used the Cancer and Tumor Genome Atlas (TCGA) database, with a sample size of over 10,000, covering various omics data such as genome, transcriptome, epigenetics, proteome, etc., providing a comprehensive, multidimensional data. Using the data provided as covariates to construct three treatment plan as shown below: Note: is pre-defined simulating terms.

Figure 5
Validation results Light grey: data simulated without interaction terms Dark grey: data simulated with interaction terms In the validation process, TSGAN performed well without interaction terms, showing more accurate prediction of the optimal dose for treatment plans 1, 2, and 3, and a large nonlinear deviation for 3, but overall more accurate. However, after the addition of interaction terms, the accuracy is greatly reduced, and it is clear that the model needs further improvement to accommodate the interaction effects under multiple treatment regimens.

Conclusion
In this paper, the author proposes a method that combines the popular matching idea in traditional ITE estimation with the generative antagonism neural network (GAN) to realize the individual effect estimation under continuous dose intervention and multiple interventions. This work first proposes the idea of processing effect space (TES), and proposes a neural network based on GAN, which uses an improved discriminator. This discriminator uses a different method from the ordinary GAN, and uses multiple discriminators with parallel structure to realize the recognition of real samples in the processing space. In this paper, the discriminator is further improved, and a series of parallel discriminators are used to distinguish the processing effect space. Therapeutic effect space (TES) is a new concept proposed in this paper. Due to the complexity of processing the effect space, it is impossible to use the traditional GAN ordinary generator to generate TES, nor to use its discriminator to distinguish the real samples from the generated TES. In order to solve these two problems, the matching idea of traditional ITE estimation is proposed to solve the generator problem; The parallel structure discriminator is used to solve the discriminator problem. more and better methods and ideas are expected to be found in the future.