Opinion Creative Commons, CC-BY

# On Random Center Grouping in Multicenter Clinical Trials

***Corresponding author: ** Weijia Mai, Department of Biostatistics and Bioinformatics, Duke University School of Medicine, 2424
Erwin Road, Hock Suite 1102, Durham, North Carolina.

**Received:** January 12, 2021; **Published:** January 26, 2021

DOI: 10.34297/AJBSR.2021.11.001667

#### Abstract

With the attention of achieving desired power within a pre-specified timeframe in clinical development, a multicenter trial is often conducted to expedite the patient recruitment. However, a multicenter trial with plethoric number of centers is very likely to result in numerous small centers, which will cause the problem of treatment imbalance (unequal number of patients per arm within a center) and/or center imbalance (a few large centers with a number of small centers). In practice, treatment imbalance within and between centers inevitably occurs regardless the randomization models and/or methods used. Treatment imbalance within center and/or center imbalance will not only (i) increase the probability of observing treatment-by-center interaction, but also (ii) decrease the power for detecting clinically meaningful difference (or treatment effect) of the test treatment under investigation. In this paper, we propose a method of determining whether an observed significant interaction is a false alarm and a method of random grouping for an unbiased and reliable assessment of treatment effect when significant treatment and center imbalance are observed. The impact of treatment and center imbalance will also be examined in terms of the potential loss of power.

**Keywords:** Treatment Imbalance; Center Imbalance; Treatment-by-Center Interaction; Simple Sampling without replacement

#### Introduction

As indicated in the 1988 guidance entitled Guideline for the Format and Content of the Clinical and Statistical Sections of New Drug Applications published by the United States Food and Drug Administration (FDA), adequate clinical trials need to be conducted in good control in order to provide ample evidence of the safety and effectiveness for endorsement of a new medication. In practice, a multicentered, randomized, well-controlled clinical trial is often considered for this purpose. When multiple clinical trials are concurrently conducted at various centers but follow the same protocol, these single trials can be viewed together as a single study called a multicenter trial. With the intention of achieving desired power within a pre-specified timeframe in clinical development, a multicenter trial is often performed to speed up the patient recruitment. There are two main purposes of using a multicenter clinical trial in clinical development: Firstly, it is to show that the clinical results are duplicable and not happening by chance from center to center. Secondly, it is to illustrate that the clinical results can be generalized to populations from different geographical locations. However, the occurrence of redundant small centers may result in treatment imbalance and center imbalance, which consequently cause treatment-by-center interaction and make it difficult for us to combine the centers for an overall assessment. In practice, it is suggested that appropriate statistical tests for treatment-by-center interaction should be performed.

As indicated in Chow and Liu [1], a multicenter trial is not equivalent to separate single- site trials, since the purpose is to gather data from different centers together and make an overall analysis. It is suggested that proper statistical tests for homogeneity across centers should be conducted to check possible quantitative or qualitative treatment-by-center interactions [2,3] before grouping data for an overall assessment of safety and effectiveness of the study medication. When significant treatment effect differences occur in different directions across centers, a qualitative interaction is said to occur. Whereas when the treatment differences are with different magnitude but still in the same direction across centers, a quantitative interaction, which is relatively less important comparing to qualitative interaction [4] is said to occur. Also, as pointed out by Guohua and Douglas (1997), an overall statistical inference regarding the treatment effect is not valid to be made if there is a substantial qualitative interaction between treatment and center, in which case that centers cannot be grouped, and the treatment effect must be assessed by study center.

For a multicenter trial, [5] posted the several questions, which are helpful for the design and analysis of multicenter trials. The first one is whether the existence of extremely small centers will affect the reliability of separate interpretation of results. Another helpful one is in contrast to the first question, which is whether large centers with large number of data will dominate the results of analysis. Moreover, whether there exist outliers in results is also an important question. Finally, is the question that we care most about, which is whether treatment-by-center interaction will cause the trial to be invalid. In this paper, however, our focus will be placed on statistical evaluation of random center grouping when a multicenter trial results in a number of small centers. In this paper, we propose a method of determining whether the observed significant treatment-by-center is a false alarm (which may be due to too many small centers) through center grouping based on the optimal selection of number proposed by [8]. Following Shao and Chow [8] idea, we also propose a method of random center grouping to handle the issue of treatment and center imbalance.

In the next section, following the idea of two-stage sampling strategy proposed by Shao and Chow (1993) [8] a rule of thumb for optimal selection of the number of centers is proposed. Section 3 discusses the impact of treatment imbalance on statistical power for testing treatment effect. Section 4 illustrates the use of center grouping for determining whether the observed significant treatment-by-center interaction (as a result of too many small centers) is a false alarm. In Section 5, we propose a method for center grouping when the multicenter trial results in a number of small centers followed by a simulation study of power change. The method utilizing random center grouping is proposed in Section 6. Section 7 compares the two methods of center grouping. An example concerning a multicenter trial for treating patients with breast cancer is given in the Section 8 to illustrate the application of the proposed methods. Discussions and concluding remarks are given in the last section.

#### The Selection of The Number of Centers

As mentioned in the previous section, one purpose of multicenter trial is to reduce the time consumption of patient recruitment process. The more the centers, the sooner the recruitment process and hence the quicker the study would be completed. Nevertheless, more centers would certainly result in small centers (i.e., fewer patients in each center).

Since for comparative clinical trials, comparisons are made between patients within the same center to test the treatment effects, if the number of patients in each center is too small, the assessments and inferences made may be statistically invalid and biased. We thereby intend to combine the data of small centers to form a larger dummy center. Nevertheless, it is required by both FDA and ICH guidelines that statistical tests for homogeneity across centers must be conducted in order to check for the existence of significant treatment-by-center interaction before pooling data. Therefore, under the situations where a qualitative treatmentby- center interaction is observed and tested to be significant, the sponsor may be required to examine the treatment effect of each center independently instead of pooling together due to statistical invalidation.

The increase in number of centers will certainly increase the chance for significant qualitative treatment-by-center interaction to occur, especially when both large and small centers exist.

The observed statistical significance may be due to two following reasons [9]:

**Heterogeneity among centers:** Some centers exhibit
relatively large variability.

**Heterogeneity across centers:** Some centers do not
constitute a representative sample of the target patient population.

Accordingly, the way to determine the optimal number of centers is noteworthy for investigators. However, for simplicity and convenience, centers are often grouped without considering from statistical perspective in many multicenter trials. The method of random grouping in this paper will based on the idea regarding two-stage sampling strategy in clinical trials proposed by Shao and Chow [8], we propose the following rule of thumb for selection of the number of centers given that the total sample size of the intended trial is N. We propose the rule of thumb that the number of patients in each center should be at least equal to the number of centers for achieving optimal statistical properties for treatment assessment. Thus, if the intended clinical trial calls for π patients in total, it is suggested that around study centers with around patients in each be selected for achieving optimal statistical properties for treatment assessment.

#### The Impact of Treatment Imbalance on Power

Without loss of generality, consider a multicenter clinical trial with two treatments. In practice, sample size is often chosen for the purpose of detecting clinically meaningful significance at prespecified significance level with a desired power [10].

Under the assumption of normality, sample size of a balanced trial (i.e., same number of patients in each arm) is often calculated by:

with a power of

where the standard deviation of the random error is written as
π, π§_{πΌ} represents πΌ^{π‘β} quantile of a standard normal distribution, and
Ξ means the difference of clinical importance.

Since under normal distribution, the part is very small and negligible, for simplicity, the power can be rewritten as follows

This expression of power is obtained by neglecting both center effect and the effect due to treatment and center interaction, hence is the optimal power that we can possibly get.

Treatment imbalance (i.e., different number of patients in each arm) is sometimes inevitable in spite of the plans of having treatment groups of same size, which may therefore cause differences among centers. Under these circumstances, the expression of power becomes

which is obviously less than the power of balanced trial. Hence in order to achieve the optimal power as shown in (1), we set (1) and (2) to be equal, which leads to

It is clearly that even if the sample size π is fixed, π_{π} , π = 1,2, are not fixed, since we are not able to predict how many patients will
be in each center after the completion of the trial. Thus, from (3) we
can know that increasing total sample size π is the only way to make the equation true (i.e., to achieve optimal power) when variance π is assumed to be unchanged. It is noteworthy that the variance of
the test statistic calculated with equal number of patients in each
center will be the same as if it is a single center trial. Hence, the
minimum variance of test statistics is achieved since
.

#### Interaction Detection Using Center Grouping

Statistical test for homogeneity across centers is the most common way to detect potential interaction between treatment and center, however, center grouping provides another way of doing this.

Suppose there are 5 centres: 3 small ones with 4, 4 and 2 patients respectively, 2 large ones with 12 and 10 patients respectively. Following are two different scenarios:

**Scenario 1:** As we can see from figure 1, the result of one small
center is different from others, which possibly cause significant
interaction. One way to check the significance of interaction is to do
the F test, and we can get the following table 1, from which we know
that the interaction is significant as p value is less than 0.05. Now
we combine the first 3 small centers and remark them as center
1, center 4 and 5 will remain the same and be remarked as center
2 and 3. The effects after center grouping is as following figure 2
shows, which matches with the conclusion that we drew from the F
test result since interaction still exists after center grouping.

**Scenario 2:** In this scenario, there is again one small center
provides opposite trend compared to other centers, and cause
interaction to occur, which can be seen in figure 3. Then by
conducting F test, we get the following results which suggest that
there is no significant interaction (table 2). Again, we combine the
three small centers and do the analysis, figure 4 illustrates that
the interaction disappears after center grouping, which matches
with the outcome of F test. Thereby we conclude that the method
of detecting interaction using center grouping is valid, and if the
interaction disappears after grouping, we are then able to make a
general assessment of the drug across centers.

#### Center Grouping

Without loss of generality, consider this two-way factor random effects model [6]:

where π is the general mean, πΌ_{π} = π΄_{π} β π is the main effect of the π^{π‘β} level of π΄ with π΄_{π} to be the fixed effect due to the π^{π‘β} treatment
(factor A), π½_{π} = π΅_{π} β π is the main effect of the π^{π‘β} level of π΅ with π΅_{π} to be the random effect due to the π^{π‘β} center (factor B). We then
define π_{ππ} to be the cell mean and thereby define random effect due
to interaction to be (πΌπ½)_{ππ} = π_{ππ} β π β πΌ_{π} β π½_{π}. Following are some
constraints on this model:

Both Ξ±_{i} , Ξ²_{j} are independently identically normally
distributed with variance Ο^{2}_{A}
and Ο^{2}_{B}
and π_{πππ} is the random error
which is also independently identically normally distributed with
mean 0 and variance π^{2} [7]. Then as indicated in ScheffΓ© (1959), the
sum of squares is calculated as follows:

If centers (factor B) are combined, then it is easy to show that the
new *ss _{Ξ΅}* becomes

Therefore, we can get

which implies that

From these expressions, we make following conclusions. First, is not 0. Meanwhile, increases as number of centers (π½) being combined increases, and also depends upon the similarity of centers () being combined.
Based on this expression, it can be deduced that the change in *ππ _{πΈ}*
might depend on not only

*πΌ, π½, πΎ,*but also on the ratio of . Before combining the J centers, the treatment effect can be tested by

which follows a non-central F distribution under alternative hypothesis with non-centrality parameter and degrees of freedom πΌ β 1 and πΌ π½ (πΎ β 1).

After grouping, we can test the treatment effect by using the
new ππ_{π}:

We used simulation based to 20,000 iterations to generate
tables illustrating the power before and after center grouping. As
discussed above, since πΌ, π½, πΎ, the ratio and πΏ might affect the test results, we change these values and summarized the results of
power analysis in (Table 3-5). When conducting simulations, π^{2} is
set to be 1, Ο^{2}_{A}
and Ο^{2}_{B} are then set according to the ratios. In the
tables, π_{1} denotes the power before grouping, π_{2}

denotes the power after grouping, which is the power within each dummy center. The relative improvement is demonstrated by

Since even combining some small centers into a dummy center can have power increased, then it is certain that by grouping all dummy centers together, we can get a power which is significantly higher than before. We can thereby use the relative improvement in power as an indicator of the center grouping effect, and we aim to combine centers in the manner of reaching maximum Ξ.

From Tables 3-5, following statements can be concluded:

A. Power can increase significantly if the small centers are
properly grouped.

B. Maximum power increase in percentage (Ξ) can be reached by
making appropriate choice of the number of centers in each
dummy center (π½) if sample size per arm (πΎ) of small centers is fixed.

C. Under some circumstances with certain simulated data,
however, power may also decrease after center grouping.

#### Random Center Grouping

As mentioned above, the power of dummy centers could be higher than that of smaller centers, hence it is suggested to group these smaller centers together with the goal of increasing power. Since we would like to keep the generalizability of multicenter trial, it is not a good choice to group by geographical location although it is the easiest way to think of. The reason for this is that efficacy varies significantly between population groups with different demographic factors such as culture, environment, and income, therefore it is a better idea to group patients with different demographic factors together to remove the heterogeneity in results caused by these differences. It is recommended to form dummy centers by grouping small centers randomly, since the outcomes many be invalid with non- random center grouping [9]. We know that generally, the between-center variability can be evaluated by considering the following equation

, therefore, we argue that achieving an
unbiased estimate for means that the randomization
is valid. Assume we would like to group π½ small centers into some dummy centers each consists of π small centers, we propose the following randomization method and believe it to be valid. First of
all, randomly permutate the index π = 1, β¦ , π½ into π_{1}, π_{2}, β¦ , π_{π½}. Then we
assign first π indices in π_{1}, π_{2}, β¦ , π_{π½} to the first dummy center (Chow,
2011) and similar procedure for the rest dummy centers.

It is obvious that the above method for randomization is simple
random sampling without replacement, hence according to [9], the mean of each center within the dummy center can
be regarded as a simple random sampling without replacement
from a population of π½ small centers each with mean π¦Μ
_{.π.}, π = 1 , β¦ , π½ .
Thus, the sum of squares within each dummy center can be written
as

Therefore can be used to estimate . Lohr (1999) proved that given

is an unbiased estimator for

Thus,

Thus, the randomization procedure that we proposed for center grouping is valid.

#### Procedure for Random Center Grouping

In this section, we will introduce two methods for random center grouping based on the idea of selecting an optimal number of centers for achieving optimal statistical properties for treatment assessment in multicenter trials proposed by Shao and Chow (1993). The proposed method under two scenarios is summarized below.

##### Scenario 1

**Step 1:** From the rule of thumb proposed by Shao and Chow
(1993), we start with the optimal selection of approximately
study centers with approximately patients in each center
if there are π patients in total. Hence the first step is to calculate the value of to see around how many patients is optimal for a
center to have and how many centers should we have.

**Step 2:** Let be optimal number of patients per center,
be the optimal number of centers such that π = πK. Then
the study sites with number of patients greater than or equal to
π, or not much less than π will retain and suppose the number of retained centers is πΎ_{1}. The remaining πΎ_{ππππππ€} = πΎ_{πππ} β πΎ_{1} centers
should be randomly grouped into πΎ_{2} = πΎβ πΎ_{1} dummy centers if πΎ_{πππ}
is the number of centers before grouping.

If is the total number of patients
in the retained centers, then π_{2} = π βπ_{1} would be the total number
of patients in the centers which need to be grouped. Hence, and it is optimal that we can have which means that all dummy centers have the same number of
patients . However, the exact value of πΎ_{2} should be determined
using the tables like Table 1-3, since we would like to maximize the
power increase.

##### Scenario 2

**Step 1:** Step one here is the same as step one of Scenario 1.

**Step 2: Let be optimal number of patients per center,
be the optimal number of centers such that N=nK. Then
if some study sites have number of patients much greater than π, then these centers will retain and suppose the number of retained
centers is πΎ _{1}. The remaining πΎ_{ππππππ€} = πΎ_{πππ} β πΎ_{1} centers should be
randomly grouped into dummy centers such that the size of study
site after grouping is approximately equal to the size of retained
centers so that the overall power can be increased as mentioned
in the previous section. Similarly, the exact number of dummy
centers should be determined using tables like (Table 3-5) in
order to achieve a maximum increase in power. Following is a table
comparing the advantages and disadvantages of these two methods
(Table 6).**

#### An Example

Here is a demonstration of the proposed method for doing center grouping in multicenter trials with the existence of small centers. The example used here is a real clinical trial testing a drug against placebo for treating patients with metastatic breast cancer, where the trial is comparative, parallel-group, randomized, and double-blinded. In the study protocol, 288 patients should be recruited in approximately 43 centers so that the desired statistical power for evaluation can be achieved, and therefore each center should have 6 to 7 patients. Although the 43 centers speeded up the patient recruitment process, 7 centers had more than 10 patients in each while the other 36 centers enrolled less than 10 patients each, which cause a significant variation among centers. In consequence, it is essential for these small centers to be grouped into dummy centers with similar size not only to address Lewisβ questions [5] but also to make an unbiased and reliable assessment on safety and efficacy of the test drug.

Since 288 patients in total suggests that β288 β 17 would be the optimal number of patients in each center for the trial to have. As mentioned in the Section 6, the centers with more than 10 patients will remain as single centers. Secondly, grouping of the 36 centers with patients less than 10 must be put in consideration. Since among these 36 centers, 24 have patients in both treatment groups and 12 have patients in only one treatment group, we first consider grouping the 24 centers. Suppose

has around 2 patients per treatment group, then as it can be seen from table 6, if the size of the dummy center is selected to be 2, i.e., π½ = 2, the power will improve 11.262 %, i.e., Ξ = 11.262 %. Whereas if π½ = 3 or π½ = 4 is selected, the value of Ξ would be 8.005 % or 12.355 % respectively. As we aim to maximize the improvement in power, according to the table, the size of dummy center should be chosen to be 4. It is thereby suggested to group these 24 centers into 6 dummy centers randomly using the random method mentioned in Section 5 and each dummy center will then contain 4 centers selected at random from the 24 small centers. The total number of centers will be 13, and then randomly assign the patients from the 12 centers with patients in only one treatment group to the 13 centers. This example is summarized in Table 7.

#### Concluding Remarks

As we discussed before, the observed significant treatmentby- center interaction may just be by chance alone or may be due to the existence of large number of small centers. We thereby introduce random grouping to provide a way to check false-positive of the observed treatment-by-center interaction. Also, by random grouping one can remove heterogeneity within/across centers and therefore achieve better statistical properties (e.g. power) to make statistical inference. The two scenarios discussed in Section 7 provides the optimal ways for grouping under two different circumstances, and the selection of method should base on the actual situation of centers after the clinical trials. There are certain limitations to the methods that we proposed. Although we aim to increase the power through combining small centers, under some circumstances, however, center grouping can also decrease the power during simulation. Also, when doing the simulation, we assume the ideal case, which is to have same number of patients in each treatment group per center before and after grouping. Nevertheless, this is usually not the case in practice, hence it is possible in some real cases that the combined centers have number of patients still smaller than the optimal number of patients that we want to have in each center, but the power after combining centers is believed to increase in most cases.

#### References

- Chow SC, Liu JP (1998) Design and Analysis of Clinical Trials. (3
^{rd}edn) John Wiley & Sons, New York, pp: 892. - FDA (1988) Guideline for the Format and Content of the Clinical and Statistical Sections of New Drug Applications. US Food and Drug Administration, Rockville, Maryland, USA.
- ICH (1995) Guideline for Structure and Content of Clinical Study Report. International Conference on Harmonization, Yokohama, Japan.
- Bayman EÖ, Chaloner K, Cowles MK (2010) Detecting qualitative interaction: A Bayesian approach. Statistics in Medicine 29(4): 455-463.
- Lewis JA (1995) Statistical issues in regulation of medicine. Statistics in Medicine 14(2):127-136.
- Siddhartha Nandy (2018) Chapter 19: Two-way ANOVA [lecture notes]. Applied Regression Analysis STAT 512.
- Scheffé H (1959) The Analysis of Variance. Wiley, New York.
- Shao J, Chow SC (1993) Two-stage sampling with pharmaceutical applications. Statistics in Medicine 12(21): 1999-2008.
- Chow SC (2011) Controversial Statistical Issues in Clinical Trials. Chapman and Hall/CRC Press, Taylor & Francis, New York.
- Chow SC, Shao J, Wang H, Lokhnygina Y (2018) Sample Size Calculations in Clinical Research. 3
^{rd}Edition, Chapman and Hall/CRC Press, Taylor & Francis, New York, 107(31-32): 552-556. - Sharon L, Lohr (1999) Sampling Design and Analysis. Duxbury Press, Pacific Grove, CA 93950.
- Pan G, Wolfe DA (1997) Test for qualitative interaction of clinical significance. Statistics in Medicine 16(14): 1645-1652.