Volume 20 - Issue 4

Research Article Biomedical Science and Research Biomedical Science and Research CC by Creative Commons, CC-BY

Biomathematical Analysis on the Evaluation Scheme for Large Scale Innovation Competition

*Corresponding author: Bin Zhao, School of Science, Hubei University of Technology, Wuhan, Hubei, China.

Received: November 14, 2023; Published: November 23, 2023

DOI: 10.34297/AJBSR.2023.20.002736

Abstract

Large-scale innovation contests are becoming increasingly popular globally, with many innovators and enterprises actively participating. Work allocation optimization and fair judging are two significant concerns in innovation contests. This academic paper aims to recommend an optimized “cross-assignment” program by leveraging the optimal objective function method to improve the comparability of scores given by various judges. We conducted data-based descriptive statistical analysis and concluded that the two-stage and weighted evaluation schemes are more beneficial than the traditional judging scheme. Nonetheless, there are still some shortcomings that require addressing. To enhance fairness, we propose an improved two-stage evaluation scheme. In the first stage, we normalize the scores with a normal distribution. In the second stage of the process, we implement a system utilizing the Borda sorting technique to categorize submissions into five distinct groups for judges to evaluate based on their perceptions. We also detail a method for weighting tied scores to determine the final rankings. Testing indicates that this approach yields a Normalized Discounted Cumulative Gain (NDCG) of 0.8667, implying greater fairness and precision in the assessment of submissions.

Keywords: Evaluation program, Normalization of a normal distribution, Borda sorting method, Grey correlation analysis, Analysis of variance

Introduction

Large-scale innovation-based competitions are an effective means of fostering science, technology, innovation, and entrepreneurship development. They attract innovators from different fields to bring innovative solutions to society. The judging program largely determines the success of a competition. Its fairness and transparency are essential to attract more talented participants. Therefore, to ensure the sustainability of large-scale innovation competitions, it is necessary to conduct research and improve the effectiveness of the judging program. There is a lack of a standardized judging mechanism in these competitions. A two-stage (online and on-site judging) or three-stage (online judging, on-site judging, and defense judging) process is usually used. The critical aspect of this type of competition is innovation. Innovation refers to the ability to perceive what others do not understand. Evaluation of the same work by different experts can lead to divergent opinions, while innovation leads to novel solutions to problems. It is, there fore, essential to develop an unbiased, impartial, and systematic innovation competition selection scheme to ensure credibility and recognition.

The design and improvement of evaluation programs for largescale innovation competitions have long interested scholars and experts in various disciplines, driven by recent rapid advances in science and technology. In their 2006 co-authored publication, Henry Chesbrough, et al. introduced the concept of open innovation from its inception [1], discussed its implications for competitions, provided a framework for evaluating large-scale innovation programs, and explored the impact of open innovation for matches. Innovation is the result of discovery, and societal progress results from innovation. The evaluation of innovative ideas has generated much debate, and there are ongoing efforts to establish unbiased evaluation methods, including the framework developed by Poetz and Schreier in 2012. Their model emphasizes the importance of involving regular users to achieve diverse innovations [2] while considering feasibility and social impact.

You Qinggen [3] has developed a user-friendly evaluation index system for experts, which provides a theoretical contribution to improving the current evaluation indexes. In addition, Changchao and Minglong [4] investigated the feasibility of evaluation using AHP hierarchical analysis, grey cluster analysis, and other algorithms in the “Challenge Cup” start-up project. For large innovation competitions, such as joint national and provincial competitions, existing programs are typically based on those developed for smaller competitors, which are not practical to use and tend to be inconsistent in large rounds, leading participants to question the results.

This paper will, therefore, focus on the following. For innovation competitions of significant scale, we offer a thorough evaluation process. Our approach encompasses suggesting an ideal “cross-distribution” format for the blind judging phase, refining the calculation of the standard scores to augment the impartiality of the verdicts, and producing an appropriate evaluative framework for resolving contentious submissions. There are both practical and theoretical implications to our research. These models theoretically improve the current evaluation criteria and suggest a methodology and confidence level for the evaluators assessing the projects submitted to the competition. We have advanced relevant theoretical research by identifying, raising, analyzing, and resolving issues and integrating them into a coherent theoretical research framework based on the work of innovation competitions.

Data

The data utilized in this paper was provided by a large-scale competition based on innovation. Technical term abbreviations, when used, were explained. The language used was clear, objective, and value-neutral, with a formal register. Biased phrasing was avoided. It was divided into three copies, each undergoing two stages of judging. Five experts assessed the entries in the first stage and generated raw and standard scores. The structure of the paper followed the conventional academic sections and maintained consistent author and institution formatting. The text was precise, free from grammatical, spelling, and punctuation errors, and presented a logical flow of information with causal connections between statements. In the second stage, another panel of three experts reviewed the entries, generating raw and standard scores and a concordance score.

In the initial phase, the mean scores of the five experts were calculated, and the pieces positioned within the top 16% of all teams were admitted to the subsequent evaluation stage. After reassessing the standard scores and making appropriate adjustments to the standard scores of a few works with significant differences, the standard scores of the five experts in the first stage and the standard scores of the three experts in the second stage will be averaged into four scores. The resulting scores will then be ranked based on the final total scores to establish the ranking of the works. The dataset comprises 3,000 teams and 125 experts. Each piece of work was randomly assigned to five experts in the first stage, while three experts were given in the second stage. The experts worked independently without interacting with one another throughout the process.

Controlling Subjective Factors Experimental Design

With judges applying their independent criteria, innovation competitions suffer from inconsistent scoring. Lenient judges may award higher scores, while stricter judges award lower [15]. Contributing to this subjectivity is the lack of clearly defined exam-like criteria. The study found that the main influences on the subjective rating of the index are emotional factors related to the subject and object of the assessment, the methods, and mechanisms used in the evaluation and that the expert’s ability to judge is not reflected in the actual rating [16]. To improve the evaluation process, a more impartial method should be used. We conducted a hypothesis test to confirm the accuracy of our assumption, and the results confirmed our hypothesis. The first stage of the scoring process uses the normal distribution of judges’ scores for standardization purposes. This method removes the influence of the judges’ subjective opinions, thus ensuring fairer and more objective scores [17]. Therefore, the independent criteria of each judge no longer affect the scores but rather align with the distributional characteristics of standard normal distribution, enhancing consistency and fairness. In addition, a new ranking approach based on the Borda sorting method has been applied to rank the entries in the second round of judging [18].

Establishment of Nonlinear Programming Model

A comprehensive two-stage integration method has been proposed [19]. In the first stage of the evaluation, the scoring results of each expert are analyzed employing a normal distribution. Then, the distribution is normalized to ensure that each expert adheres to the same scoring criteria [20]. In the second stage, we have adopted a new categorization approach based on the Borda method [21,22]. The main objective is to categorize the works submitted for the second evaluation into five ABCDE grades: A has 5 points, B has 4 points, C has 3 points, D has 2 points, and E has 1 point. Each work is evaluated by three experts who assign a grade according to the content of the work and the corresponding score. They then sum up the total scores of all the pieces. Each submission is evaluated by three experts who assign grades based on the quality of the content and subsequently provide corresponding scores. The final score is then calculated by summarizing the individual scores given by each expert.

The maximum number of points that can be obtained is 15, while the minimum is 3. Innovation competitions, such as modeling contests, generally list the first, second, and third prizes without ranking. Therefore, the works need to be sorted for segmentation. Results are divided into different phases during the segmentation process if they score similarly. At this point, the second weighting can be applied [23,24]. Firstly, the weight of the scores from the three expert judges is obtained. Then, the weighting is multiplied by the corresponding scores and added up. By doing so, different total scores are accepted, and the highest score is considered a reasonable and fair ranking. This segmentation makes it easier to identify the winning entries in creative competitions. It also ensures that the judging aligns with the predetermined award standards [25]. For better modeling in the second stage of judging, we have made the following assumptions and limitations:

Limitations

Each work is scored by three experts and categorized into five grades: A, B, C, D, and E.

The scores of the works are calculated based on the scores corresponding to the grades.

The total score of the work is the sum of the scores rated by the three experts.

The total score of the work is within the range [Tmin, Tmax].

The works are ranked according to the total score, with the highest score being the first place.

The entries are segmented based on their total scores to determine the first, second, and third-place entries.

The specific flowchart of the two-stage judging program is shown in Figure 7.

Biomedical Science &, Research

Figure 7: Model building flowchart.

In this flowchart:

First, let () represent the probability density value at a given . represents the mean, represents the standard deviation, represents the circumference, and exp represents the natural exponential function normalized as follows [26]:

Let represent the transformed random variable, represent the original customarily distributed random variable, μ represent the mean of the original random variable, and present the standard deviation of the original random variable. The normalized average distribution formula is as follows:

Finally, the Borda ranking method is divided into the following five main steps:

Step 1: Calculate the degree of affiliation. can be expressed as the result of the jth evaluation method of the ith review expert. It is a simplistic normalization:

Step 2: Calculate the fuzzy frequency number:

Step 3: Transform the ranking into a score:

Step 4: Calculate the Borda number:

Where the more significant the value of 𝐵i , the higher the rankings are

Results and Discussion

To ensure the fairness, impartiality, and scientific validity of the judging process in the innovation category of the competition, we investigate whether the subjective evaluations of the experts affect the judging results. As a result, we have designed a two-stage scoring system consisting of standard distribution standardization and a Borda ranking-based sorting method. This approach allows us to optimize the judging process and reduce the potential influence of human factors on the final decision [27]. We implemented a twostage scoring scheme using Matlab based on competition scoring data. In the first stage, we normalized the scores using the normal distribution. To achieve this, we calculated the normal distribution of each judge’s score and obtained the normalized result. The figure below shows the normal distribution of some judges’ scores and the resulting normalization process.

Four judges (P005, P022, P127, and P230) were randomly selected from a pool of 125 experts. As can be seen in Figure 8, their scores met the criteria for a normal distribution [28]. This procedure aimed to establish consistency in the scoring criteria of each expert. Figure 8 shows the normalization of the experts’ scores in the first review. This was achieved by standardizing the distribution. Figure 9 shows the results of the first review, which involved developing ratings and league tables. By standardizing the scoring process, we were able to eliminate scoring bias. By standardizing the scores, we stopped the discrimination by making the scores follow a standard normal distribution. This process ensured that the scoring results would be fair by removing the bias caused by different scoring criteria.

Biomedical Science &, Research

Figure 8: Normal Distribution of Ratings of Selected Experts in the First Review.

Biomedical Science &, Research

Figure 9: Normalization of the Normal Distribution of Selected Expert Ratings in the First Review.

As shown in Figure 10 we have obtained the scores and rankings from the first round of evaluations. By standardizing the scores, we successfully eliminated the bias in the scoring. Bias in scoring can occur due to different judges or scoring criteria, resulting in unfair scoring results. By normalizing the scores using a normal distribution, we transformed the scores into a standard normal distribution, eliminating the bias caused by different scoring criteria and ensuring a fairer scoring result [29].

Biomedical Science &, Research

Figure 10: Ranking table of scores after normalization in the first review.

In the second stage of the evaluation, we implemented an improved approach based on the Borda ranking method, as illustrated in Figure 11. We obtained the results of the second round of evaluations through this new ranking method based on the Borda ranking method. Since the competition adopted a two-stage evaluation model, this represents the review’s outcome. We can observe the ranking and positions of each team. To prove the validity of our sorting method and model, NDCG Normalized Discounted Cumulative Gain (NDCG) is used for validation. The value of NDCG ranges from 0 to 1, where 1 indicates the best sorting result, and 0 indicates the worst. Higher NDCG values indicate better sort quality and lower NDCG values indicate poorer quality [30]. In general, the following general guidelines can be used to evaluate NDCG values:

Biomedical Science &, Research

Figure 11: Ranking of scores in the second review.

When 1>NDCG ≥ 0.8, it is an excellent sorting result;

When 0.8>NDCG≥0.6, it is a good sorting result;

When 0.6>NDCG≥0.4, it is a fair sorting result;

When 0.4>NDCG≥0, it is a poor sorting result.

We calculated the NDCG result of the reordering model for the second judging by Matlab as 0.8667>0.8, which means that our proposed new ordering model will get an excellent ordering result in the judging process.

This paper proposes a more streamlined judging technique that divides the competition judging into two phases and implements different judging methods. First, we applied a standardization technique using a normal distribution to evaluate the scores of each judge. This ensured that the judges had identical scoring criteria, thereby increasing the impartiality of the scoring. The first judging stage was crucial, enabling many outstanding teams to be selected. The second stage of the judging process was introducing a new judging method based on the Borda method. The entries were divided into five levels, with the judges assigning a score to each group based on their judgment. This method ensures that the evaluation process is fair and objective. The final ranking of each entry was determined by the total number of points it received. When multiple entries receive the same score, a weighting process increases the accuracy and fairness of the final ranking results. In addition, our methodology shows practical solid performance. It has been validated by NDCG scores, which confirm the successful control of subjective factors and the production of quality ranking results in the judging scheme. This ensures a fair and accurate judging process. It also guarantees a reasonable ranking of competition entries.

Future Work and improvements

The judging scheme proposed in this study for large-scale innovation contests can be applied to innovation contests and various other large-scale assessments and evaluations, such as art tests, exams of multiple levels, and elections of public officials, while achieving a more equitable level of accuracy. Its benefits are the enhancement of impartiality, consistency, and objectivity throughout the assessment process. Implementing and designing large-scale innovation competitions is a complex process influenced by several factors that impact each other. This paper focuses on controlling subjective factors that influence the scoring of entries and only provides solutions for controversial entries. To ensure the quality and longevity of competitions, further research into additional features for factor analysis is necessary, and a continued focus on improving the judging scheme is imperative.

Conflict of Interest

We have no conflict of interests to disclose and the manuscript has been read and approved by all named authors.

Acknowledgments

This work was supported by the Philosophical and Social Sciences Research Project of Hubei Education Department (19Y049), and the Staring Research Foundation for the Ph.D. of Hubei University of Technology (BSQD2019054), Hubei Province, China.

References

  • Chesbrough HW, Vanhaverbeke W, West J (2006) Open Innovation: Researching a New Paradigm. Wim Vanhaverbeke 84(1): 1259-1262.
  • Poetz MK, Schreier M (2012) The Value of Crowdsourcing: Can Users Really Compete with Professionals in Generating New Product Ideas? Journal of Product Innovation Management 29(2): 245-256.
  • You Qinggen (2014) Research and Implementation of Expert Selection Model for WISCO Research Project Review. Huazhong University of Science and Technology 32(7): 56-60 .
  • Kim Y, Kim Y, Kim J,Lee S, Kwon S (2009) Boosting on the functional ANOVA decomposition. Statistics & Its Interface 2(3): 361-368.
  • Kannemann K (2010) The Exact Evaluation of 2-way Cross-classifications: An Algorithmic Solution. Biometrical Journal 24(2): 157-169.
  • Yonghao S, Xiao S, Bin H (2018) Team Building Algorithm Based on Multi-Objective Greedy Strategy for Gain Maximization. High Technology Communications 28(4): 279-290.
  • MielikInen T, Ukkonen E (2006) The Complexity of Maximum Matroid-Greedoid Intersection and Weighted Greedoid Maximization. Discrete Applied Mathematics 154(4): 684-691.
  • Jia Tianli (2005) Hypothesis Analysis Model of the Causes of Mean Differences. Mathematics Practice and Understanding 64(9): 212-215.
  • Daoji S (1988) An analysis of variance test for the extreme value distribution. Tianjin Daxue Xuebao 1988(2): 116-121.
  • Shapiro SS, Wilk MB (1975) An analysis of variance test for normality (complete samples). Biometrika 67(3): 215-216.
  • Guohong S, Zhenhui Z (1999) Application of Gray Correlation Analysis in Fault Tree Diagnosis. China Safety Science Journal 30(2): 1505-1507.
  • Liu G, Yu J (2007) Gray correlation analysis and prediction models of living refuse generation in Shanghai city. Waste Management 27(3): 345-351.
  • Victor G, Juan DJTM (2022) The Eurovision Song Contest: voting rules, biases, and rationality. Journal of Cultural Economics 47(2): 247-277.
  • Looney, Marilyn A (2004) Evaluating Judge Performance in Sport. Journal of Applied Measurement 5(1): 31-47.
  • Cliff MA, King MC (1996) A proposed approach for evaluating expert wine judge performance using descriptive statistics, Journal of Wine Research 7(2): 83-90.
  • Yonglin W (2017) Subjective Indicators in Educational Evaluation and Factors Affecting Their Judgment. Education Science 33(3): 14-19.
  • Kai X (2017) Exploration and Research on Retrospective Evaluation Selection Method Based on Peer Review Experts. Beijing University of Chinese Medicine 7(4): 47-50.
  • Ryo I, Kazumasa O (2022) Borda Count Method for Fiscal Policy: A Political Economic Analysis. The Institute of Comparative Economic Studies, Hosei University 36(7): 25-40.
  • Chen Yuan (2011) Research on Decision-Making Methods for Scientific Fund Project Review and Selection. Northeastern University 9(5): 23-26.
  • Lee SM, Kim KH, Kim EJ (2012) Selection and Classification of Bacterial Strains Using Standardization and Cluster Analysis. Journal of Animal Science and Technology 32(6): 54-56.
  • Rui Z (2015) Ranking Risk with Borda Method. Enterprise Reform and Management 32(11): 154-156.
  • Orouskhani M, Shi D, Cheng X (2021) A Fuzzy Adaptive Dynamic NSGA-II With Fuzzy-Based Borda Ranking Method and its Application to Multimedia Data Analysis. IEEE Transactions on Fuzzy Systems 29(8): 118-128.
  • Huanling T, Jingdong W, Yuchang L (2004) A New Weight Allocation Strategy to Reduce Similar Topic Classification Errors. Computer Engineering and Applications 58(13): 185-188.
  • Zeyan W, Hongfang G, Xiaoxin Y, Shenru Z (2003) A Linear Combination Weighting Method Based on Entropy. Systems Engineering Theory and Practice 69(3): 112-116.
  • Kilgour, Marc D, Grégoire (2021) Correction to: Weighted scoring elections: is Borda best? Social Choice and Welfare 58(2): 1-2.
  • Pierrat L, Samuel K (2013) Standardization of the Logistic Distribution based on Entropy. International Journal of Performability Engineering 9(3): 352-354.
  • Jian W, Qianqian M, Hu-Chen L (2021) A meta-evaluation model on science and technology project review experts using IVIF-BWM and MULTIMOORA. Expert Systems with Applications 36(3): 68-70.
  • Nianri K, Chaomo Z, Yingwei W (2010) Application of Normal Distribution Method in Well Logging Curve Standardization in M Oilfield. Journal of Yangtze University (Natural Science Edition) 7(4): 76-78.
  • Aroian, Leo A (1996) Handbook of the Normal Distribution. Technometrics 25(1): 112-115.
  • Taylor M (2008) ABSTRACT SoftRank: Optimising Non-Smooth Rank Metrics. Inproceedings 7: 77-86.

Sign up for Newsletter

Sign up for our newsletter to receive the latest updates. We respect your privacy and will never share your email address with anyone else.