競馬データにみられる統計的偏りについて （1）
The purpose of the present paper is to report what type of statistical bias the author has found in the horse racing data (based on [3]). In order to explain the type of statistical bias, let us consider a racing with m participants. We denote by {a, b, c}(1 ≦ a < b < c ≦ m) a set of numbers of the first, second and third racehorses to reach the goal. The number of each participant is determined by lot, which leads us to the following null hypothesis: H0: A set {a,b,c} is nothing but a result of random sampling from the set {1,2,…,m}. Studying the probability distributions of various random variables arising in the random sampling of H0 , we are in a position to examine, by means of the chi-square test, how frequency distributions observed in the data mentioned above deviate from the expected ones under the null hypothesis H0 . Our method of contracting the original data consists of studying two random variables, R = c - a (the range) and D = min {b - a, c - b} (the adjacent interval of three numbers), as well as the following pair of partitions of the total event: A0 = {2b = a + c}, A1 = {2b < a + c} and A2={2b > a + c} ; B0 = {a + b = c}, B1 = {a + b < c} and B2 = {a + b > c}. In this paper (1), we take up three racetracks, Chukyo, Hanshin and Kyoto, to examine all racings of m = 16 (and also m =14) carried out on these racetracks. Indeed, we sum up the original data into two kinds of contingency tables, the one corresponding to the joint probability distribution of (R,D) and the other to the 3×3 probability table of the product events Ai ∩ Bj (i, j = 0, 1,2). Performing the chisquare tests for these contingency tables, we are able to detect some types of statistical bias for each racetrack. Furthermore, these results tell us interesting dependency of the type of detected bias upon the racetrack, which suggests that the individual character of racetrack can be extracted from the long-term racing files [3].
