Investigation of the Effects of Simple Outlier Replacement Protocols in Forecasting Analyses: Are they RobustinUtility?
Article History
Received: 30.12.2020, Revision: 10. 01.2021, Accepted: 20. 01.2021, Published: 30. 01.2021
Author Details
Frank Heilig1 and Edward J. Lusk2
Authors Affiliations
1Strategic RiskManagement, Volkswagen Leasing GmbH, Braunschweig, Germany
2Emeritus: The Wharton School, [Dept. Statistics], The University of Pennsylvania, USA & School of Business and Economics, SUNY: Plattsburgh, USA & Chair: International School of Management: OttovonGuericke, Magdeburg, Germany
Abstract: Outliers caused by errors are troublesome in any forecasting analyses. Sometimes errors can be corrected; errors that are related to aggregated or downloaded data often can be identified but sometimes cannot be corrected. In the latter case, the analyst employs outlier screens; if the screen signals a likely outlier, then common practice in the forecasting domain is to replace the suspicions Panel datapoint. A standard and very simple replacement protocol is to replace the outlier with the Average of the Nearest Neighbor Panelpoints. This is the ANNreplacement protocol. Previous research indicates that more than 50% of the time the ANNprotocol is likely to be Smoothing re: the relative OLS Regressionstandard error. There is a predilection to rationalize the use of simple replacement protocols, such as the ANN, by assuming that they are “RobustinUtility” in the sense that if they are used when not needed they are usually neutral in their effect—thus, they can only be useful. To test this “rationalization”, we (i) collected accounting information to be forecasted from firms on the Bloomberg™ terminals for Income Statement and Balance Sheet sensitive variables, and (ii) formed four informative forecasting effectvariables to measure the impact of the ANNmodifications. Using an impact ±2.5% ComfortZone, we find that the ANNprotocol had a dramatic effect on relative precision contrary to the often asserted Robustness.
Keywords: Vetting, Smoothing, Provoking, Transformations, Precision, Confidence Intervals.
1. Introduction
1.1 Context In a recent research report, (Heilig F et al., 2020) [H&L] investigated various aspects of the effect of the ANNoutlier Panel replacement protocol with respect to its effect on the 95%Confidence Intervals of the Excel OLStwo parameter [Intercept: and Slope:] linear forecasting equation[OLSR]. The ANNprotocol is an acronym for the Average of the Nearest Neighbor Points. Specifically, if the Panel of data under examination is:
Panel: {
Where: represents an error of such a magnitude that is identified as an outlier using a standard outlier screening protocol.
In this case, the ANNreplacement is: )
H&L selected a Panel of 12 Time Series [TS] observations from 22 firms randomly selected from the Bloomberg Markets platform [See the Appendix] and examined three ANNreplacements:
ANN: Replacement[Both[Early & Late]]: &
ANN: Replacement[Late]: The Next to Last Panel Point: , and
ANN: Replacement[Early]: The Second Panel Point: ,
Their effectvariable of interest was the ratio:
:
Where: represents the Precision of the 95%OLSR Confidence Interval, and the is the root of the Mean Square Error[MSE] of the OLSRfit for the Panel [] under analysis.
If is <1.0 the effect of the ANNmodification is labeled as: Smoothing as the OLSRMSEvariation created by the ANNmodification decreased relative to the OLSRMSEvariation of the BASIC—unmodified series. Thus, Smoothing produces a CI with smaller precision than was the case for the Basic Panel. Provoking is the designation for ratios > 1.0, and No Effect is the label if the ratio =1.0. H&L report of the 488 tests 314 were Smoothing or 64.3% [314/488].
Further, they created a Screen, called the ComfortZone [CZ] for reclassifying the Smoothing or Provoking ratioeffects of ±2.5%: [97.5% ; 102.5%] or [0.975 ; 1.025]. Specifically, if the Smoothing ratio was < than 0.975 it was labeled as a Serious Smoothing Effect; for ratios >1.025 a Serious Provoking Effect was recorded. H&L note, in this CZcalibration, there were 201 Seriousevents and 173 were Smoothing:[Both[79] + Late[51] & Early[43]] or 86.1% [173/201] Also see Table 2 [Panel:A: Col[RRP]] in this research report.
1.2 Details of the Research This Smoothing tendency presented by H&L was heretofore not reported in the peerreviewed literature. For this reason, their Smoothingstudy begs additional testingarms. This is the point of departure of our reanalysis of their data. Specifically, we:
present a behavioral context that tacitly cajoles replacement of outliers,
detail the essentials of the Time Series[TS] Regression Forecasting Model used to form the forecasts and the related 95%Confidence Intervals,
extend and detail the TSANN Replacement Protocols used to create comparative ANNinformation,
discuss the Accrual of the Firms and their sensitive account Panel variables used to develop the TSforecasts of the accounting data,
offer various Vetting Tests, the intention of which, are to engender understanding and confidence in the meaning of inferences drawn from the various probability tests to be reported,
detail four [4] variable measures sensitive to: Forecast & Precision ModificationEffects,
present detailed Tabular Profiles of all the study impact variables and selected Inferential Illustrations that can be used by the readers to examine aspects of the modifications that would inform their decision needs, and
offer a Summary and an Extension of this study.
2.The Behavior Context for Outliers
2.1 Overview The label “Outlier” has a pejorative linguistic connotation that all but requires the analyst to “take a corrective action”. The MerriamWebster^{i}: definition of an outlier in a statistical context is:
A statistical observation that is markedly different in value from the others of the sample; values that are outliers give disproportionate weight to larger over smaller values.
This standard definition effectively “prods” the decisionmaker [DM] to “do something about this problematic or troublesome datapoint” that has “objectively” been labeled as an outlier. Thus, this interesting coupling of {Outlier & Corrective Action} has promoted the tendency to replace data in TSPanels or at least compensate for them in the modeling process. In most standard texts, such as the one that we used in our forecasting course, (Hanke J et al., 2003), there are advisories against making an inference where there are likely to be outliers in the dataset. For this reason, many forecasting and dataanalytic software platforms offer a number of techniques for identification of outliers with the obvious intention to “deal with them in some acceptable manner”. For example, the SAS^{} (2014) publication: Econometrics and Time Series Analyses in the Regression Section [Ch 9[p. 120]] notes regarding options for this analytic platform:
Specify Outlier Options: Enables you to control the reporting of additive outliers (AO) and level shifts (LS) in the response series.
Additionally, SAS has a model that can be used to “tease out” the effect of possible outliers called: Recursive Partitioning^{} based upon elimination or repositing model parameters^{ii}.
Usually, outliers are identified using an outlierscreening platform. The “goldstandard” screening protocol is the BoxPlot (Tukey J, 1977) where points outside of the whiskers are designated as outliers thus, all but requiring, that an action be taken. Further, the SAS/JMP^{®} Statistics and Graphics Guide (2005) lists the Histogram, Normal Quantile Plot, Quantile BoxPlot^{}, and the Stem & Leaf^{} as possible outlierscreens.
2,2 Focus of this research report: Testing the ANNeffects As context, it was detailed above that there:
is a linguistic behavioral “imperative” for analysts to correct for outliers,
is a plethora of outlier screening platforms, the use of which is synonymous with best practices in creating meaningful forecasts and/or dataanalytic recommendations in the Audit context, and
Are numerous analyticplatforms that have output conditioning platforms to adjust for outliers.
These three features combined with “a” conventional wisdom in the practicing forecasting community that replacing outliers is (i) critical—usually true, and (ii) if one uses Simple and Logical replacement protocols that this mode d’emploi would enhance the information content of the forecast. The principal support for this logicalmontage is that Simple and Logical replacement protocols are robust–to wit: They would have No Material Effect on forecasts drawn from the basic Panel IF there were to be no actual outliers. Thus, such protocols would be harmless if used when not needed and so can only be useful. This concept, as it pertains to OutlierProtocols is best characterized as Robust–in–Utility.
Given this preamble that borders on a homily, we should expect that an extended study that addresses the impact of replacement of outliers would be of practical value to the practice of forecasting. Such a study could be a baseline for the Missing Data Panel replacement protocols in a timeseries context. Thus, the focus of this research extension is to address the presumed RobustnessinUtility of the ANNProtocol. In this regard, we will:
Investigate the effects of ANNreplacements for: (i) meaningful and sensitive forecasting variables composed of {Forecasts & Precisions}, (ii) varying Panellengths, (iii) examining the symmetry of these ANNeffects, and (iv) vetting the “2 to 1”Smoothing percentage reported by H&L.
3. The Forecasting Model
To generate a forecast, , we will use the Excel™ Platform [DataAnalysis[Regression]]. This form will be the OLStwo parameter linear [Intercept & Slope[Trend]] Time Series Regression model [OLSR] used to forecast from a TSPanel. In the TSversion, the dependent variable is the traditional Yvariate [Ordinate] and the integer Time Index is the equally spaced independentvariate [Abscissa] where: the first point is indexed as 1 and subsequent points follow as integers to the last point all of which are used in the OLSfitting.
3.1 OLSR Inference from the Excel Parameter Range Model The Excel Regression functionality [Data[DataAnalysis[Regression]]] forms a “widecovering” confidence interval for the oneperiod ahead forecasts compared to the Fixed or Random Effects projections. See (Gaber M, et al 2017). For this reason, we are NOT interested in the capturerate of these Excel 95%CIs; almost certainly, these wide95%CIs will capture most of oneperiodahead holdbacks. We are forming these 95%CIs as they will be used in the creation of sensitive test variables; and also, the ratio of the precisions of 95%CIs is the ratio of the OLSRStandard Errors[ and so is a simple measure for calibrating the effect of the ANNmodification Smoothing or Provoking.
3.2 Instructive Illustration It is usual to script an illustration to clarify these computations. This will be done using the data from the Hershey Corporation, Inc. [HSY^{iii}] downloaded from the Bloomberg Markets^{™} platform.
Table.1 HSY Panel: Other Assets: Deferred Charges
1227.158 
1088.453 
1280.824 
830.825 
884.83 
829.813 
800.819 
967.283 
1064.809 
1223.922 
1255.313 
1530.307 
For a oneperiod ahead forecast, h=1, we produced the following information:
Forecast (1)
1,214.45 = ] (1.a)
The TSversion of the 95%CIs for the HSYdataset are:
Extreme Left Side [LowerLimit [LL]] 95% Boundary for the first projection horizon [h=1] is:
(2)
(2.a)
Extreme Right Side [UpperLimit [UL]] 95% Boundary for the first projection horizon [H1] is:
(3)
1 (3.a)
Using the usual definition of precision—i.e., 50% of the spanninglength of the confidence interval—the precision of this forecast is: 50% × [2,071.6 357.3] = 857.2. Note, as a computationcheck, that the midpoint of the confidence interval is the forecast—e.g., [[357.3 2,071.6] /2] = 1,214.45. Finally the Mean Square Error[MSE] is 51,053.65 and so the Standard Error[ of the OLSR is: = 225.95.
4. Modification Protocols: The Account Base
4.1 Accounting Variable Set for Forecasting We used the exact same H&Ldataset so as to control for intertest randomization effects. For each firm, we selected four (4) Income Statement variables: {Gross Profit; Operating Income; Earnings for the Common Shareholders; Shares for Diluted Earnings per Share} and four (4) Balance Sheet Statement variables: {Current Assets; Other Assets & Deferred Charges; Current Liabilities; Current Ratio}. For each of these eight accounts, we selected 12 datapoints starting [2005: ] to [2016: ].
4.2 Panel Modifications Download Panel: No modifications: Baseline Panel [BP]: { . To test for the PanelLength effect we created two variations from the BP[n=12]. The MidLength Panel [MP] was created by removing three Points: { from the Download Panel:BP. MP[n=9] is then: { . Finally, the ShortLength Panel [SP] removed three Points: { from MP[n=9]. SP[n=6] is thus: { . For each of these three Panels: {BP; MP & SP} datasets, we made the following three ANNreplacements.
ANN: Replacement[Early]: The Second Panel Point: ,
ANN: Replacement[Late]: The Next to Last Panel Point: , and
ANN: Replacement[Both[Early & Late]]: &
Thus, there are ninetestarms: {BP[E:L:B]; MP[E:L:B]; SP[E:L:B]}
4.3 Nature of the Replacement Protocols The critical questions of this research report are: For the ANNprotocol, what are the (i) magnitude, and (ii) nature of the effects relative to the BP likely to be? If the conventional wisdom of RobustinUtility, is correct re: the ANNprotocol, then there should be basically No or Unimportant effects relative to the BP—to wit, inferentially the ANNprotocol will be RobustinUtility. The counter point thus is the H&Lresearch report where about twothirds of the time the ANNmodifications created important Smoothing effects. As noted above, one of the testfoci of this research report is to vet the H&L indication of a Smoothing tendency.
4.4 Illustrative Profile At this point, it would be useful to offer a graphical profile of the nature of the effects created by the ANNprotocol. For example, for the HSYPanel[Table 1], we made the ANN[Both] transformation. Results: The BP[ ] was 225.9506 and the ANN:Both[ ] was 244.5366. This ratio: [ANN:Both[ ] / BP[ ]] is 1.0823. This is, of course, also the ratio of the Precisions of the 95%CIs of the two series: [927.67/857.16]. Thus we can reject that the ANN[Both] is, by nature, Smoothing as 1.0823 > 1; this indicates that the BP[ANN[Both]] is Provoking. The graphic is most illuminating:
Figure is Available in PDF Format
Figure 1 Plot of a Provoking Replacement
The dashed lines are the OLSR:Excel regressionlines [  ] of the BP and the BP[ANN]. The graphic highlights the change at points n=2 and n=11 where the ANN[Both] modification was effected. Essentially, the transformation moved the transformed series “away” controlling for the OLSRreorientation effected by the ANN, from the regression line of the BP series and this is why BP[ ] went from BP[ ]=225.9506 to 244.5366 for the ANN:Both[ ]. Specifically, the Both[ANNtransformation] shifted Point[2] and Point[11] away from the regression lines formed from the BP and the BP[ANN] datasets and overall the orientations resulted in an increase in the OLSdistance from the respective regressionline. However, the operative issue is also the magnitude of the Provokingeffect that will be consider subsequently.
5. Testing Variables: The Inference Context
To develop informative profiles that address the forecasting and precision effects of the proposed ANNreplacement modifications {BP, Both, Late & Early}as well as the {Small, Medium & Large}blocking, we will create four variables in Sections { 5.2, 5.3, 5.4 & 5,5}. Additionally, to address the magnitude of possible effects of the ANNProtocol we will recalibrate the “ComfortZone” used by H&L.
5.1 Recalibration of the H&Lversion of the Comfort Zone [CZ] The logic of using a CZfilter is to focus on “Events of Interest”. Rationale: In any measurement calibration in a forecasting context using accounting data from firms traded on active exchanges, there is stochastic variation that is generally labeled as “noise”. This being the case, eventmeasurements falling into this unavoidable “noisy” stochasticzone are of little analytic interest. To focus on Events of Interest, we have selected a bandwidth of 5% as this is also the usual calibration for the False Positive Error for the 95% nondirectional level of confidence often used in most analytic contexts. Thus, our CZband will be noted as: ±2.5%CZ. H&L applied their CZversion directly to the ratio: = ; in this research extension, we will use the ±2.5%CZscreen with a sensitive benchmark for the variables that we will use in this study. Point of Information. In the H&Lversion, the Left Hand Side[LHS] of the CZ indicated a Smoothing Event and Events in the Right Hand Side [RHS] were Provoking in Nature. In our recalibration, there are still Smoothing and Provoking aspects but the LHS and RHS of our recalibration are NOT synonymous with Smoothing and Provoking; rather, the LHS & RHS are CZpartitions for examining the symmetry of the ANNeffects.
5.2 The Relative Forecast Error: [RFE] For the RFE blocked by the Panelsize, first we compute the OLSRoneperiodahead forecasts for each of the four ANNversions. Note this as: [Forecast[ . Then, we compute the Average of these OLSRforecasts using only the three ANNmodified Panels: Note this as: AveF[B;L;E]. Thus, RFE[ ] computes for each [Firm & Account]variable:
RFE[ ] = [Forecast[ ] AveF[B;L;E]] / AveF[B;L;E]
The Sensitivity Context
RFE[ ] is: a relative percentage measure of directional divergence [ ; 0; + ] of the average of the ANNprotocolforecasts FROM a particular[j^{th}] forecast. The relative ratio magnitude benchmark is: AveF[B;L;E]. For the Panel j=BP there is no conditioning as AveF[B;L;E] is independent of BP.
Discussion The ±2.5%CZscreen for RFE[ ] is: [2.5% : +2.5%]. Point of Information:
This ±2.5%CZscreen is the directional difference of the numerator relative to a benchmarkingmagnitude. This will be the case for the RFE, RPE, and PERF. The RRP is : , thus this ±2.5%CZscreen is the same as is used by H&L and so a direct measure of H&LSmoothing or Provoking.
In addition, the filtercoding using the ±2.5%CZscreen will be used to create the profile of the percentage of Events of Interest that are outside the CZinterval: [2.5% : +2.5%] (i) on the LeftHandSide[LHS] i.e., < [2.5%] or (ii) on the RightHandSide[RHS] i.e., > [+2.5%]. Regarding, an a priori expectation of the profile of the LHS v. the RHS, the operative query asks: Is there a structural aspect of the formulation of RFE[ ] that would bias the directional difference between [Forecast[ ] AveF[B;L;E]] as benchmarked by AveF[B;L;E] to favor the creation of an event difference in magnitude outside the “Zone of No Interest” of ±2.5%CZ on the LHS [i.e., a relative negativevalue of the benchmarkedpercentage magnitude] or the RHS [i.e., a relative positivevalue of the benchmarkedpercentage magnitude]? In the OLSRcontext a forecastprotection is independent of its precision; thus, there is no indication that the RFE[ ] is conditioned in such a way to be biased to overpopulate the LHS or the RHS. In this case, for the [LHS v. RHS] populationprofiles, an a priori expectation of a balance of 50% seems reasonable. This is effectively driven by the directional component and assumes that the magnitude of the benchmark will be onbalance symmetric.
5.3 The Relative Precision Error: [RPE] For the RPE blocked by the Panelsize, first we compute the Precision^{iv} for the 95%CIs individually for each of the four Panels. Note this as: [Precision[ }]. Then, we compute the magnitudebenchmark of: The Average of the Precisions for the three ANNmodified Panels: Note this as AveP[B;L;E]. The computation for each [Firm & Account]variable is:
RPE[ ] = [[Precision[ ] AveP[B;L;E]] / AveP[B;L;E].
The Sensitivity Context
RPE[ ] is a relative percentage measure of directional divergence [ ; 0; + ] of the average Precision of the ANNprotocols FROM the j^{th} Precision as benchmarked. For the Panel j=BP there is no conditioning as AveP[B;L;E] is independent of BP.
Discussion The CZscreen for RPE[ ] is: [2.5% : +2.5%]. Regarding, the balance of the LHS & RHS, given the likelihood of a Smoothing bias [reported by H&L], we proffer that for the RPE[ ] there could be a bias as the ANNmodifications, according the H&L, have a Smoothingtendency. If this is the case and the magnitude benchmark is not associated with the directional partition, then the RPE[ ] }will likely be > the AveP[B;L;E], thus overpopulating the RHS of the CZscreen. Thus, in the case for RPE[ ], the a priori expectation is for RPE[ ] populating the RHS in comparison to populating the LHS. However, as for the precision comparisons for j= B, L or E there is no indication how Smoothing or the magnitude calibration would differentially impact the RPE[ ] so as to overpopulate the LHS or the RHS. For these indeterminate effects, the a priori expectation is a balance of 50%.
5.4 The Relative Ratio of Precisions: [RRP] For the RRP blocked by the Panelsize, the computation for each firm and for each Accountvariable is:
RRP[ ] = Precision[ ] / Precision[ ]
Where jBP, and is only BP.
The Sensitivity Context
RRP[ ] is: a ratio measure of magnitude values of the relative precisions. The PRR uses the metric and thus can be used to intuit Smoothing or Provoking tendencies as demonstrated by H&L.
Discussion In this case, the CZscreen for RRP[ ] is: [0.975 : 1.025] and is exactly the same as used by H&L. Further,
if the precision of a modified Panel were to be less than that of the BP, the relative ratio would be <1.0 and thus be located on the LHS of 1.0. This would then be the Smoothingevent. Thus, for the LargePanel, n=12 these results are those presented by H&L, and
if the precision of a modified Panel were to be more than that of the BP, the relative ratio would be >1.0 and thus be located on the RHS of 1.0—e.g., Figure 1. This would then be the Provokingevent. Thus, for the LargePanel, n=12 these results are also those presented by H&L and thus a recalculation check on their results.
The a priori expectation is that LHS for RRP[ ] will be overpopulated visàvis the RHS given the Smoothing tendency reported by H&L; thus, we will be able to examine the Smoothing and Provoking for the RRPvariate for the various Panelsizes tested.
5.5 The Precision Errors Relative to their Forecast: [PERF] For the PERF blocked by the Panelsize, the computation for each [Firm & Accountvariable] is:
PERF[ ] = Computation: [Precision[ ] Precision[ ]] / Forecast[ ]
Where: j = B, L & E
The Sensitivity Context
PERF[ ] is a relative percentage measure of directional divergence [ ; 0; + ] of the Precision of an ANNprotocol[ ] FROM the Precision[ ] as benchmarked by the Forecast.
Discussion Thus, the CZband for PERF [ ] is [2.5% : +2.5%]. Recalling that the MSEratio: is the Smoothing or Provoking measure and given the H&Lresults, the expectation is that the numerator will be negative > 50% of the time; thus, the a priori expectation is that LHS for RRP[ ] will be overpopulated visàvis the RHS. In this case, the directional designation will follow the Smoothing and Provoking partition; however, this will not only indicate the Nature: Smoothing or Provoking but also the Magnitude relative to the j^{th} forecast.
6. Vetting Expectations: Assurance of the Testing Analytics
6.1 Vetting Context In the current practice of statistical analyses, it is the usual case when the pvalue is the inferential measure that “a background” check of sorts is performed. This speaks to assuring those interested in the inferential results that the context of the modeling methods pertains to the expected population context. Thus, vetting is simply the analysis of the analytic protocol that address the assurance that the sample accrual and the expected population are insync. Vetting is thus a derivative of the first work of (Fisher: Sir R, 1925), Metaanalysis of (Rosenthal R, 1984), and MetaScience. See: (Aryal U et al., 2013) and (Fahey L, 2019).
6.2 Vetting Test Expectation: Low Associational Screen We expect that the association between the following three partitions will be Low. As the Pearson Product Moment correlation is sensitive to outliers, we have elected to use the Spearman rankorder correlation: [ —the estimate of Spearman’s ] as the association measure. The Low expectation VettingSets are:
{Spearman [RPE & RFE] ; Spearman [PERF & RFE] ; Spearman [RRP & RFE]}
Rationale In this case, for the nature of these computations, the correlationpairing of the Relative Forecasting Error with the three Relative Precision partitions would not be associated as they are formed from independent constructions. In this case, for a strong vetting test, we computed these Spearman tests block by Panel Sample size groups {Large Middle and Small}. We are using the Square of as the test measure. We find for these Blocked pairwise tests that the average of the nine values was: 0.0042. This is a strong indication that there is a lack of association in these variable pairs as proffered. Further, as an aggregatedresult, the pvalues for the Spearman Null were: [RPE&RFE[0.89]: RRP&RFE [0.59]: PERF&RFE [0.61]. Thus, the vetting is founded.
6.3 Vetting Test Expectation: High Associational Screen] As for the RPE, PERF and RRP, we expect that for these three correlation pairs, the Spearman association would be High. The reason for this is that these variables are Precision Measures that are formed from their directional disposition: Percentage Error or Relative Ratio, around the Precision[ ]. Thus, these “same” motion drivers should be highly Spearman correlated. In this context, we offer that high would be that the Spearman correlation would be higher that the Harmancutoff of [ ]. In this case, for the Spearman correlations blocked by Panelsize—i.e., for all nine values were > than [ ]^{v}. Further, as an aggregatedresult, the pvalues for the Spearman Null were: [RPE&PERF[<0.001]: RPE&RRP[<0.001]: RRP&PERF[<0.001] Thus, the vetting is founded.
6.4 Generalizability Recall the reason for these vetting tests is to explore the nature of the generalizability of subsequent inferential testing. The summary of these preanalysis vetting tests suggests clearly that the variables and the accruals are likely to form a useful sample from a population of nonrandom effects of firms traded on active exchanges.
Specifically, the analyst can draw and justify the inferences from the profile results that we are to offer in the profiling stage of the research report.
7. Testing Profiles: Events Relative to the “ComfortZone”
The information to be presented following is the impact direction and magnitude of the modifications on the Panels for the four test variables. We will block the profiles using the ±2.5ComfortZone impact–frontier for each of the four test variables.
7.1 The FPE & FNE Credibility of the Inferential Framework In this study, as is common practice in the required Best Practices context of inferential testing, we will offer a “reasonable” inferential testing context; reasonable means that if the sample size it too small it is difficult to find a pvalue where the False Positive Error [FPE] is rejected re: the a priori expectation. If, on the other hand, the sample size is too large most any minuscule Null difference will produce a pvalue that justifies rejecting the a priori expectation. We need to be in the GoldilocksZone—a sample size that is Just Right!^{vi}
7.2 Sample Accrual used in the ANNProtocol Assume we are interested in the most disaggregated nondirectional test of proportions [ToP] for twosampled populations as the practical benchmark. As is “standard practice” we used: FPE[90%[CV=1.645]] & FNE[75%[CV=0.675]] as a guide to determining the number of firms to accrue for the tests proposed. In this case, we assumed that Population[A] has a proportion of H%A and the other Population[B] has proportion of H%B. To form the sample accrual information in this twopopulation context, we used the following standard ToP sample size formula: See (Wang H et al., 2007):
Sample size =
In this case, to initialize the computations, we used as a typical proportionset for our context: [75% v. 85%]. We selected [75% & 85%] as they give identical results with their binarypartner of [25% & 15%]. This gives, more or less, a boundary range of a sample size of 170 per generalized ToPtesting partition. For our study, we will have three Panel Sizes [Large, Medium, Small]. Each Panel block will have four Panels modifications [BP, B, L & E]. Each one of these will have the four Measures {RFE, RPE, RRP, PERF}. Thus, there will be Three [4 x 4] [Tableaux] and overall 42 Cells: [[4 × 4 × 3] [2 × 3]]. Each cell will have 176 [22 × 8] or 7,392 [176 × 42] sample points estimated. However, as is usually the case, there were accrual “glitches” in the execution of the design. Specifically, not all of the eight accounts were fully populated in the BBTs using the exact account descriptors that we used in the study. Further, all of the variables were selected assuming that the “normal” values would be positive. However, there were a number of instances where the forecast or the lowerlimit of the 95%CI was not in the positive CCquadrant. We eliminated these as atypical. The final studyaccrual used was: 6,706 sample points: [Panels A:B:C]: {[[652 + 644 + 620] + [489 + 483 + 465 ]] ×2} or a short fall of about 10%.
8. Results: Impact of the ANNProtocol
8.1 Profiling of the Test Modifications We have decided to present the three following profile partitions as blocked on Panel Size.
Table 2 Panel Presentation of the ANNImpact
In these cases, the LHS is a filter that indicates the number of cases that were outside ±2.5%CZ meaning that these are the number of cases that were lower than 2.5% for the RFE, RPE & PERF—i.e., [<2.5%%] and for the RRP were lower than 0.975. A calculation will aid in clarifying the calculations in Table 2. For Panel A for the None[BP] & RFE cell, there were 163 instances created using the formula:
RFE[ ] = [[Forecast[ ] AveF[B;L;E]] / AveF[B;L;E]
Assume that we are using Table 1: [HSY Panel: Other Assets: Deferred Charges]. There are four (4) Panel forecasts: {BP, Both, Late & Early}. The computation for the None[BP] uses the following information:
Forecasts for [BP, B, L & E] respectively were: {1,214.45, 1,229.45, 1,249.51 & 1,194.38}
The relevant Average is: Ave[1,229.45, 1,249.51 & 1,194.38] = 1,224.45
Thus RFE[BP] is: [1,214.45 1,224.45] / 1,224.45.45 = 0.00816 0.816%
As 0.816% is > 2.5%, it is NOT counted as a LHS event, since it is still in the comfort zone. For the full dataset accrued of 163 given this blocking there were 18LHS events or 11.1% [18/163] the RFE&None[BP] cell.
Table 2: Panel A: LPLarge Sample, n = 12 Blocked by ANNProtocols
Mods 
RFE[n=163] 
RPE[n=163] 
PERF[n=163] 
RRP[n=163] 
None[BP] 
23[14.1%]RHS 18[[11.0%]LHS 
70[43.0%]RHS 3[[2%]LHS 
N/A 
N/A 
Both 
5[3.1%]RHS 11[[6.8%]LHS 
1[<1%]RHS 51[[31.3%]LHS 
4[2.5%]RHS 50[[30.7%]LHS 
12[7.4%]RHS 79[48.5%]LHS 
Late[n=11] 
8[4.9%]RHS 9[5.5%]LHS 
33[20.3%]RHS 26[16.0%]LHS 
1[<1%]RHS 28[17.2%]LHS 
6[3.7%]RHS 51[31.3%]LHS 
Early[n=2] 
17[10.5%]RHS 16[[9.8%]LHS 
40[24.5%]RHS 15[[9.2%]LHS 
7[4.3%]RHS 24[[14.7%]LHS 
10[6.1%]RHS 43[[26.4%]LHS 
Total 
53[8.2%]RHS 54[[8.3%]LHS 
144[22.1%]RHS 95[[14.7%]LHS 
12[2,5%]RHS 102[[20.9%]LHS 
28[5.7%]RHS 173[[35.4%]LHS 
Overall Total 
16.4%[107/652] 
36.7%[239/652] 
23.3%[114/489] 
41.1%[201/489] 
Table 2: Panel B: MPMedium Sample, n = 9 Blocked by ANNProtocols
Mods 
RFE[n=161] 
RPE[n=161] 
PERF[n=161] 
RRP[n=161] 
None[BP] 
14[8.7%]RHS 35[[21.7%]LHS 
84[52.2%]RHS 2[[1.2%]LHS 
N/A 
N/A 
Both 
14[8.7%]RHS 6[3.7%]LHS 
0[0%]RHS 62[[38.5%]LHS 
5[3.1%]RHS 71[[44.1%]LHS 
12[7.5%]RHS 94[58.4%]LHS 
Late[n=8] 
18[11.2%]RHS 7[4.4%]LHS 
44[27.3%]RHS 29[18.0%]LHS 
3[1.9%]RHS 41[25.5%]LHS 
3[1.9%]RHS 64[39.8%]LHS 
Early[n=2] 
13[8.1%]RHS 29[[18.0%]LHS 
44[27.3%]RHS 26[[16.1%]LHS 
4[2.5%]RHS 38[[23.6%]LHS 
8[5.0%]RHS 54[[33.5%]LHS 
Total 
59[9.2%]RHS 77[[12.0%]LHS 
172[26.7%]RHS 119[[18.5%]LHS 
12[2.5%]RHS 150[[31.1%]LHS 
23[4.8%]RHS 212[[43.9%]LHS 
Overall Total 
21.1%[136/644] 
45.2%[291/644] 
33.5%[162/483] 
48.7%[235/483] 
Table 2: Panel C: SPSmall Sample, n = 6 Blocked by ANNProtocols
Mods 
RFE[n=155] 
RPE[n=155] 
PERF[n=155] 
RRP[n=155] 
None[BP] 
31[20.0%]RHS 30[[19.4%]LHS 
132[85.2%]RHS 0[[0%]LHS 
N/A 
N/A 
Both 
15[9.7%]RHS 18[[11.6%]LHS 
0[0%]RHS 120[[77.4%]LHS 
3[1.9%]RHS 116[[74.8%]LHS 
3[1.9%]RHS 134[86.5%]LHS 
Late[n=5] 
16[10.3%]RHS 17[11.0%]LHS 
62[40.0%]RHS 52[33.6%]LHS 
2[1.3%]RHS 79[51.0%]LHS 
0[0%]RHS 98[63.2%]LHS 
Early[n=2] 
27[17.4%]RHS 21[[13.6%]LHS 
70[45.2%]RHS 36[23.2%]LHS 
1[<1%]RHS 68[[43.9%]LHS 
2[1.3%]RHS 90[[58.1%]LHS 
Total 
89[14.4%]RHS 86[[13.9%]LHS 
264[42.6%]RHS 208[[33.6%]LHS 
6[1.3%]RHS 263[[56.6%]LHS 
5[1.1%]RHS 322[[69.2%]LHS 
Overall Total 
28.2%[175/620] 
76.1%[472/620] 
57.8%[269/465] 
70.3%[327/465] 
9. Profile Testing: Illustrative Examples
9.1 Overview of Results To economize on the notation, we will refer to Table 2 [Panels A, B and C] as the Profile Tables. In the Profile Tables, we give the complete details of the research report useful for selecting inferential queries of interest. However, as a practical gestalt, the Null specifications are grossomodo, indicative of those that we proffered in the four sections where the variables were detailed: {5.2; 5.3; 5.4 & 5.5}. To indicate this “noninferential” correspondence, we have shaded & bolded the cells in Table 2 that are insync with the a priori specification. As a summary overview, the 95%Conifidence Intervals for the fourarms aggregated over the Panelsize Blocks are presented following:
Table 3 Overall Percentage Not in the CZscreen—i.e., Events of Interest
Summary 
Percentage 
LowerLimit 
UpperLimit 
RFE 
21.8% 
20.0% 
23.7% 
RPE 
52.3% 
50.1% 
54.5% 
PERF 
37.9% 
35.4% 
40.4% 
RRP 
53.1% 
50.5% 
55.7% 
Discussion To enhance the clarity of the information of Table 3 consider the following demonstration computation for the 95%CI of the RRP:
For the PanelArms we had 763: [201 + 235 + 327] RRP values that were not in the ±2.5%CZ—i.e., were variableevents of interest as bolded in Table 2. The total number of all events was: 1,437 [(163 + 161 + 155) × 3)]. This gives the Percentage average as 53.1% [763/1,437]. The Standard Error is: 1.32% . This gives the RRP:95%CI as: 53.1% ± 1.96 × 1.32%. [50.5% : 55.7%]
The important information as a summary of the overall results of all the testing presented in Tables 2 & 3 is: The simple ANNmodification protocol had a dramatic effect on the collection of sensitive percentage and ratio variables in this study.
In the computational context, however, we leave it, to the interested readers, to form inferential tests germane to their areas of interest. Following, as an illustration of possible analyses that we found useful in forming insights into ANNmodification effects as they impact the forecasting decisionmaking context. Indeed, many others are possible. These illustrations are offered to stimulate further analytic considerations.
9.2 Inferential Context An interesting aspect of the profiles presented in the Profile Tables is there appears to be a number of differences in the impact of the ANNprotocols as viewed thought the four variablescreens: {RFE, RPE, PERF & RRP}. To experientially test relationships of interest, readers of this research report can use the usual statistical tests applied to the Profile Table information as given; however, as these tests address relationships that are offered in the above tables, most of which do not have an a priori context, their pvalues are merely adhoc guidelines—useful to be sure— but bereft of a defensible statistical FPE or FNEcontext. With this as the inferential caveat, we will examine a few interesting relationships using the above profiling tables.
9.3 Test of Proportions[ToP]: Suggested Testing Vignettes For the RFE overall for the three samplesizes, we observe that there seems to be an interesting effect difference between the LP Sample size and the SP Sample size. This is a test of the percentage of time that the RFE was NOT IN the relative capture interval ±2.5%CZ. Specifically,
RFE:LP:[ :16.4%[107/652]] & RFE:SP:[ :28.2%[175/620]]
In this case, the ToP conservative or twotailed pvalue is << 0.0001 for the nondirectional test of; RFE:LP[16.4%] v. RFE:SP[28.2%]. This is a most interesting result. We know that forecasts for Small Panels are much more sensitive to perturbations or modifications compared to Large Panels. This lengthsensitivity “inversedominance” finding is worth further investigation. As an extended confirmatory insight for the RFEtest above, the order {LP: MP: SP} are in the expected confirmatory order {16.4% < 21.1% < 28.2%}.
9.4 RFE v. RPE Regarding the LHS v. RHS Sensitivity In scanning the information in Table 2 there seems to be an empirical divergence in the symmetry of the LHS and the RHS for Panels[A:B:C] for the RFE v. the RPE, In this case, it behooves the analyst to investigate this even though a priori , as noted in section 5.2 & 5.3, such an asymmetry was anticipated. In this case, the test is a vetting examination of sorts—but, non the less, is valuable. We began our examination with the None[BP] Partition for the Large Panel for the events only on the RHS. The operative information from Table 2 Panel A re: RFE[RHS] v. RPE[RHS] is:
RFE[RHS]:LP:[ :14.1%[23/163]] & RPE[RHS]:LP:[ :43.0%[70/163]]
In the case of the Large dataset, the exploratory question is: Are the RHSs of the RFE & RPE equally populated for the basic dataset of the Large accrual set—i.e., None:LP[BP]? The nondirectional pvalue for the Null of no difference for the above relationship [14.1% v. 43.0%] is: p<< 0.0001. This suggests that there seems to be a marked tendency for the RFE to be less RHSpopulated compared to that of the RPE. This is exactly the relationship expected given the nature of these variables.
As a followup, we then extended testing of this “RHS Tendency” of the RPE v. the RFE for the other two Panel sizes. These two tests are:
RFE:RHS:MP:[ :8.7%[14/161]] & RPE:RHS:MP:[ :52.2%[84/161]]
RFE:RHS:SP:[ :20.0%[31/155]] & RPE:RHS:SP:[ :85.2%[132/155]]
The pvalue for both are <<0.0001; this confirms the “inferenceindication” that we made from the Large dataset—to wit: the RHS seem less populated for the RFE compared that of the RPE as expected. This will now be further considered.
In this case, for a more complete analysis, the Chi2 is recommended as it is an inference tool that offers an overall pvalue for an extensive set of comparisons and not just the twosample context. The Chi2 indicates a departure from a “random” assignment relative to the Marginal[Row&Column] totals; thus, one dataset is directly benchmarked by another or another set of datasets. Further, there is a creative way to examine the inferenceimpact on a cellbycell basis, termed the Chi2 Cell Contribution [C2CC] due to (Tamhane A et al., 2000). In the Chi2 context, a significant pvalue indicates that the values in the cells are not expected given the marginal distributions—i.e., considering all of the data used in the classification matrix and, in that sense, are indicative of an “unexpected” classification result. Also, for the 2 x 2 category case there is an exact test pvalue test that is also directional called Fishers Exact Test to consider along with the usual Pearson probability measures.
Continuing with the RFE & RPE theme, here is the 8x4 classification matrix [Table 4] for the aggregated sample size partitions [{LP:MP:SP} {LHS:RHS} {RFE & RPE}]. This extended analyses can be easily formed using information in Table 2 [Panels:A:B:C]. For Table 4 the codex is as follows: The first cell number is the number of events in that cross[Row×Column]category; the second number in []s is the expectation given the marginal values. This expectation is formed as the cross values of the proportions allocated only using the Marginals relative to the total of the sampled events. For example, the expected number of events proportioned over the Row Marginals[RM] & Column Marginals[CM] relative to the Total for Cell[5,3] bolded is: 138.3 [(1002/1420) × 196]; the third number, also in []s, is the C2CCvalue. This is computed as follows: The C2CC is [The difference between the [Actual and Expectation] squared as a ratio to that Expectation]. Thus, for Cell[5,3] 2.69 [ ]. (Tamhane A et al., 2000[p.324]) suggest that any C2CC > 1.0 provides an indication of an interesting departure from the marginal proportional allocation and so would be of analytic interest relative to investigative inference.
Table 4 Overall Profile of the RFE & RPE Blocked on Size & LHS&RHS
Partition 
RFE 
RPE 
RM 
LP:RHS 
53[58.0][0.43] 
144[139.0][0.18] 
197 
LP:LHS 
54[43.9][2.34] 
95[105.1][0.98] 
149 
MP:RHS 
59[68.0][1.19] 
172[163.0][0.50] 
231 
MP:LHS 
77[57.7][6.46] 
119[138.3][2.69] 
196 
SP:RHS 
89[103.9][2.14] 
264[[249.1][0.89] 
353 
SP:LHS 
86[86.5][0.003] 
208[207.5][0.001] 
294 
CM 
418 
1002 
1420 
Discussion: The Pearson pvalue for this classification table is: 0.003 strongly suggesting that there are interesting departures from the expectations given the marginal information. These are easily identified by scanning Table 4 for C2CC for TDvalues > than 1.0. These five Cells are Bolded in Table 4. Alert: Recall, the inference context is for the Actual realizations relative to the Chi2Expectations. This is NOT the same as the ToP calculation. For example, the LP:RHS for the RFE relative to the LP:RHS for the RPE is not identified as a relationship of inferential interest in the Chi2context. However, the ToPtest for:
8.2% [53/652 ] v. 22.1% [144/652]
has a nondirectional ToP pvalue of <0.0001. This illustrates that the events using different inference tools give different results; thus, each test must be considered in their propercontext.
Now considering the Chi2 context, we observe that the balance between the LP, MP and SP partitions is different between the RFE and RPE; the divergence is leveraged by relative ratios for the RFE & RPE compared to the relative ratios overall for the Marginals [RM & CM]. Specifically, for the RFE there are usually less LHS events than expected and more RHS events than expected giventhe Marginal profiles of the RFE & RPE; and, of course visaverse for the RPE. For example, for the RFE:LHS:LP, the Actual number of events was: 54; the expectation was: 43.86: [149×[418/1420]]; and the C2CC was: 2.34 [ ]. The reason that this happens is that relative ratio of 54 to 95 [36.2% v. 63.8%] is “much” different from the relative ratio of 418 & 1002 [29.4% v. 70.6%]. Recall, the Marginal relative ratios are used to allocate the expectations to the cells. Therefore, for the RFE:LHS:LP only 43.86: [149 × 29.4%] is allocated as 29.4% is less than the actual relative ratio of 36.2%. Thus, as the RPE for the RHS is relatively dominate overall in percentage terms that means that usually less will be allocated to the RFE:LHS and more to the RFE:RHS; and, when these relative ratios are very different, the C2CC will be greater than 1.0. Given that the Chi2 Pearson pvalue is 0.003, the summary indication, is: There is strong evidence that: (i) the population of the LHSs and the RHSs of the RFE and RPE are not inferentially similar in proportion, and (ii) the propensity for the RHS to be greater in relative proportion compared to that of the LHS for the RPE compared to that of the RFE is the likely population state of nature.
9.5 Panel Length
Factoring out the Smoothing v. Provoking results blocked by Panel length by eliminating any RRP =1, we observe an interesting result for the RRParm.
The classification Grid for the Smoothing v. Provoking by Panel Length is presented in Table 5
Table 5 Panel Length visàvis Smoothing Propensity

Provoking 
Smoothing 
Total 
LP 
28[11.9] 
173[0.9] 
201 
MP 
23[1.9] 
212[0.2] 
235 
SP 
5[15.0] 
322[1.2] 
327 
Total 
56 
707 
763 
The numbers in the []s are the C2CC values. The Pearson pvalue is <0.0001. It is suggestive from this Chi2analysis that the Smoothing percentages are inversely related with the Panel sizes: LP:[86.1%[173/201]] MP[90.2%] & SP[98.5%].
10. Summary Insights and Extensions
10.1 Summary Using the LHS;RHS: screeningfilter: ±2.5%CZ, we have created a copious amount of information all of which is reported in Table 2 and summarized in Table 3. Further, we have offered vetting tests and exploratory analyses re: ANNreplacement issues at the heart of the study. Clearly, many more studies are enabled by the dataprofiles offered in Table 2. Thus, researchers can test relationships that they find interesting. However, as an overview, let us step aside from the statistical infrastructure that was the driver of most all the discussions and offer an en bref conversational summary:
The ANNprotocol had a dramatic directional & magnitude effect on the Forecast and the Precisionvariables [RFE, RPE, RRP & PERF] most all of which are insync with the H&L results. Specifically,
Filtering the ANNresults using the ±2.5%CZ screen, Smoothing seems to be the tendency for the LP, MP & SP Panels for the RRPvariable; further, there is initial suggestive evidence that Smoothing dominance increases as the Panel size decreases: Table 5. This is original information and also is a vetting of the H&L initial results,
Overall a uniformbalance between the LHS v. the RHS does not seem to be the case; there is confirmatory inferential evidence using Table 2 that the populating tendency for the LHS v. the RHS is “functionally” conditioned on the nature of the variables in play, and
The essential takeaway for this study: Using the CZImpact screen, the ANNmodifications tested over the variables: {RPE, RRP & PERF} are likely to have a dramatic directional & magnitude effect on the precision of forecasting CIs using Paneldata for firms traded on active exchanges. Of additional interest, regarding the ANNimpact on the oneperiodahead forecasts, the RFE does not seem to be biased re: the LHS v. the RHS. The support for this is that the overall percentage of CZevents in the LHS was 51.9% and had a pvalue of 43.4% against chance—a result clearly suggesting that the LHS & RHS balanceNull is the likely state of nature. These directional and magnitude ANNimpact results are interesting counterpoints to the conventional wisdom that simple outlier replacement protocols, such as the ANN, are inherently useful as they are in fact neutral relative to the Basic Panel. This research report shows that RobustinUtility for the ANNprotocol is NOT LIKELY to be the case for the variables tested.
10.2 Further Investigations The following are productive extensions of this research report:
One could examine Larger Panels n > 12, to evaluate the impact of ANNreplacements. A sample size of 12 for the OLSRfit may be close to the practical or reasonable limit of a time series. See (Adya M et al., 2016) who suggest that n = 13 is at the frontier of a reasonable limit for a time series in the Rule Based Forecasting context. We have reported that for smaller sample sizes there seems to be an increase in the Smoothing proportions. Perhaps the inverse may be the case. For large Panels, perhaps there is more of a balance between the Smoothing and Provoking effects.
Perhaps, it would be interesting to examine other basic replacement protocols such as: a Median or Averagereplacements,
We used the Excel[OLSR] to create the ANNvariable set. Another standard forecasting model is the Moving Average Model[MAM]: Excel[Data[DataAnalysis[MovingAverage]]]. It would be of interest to research the ANNimpacts on this model, and
We have taken up the analysis of the ANNprotocol—clearly the most basic replacement protocol. There are also expectation models that offer promise and could be investigated. These are called Missing Data Analyses. In this context, one proffers a missing data generating function and then constructs a likelihood dataset thus tacitly forming actual missing data replacements. See (Enders C, 2010).
Acknowledgments Appreciation are due to: Mr. John Conners, Senior Vice President, Financial Counseling, West Coast Region, AYCO for his generous philanthropy which funded the establishment of the John and Diana Conners Finance Trading Lab at the State University of New York College at Plattsburgh and the Bloomberg Terminals that were instrumental in this research. Further thanks are due to: Prof. Dr. H. Wright, Boston University: Department of Mathematics and Statistics, and the participants of the SUNY: SBE Workshop Series, in particular Prof. Dr. Kameliia Petrova: Dept. of Economics[Statistics] for their helpful comments and suggestions.
References
Adya, M., & Lusk, E. (2016). Time series complexity: The development and validation of a RuleBased complexity scoring technique. Decision Support Systems. <https://doi.org.10.1016/j.dss.2015.12.009>
Aryal, U., & Khanal, K. (2013). Sharing the ideas of Meta  science to improve quality of research. KUMJ, 11, 7577.
Benjamin, D^{*}.,   , and Johnson, V. E. (2018) Redeﬁne statistical signiﬁcance. Nat. Hum. Behav., 2, 6–10.
Benjamini, Y., & Hochberg,Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Royal Statist. Soc. [B], 57, 289–300.
Enders, C. (2010). Applied missing data analysis. Guilford Publications; Ltd.
Fahey, L. (2019). Getting to insight: the value and use of small data. Strategy & Leadership, 47, 2733. https://doi.org/10.1108/SL0320190034
Fisher, Sir R.A. (1925). Statistical methods for research workers. Oliver & Boyd.
Gaber, M., & Lusk, E. (2017). Analytical procedures phase of PCAOB audits: A note of caution in selecting the forecasting model. Journal of Applied Finance and Accounting, 4, 7684.
Hanke, J., & Wichern, D. (2003). Business forecasting. Upper Saddle River:NJ:USA Pearson: Prentice Hall.
Harman, H. (1960). Modern factor analysis. U. of Chicago Press. [Revision: ISBN13: 9780226316529]
Heilig, F., & Lusk, E. (2020). Forecasting confidence intervals: Sensitivity respecting paneldata pointvalue replacement protocols. Accounting and Finance Studies, 3, <http://dx.doi.org.10.22158/ijafs.v3n2p104>
Kim, J., Ahmed, K., & Ji, P. (2018). Significance testing in accounting research: A critical evaluation based on evidence. Abacus, 54, 524537. <https://doi.org.10.1111/abac.12141>
Rosenthal, R. (1984). MetaAnalytic procedures for social research. Sage Publications (Applied Social Research 1^{st} Edition: Series[v.6]).
SAS®. (2005). Statistics and graphics guide: JMP[6]®. SAS Institute Inc. ISBN 1590478169.
SAS®. (2014). Econometrics and time series analyses 2 for JMP®. SAS Institute Inc.
Tamhane, A., & Dunlop, D. (2000). Statistics and data analysis. Prentice Hall [New Jersey] USA
Tukey, J. (1977). Exploratory data analysis. AddisonWesley, ISBN13: 9780201076165.
Wang, H., & Chow, S.C. (2007). Sample size calculation for comparing proportions. Test for equality: Wiley encyclopedia of clinical trials. <https://onlinelibrary.wiley.com/doi/abs/10.1002/9780471462422.eoct005>
Appendix
We randomly sampled 22 organizations from the Bloomberg Terminals in the John & Diana Conner’s Finance Trading Lab at the SUNY:SBE: College at Plattsburgh. For each organization, we selected a Panel of yearly reported information starting 2005 through 2016. This created three forecasting Panels: {LP(n=12) & MP(n=9) & SP(n=6) }.
Table A1. Accrual Firms Tickers found on the BICSPlatform: Bloomberg Terminals
6758JP 
ACN 
AIR 
AXE 
BA 
BAE 
CVS 
EFX 
HSY 
HUM 
HYS 
JBLU 
LMT 
LUV 
RAD 
ROK 
SIE GR 
SNA 
SPGI 
SWK 
UTX 
WBA 
i^{} https://www.merriamwebster.com/dictionary/outlier?src=searchdictbox Accessed: 15Dec2020
ii^{} <https://support.sas.com/content/dam/SAS/support/en/books/freebooks/forecastingwithsas.pdf>
iii^{} [https://www.thehersheycompany.com/en_us/home.html. We took this information from the Bloomberg Market Navigation Platform [BBT^{}] https://www.bloomberg.com/quote/HSY:US.
iv^{} Recall, the precision relates to the 95% Forecasting Confidence Interval. Specifically, Precision is 50% of the width of the 95%CI or [50% x [UpperLimit – LowerLimit]]. In the example in the OLS section, it was: 857.2.
v^{} The Harmancutoff is taken from (Harman H, 1960) where in the Factor Loading context a Loading > guarantees a unique variableloading on a Factor. In addition, the square of a correlation coefficient is often used to calibrate the impact of a binary relationship in the Pearson Product Moment calculation domain. However, we are using as an “approximation” of the binaryimpact for the Spearman correlation.
vi^{} This is not a trivial issue; and it has been researched over the years. For those interested we recommend (Kim, J el al, 2018) who document the widespread fallacious indications reported in the commercial context. Additionally, an excellent and slightly more technical discussion are the articles published by a Collective of Concerned Statistical–Scientists: Lead Author, (Benjamin D et al, 2018) and also: see (Benjamini Y el al, Hochberg 1995). As a final note, it is very beneficial to compute the sample size using parametrizations for the FPE & the FNE because then usually it is only necessary to report the FPE and not the FNE derived from a judgmental αtestagainst.
Available
Online: https://iarconsortium.org/journalinfo/IARJBM