New Research is NOT Better Research

Introduction

The belief that “newer research is better research” is a teaching failure, not a scholarly position. Date of publication does not predict accuracy. Factors that affect accuracy include methods, outcome measurement sensitivity, appropriate statistical analysis, transparent synthesis, and many other aspects of the scientific research process. If you believe that only research published in the last 5–10 years is acceptable, then you must tell us what massive paradigm shift occurred that rendered all prior research inferior. Seriously, what happened ten years ago that made anyone publishing before that date look like an idiot, and everyone who has published since look like an innovative genius?

Do you question the theory of relativity (1905), the synaptic model of the nervous system (1906), the structure of DNA (1953), or the sliding filament theory (1954)? If you published a paper tomorrow on any of these topics, without additional education, would you claim to be more accurate simply because your publication is more recent?

The Fallacy

Appeal to Novelty (argumentum ad novitatem) presumes an assertion is superior or correct solely because it is newer. Sometimes new work is better, but not because it is new. New research may be better when prior research is used to identify opportunities for improved methods, when new technology becomes available at a price that can be integrated into a study, or when more sensitive outcomes or larger samples allow more rigorous analysis. Although you might expect iterative improvement to be common, it is not. As noted above, methods used over the last 50–70 years have been relatively similar.

Fallacy in Action

This bias is built into our systems. Teachers and professors assign papers, yet often accept citations only from the last 5 or 10 years. Instruction on using a research database often begins by setting a date filter. Capstone and thesis templates may require a fixed proportion of citations from the last five years. Textbook adoption policies privilege the “latest edition” over seminal sources, pushing instructors to mirror this bias in their syllabi. Unfortunately, many educators are not contributing to a potential solution, as few teach a valid method for appraising study methodology, and almost none address the fallacies of "appeal to novelty" or "recency bias."

Accreditation reinforces the pattern. Continuing-education approvers and accreditors often require references from the last 5 years for course approval (e.g., AOTA). Learning-objective templates may inquire about “current best evidence,” but then operationalize “current” as a recent time window rather than a criterion that includes all available studies from the past to the present. It is actually surprising how few approvers and accreditors are promoting evidence-based practice, and how many of those that do use inherently flawed policies.

Social media completes the loop. Influencer content norms built around “New study says…” reframe recency as authority, critics dismiss opposing evidence as outdated, and the rampant promotion of cherry-picked studies fails to acknowledge the full body of research from past to present.

These practices are likely narrowing the acceptable body of literature for convenience rather than accuracy. It is obviously easier to review, aggregate, and build conclusions from fewer studies. But it is still nothing more than complacency. It's analogous to using “levels of evidence” hierarchies to dismiss strong observational studies simply because RCTs are labeled “best.”

Caption: Research quality is determined by methods, not publication date.

Recency Bias

Restricting evidence to the last three to five or even 10 years does more than reduce the number of studies under review. It reshapes the sample toward what is fashionable at the moment. Funding calls, editorial priorities, media cycles, and product launches all concentrate attention on a narrow set of topics. Searches that are limited by date of publication harvest a trend sample, not a neutral sample, and the conclusions drawn from that slice often diverge from what the full body of research would show. (Further, more research often allows for more nuanced conclusions, which may be impossible to determine with a small sample or recent research studies).

This bias enters through several predictable channels. Funding agencies and journals favor themes that attract attention, which compresses near-term publications around the same idea. Industry cycles add momentum: new devices or branded methods spur sponsored or convenience studies that surge for a few years, then fade regardless of long-term value. Measurement practices drift as new tools make some outcomes easier to capture, reducing comparability with earlier work that assessed the same construct differently. Terminology also drifts; fresh labels and keywords replace older indexing terms, so date-restricted searches miss earlier, equivalent studies. Finally, citation behavior herds authors toward recently published studies to frame novelty, amplifying an illusion of consensus within short windows.

Note that not all of these channels are inherently negative unto themselves. A surge in interest in a topic may lead to a surge in research. However, it increases vulnerability to recency bias. The result is distortion at the level that matters: inference. Trend-heavy samples overestimate effects that align with the current fashion and underestimate additional evidence that may refine conclusions and improve accuracy. They can manufacture an apparent consensus around a hypothesis that evaporates when all studies are included. In fast-moving cycles, adjacent three- to five-year windows can even flip the apparent direction of an effect, not because the underlying phenomenon changed, but because more data added nuance to conclusions.

Research Then and Now

After assembling more than 150 reviews, I can confidently say there is almost no difference in the average quality of published research between 1980 and now, and most research from the 1960s and 1970s used similar methodologies (although publication standards were slightly lower). Across fitness, athletic performance, and physical medicine, study protocols from the 1980s, 1990s, 2000s, and 2010s often look similar: comparable sampling, blinding when feasible, standard reliability statistics, and familiar outcome measures. New tools exist, but high-cost technologies are rarely used at scale, and statistical practice today is often identical, sometimes worse, due to the flood of poorly conceived systematic reviews. In summary, include studies based on their merits. Publication date is not a proxy for quality or accuracy.

Try this experiment: Based on the quoted snippets from study methodologies alone, can you put these studies in chronological order?

“...Maneuvers used were forward bending, side bending left and right, and rotation left and right. Two evaluation sessions were held 13 days apart….The therapist performed under two conditions within each evaluation session: a "blind" and a "normal" condition…”
“…The therapists pressed on the subject's spine and then the mechanical device, the task being to match the stiffness of the back to 1 of the 11 stimuli provided by the device…Interrater reliability was evaluated with 2-way analysis of variance (ANOVA) intraclass correlation coefficient for a single rating (ICC…”
Each examiner identified the thoracic segment of maximal restriction, and also whether they were “very confident” or “not confident” in their finding. For all subjects combined, the examiners' calls were “poor”: intraclass correlation coefficient = .3110 (95% CI, .0458-.5358). In contrast, interexaminer agreement was “good” when both examiners were very confident: intraclass correlation coefficient = .8266 (95% CI, 0.6257-0.9253).
"The mobilization technique studied was the central posterior to anterior (PA) joint mobilization of the L3 vertebra. Reliability and accuracy data for the reference standard were collected over four time periods spanning 16 weeks. Intrarater reliability of the expert physical therapist for R1 and R2 joint forces was…"

How did you do? The examples were in chronological order. All studies were well done and of equal quality. Ironically, only the oldest study used “blinding”, and the 2010 study (not the most recent) arguably used the most “sophisticated” statistical analysis.

Gonnella, C., Paris, S. V., & Kutner, M. (1982). Reliability in evaluating passive intervertebral motion. Physical Therapy, 62(4), 436-444.
Chiradejnant, A., Maher, C. G., & Latimer, J. (2003). Objective manual assessment of lumbar posteroanterior stiffness is now possible. Journal of manipulative and physiological therapeutics, 26(1), 34-39.
Cooperstein, R., Haneline, M., & Young, M. (2010). Interexaminer reliability of thoracic motion palpation using confidence ratings and continuous analysis. Journal of chiropractic medicine, 9(3), 99-106.
Petersen, E. J., Thurmond, S. M., Shaw, C. A., Miller, K. N., Lee, T. W., & Koborsi, J. A. (2020). Reliability and accuracy of an expert physical therapist as a reference standard for a manual therapy joint mobilization trial. Journal of Manual & Manipulative Therapy, 1-7.

These citations are from our systematic review and course: Joint Mobilization and Manipulation: Palpation, Assessment, and Reliability

Caption: Don't believe "New Research Says....", instead believe "All the research in the field suggests..."

Common Objections, Brief Replies

“Your studies are outdated.” This is not a reasonable statement. Facts do not change. Many of our most important discoveries were made 100 or more years ago. Great research does not “go out of date.”
“Older studies had lower standards.” Sometimes, although you would need to start at least 60 years back as a benchmark. The larger issue was that publication criteria were less rigorous before about 1980.
“Methods have improved.” Sometimes, although this is rare in fitness, human performance, and physical medicine. Outcome measures, statistics, and designs have been fairly consistent.
“Science changes all the time.” No, it does not. Our understanding evolves slowly, usually becoming more nuanced as we accumulate evidence. The more evidence that has accumulated, the less conclusions are likely to change. After all, one new study is simply another data point added to a large body of evidence.
“Fields evolve.” Agreed. That is a better description of the scientific process than “changing all the time.” However, evolution usually does not change an idea from right to wrong; it adds context and nuance.
“A new study was published this year that says…” This is appeal to novelty. A better approach is to integrate that study into a review of all previous research. The hypothesis that accounts for all outcomes is likely the most accurate.

Practical Application

There are straightforward safeguards. Start with the full time range unless a clear safety concern, retraction, or true obsolescence justifies restriction. Subgroup by methodological congruence (population, comparator, outcomes, measurement) rather than by publication year. Only stratify by era when studies are very old (approximately 60 years or more) or when there is a documented discontinuity in definitions, diagnostic criteria, or measurement technology that compromises comparability. Use sensitivity analyses for one purpose only: to detect trend-slice artifacts. Compare conclusions from the full corpus with a recent-only slice; if conclusions drift, treat that as evidence of trend bias, not time-based superiority. Be attentive to changes in vocabulary and definitions across eras. Before searching and coding, create a robust list of synonyms, legacy labels, and operational definitions so older studies are not missed. Remember that data is just data. It is not trying to lie to you. Peer-reviewed and published studies have been evaluated by teams of well-educated individuals for decades. Your task is to develop conclusions that reflect all available evidence, not to select which research is “correct.” Data is data.

Brookbush Institute Systematic Review Guidelines:

Include all available peer-reviewed and published original research, regardless of publication date.

This principle may be considered the “anti-cherry-picking” guideline. Cherry-picking refers to the biased selection of research to support a predetermined assertion. However, any less-than-comprehensive approach, whether intentional or inadvertent, risks introducing selection bias. To minimize this bias and avoid the exclusion of conflicting data (confirmation bias), systematic reviews should include all relevant peer-reviewed and published original research, without restriction based on arbitrary quality ratings or oversimplified evidence hierarchies.

Review topics, not narrowly defined research questions.

Rather than starting with a predefined hypothesis, reviews should begin with a broad topic and allow conclusions to emerge from the full body of available evidence. This approach reduces hypothesis generation errors and confirmation bias by preventing early commitment to a specific claim that the researcher might subconsciously seek to validate.

Prioritize comparative research whenever available.

Intervention effectiveness is inherently relative. Comparative research is required to determine whether one intervention is more effective or reliable than another, and is essential for establishing best-practice recommendations. Additionally, comparative outcomes provide the data needed to refine probabilistic models used to optimize intervention selection. (See: ...Single Best Approach .)

Recognize that study design alone does not determine methodological rigor

Most traditional “levels of evidence ” hierarchies are flawed by the assumption that study design is the primary determinant of research quality. In reality, randomized controlled trials (RCTs), observational studies, and cohort studies may be well-designed or poorly executed. Further, the study design should align with the research question. For example, RCTs are ideal for comparing acute effects between previously studies interventions, and prospective cohort studies are useful for modeling risk over time, and retrospective observational studies are appropriate for assessing prevalence or historical patterns. A more defensible hierarchy considers the number and type of controls used—such as peer review, replication, blinding, and statistical analysis—rather than assuming intrinsic superiority of one study design over another. (See: Levels of Evidence .)

Apply a structured vote-counting method

Vote counting synthesizes directional trends across studies and is less susceptible to the distortions that may arise in meta-analysis. It avoids the compounding of unknown confounding variables and reduces the interpretive errors that arise from combining heterogeneous datasets into a single aggregate statistic. Importantly, vote counting must follow a clearly defined rubric, such as the one used by the Brookbush Institute:
- A is better than B in all studies → Choose A
- A is better than B in most studies, and additional studies show similar results between A and B → Choose A
- A is better than B in some studies, and most studies show similar results between A and B → Choose A (with reservations)
- Some studies show A is better, some show similar results, and some show B is better → Results are likely similar (unless there is a clear moderator variable such as age, sex, or injury status that explains the divergence)
- A and B show similar results in the gross majority of studies → Results are likely similar.
- Some studies favor A, others favor B → Unless the number of studies overwhelmingly supports one side, results are likely similar.
This method avoids reliance on null-hypothesis significance testing across pooled data (i.e., meta-analysis) and instead identifies the most probable trend in the literature.

Be cautious with meta-analyses (MA)

While meta-analyses can provide useful aggregate statistics, they should not be interpreted as inherently superior to trend-based synthesis (the methods of systematic review mentioned above). Potential problems with meta-analyses include aggregating effect sizes (averages of averages), which may obscure clinically meaningful patterns. Combining heterogeneous studies increases the risk of confounding. Failure to reject the null hypothesis in a meta-analysis may reflect regression to the mean or methodological flaws, rather than a true lack of difference. Meta-analyses should never be elevated above consistent trends demonstrated by direct, well-controlled, comparative research. For more on this topic, check out: Meta-analysis Problems: Why do so many imply that nothing works?

Comments, critiques, and questions are welcome!