• Search by category

  • Show all
Organised boxes filled with colour coded folders and paper archives showing a physical system for storing documents and research materials | Niche

Scientific Documentation: Mindful Archives vs. Instant Access

July 1, 2025

The way I have stored and accessed data over the past 40 years has changed markedly. In the 1980s, I mostly managed ‘information’ in the form of numbered photocopies of book chapters and journal articles that I indexed in a small brown book (I still have it somewhere). Most of the data for my PhD was first kept in lab books and later, on floppy disks, with capacities measured in kilobytes. It is now terrifying to think that all my crucial information was stored locally, difficult to duplicate, and vulnerable to loss. The 1990s introduced optical media like CD-ROMs, which meant that you no longer had to search the literature by Index Medicus [1]. Hard drives were larger (no longer 5-10 Mb) but I still had an ever-growing number of photocopies (indexed in that same book) and data remained in my lab books – hard drives were notoriously fragile.

The early 2000s marked the rise of super-large hard drives and broadband connections. Search engines like Google transformed how we accessed knowledge. Personal information management began shifting toward online retrieval rather than offline archiving. By the 2010s, solid-state drives and mobile devices offered rapid, portable storage, but cloud services became dominant for both personal and business data. Enterprise systems embraced Big Data, managing petabytes of structured and unstructured information. In the 2020s, information storage morphed to largely cloud-native systems. While this offers immediacy and constant updates, it also raises concerns about data privacy, knowledge permanence, and cognitive offloading.

The fact that the evolution in the storage of information/data occurred simultaneous to the digitisation of our lives reflects not only technological progress but fundamental shifts in how we perceive information. The rise of search engines and cloud-based services has made it possible to access information almost instantaneously.

Data Deluge and Cognitive Offloading

Modern professionals contend with unprecedented volumes of information. In 2011, IDC estimated that the digital universe was doubling every 2 years [2], a trend that has continued unabated. My own collection of information has also grown exponentially. It now consists of photocopies, photographs, pages torn out of magazines and a host of information in electronic formats maintained on external hard drives and online – my book is obsolete (nut many boxes of paper remain - see above). Many people are the same as me in that they externalize memory and knowledge management to digital devices and systems —termed “cognitive offloading” because it reduces your cognitive load, allowing individuals to focus on problem-solving rather than memorization [3]. And yet, many professionals, researchers, and knowledge workers (particularly from my era) maintain personal offline archives — folders of articles, images, datasets, and notes — meticulously organized by subject to support their intellectual projects.

The process of managing your information was (temporarily) made a lot easier by the introduction of Google Desktop (sadly stopped in 2011) [4], a computer program with desktop search capabilities that allowed text searches of a user's email messages, computer files, music, photos, chats, web pages etc. Things may have turned out very differently if Google had continued to develop this product – imagine how adding a large language model agent to Google Desktop would have changed how you might interact with your self-curated library of information. How and where information is stored affects not just access but also the ways people think, reason, and make decisions. The advent of online search engines like Google introduced a form of distributed cognition, where access to external information becomes an extension of human memory and reasoning [5]. Although online searches provide speed and breadth, I find that they often fracture the continuity of my thought processes. Offline, subject-sorted storage systems, by contrast, reflect my conceptual model of topics and how they interrelate, preserving a consistent intellectual framework that I have revisited and expanded over the years.

Offline and Sorted

Maintaining personal archives of digital information in offline directories organised by subject works for me and offers several distinct advantages. First, it promotes alignment between my thought process and the way I structure information. Scholars and strategists alike appear to develop conceptual taxonomies unique to their fields of interest. Storing information in a manner that reflects these frameworks supports deeper engagement with the material and fosters the development of cohesive, internally consistent models of knowledge [6].

Offline storage enhances source integrity and locks the information ‘as it was’ when I first absorbed it. By archiving original articles, datasets, and images, it is possible to verify information provenance and track intellectual debts. In the context of academic publishing or regulated industries like pharmaceuticals, this can be critical for ensuring compliance with ethical and legal standards [7]. I also find that my curated offline archive reduces my exposure to irrelevant, distracting, or contradictory information when I am working on a specific subject. Online searches (when not using AI) will often yield tangentially related content or commercial results prioritised by someone else’s algorithms, which diverts attention and fragments my cognitive effort [8]. In contrast, subject-sorted directories contain only the materials one I deemed valuable, fostering a more coherent and distraction-free intellectual environment. My information archive also offers a certain longevity and autonomy that remains accessible independent of internet connectivity, subscription services, or changes to external platforms. This is particularly valuable when working with sensitive or proprietary data.

Nevertheless, offline storage has disadvantages. Chief among them is the risk of information obsolescence. Without regular updates, personal archives can become repositories of outdated knowledge, especially in a fast-moving field like science and medicine [9]. The burden of organizing, updating (and deleting), and backing up your archives requires time, discipline, and technical literacy. Hardware failure, theft, or human error can lead to data loss unless you maintain robust backup protocols [10]. And this is the main reason that I have been moving my archive online.

Search Speed and Serendipity

Online search engines offer compelling advantages that have transformed the way people engage with information. The immediacy of search results enables rapid access to the most recent, comprehensive, and globally sourced data. This facilitates intellectual agility, allowing knowledge workers to update their understanding in real time. In addition, it also increases the potential for serendipitous discovery [11]. Unlike curated offline archives, online searches expose you to unexpected perspectives, adjacent fields, and novel methodologies. Such chance encounters can spark innovation by challenging established assumptions and expanding your intellectual horizons [12].

Reliance on online search reduces storage management overheads and you don’t need to classify, back up, or curate large numbers of information sources, retrieving only what you need when you need it. However, search results are mediated by algorithms that shape the information it provides based on commercial interests, personalisation algorithms, and opaque relevance criteria [13]. This skews your information landscape, privileging certain viewpoints while marginalising others. Moreover, it can sometimes be difficult to trace the provenance and reliability of online content, raising challenges for evidence-based decision-making [14]. Let’s not even get started on the potentially confounding impact of artificial intelligence [15].

I have found that the fragmented nature of online information retrieval, particularly through Google or PubMed searches, disrupts the continuity of my thought processes. Each search generates a fresh set of results, often disconnected from my existing mental model. The cognitive reshuffle this requires tends to impede the development of coherent, long-term intellectual projects, with the context-switching and information filtering increasing the cognitive load [16]. I am also a little worried that reliance on online search engines fosters what I might all digital transience. For example, people who believe information is readily accessible online are less likely to remember the content itself, relying instead on knowledge of where to find it [16]. I am sure that like me you also have a host of ‘Bookmarks’ on your browser. While this can free cognitive resources for higher-order reasoning, it also risks superficial engagement with complex topics and bookmarks don’t stay valid forever. This is one of the criticisms levelled at the use of AI to create outputs [17].

Cognitive and Epistemic Implications

The choice between offline, subject-sorted storage and online searching carries significant cognitive and epistemic implications. Offline archives support knowledge consistency and epistemic coherence by preserving the structure of an individual’s conceptual framework. This aligns with cognitive load theory, which posits that reducing extraneous cognitive processing enhances learning and problem-solving [18]. Conversely, online searches prioritise information currency and epistemic diversity, facilitating exposure to new data but potentially disrupting mental continuity. Although this may increase cognitive flexibility and adaptive reasoning, it can also dilute the integrity of long-term intellectual projects.

In moving my files to online storage I believe that I am adopting a hybrid approach to knowledge management where I keep the information storage structure I have devised over the last 40 years, while benefiting from cloud-based resilience, ensuring data availability even in the event of device failure. Collaborative tools facilitate information sharing across geographically dispersed teams. Sadly, I have yet to adopt the tools such as reference managers (e.g., Zotero, Mendeley) and personal knowledge management systems (e.g., Obsidian, Notion) that can integrate both approaches, combining the stability of offline archives with the dynamism of online discovery. This old dog has not been able to learn these new tricks (yet?) [20]. Continually reviewing the scientific landscape, I am daily adding ‘interesting’ articles, photos and web pages to my electronic library.

Conclusion

Over the past decade, the volume of data created and consumed globally has grown at an extraordinary, unprecedented pace. In 2016, the total amount of data generated worldwide was estimated at around 12 zettabytes (ZB). By 2020, this figure had risen to 64.2 ZB, representing a fivefold increase in just 5 years. This trend has continued, with global data creation reaching an estimated 147 ZB in 2024, and projections suggesting it will surpass 181 ZB by 2025 [21]. A widely cited statistic from IBM Marketing Cloud noted that 90% of the world’s data had been created in the preceding 2 years [22] and that enterprise cloud storage alone had surged from 10 ZB in 2023 to an expected 20 ZB by 2027 [23].

In navigating the information-rich environment of the digital age, we need to balance the advantages of curated, subject-sorted offline archives with the immediacy and breadth of online search engines. Structured self-storage fosters thought process alignment, source integrity, and intellectual continuity, while online search delivers current, diverse information and serendipitous discovery [11]. Each method entails cognitive, practical, and epistemic trade-offs. No single approach is universally optimal, particularly as the size of your collection grows, diluting the amount you can genuinely engaged with any single source. Here I believe I can give you one valuable piece of advice in the form of a combination of quotes:

Life is too short for too much stuff.

Clutter is nothing more than postponed decisions.

Downsizing our lives is not a step back but a leap forward.

References

  1. Hardman, TC. (2018). Do you remember Index Medicus? Ramblings of an old fool. https://www.linkedin.com/pulse/do-you-remember-index-medicus-ramblings-old-fool-tim-hardman/
  2. Gantz, J, Reinsel, D. (2011). Extracting Value from Chaos. IDC.
  3. Risko, EF, Gilbert, SJ. (2016). Cognitive Offloading. Trends Cogn Sci 20(9), 676–688.
  4. Wikipedia. Google Desktop. https://en.wikipedia.org/wiki/Google_Desktop
  5. Sparrow, B, et al. (2011). Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips. Science, 333(6043), 776–778.
  6. Apostel, L. (1972). Interdisciplinarity: Problems of Teaching and Research in Universities. OECD.
  7. Committee on Publication Ethics. (2021). Guidelines on Good Publication Practice. https://publicationethics.org
  8. Carr, N. (2010). The Shallows: What the Internet Is Doing to Our Brains. W. W. Norton & Company.
  9. Nickerson, RS. (1998). Confirmation Bias: A Ubiquitous Phenomenon in Many Guises. Rev Gen Psychol 2(2), 175–220.
  10. Kissel, R, et al. (2007). Guidelines for Media Sanitization. NIST Special Publication 800-88.
  11. Hardman TC (2025). Artificial Intelligence vs. PubMed. https://www.linkedin.com/pulse/artificial-intelligence-vs-pubmed-tim-hardman-trowe/
  12. Merton, RK, Barber, E. (2004). The Travels and Adventures of Serendipity. Princeton University Press.
  13. Pariser, E. (2011). The Filter Bubble: What the Internet Is Hiding from You. Penguin.
  14. Lewandowsky, S., et al. (2017). Beyond Misinformation: Understanding and Coping with the “Post-Truth” Era. J Appl Res Mem Cogn 6(4), 353–369.
  15. Peterson AJ. (2025). AI and the Problem of Knowledge Collapse. AI and Society, arXiv:2404.03502
  16. Ophir, E, et al. (2009). Cognitive Control in Media Multitaskers. Proc Nat Acad Sci 106(37), 15583–15587.
  17. Gong C, Yang Y. (2024). Google effects on memory: a meta-analytical review of the media effects of intensive Internet search behavior. Front Public Health 18;12.
  18. Chandler, P. Sweller, J. (1992) The split-attention effect as a factor in the design of instruction. Br J Educat Psychol (62): 233–246.
  19. Sweller, J. (1988). Cognitive Load During Problem Solving: Effects on Learning. Cogn Sci 12(2), 257–285.
  20. Hardman TC. (2025). The curmudgeon conundrum. https://www.linkedin.com/pulse/curmudgeon-conundrum-tim-hardman-sdv1e/
  21. Statista (2024). Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2023, with forecasts from 2024 to 2028. https://joingenius.com/statistics/data-generated-per-day/
  22. Duarte F (2025). Amount of Data Created Daily (2025). https://explodingtopics.com/blog/data-generated-per-day#
  23. Telecoms (2024). Report finds hyperscale data centre capacity doubling every four years https://www.telecoms.com/public-cloud/report-finds-hyperscale-data-centre-capacity-doubling-every-four-years

About the author

Tim Hardman
Managing Director
View profile
The Managing Director of Niche Science & Technology Ltd., a 30+ person bespoke services CRO based in the UK, Dr Tim Hardman founded the company in 1998. With over 40 years of experience in clinical research, Dr Hardman is highly regarded for his expertise in translational science, clinical pharmacology, and the strategic design and implementation of clinical studies. Dr Hardman began his career with a solid foundation in pharmacology, earning his doctorate in the field and gaining early experience in academic and clinical research settings. His career path saw him working in the field of regulatory science, where he developed a deep understanding of clinical trial design, data interpretation, and regulatory requirements across various therapeutic areas. Dr Hardman’s expertise spans early-phase studies, first-in-human trials, and advanced regulatory submissions, helping numerous clients bring innovative therapies from concept to clinical reality.

Social Shares

Subscribe for updates

* indicates required

Related Articles

Get our latest news and publications

Sign up to our news letter

© 2025 Niche.org.uk     All rights reserved

HomePrivacy policy Corporate Social Responsibility