Getting started with AI for Researchers | AI Competency Centre
Getting started with AI for Researchers
The AI Competency Centre can help Oxford researchers use AI to unlock their research potential.
What is the role of generative AI in research
Generative AI is distinct from other Artificial Intelligence by its reliance on Large Language Models which are general semantic engines that can produce and transform text based on semantic properties. It is represented by general purpose tools such as ChatGPT or Claude or specialised tools such as Elicit or Cursor.
Generative AI can play three roles in the research process:
- Productivity and ideation assistant (personal companion)
- Research tool (from text analysis to coding)
- Research subject (understanding the behaviours of Large Language Models in particular contexts)
Three roles of generative AI in the research process
Some of the examples of uses researchers can put generative AI to include:
Productivity Assistant
- Grant writing and proposal development
- Literature review summarisation and synthesis
- Email drafting and administrative communication
- Converting between different document formats and styles
- Ideation and brainstorming for research directions
Research Tool
- Qualitative data analysis and thematic coding
- Text analysis across medium-sized corpora
- Programming and code development
- Converting between different data modalities (text to structured data, images to text descriptions)
- Multi-language transcription and translation
Research Subject
- Evaluating AI system performance and capabilities
- User experience studies with AI interfaces
- Disciplinary impact assessment of AI technologies
- Developing evaluation frameworks for AI outputs
Tools
Tool Categories
Personal Ideation and Productivity Assistants: Tools for general-purpose thinking and writing support such as chatbots - for instance, ChatGPT, Gemini or Claude or an LLM-powered notetaking tool such as NotebookLM.
Specialized Research Tools: Advanced capabilities for literature analysis, coding, and custom workflows. These may include a literature review tool like Elicit or a dedicated data analysis tool such Archive Studio. Many of these tools have to be developed specifically for individual research projects.
University-Supported Platforms
ChatGPT Edu: This is the University's version of ChatGPT with data protection. It includes the new reasoning models (GPT-5 Thinking) that can serve as an ideal ideation partner. You can build custom GPTs for specific research tasks. ChatGPT also contains a feature called Advanced Data Analysis which uses a Code Interpreter (a virtual machine inside a chat that can run Python code). This means you can upload a data set and ChatGPT will be able to analyse it by writing and running appropriate Python code.
Google Gemini: The main strength of Gemini are its models that can take audio, video, and images alongside text, and can work with massive amounts of context - up to a million tokens (about 500 words of English text) at once. This means you can ask for insights on a large corpus of data, multiple video transcripts or even video and audio recording.
Claude: Demonstrates strong performance in extended analytical tasks, particularly effective for identifying patterns and inconsistencies across large textual corpora. Note that Claude lacks the enterprise data protection agreements available through University-supported tools.
Google AI Studio: This is a platform for developers to test Google’s models but it provides free access to Google's best models with full control over how they work. If you want to experiment with what's currently possible without paying anything, this is the best option available.
Microsoft 365 Copilot: The main strength of Copilot is its integration with Exchange, Word, Excel, PowerPoint and other Office apps.
Specialised Research Applications
Elicit: Built specifically for assisting with literature reviews. Elicits accesses academic literature (either via scholarly databases or using files uploaded by researchers).
NotebookLM is another tool in the Google LLM family of products. It allows you to collect a large number of documents and interrogate them. You can combine your notes with notes generated by the Gemini Large Language Model.
Archive Studio is an example of a narrowly specialised tool powered by Large Language Models. It is designed to process archival data through various workflows.
Coding Tools
Many people use chatbots to write computer code but there are also many specialised tools that make it easier to manage an entire code base.
GitHub Copilot: An extension to VS Code that can autocomplete code, generate code or even write an entire application from scratch. GitHub Copilot is a paid service but educators and students can apply for a free tier.
Cursor: An LLM-powered integrated code editor (clone of VS Code) that is preferred by many professional programmers to GitHub Copilot.
Google Colab: Google’s hosted implementation of Jupyter Notebooks now also includes AI-assisted code generation powered by the Gemini model. It can write or explain code in the notebook or create entire notebooks. Colab has a very generous free tier.
Planning for using generative AI in research projects
The needs of individual research projects are often too extensive and specialised to be covered by the centrally provided tools. Increasingly, we recommend that researchers should consider building AI costs into project funding:
- API access budgets (£5,000-£10,000 for substantial projects)
- Custom orchestration development costs
- Fine-tuning budgets for specialised model adaptation
- Tools needed "intensively for three months" rather than continuous use
The University's OpenAI API access programme can cover some of the costs for early stages of custom development but not for extensive projects.
Tips
Understanding AI Capabilities and Limitations
The Semantic Baseline: Modern large language models can be thought of as primarily semantic information processors. They can identify themes, extract structured information from unstructured text, and convert between different representational formats with remarkable accuracy.
Context is Everything: LLMs process information differently from humans. Where researchers rely on accumulated knowledge and mental schemas, AI models work entirely from textual context provided in each interaction. Understanding this distinction is crucial for effective prompt design.
Inference vs. Orchestration: Advanced AI capabilities result from two components:
- Inference: The model's core reasoning abilities
- Orchestration: How applications structure and sequence model interactions
Chatbots or tools like Elicit are one type of orchestration of Large Language Models but researchers often need to design custom orchestrations for language models to fit their workflow.
Context Engineering: The Core Challenge
Understanding Context Limitations: Large language models can only generate text based on the context provided in each interaction. Unlike humans who draw upon accumulated knowledge and mental schemas, language models work entirely from textual input within their context window.
The Context Window Challenge: While some models can now process up to a million tokens of context, researchers must strategically engineer what information to include. This represents the primary technical challenge in applying language models to research contexts.
Context Compression Strategies: When working with extensive research materials, consider creating structured intermediary representations:
- Standardised document summaries using consistent formats
- Systematic extraction of key claims and evidence
- Normalised abstracts that facilitate cross-document analysis
Methodological Considerations
Avoiding Completeness Illusion: Large language models are capable of sophisticated semantic understanding but may miss systematic elements. For exhaustive analysis (counting instances, finding all examples), use structured approaches rather than relying on single-pass generation.
Reproducibility and evaluation
Indeterminacy: Large Language Models are known for not producing exactly the same output based on the same prompt. This means that it is important to keep a record of past interactions.
Replicability: When reporting on use of Large Language Models in research, it is important to report on:
- The exact model version used
- The orchestration used (chatbot, custom application)
- The prompts used
- Dates of the interaction (this is particularly important when conducting experiments using chatbots such as ChatGPT or Claude - providers often release new versions of models.
Support from AI Competency Centre
Direct Consultation: Request project-specific guidance through our Expression of Interest form. Our research software engineers and AI consultants provide technical guidance for implementing AI in research workflows.
Training Programmes: Attend our workshops designed specifically for research contexts. View upcoming training sessions or request custom training for your research group.
Community Engagement: Join the AI Builders User Group (BUG) for technical discussions about building AI-powered research tools, or the Generative AI Special Interest Group for broader discussions about AI in academic contexts.
Additional Resources
Tool Onboarding: Access our comprehensive onboarding guides for ChatGPT Edu, Google Gemini, and Microsoft 365 Copilot to ensure optimal configuration for research use.
Beginner Foundation: If you're new to AI, start with our Generative AI for Beginners guide before diving into research-specific applications.
Funding and Collaboration: Explore opportunities through initiatives like the AI Teaching and Learning Exploratory Fund for research projects investigating AI in academic contexts.
Getting Started Recommendations
- Begin with low-stakes experimentation: Use AI for preliminary literature reviews or grant application drafts before moving to core analytical work.
- Develop your evaluation intuition: Spend focused time with frontier AI models to understand their capabilities and limitations in your specific research domain.
- Document your methodology: Maintain clear records of how you used language models or other AI tools as part of the research process - especially when used as analytic tools.
- Engage with the community: Share experiences and learn from colleagues through our online communities and training programmes.
The key to successful AI integration in research lies not in replacing scholarly judgement, but in augmenting analytical capabilities while maintaining rigorous methodological standards.
Policy & Guidance
The University of Oxford wishes to enable and support the safe and productive use of GenAI by the Oxford research community. A policy for the use of GenAI in research has been laid out by the University to ensure the responsible use of GenAI in research and to provide clear guidelines for its integration into the research process.
Users are advised to interpret the use of GenAI in the context of research practice standards on transparency, rigour and respect; data protection, intellectual property and export control legislation; and information security and compliance guidance.
Researchers should refer to the University's Policy for using Generative AI in Research when considering the use of a GenAI tool in their work.