ChatGPT vs Perplexity vs Gemini: Comprehensive Evaluation of Academic Research Capabilities

ChatGPT, Perplexity, and Gemini - these three popular AI products offer distinct deep research capabilities: some prioritize extreme accuracy, while others focus on efficient output. But which one is the best tool for academic research? This article reveals the answer through a systematic evaluation across six dimensions.

Introduction

Let's first look at the three major AI products selected for this evaluation, their deep research capabilities, and pricing models:

ChatGPT (OpenAI) Perplexity Gemini (Google)
Base Model OpenAI o3 model Proprietary model, not publicly disclosed 2.0 Flash Thinking (experimental version)
Methodology Uses the o3 model's advanced reasoning and multi-step research capabilities to break down complex queries into multiple steps, automatically browse the internet, retrieve, analyze, and synthesize information from various sources including text, images, and PDFs, ultimately generating structured reports. Performs multiple searches, reviews hundreds of sources, simulates expert-level research techniques, dynamically adjusts search strategies, evaluates source reliability, and generates detailed exportable reports. Relies on the 2.0 Flash Thinking model, combining speed and performance, excelling in science and mathematics, demonstrating complex problem-solving thinking. Also deeply integrates with Google services (like Docs, Sheets, Gmail), processes complex queries, and generates detailed research reports.
Pricing Model Pro subscription: $200/month, includes 120 deep research queries. Plus subscription: $20/month, includes 10 deep research queries. Team, Edu, Enterprise subscriptions: Deep research feature available, specific quotas not published. Free users: Up to 5 deep research queries daily. Pro users: 500 deep research queries daily. Free users: Initially available to premium subscribers at $20/month, now free to all users with monthly trial limits. Pro users: Enjoy additional features including more requests and longer context windows.

Evaluation Framework

We established a professional assessment framework with quantitative analysis across six core dimensions:

Standardized Test Task

  • Task: Generate a comprehensive report on how deep learning is transforming financial markets
  • Requirements: Use critical analysis to extract insights, cite sources categorized by credibility (peer-reviewed articles, industry reports, etc.)
  • Data sources: Extract information from multiple formats including data sources and visualizations, with identification and interpretation of charts and graphs
  • Comparative analysis: Summarize key trends, compare different research methodologies, and highlight contradictions in findings
  • Format: Use APA citation format (including proper in-text citations), structured paragraphs, bullet points, and tables to present insights, maintaining a professional and formal tone

Prompt

Generate a comprehensive report on how deep learning is transforming financial markets. Use critical analysis to extract insights. Provide sources categorized by credibility (peer-reviewed, industry reports etc.). Extract insights from multiple formats, such as using data sources and data visualizations, ensuring that charts/graphs are identified and interpreted. Summarize key trends, compare methodologies across different studies, and highlight contradictions in findings. Use APA citation format, including properly formatted in-text citations. Present insights using structured paragraphs, bullet points, and tables. Provide a graphical summary if possible and maintain a professional and formal tone.

We input this prompt simultaneously into all three models and initiated deep research.

After 30 minutes of synchronous testing, the three models showed significant differences across six dimensions. Using a 5-point scale, we scored each model's specific performance. Here are the detailed evaluation results.

01.

Evaluation Models and Experimental Design

Evaluation Framework

We established a professional assessment framework with quantitative analysis across six core dimensions:

  1. Response capability: Report length/generation speed
  2. Content structure: Chapter organization/logical rigor
  3. Reference quality: Source authority/data timeliness/citation transparency
  4. Analysis depth: Critical thinking/perspective comparison/case support
  5. Writing quality: Academic language/technical description presentation
  6. Writing standards: APA standard execution/in-text citation compliance
02.

Test Results Comparison

We input this prompt simultaneously into all three models and initiated deep research.

After 30 minutes of synchronous testing, the three models showed significant differences across six dimensions. Using a 5-point scale, we scored each model's specific performance. Here are the detailed evaluation results.

1. Response Capability

  • ChatGPT's output was extremely detailed, with a total report of 23 pages, including approximately 21 pages of main content plus 2 pages of references
  • Perplexity's report was shorter, about 10 pages with 16 references at the end
  • Gemini's output was between the two, with a total of about 22 pages, including approximately 13 pages of text report plus multiple pages of tables and rich references, totaling about 17 pages

Perplexity and Gemini generated reports much faster than ChatGPT. From the results, this appears proportional to the length of the output content.

2. Content Structure

  • ChatGPT — Introduction → Topic-based discussion → Summary: Starting with an introduction, then discussing algorithmic trading and high-frequency trading using target detection algorithms. The report used themes and topics as headings, such as portfolio optimization and financial forecasting. Under each topic, it explored outputs, market predictions, etc. The report concluded with trends, challenges, and regulatory perspectives.
  • Perplexity — Industry analysis report paradigm (focusing on trend overview): Titled "Deep Learning, Transformation, Financial Markets - A Comprehensive Analysis," it first introduced deep learning and financial markets (historical background and development), then introduced deep learning methods, comparative analysis of deep learning models (advantages and limitations of different architectures), and concluded with predictions about emerging trends and future development directions.
  • Gemini — Application scenario-oriented structure (highlighting technology implementation): Included introduction, the rise of deep learning in financial markets, deep learning for enhanced financial fraud detection, deep learning and financial risk management and prediction, revolutionary credit scoring, deep learning technologies. It then deeply analyzed the broader impact of deep learning on financial markets, concluding with a summary and future development direction predictions.

Both Gemini and ChatGPT focused more on themes and topics, with Gemini providing some broader analysis. Perplexity responded most completely to the prompt with the best structure. Therefore, in terms of content structure performance: Perplexity (5 points) > Gemini (4 points) > ChatGPT (3 points).

3. Analysis Depth

  • ChatGPT: In the highly technical sections, it cited numerous empirical studies while demonstrating good structure and clear topic progression. It mentioned that other studies similarly found deep learning models can more effectively predict intraday patterns or quote imbalances, showing it built upon literature findings. It also listed how two companies utilize AI, with excellent information processing.
  • Perplexity and Gemini generated deep research reports much faster than ChatGPT. From the results, this appears proportional to the length of the output content.

Both Gemini and ChatGPT focused more on themes and topics, with Gemini providing some broader analysis. Perplexity responded most completely to the prompt with the best structure. Therefore, in terms of content structure performance: Perplexity (5 points) > Gemini (4 points) > ChatGPT (3 points).

03.

Comprehensive Scoring and Conclusion

After comprehensive evaluation, the three major models each demonstrate distinct strengths in academic research capabilities:

ChatGPT

Total score: 23/30

Strengths: Strongest analysis depth, richest citations, most professional technical descriptions

Weaknesses: Slower generation speed, less clear structure

Suitable for: Professionals requiring in-depth academic research

Perplexity

Total score: 25/30

Strengths: Clearest structure, fast response speed, high citation transparency

Weaknesses: Slightly insufficient content depth

Suitable for: Users needing quick access to structured research reports

Gemini

Total score: 22/30

Strengths: Rich application scenario analysis, tight integration of technology and practice

Weaknesses: Citation standardization needs improvement

Suitable for: Researchers focused on practical technology applications

Final Recommendation

If you primarily conduct academic research, we recommend choosing tools in the following priority:

  1. Perplexity: Clear structure, fast speed, standardized citations - the best choice for most research scenarios
  2. ChatGPT: The preferred choice when in-depth technical analysis and professional literature reviews are needed
  3. Gemini: Suitable for research requiring technology application cases and practical references

Regardless of which tool you choose, we recommend combining manual review and multi-source verification to ensure research quality and accuracy.