Dev Log: Cursor-Inspired ArXiv Research Tool

I’ve been trying to get back into reading research papers, but it’s been years since my linear algebra classes. Every paper feels like it’s written in a foreign language. So I built a tool to help me (and hopefully others) understand them better.

The Problem

ArXiv papers are dense. You’re reading along, following the introduction, then BAM - you hit a wall of equations and terminology that assumes you remember your graduate-level math.

The traditional approach: Open 15 Wikipedia tabs, lose context, forget why you were reading the paper in the first place.

The Inspiration

I’ve been using Cursor for coding, and their CMD+K feature is brilliant. Select some code, hit the shortcut, and get an AI explanation in context.

Why not do the same for research papers?

Synthetic Trail

That’s the placeholder name for now. Here’s how it works:

  1. Find an ArXiv paper you want to read
  2. Open the HTML version (not PDF)
  3. Prepend r.apurn.com/ to the URL
  4. Select any confusing text
  5. Hit CMD+Shift+L
  6. Get an explanation that knows the paper’s context

The Technical Approach

The implementation is surprisingly straightforward:

// Listen for text selection
document.addEventListener('mouseup', () => {
  const selection = window.getSelection()
  if (selection.toString().length > 0) {
    currentSelection = selection.toString()
  }
})

// Listen for keyboard shortcut
document.addEventListener('keydown', (e) => {
  if (e.metaKey && e.shiftKey && e.key === 'l') {
    explainSelection(currentSelection)
  }
})

The magic happens in explainSelection(). It:

  1. Grabs the selected text
  2. Adds context from surrounding paragraphs
  3. Includes the paper title and abstract
  4. Sends to GPT-4 with a prompt optimized for academic explanations

The Context Window

The key insight: Don’t just explain the selected text. Include:

  • The paragraph it’s from
  • The section heading
  • The paper’s abstract
  • Previously explained terms from this session

This way, when you select “We use a VAE to…”, the AI knows what paper you’re reading and can explain VAE in that specific context.

Current Features

Smart Explanations

The AI adjusts its explanation based on what you select:

  • Equations: Step-by-step breakdown
  • Terms: Definition + how it’s used in this paper
  • Paragraphs: Plain English summary
  • Citations: What the referenced paper contributes

Session Memory

Each paper gets its own context. Previously explained terms are remembered, so explanations get more sophisticated as you read.

Future Plans

Local Storage

Currently, explanations vanish when you refresh. Planning to store chat history by ArXiv ID:

localStorage.setItem(`arxiv_${paperId}`, JSON.stringify(explanations))

Authentication

Considering Google OAuth so your reading history follows you across devices. Privacy-first though - all data stays client-side unless you explicitly sync.

Annotations

Thinking about letting users highlight and annotate papers, Genius-style. Build up a personal knowledge base of paper notes.

Paper Graph

Every paper cites others. What if you could visualize the citation graph and see explanations for how papers connect?

Technical Challenges

CORS

ArXiv doesn’t set CORS headers, so I had to proxy requests. Adds latency but it works.

LaTeX Rendering

ArXiv HTML has LaTeX equations as images. Extracting the actual math for better explanations is tricky.

Rate Limits

OpenAI rate limits are real. Implemented caching and request queuing to avoid hitting them.

Why This Matters

Academic papers are humanity’s knowledge repository, but they’re locked behind jargon. If we can lower the barrier to entry, more people can learn from and contribute to research.

I’ve already used it to finally understand a paper on transformer architectures that’s been in my reading list for months.

Try It

It’s live at r.apurn.com. Just prepend any ArXiv HTML URL.

Fair warning: It’s still rough. Sometimes the explanations are too verbose. Sometimes they miss nuance. But it’s already helpful enough that I use it daily.

The Code

Thinking about open-sourcing it. The core is just:

  • A Chrome extension for the keyboard shortcuts
  • A simple proxy server
  • OpenAI API calls
  • Some JavaScript glue

If there’s interest, I’ll clean it up and put it on GitHub.


The goal isn’t to replace deep understanding. It’s to help you build it. Sometimes you just need someone to explain that one concept that’s blocking everything else.

That’s what Synthetic Trail tries to be - your patient study buddy who never judges you for forgetting what an eigenvalue is.