How to Summarize YouTube Videos with LLMs
Ever find yourself wanting to get the key points from a long YouTube video without watching the entire thing? This tutorial shows you how to automatically generate summaries using YouTube transcripts and AI.
We’ll walk through three simple steps: getting video transcripts, feeding them to an LLM, and enjoying the results.
Prerequisites
Before we start, you’ll need:
- Python 3.7+
- An OpenAI API key
- YouTube videos with available transcripts
Install the required packages:
1pip install youtube-transcript-api openai
Step 1: Extract YouTube Transcripts
YouTube automatically generates transcripts for most videos. The youtube-transcript-api
library makes accessing them straightforward.
1from youtube_transcript_api import YouTubeTranscriptApi
2from youtube_transcript_api.formatters import TextFormatter
3from urllib.parse import urlparse, parse_qs
4
5def extract_video_id(url):
6 """Extract video ID from various YouTube URL formats"""
7 if 'youtu.be/' in url:
8 return url.split('youtu.be/')[-1].split('?')[0]
9 elif 'youtube.com/watch?v=' in url:
10 parsed = urlparse(url)
11 return parse_qs(parsed.query)['v'][0]
12 return None
13
14def get_video_transcript(video_id, languages=['en']):
15 """Fetch transcript for a YouTube video"""
16 try:
17 transcript_api = YouTubeTranscriptApi()
18 transcript_list = transcript_api.fetch(video_id=video_id, languages=languages)
19
20 if not transcript_list:
21 return None
22
23 formatter = TextFormatter()
24 transcript_text = formatter.format_transcript(transcript_list)
25
26 # Clean up the text
27 transcript_text = transcript_text.replace('\n', ' ')
28 transcript_text = ' '.join(transcript_text.split())
29
30 return transcript_text
31
32 except Exception as e:
33 print(f"Could not retrieve transcript: {e}")
34 return None
35
36# Example usage
37video_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
38video_id = extract_video_id(video_url)
39transcript = get_video_transcript(video_id)
40print(f"Transcript length: {len(transcript)} characters")
The transcript API handles different languages and formats automatically. Most popular videos have transcripts available.
Step 2: Generate Summaries with OpenAI
Now we feed the transcript to an LLM to generate an intelligent summary.
1import openai
2
3class VideoSummarizer:
4 def __init__(self):
5 self.client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
6
7 def summarize_video(self, transcript, model='gpt-5-mini'):
8 """Generate summary from video transcript"""
9
10 # Truncate if too long (roughly 8000 chars = 2000 tokens)
11 if len(transcript) > 8000:
12 transcript = transcript[:8000] + "..."
13
14 prompt = f"""
15Analyze this YouTube video transcript and provide a clear, structured summary.
16
17Instructions:
18- Identify the main topic and key points
19- Highlight important insights or conclusions
20- Use bullet points for clarity
21- Keep it concise but comprehensive
22
23Transcript:
24{transcript}
25
26Summary:
27"""
28
29 try:
30 response = self.client.chat.completions.create(
31 model=model,
32 messages=[{"role": "user", "content": prompt}],
33 max_completion_tokens=4096
34 )
35
36 return response.choices[0].message.content
37
38 except Exception as e:
39 print(f"Error generating summary: {e}")
40 return f"Summary failed. First 500 chars: {transcript[:500]}..."
41
42# Example usage
43summarizer = VideoSummarizer()
44summary = summarizer.summarize_video(transcript)
45print(summary)
The key is crafting a clear prompt that tells the AI exactly what kind of summary you want.
Step 3: Put It All Together
Here’s the complete pipeline:
1def summarize_youtube_video(video_url):
2 """Complete pipeline to summarize a YouTube video"""
3
4 print(f"Processing: {video_url}")
5
6 # Step 1: Extract video ID and transcript
7 video_id = extract_video_id(video_url)
8 if not video_id:
9 return "Invalid YouTube URL"
10
11 transcript = get_video_transcript(video_id)
12 if not transcript:
13 return "No transcript available for this video"
14
15 print(f"Transcript extracted: {len(transcript)} characters")
16
17 # Step 2: Generate summary
18 summarizer = VideoSummarizer()
19 summary = summarizer.summarize_video(transcript)
20
21 return summary
22
23# Example usage
24video_url = "https://www.youtube.com/watch?v=your_video_here"
25
26summary = summarize_youtube_video(video_url)
27print("\n" + "="*60)
28print("VIDEO SUMMARY")
29print("="*60)
30print(summary)
Conclusion
With about 100 lines of Python, you can build a system that automatically summarizes any YouTube video with available transcripts. The three-step process—extract transcript, process with LLM, enjoy results—works reliably for most content.
This approach is particularly useful for educational videos, interviews, and technical talks where you want to quickly understand the main points without watching the entire video. Or, if you’re like me, you can find out what’s behind clickbait titles.
The complete code is available here