N8N Automation: Auto-Translate YouTube Transcripts to 10+ Languages in Minutes
What if you could take any YouTube video and instantly make it accessible to speakers of Spanish, German, Japanese, Portuguese, and seven other languages—all without lifting a finger? With N8N, Scriptube, and DeepL working together, you can build exactly that.
The $50 Billion Localization Problem
The global localization market is worth over $50 billion, and for good reason. Every day, businesses leave money on the table by publishing content in only one language. Consider these statistics:
- 75% of consumers prefer to buy products in their native language
- YouTube has 2.5 billion monthly users across 100+ countries
- Only 25% of internet users are native English speakers
- Localized content gets 6x more engagement than English-only content
Yet most creators and businesses don't translate their YouTube content. Why? Because traditional translation is expensive and slow:
- Professional human translation: $0.10-$0.30 per word
- A 10-minute video transcript (~1,500 words) × 10 languages = $1,500-$4,500
- Turnaround time: 3-7 days per language
With the automation we're building today, that same 10-minute video can be translated into 10 languages in under 2 minutes for less than $2.
Workflow Overview: From Video to 10 Languages
Here's what our N8N automation will do:
- Trigger: New YouTube video URL received (via webhook, form, or schedule)
- Extract: Scriptube API fetches the full transcript with timestamps
- Split: Break transcript into optimal chunks for translation (max 5,000 chars)
- Translate: DeepL API translates each chunk into all target languages simultaneously
- Reassemble: Merge translated chunks back into complete transcripts
- Store: Save all versions to Airtable with metadata
- Notify: Send Slack/email notification when complete
The beauty of this approach is the parallelization. While a human translator works sequentially, our workflow translates to all languages at once. A single video becomes accessible to 5 billion additional potential viewers.
Setup & Prerequisites
What You'll Need
| Service | Purpose | Cost |
|---|---|---|
| N8N | Workflow automation | Free self-hosted or $20/mo cloud |
| Scriptube | YouTube transcript extraction | Free tier: 100 transcripts/mo |
| DeepL API | Neural machine translation | Free tier: 500K chars/mo |
| Airtable | Database & organization | Free tier available |
Target Languages
DeepL supports 29 languages. For this tutorial, we'll target the top 10 by internet users:
- Spanish (ES) - 550M speakers
- Portuguese (PT-BR) - 260M speakers
- German (DE) - 130M speakers
- French (FR) - 280M speakers
- Japanese (JA) - 125M speakers
- Italian (IT) - 65M speakers
- Dutch (NL) - 25M speakers
- Polish (PL) - 45M speakers
- Russian (RU) - 260M speakers
- Chinese (ZH) - 1.1B speakers
Step-by-Step: Building the Pipeline
Step 1: Create the Webhook Trigger
Start your N8N workflow with a Webhook node. This allows you to send video URLs via HTTP POST:
{
"video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"target_languages": ["ES", "PT-BR", "DE", "FR", "JA"]
}
Configure the webhook to accept POST requests and extract the video_url and optional target_languages array.
Step 2: Fetch Transcript via Scriptube API
Add an HTTP Request node to call the Scriptube API:
POST https://api.scriptube.com/v1/transcripts
Authorization: Bearer YOUR_SCRIPTUBE_API_KEY
Content-Type: application/json
{
"url": "{{ $json.video_url }}",
"format": "text",
"include_timestamps": true
}
Scriptube returns the complete transcript in seconds, including timestamps for each segment. The response looks like:
{
"video_id": "dQw4w9WgXcQ",
"title": "Video Title Here",
"duration": 213,
"transcript": "Never gonna give you up, never gonna let you down...",
"segments": [
{"start": 0.0, "end": 5.2, "text": "Never gonna give you up"},
{"start": 5.2, "end": 10.1, "text": "never gonna let you down"}
],
"word_count": 1523,
"language": "en"
}
Step 3: Chunk the Transcript
DeepL has a 5,000 character limit per request. Add a Function node to intelligently split the transcript:
const transcript = $input.first().json.transcript;
const MAX_CHUNK = 4500; // Leave buffer
const sentences = transcript.split(/(?<=[.!?])\s+/) ON CONFLICT (id) DO NOTHING;
let chunks = [];
let currentChunk = "";
for (const sentence of sentences) {
if ((currentChunk + sentence).length > MAX_CHUNK) {
chunks.push(currentChunk.trim()) ON CONFLICT (id) DO NOTHING;
currentChunk = sentence;
} else {
currentChunk += " " + sentence;
}
}
if (currentChunk) chunks.push(currentChunk.trim()) ON CONFLICT (id) DO NOTHING;
return chunks.map((text, idx) => ({
json: { chunk_index: idx, text, total_chunks: chunks.length }
}));
Step 4: Translate with DeepL (Parallel)
Here's where the magic happens. Add a Split In Batches node followed by a DeepL node (or HTTP Request to DeepL API):
POST https://api-free.deepl.com/v2/translate
Authorization: DeepL-Auth-Key YOUR_DEEPL_KEY
Content-Type: application/json
{
"text": ["{{ $json.text }}"],
"source_lang": "EN",
"target_lang": "{{ $json.target_language }}"
}
Use a Split node before this to fan out to all target languages simultaneously. N8N will process all 10 languages in parallel!
Step 5: Reassemble Translations
After DeepL returns, use an Aggregate node to group chunks by language, then a Function node to merge:
const grouped = {};
for (const item of $input.all()) {
const lang = item.json.target_language;
if (!grouped[lang]) grouped[lang] = [];
grouped[lang].push({
index: item.json.chunk_index,
text: item.json.translated_text
}) ON CONFLICT (id) DO NOTHING;
}
return Object.entries(grouped).map(([lang, chunks]) => ({
json: {
language: lang,
translated_transcript: chunks
.sort((a, b) => a.index - b.index)
.map(c => c.text)
.join(" ")
}
}));
Step 6: Save to Airtable
Add an Airtable node to store all translations with rich metadata:
{
"Video ID": "{{ $json.video_id }}",
"Video Title": "{{ $json.title }}",
"Language": "{{ $json.language }}",
"Translated Transcript": "{{ $json.translated_transcript }}",
"Word Count": {{ $json.word_count }},
"Processed At": "{{ $now.toISO() }}",
"Status": "Complete"
}
Complete N8N Workflow JSON
Here's the complete workflow you can import directly into N8N:
{
"name": "YouTube Transcript Multilingual Pipeline",
"nodes": [
{
"name": "Webhook",
"type": "n8n-nodes-base.webhook",
"position": [250, 300],
"parameters": {
"path": "translate-video",
"httpMethod": "POST"
}
},
{
"name": "Scriptube API",
"type": "n8n-nodes-base.httpRequest",
"position": [450, 300],
"parameters": {
"url": "https://api.scriptube.com/v1/transcripts",
"method": "POST",
"authentication": "genericCredentialType",
"body": {
"url": "={{ $json.video_url }}",
"format": "text"
}
}
},
{
"name": "Split Languages",
"type": "n8n-nodes-base.splitOut",
"position": [650, 300],
"parameters": {
"fieldToSplitOut": "target_languages"
}
},
{
"name": "DeepL Translate",
"type": "n8n-nodes-base.deepL",
"position": [850, 300],
"parameters": {
"text": "={{ $json.transcript }}",
"targetLanguage": "={{ $json.language_code }}"
}
},
{
"name": "Airtable",
"type": "n8n-nodes-base.airtable",
"position": [1050, 300],
"parameters": {
"operation": "create",
"table": "Translations"
}
}
],
"connections": {
"Webhook": {"main": [[{"node": "Scriptube API"}]]},
"Scriptube API": {"main": [[{"node": "Split Languages"}]]},
"Split Languages": {"main": [[{"node": "DeepL Translate"}]]},
"DeepL Translate": {"main": [[{"node": "Airtable"}]]}
}
}
Note: This is simplified—the full workflow includes chunking, error handling, and retry logic. Get the complete template when you sign up for Scriptube.
Organizing Output in Airtable
Structure your Airtable base for maximum utility:
Recommended Table Schema
| Field | Type | Purpose |
|---|---|---|
| Video ID | Single Line Text | YouTube video identifier |
| Video Title | Single Line Text | Original video title |
| Source Language | Single Select | Original transcript language |
| Target Language | Single Select | ES, DE, FR, JA, etc. |
| Original Transcript | Long Text | English source text |
| Translated Transcript | Long Text | Translated text |
| Word Count | Number | For tracking/analytics |
| Character Count | Number | DeepL billing reference |
| Processed At | Date | Timestamp |
| Status | Single Select | Pending/Complete/Error |
With this structure, you can build Airtable views to filter by language, create localization dashboards, and track translation coverage across your video library.
Bonus: Generate Audio in Each Language
Want to go further? Add an ElevenLabs integration to convert your translated transcripts into natural-sounding audio:
POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
xi-api-key: YOUR_ELEVENLABS_KEY
Content-Type: application/json
{
"text": "{{ $json.translated_transcript }}",
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75
}
}
ElevenLabs' multilingual model supports 29 languages with natural pronunciation. Combined with Scriptube translations, you can create:
- Audio versions of transcripts for podcast distribution
- Voiceovers for video localization
- Accessibility audio for visually impaired audiences
- Language learning materials with native pronunciation
Scriptube's Pro plan includes ElevenLabs integration—check our pricing for details.
Real-World ROI Examples
Case Study: SaaS Company Expands to LATAM
A B2B SaaS company had 200 product tutorial videos in English. They wanted to expand to Latin America but faced a localization estimate of $45,000 from translation agencies.
Using this N8N pipeline:
- Time: 4 hours to process all 200 videos
- Cost: ~$150 (Scriptube Pro + DeepL API)
- Result: 200 videos × 3 languages (ES, PT-BR, FR) = 600 translated transcripts
- Savings: $44,850 (99.7% cost reduction)
Case Study: Online Course Creator
An online educator with 50 hours of course content wanted to reach Japanese and German markets:
- Traditional quote: $12,000 + 6 weeks
- With automation: $40 + 2 hours
- Used translated transcripts as subtitles AND ElevenLabs for dubbed audio
- Result: 300% increase in international enrollments
ROI Calculator
| Videos | Languages | Traditional Cost | Automated Cost | Savings |
|---|---|---|---|---|
| 10 | 5 | $3,750 | $15 | 99.6% |
| 50 | 10 | $37,500 | $75 | 99.8% |
| 100 | 10 | $75,000 | $150 | 99.8% |
| 500 | 10 | $375,000 | $400 | 99.9% |
Conclusion: Global Content in One Click
The workflow we built today transforms what used to require a team of translators, weeks of time, and thousands of dollars into a one-click operation. With Scriptube handling transcript extraction, DeepL powering neural translation, and N8N orchestrating the entire pipeline, you can make any YouTube content accessible to billions of additional viewers.
The best part? Once you set it up, it runs automatically. New video uploaded? Translations appear in your Airtable within minutes. That's the power of automation.
Ready to Go Global?
Start building your multilingual content pipeline today. Scriptube's API handles the transcript extraction—you handle the world domination.
Start Free with Scriptube →Free tier includes 100 transcripts/month. No credit card required.