Gmail Integration
Index Gmail conversations and enable AI-powered email responses that cite your knowledge base.
Features
| Feature | Description |
|---|---|
| Email Indexing | Sync inbox, sent, and labeled emails |
| Thread Reconstruction | Preserve conversation context |
| Auto-Response | AI-generated replies with citations |
| Smart Routing | Route emails to appropriate teams |
| Attachment Processing | Index PDFs, docs, and images |
| Label Filtering | Sync specific labels only |
Prerequisites
- Google Workspace account (or personal Gmail)
- Google Cloud project with Gmail API enabled
- OAuth consent screen configured
Setup
Step 1: Google Cloud Configuration
- Go to Google Cloud Console
- Create or select a project
- Enable the Gmail API
- Configure OAuth consent screen:
- User type: Internal (for Workspace) or External
- Add scopes:
- \gmail.readonly\ - Read emails
- \gmail.send\ - Send replies (if auto-response enabled)
- \gmail.labels\ - Access labels
- Create OAuth credentials:
- Application type: Web application
- Authorized redirect URI: \https://app.lakehouse42.com/api/auth/google/callback\
Step 2: Connect in LH42
- Go to Settings > Integrations > Gmail
- Click Connect with Google
- Sign in and authorize access
- Configure sync settings
Step 3: Configure Email Capture
\\\`python
client.integrations.gmail.configure({
"sync": {
"labels": ["INBOX", "IMPORTANT", "Support"],
"exclude_labels": ["SPAM", "TRASH", "PROMOTIONS"],
"lookback_days": 90,
"include_attachments": True,
"max_attachment_size_mb": 25
},
"auto_response": {
"enabled": False, # Enable with caution
"labels": ["Support"],
"require_approval": True,
"signature": "— Sent via LH42 AI Assistant"
}
})
\\\`
Email Processing Pipeline
\\\`
Gmail Inbox
│
├── Webhook notification (or IMAP poll)
│
▼
Email Capture Service
│
├── Thread reconstruction
├── Quote parsing (extract new content)
├── Attachment extraction
│
▼
Processing Pipeline
│
├── Generate embeddings
├── Extract entities
├── Classify intent
│
▼
Storage (Iceberg)
│
├── emailCaptures table (threads)
└── emailCaptureMessages table (messages)
\\\`
What Gets Indexed
| Content | Indexed | Notes |
|---|---|---|
| Email body | ✅ | HTML stripped, text preserved |
| Subject | ✅ | Searchable metadata |
| Sender/Recipients | ✅ | For filtering |
| Timestamps | ✅ | For time-based queries |
| Thread context | ✅ | Full conversation history |
| Attachments | ✅ | PDF, DOCX, images (OCR) |
| Labels | ✅ | As metadata tags |
Auto-Response (AI Replies)
Enable AI-powered email responses:
\\\`python
client.integrations.gmail.configure({
"auto_response": {
"enabled": True,
"mode": "draft", # draft | send | approval_queue
"trigger_labels": ["Support", "Sales"],
"response_template": "professional",
"include_citations": True,
"max_response_length": 500,
"signature": "Best regards,\nLH42 AI Assistant"
}
})
\\\`
Response modes:
- \
draft\: Create drafts for human review - \
send\: Send immediately (use carefully) - \
approval_queue\: Queue for approval in LH42 dashboard
API Reference
List Email Captures
\\\`bash
GET /api/integrations/gmail/threads?label=Support&limit=50
# Response
{
"threads": [
{
"id": "thread_abc123",
"subject": "Question about product pricing",
"participants": ["customer@example.com", "sales@company.com"],
"message_count": 5,
"last_message_at": "2026-01-20T10:30:00Z",
"labels": ["Support", "INBOX"],
"status": "indexed"
}
]
}
\\\`
Generate Response
\\\`bash
POST /api/integrations/gmail/generate-response
{
"thread_id": "thread_abc123",
"context": "Reply to the customer's question about pricing"
}
# Response
{
"draft": {
"subject": "Re: Question about product pricing",
"body": "Thank you for your inquiry...\n\n📄 Based on: Pricing Guide v2.0",
"citations": [
{"title": "Pricing Guide", "page": 5, "url": "..."}
]
}
}
\\\`
Send Response
\\\`bash
POST /api/integrations/gmail/send
{
"thread_id": "thread_abc123",
"body": "Thank you for your inquiry...",
"citations": true
}
\\\`
Thread Handling
Gmail threads are reconstructed with full context:
\\\`python
# Quote parsing extracts only new content
# Full thread context maintained for RAG queries
{
"thread_id": "...",
"messages": [
{"role": "customer", "content": "What's your pricing?", "timestamp": "..."},
{"role": "agent", "content": "Thanks for asking! Our pricing...", "timestamp": "..."},
{"role": "customer", "content": "Do you offer discounts?", "timestamp": "..."}
]
}
\\\`
Sync Frequency
| Mode | Frequency | Use Case |
|---|---|---|
| Webhook (Push) | Real-time | Immediate processing |
| IMAP (Pull) | Every 5 min | Fallback/batch |
| Manual | On-demand | Re-sync specific threads |
Security & Privacy
- OAuth 2.0: Secure token-based authentication
- Minimal scopes: Only request necessary permissions
- Token encryption: Credentials encrypted at rest
- No password storage: Never stores email passwords
- Audit logging: All email access logged
Troubleshooting
| Issue | Solution |
|---|---|
| Emails not syncing | Check OAuth token validity |
| Missing threads | Verify label filter configuration |
| Attachment errors | Check file size limits |
| Auto-response not working | Verify trigger labels match |
Next Steps
- Outlook Integration - Similar setup for Microsoft
- Communication Overview - Architecture deep-dive