SharePoint Connector
Sync documents from SharePoint sites and document libraries into LH42.
Features
- Sites - Sync entire SharePoint sites
- Document libraries - Specific library sync
- Office documents - Word, Excel, PowerPoint extraction
- PDFs - Full text extraction
- Metadata - SharePoint columns as searchable fields
- Incremental sync - Only changed files re-indexed
Prerequisites
- Microsoft 365 subscription with SharePoint
- Azure AD app registration (or use managed app)
- SharePoint site access
Setup
Step 1: Connect via OAuth
- Go to Settings > Integrations
- Find SharePoint and click Connect
- Sign in with your Microsoft account
- Grant the requested permissions:
- Sites.Read.All - Read SharePoint sites
- Files.Read.All - Read files in sites
> Note: Admin consent may be required for organization-wide access.
Step 2: Select Sites
After connecting, choose which sites to sync:
python
client.connectors.configure("sharepoint", {
"settings": {
"site_id": "specific-site-id", # Optional: limit to site
"site_url": "https://company.sharepoint.com/sites/TeamSite",
"extract_content": True, # Extract Office doc text
"max_file_size_mb": 50 # Skip files over 50MB
}
})Options:
- Omit
site_idto sync all accessible sites - Set
extract_content: falsefor metadata-only indexing
Step 3: Start Initial Sync
python
client.connectors.sync("sharepoint", mode="full")Supported File Types
| Type | Extensions | Processing |
|---|---|---|
| Word | .docx, .doc | Full text extraction |
| Excel | .xlsx, .xls | Cell content extraction |
| PowerPoint | .pptx, .ppt | Slide text extraction |
| Full text extraction | ||
| Text | .txt, .md, .csv | Direct indexing |
| HTML | .html, .htm | Content extraction |
API Reference
List Sites
bash
GET /api/connectors/{connector_id}/sites
# Response
{
"sites": [
{
"id": "site-guid",
"name": "Team Site",
"url": "https://company.sharepoint.com/sites/Team",
"document_count": 500,
"synced": true
}
]
}List Libraries
bash
GET /api/connectors/{connector_id}/libraries?site_id=site-guidSync Specific Site
bash
POST /api/connectors/{connector_id}/sync
{
"mode": "incremental",
"filters": {
"site_ids": ["site-guid"]
}
}Azure AD App Registration
For self-managed authentication:
- Go to Azure Portal → App Registrations
- Create new registration
- Add API permissions:
- Microsoft Graph: Sites.Read.All
- Microsoft Graph: Files.Read.All
- Create client secret
- Grant admin consent
- Configure in LH42:
python
client.connectors.create("sharepoint", {
"auth_type": "client_credentials",
"credentials": {
"tenant_id": "your-tenant-id",
"client_id": "your-client-id",
"client_secret": "your-client-secret"
}
})Permissions
SharePoint permissions are respected:
- Users only see documents they can access in SharePoint
- Permission changes sync within 1 hour
- Site-level and item-level permissions supported
Sync Frequency
| Mode | Frequency | Use Case |
|---|---|---|
| Scheduled | Every 4 hours | Standard sync |
| On-demand | Manual trigger | Immediate updates |
| Webhook | Near real-time | Microsoft Graph subscriptions |
Troubleshooting
"Site not accessible" errors
- Verify you have access to the SharePoint site
- Check Azure AD app has required permissions
Missing documents
- Verify file type is supported
- Check file isn't over size limit
Slow sync
- Large libraries may take longer
- Use site/library filters to limit scope
Next Steps
- Dropbox Connector - Connect Dropbox
- Integrations Overview - Architecture overview