A self-contained Python script for creating and managing custom datasources in Glean.
This tool provides both interactive and automated modes for datasource setup, configuration export/import, and template generation.
If you have uv installed, you can immediately run the CLI without downloading anything - no dependency installation steps required!
uv run https://raw.githubusercontent.com/nathancatania/glean-datasource-manager/refs/heads/main/manage.py
Usage: manage.py [OPTIONS] COMMAND [ARGS]...
Glean Custom Datasource Management Tool.
Manage Glean custom datasources with commands for setup, configuration, and
templates.
Options:
-v, --version Show the version and exit.
--help Show this message and exit.
Commands:
check-categories Show available datasource categories.
config Fetch datasource configuration from Glean.
generate-example-env Generate an example .env.setup.example file.
generate-object-template Generate a sample object_types.json...
generate-quicklinks-template Generate a sample quick_links.json template.
setup Setup a new Glean datasource.Note
This project is not affiliated with Glean.
This is a personal project created and published to assist in the automation of configuring datasources in Glean.
You use this tool at your own risk! If it somehow deletes all of your company's data, burns down your house, and nukes your dog from orbit - it will be your own fault.
That being said, I don't recall adding any code that would do this... 🤷♂️
- Prerequisites
- Quick Start
- Commands
setup- Create or Update a Datasourceconfig- View and Export Datasource Configurationgenerate-object-template- Create Object Types Templategenerate-quicklinks-template- Create Quick Links Templategenerate-example-env- Create Environment File Templatecheck-categories- List Datasource Categories
- Environment Variables
- Configuration Files
- Example Workflows
- Tips
- Troubleshooting
- After Setup
This script uses uv for Python package management:
# macOS
brew install uv
# Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"- Requires Python 3.13 or later
- Set the
GLEAN_INDEXING_API_KEYandGLEAN_INSTANCE_NAMEENVs- Copy
.env.setup.exampleand populate them, OR - Export them directly, e.g.
export GLEAN_INDEXING_API_KEY=..... - Link: Creating Indexing API Tokens (developers.glean.com)
- Link: Finding your Instance Name (developers.glean.com)
- Copy
Tip
If your custom connector has an ID of backstage, then the API key must have the scope of backstage as well.
Tip
The Glean Instance Name is the value to the right of the Glean Backend Domain for your environment.
E.g. If your domain is mycompany-prod-be.glean.com, then your Instance Name is mycompany-prod.
-
To setup a new custom connector, you will also need to populate
GLEAN_DATASOURCE_ID,GLEAN_DATASOURCE_DISPLAY_NAMEandGLEAN_DATASOURCE_HOME_URLvariables. For information on these, see the Environment Variables section below. -
You can either download the
manage.pyscript on it's own, or just point uv to the script on GitHub directly:# Run the script (shows help) uv run manage.py # Run the script without downloading it uv run https://raw.githubusercontent.com/nathancatania/glean-datasource-manager/refs/heads/main/manage.py # Read config of custom datasource from Glean uv run manage.py config <datasource_id> # Export config for a custom datasource from Glean # 💡 Tip: You can push this again to Glean using the 'setup' commands above! uv run manage.py config <datasource_id> --save # Interactive setup (recommended for first-time users) uv run manage.py setup # Silent setup (no confirmation prompts) uv run manage.py setup --silent
Interactive or automated datasource creation.
# Interactive mode (default)
uv run manage.py setup
# Silent mode - no prompts, uses environment variables
uv run manage.py setup --silent
# Force overwrite existing datasource (requires --silent)
uv run manage.py setup --silent --forceFetch configuration from an existing Glean datasource.
# View configuration
uv run manage.py config <datasource_id>
# Export configuration to files
uv run manage.py config <datasource_id> --saveWhen using --save, creates a directory called <datasource_id>-config containing:
<datasource_id>.env- Environment configurationobject_types.json- Object type definitions (if any)quick_links.json- Quick link definitions (if any)- Icon files (extracted from data URLs)
This configuration can be immediately restored by using the setup command from within the directory containing the exported config. Just add the GLEAN_INDEXING_API_KEY to the <datasource_id>.env that was created.
Generate a sample object_types.json file with examples.
uv run manage.py generate-object-templateGenerate a sample quick_links.json file with examples.
uv run manage.py generate-quicklinks-templateGenerate a .env.setup.example file with all available variables.
uv run manage.py generate-example-envDisplay available Glean datasource categories and their descriptions.
uv run manage.py check-categories| Variable | Description |
|---|---|
GLEAN_INDEXING_API_KEY |
API key for Glean Indexing API |
GLEAN_INSTANCE_NAME |
Your Glean instance name (e.g., "mycompany") |
| Variable | Description |
|---|---|
GLEAN_INDEXING_API_KEY |
API key for Glean Indexing API |
GLEAN_INSTANCE_NAME |
Your Glean instance name, e.g. mycompany-prod |
GLEAN_DATASOURCE_DISPLAY_NAME |
Display name for the datasource (max 50 chars), e.g. The Hub |
GLEAN_DATASOURCE_ID |
Unique ID for the datasource (lowercase, alphanumeric + hyphens), e.g. intranet |
GLEAN_DATASOURCE_HOME_URL |
Main landing page for the app after the user logs in, e.g. https://intranet.example.com/dashboard |
| Variable | Description | Default |
|---|---|---|
GLEAN_DATASOURCE_CATEGORY |
Category that best describes the majority of the content in the datasource. See here, or use the check-categories command for a list. |
KNOWLEDGE_HUB |
GLEAN_DATASOURCE_URL_REGEX |
URL pattern that will match every document indexed. This could be https://intranet.example.com/.*, or something more specific like https://intranet.example.com/app/content/.* |
<base_home_url>/.* |
GLEAN_DATASOURCE_ICON_FILENAME_LIGHTMODE |
Path to light mode icon file. Must be either a PNG or SVG. The tool will automatically look for a file called icon-lightmode.png in the same directory as where the script is run. |
icon-lightmode.png |
GLEAN_DATASOURCE_ICON_URL_LIGHTMODE |
Alternative to a local file, you can pass a URL that links to the light mode icon directly. E.g. https://example.com/lightmode-icon.svg |
- |
GLEAN_DATASOURCE_ICON_FILENAME_DARKMODE |
Path to dark mode icon file. Must be either a PNG or SVG. The tool will automatically look for a file called icon-darkmode.png in the same directory as where the script is run. If not specified, the light mode icon is used for dark mode as well. |
icon-darkmode.png |
GLEAN_DATASOURCE_ICON_URL_DARKMODE |
Alternative to a local file, you can pass a URL that links to the dark mode icon directly. E.g. https://example.com/darkmode-icon.svg |
- |
GLEAN_DATASOURCE_SUGGESTION_TEXT |
Search suggestion text for this datasource that appears in the UI, e.g. "What would you you like to search for in The Hub?" | Search for anything in <datasource_display_name>... |
GLEAN_DATASOURCE_USER_REFERENCED_BY_EMAIL |
Whether user identities in this datasource are referenced by email (or some other ID). E.g. When fetching a document, if a field like created_by is examined, will it have a user's email as a value, or some other ID representing the user?"created_by": "sam.sample@company.com" → Email"created_by": "sam.sample" → ID"created_by": "6de7677a-b68a-4183-83e4-a57c589f74e6" → ID |
true |
GLEAN_DATASOURCE_IS_TEST_MODE |
Whether the datasource should be hidden from non-test users by default. Ranking signals are also turned off while in Test Mode. | true |
GLEAN_DATASOURCE_TEST_USER_EMAILS |
Comma-separated test user emails, e.g. user1@company.com,user2@company.com |
- |
When creating a new custom datasource, the tool will automatically look for certain files in the same directory as where the script is being run to enhance the configuration of the datasource.
- Icons
- Object Types
- Quick Links
The script looks for icons in this order:
- File specified in
GLEAN_DATASOURCE_ICON_FILENAME_*environment variable - URL specified in
GLEAN_DATASOURCE_ICON_URL_*environment variable - Default files:
icon-lightmode.pngandicon-darkmode.pngin current directory
Supported formats: PNG, SVG
Object Types define the kinds of content that a piece of content from your datasource can be mapped to/classified as.
For example, is it a 'Document', 'Folder', 'Message', 'Announcement', 'Article', 'Channel', etc
These can be anything you want and should closely align with the types of content you have in the datasource that will be indexed.
Each object type can be rendered differently in Glean's search results. For example, you would want the "status" field to be shown in the search results for each "Ticket" object type, but for an "Article" object type, you would likely want the "created_at" and "author" metadata fields to be shown instead.
Any object types defined in object_types.json will automatically be pushed and associated with your custom datasource.
{
"objectTypes": [
{
"name": "article",
"display_label": "Article",
"doc_category": "PUBLISHED_CONTENT",
"summarizable": true,
"property_definitions": [...],
"property_groups": [...]
},
{
"name": "announcement",
"display_label": "Announcement",
"doc_category": "PUBLISHED_CONTENT",
"summarizable": true
},
]
}You can examine object_types_example.json for a more detailed example.
An indexed document will be associated with one of these object types. For example, consider an app like Slack:
- There are one or more Workspaces
- A Workspace has different Channels.
- A Channel contains different Messages.
- A Message may be have threaded Replies.
- A Message or Reply may have a File Attachment.
When content is crawled, it will need to be categorised into one of the above object types as either a Workspace, Channel, Message, Reply, or Attachment.
Object types help Glean rank content more effectively, and establish a hierarchy between content, e.g. Reply → Message → Channel → Workspace
The minimum representation is as follows:
{
"objectTypes": [
{
"name": "default",
"display_label": "Default",
"doc_category": "PUBLISHED_CONTENT",
"summarizable": true,
}
]
}
Property definitions are only required if you plan to push custom properties for a specific type of document; i.e. additional metadata associated with an asset that doesn't map to any of the fields that Glean has available when indexing the content - e.g. language, publish_date, lifecycle, classifier, etc.
Property groups are always optional and serve to group certain property definitions together.
Quick links are optional and define one or more quick action links for your datasource. Typically these are added to the "New..." shortcut menu at the top-right of the Glean UI.
Structure:
{
"quicklinks": [
{
"name": "Create Document",
"short_name": "Document",
"url": "https://myapp.com/app/new/document",
"icon_config": {
"icon_type": "URL",
"url": "https://example.com/icon.png"
},
"scopes": [
"APP_CARD",
"NEW_TAB_PAGE",
"AUTOCOMPLETE_FUZZY_MATCH",
"AUTOCOMPLETE_ZERO_QUERY",
"AUTOCOMPLETE_EXACT_MATCH"
]
}
]
}-
Generate configuration templates
# Generate templates uv run manage.py generate-example-env uv run manage.py generate-object-template uv run manage.py generate-quicklinks-template -
Update the .env with your values.
-
Find a PNG icon for your datasource. Name it
icon-lightmode.png. Optionally, do the same with a separate icon for the Dark Mode UI:icon-darkmode.png. -
Categorize your content in the
object_types.jsonfile (or just create a single "Default" object type.) -
(Optional) Add any shortcut links you would like to appear in the "New..." menu in the Glean UI to
quick_links.json -
Run the interactive setup
# Run interactive setup uv run https://raw.githubusercontent.com/nathancatania/glean-datasource-manager/refs/heads/main/manage.py setup -
Check the Glean UI for your new custom datasource (Admin Settings > Datasources), or use the tool to check the config:
# Check the config you just pushed uv run https://raw.githubusercontent.com/nathancatania/glean-datasource-manager/refs/heads/main/manage.py config <datasource_id>
# Set all required environment variables
export GLEAN_INDEXING_API_KEY="your-api-key"
export GLEAN_INSTANCE_NAME="mycompany"
export GLEAN_DATASOURCE_DISPLAY_NAME="My Datasource"
export GLEAN_DATASOURCE_ID="my-datasource"
export GLEAN_DATASOURCE_HOME_URL="https://myapp.com"
# Run silent setup
uv run manage.py setup --silent --force# Export existing datasource configuration
uv run manage.py config my-datasource-id --save
# This creates my-datasource-id-config/ directory with all settings
# To recreate on another system:
cd my-datasource-config/
cp my-datasource.env .env
# Add your API key to .env
uv run https://raw.githubusercontent.com/nathancatania/glean-datasource-manager/refs/heads/main/manage.py setup --silent# Set test mode variables
export GLEAN_DATASOURCE_IS_TEST_MODE=true
export GLEAN_DATASOURCE_TEST_USER_EMAILS="user1@company.com,user2@company.com"
# Create datasource in test mode
uv run https://raw.githubusercontent.com/nathancatania/glean-datasource-manager/refs/heads/main/manage.py setup-
Icon Requirements: Icons should be square and at least 256x256 pixels for best results.
-
Datasource IDs: Once created, datasource IDs cannot be changed. Choose carefully.
-
URL Regex: The URL pattern determines which documents belong to your datasource.
- By default, this is set to the base URL specified for the
GLEAN_DATASOURCE_HOME_URL- E.g. If this is set to
https://myapp.com/dashboard, the URL regex will automatically be set tohttps://myapp.com/.*.
- E.g. If this is set to
- If you wish to modify this yourself, the regex MUST encompass every URL for every document/asset indexed. If you are confident that all indexed assets will have the same path prefex, you can optionally make this more specific, e.g.
https://myapp.com/app/content/.*
- By default, this is set to the base URL specified for the
-
Test Mode: Always start in test mode to verify your configuration before making the datasource available to all users.
[!IMPORTANT] Ranking signals are disabled in Test Mode!
When searching in Glean for your datasource's content, when in test mode, the results will not be indicative of how they would actually be ranked once the datasource is made live.
-
Categories: Use
check-categoriesto see available categories and choose the most appropriate one for your content type.
- Ensure icon files exist in the current directory or specified path
- Check file extensions match (.png or .svg)
- Use absolute paths in environment variables
- Use
--forceflag with--silentto overwrite - Or delete the datasource in Glean UI first
- Verify your API key has indexing permissions
- Check instance name matches your Glean Backend Domain (e.g., if URL is
mycompany-be.glean.com, instance ismycompany) - Your Indexing API token MUST have the right scope assigned to it.
- To be able to read/write a custom datasource's config, the scope should be set to the ID for that datasource.
- E.g. For a datasource ID of
mycompanyapp, the Indexing API key must also have the scopemycompanyapp.
Once your datasource is created, you can:
- View it in Glean UI at:
https://app.glean.com/admin/setup/apps/custom/<datasource_id> - Start indexing documents using the Glean Indexing API
- Manage test users and permissions in the Glean admin interface


