Skip to main content
Version: Next

Environment Variables

The following is a summary of a few important environment variables which expose various levers which control how DataHub works.


DataHub Java Components

This includes GMS, System Update, MAE/MCE Consumers.

Authentication & Authorization

Reference Links:

Authentication Configuration

Environment VariableDefaultDescriptionComponents
METADATA_SERVICE_AUTH_ENABLEDtrueEnable if you want all requests to the Metadata Service to be authenticatedGMS, MAE Consumer, MCE Consumer, PE Consumer, Frontend
DATAHUB_SYSTEM_CLIENT_SECRETSystem client secret used by AuthServiceControllerGMS, MAE Consumer, MCE Consumer, PE Consumer, Actions, Frontend
METADATA_SERVICE_AUTHENTICATOR_EXCEPTIONS_ENABLEDfalseNormally failures are only warnings, enable this to throw themGMS
DATAHUB_TOKEN_SERVICE_SIGNING_KEYKey used to validate incoming tokens and sign new tokensGMS
DATAHUB_TOKEN_SERVICE_SALTSalt used for token validation and signingGMS
DATAHUB_TOKEN_SERVICE_SIGNING_ALGORITHMHS256Signing algorithm for DataHub tokensGMS
SESSION_TOKEN_DURATION_MS86400000The max duration of a UI session in milliseconds (defaults to 1 day)GMS
GUEST_AUTHENTICATION_USERguestGuest user for unauthenticated accessGMS
GUEST_AUTHENTICATION_ENABLEDfalseEnable guest authenticationGMS

Authorization Configuration

Environment VariableDefaultDescriptionComponents
AUTH_POLICIES_ENABLEDtrueEnable the default DataHub policies-based authorizerGMS
POLICY_CACHE_REFRESH_INTERVAL_SECONDS120Cache refresh interval for policies in secondsGMS
POLICY_CACHE_FETCH_SIZE1000Cache policy fetch sizeGMS
REST_API_AUTHORIZATION_ENABLEDtrueEnable authorization of reads, writes, and deletes on REST APIsGMS
VIEW_AUTHORIZATION_ENABLEDfalseControls whether entity pages can limit access based on policiesGMS
VIEW_AUTHORIZATION_RECOMMENDATIONS_PEER_GROUP_ENABLEDtrueEnable peer group recommendations for view authorizationGMS

Ingestion Configuration

Reference Links:

Environment VariableDefaultDescriptionComponents
UI_INGESTION_ENABLEDtrueEnable UI-based ingestionGMS, MAE Consumer
INGESTION_BATCH_REFRESH_COUNT100Number of entities to refresh in a single batch when refreshing entities after ingestionGMS
INGESTION_SOURCE_REFRESH_INTERVAL_SECONDS43200Interval at which the ingestion source scheduler will check for new or updated ingestion sourcesGMS

Telemetry & Analytics

Environment VariableDefaultDescriptionComponents
INGESTION_REPORTING_ENABLEDfalseEnable ingestion reportingGMS
ENABLE_THIRD_PARTY_LOGGINGfalseWhether mixpanel tracking is enabledGMS

DataHub Core Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_SERVER_TYPEprodDataHub server typeGMS
DATAHUB_GMS_ASYNC_REQUEST_TIMEOUT_MS55000Async request timeout for GMSGMS
DATAHUB_GMS_HOSTlocalhostGMS hostFrontend
DATAHUB_GMS_PORT8080GMS portFrontend
DATAHUB_GMS_USE_SSLfalseUse SSL for GMS connectionsFrontend
DATAHUB_GMS_URInullURI instead of separate host/port/ssl parameters (takes priority)Frontend
DATAHUB_GMS_SSL_PROTOCOLnullSSL protocol for GMSFrontend

Plugin Configuration

Environment VariableDefaultDescriptionComponents
PLUGIN_SECURITY_MODERESTRICTEDPlugin security mode (RESTRICTED or LENIENT)GMS
ENTITY_REGISTRY_PLUGIN_PATH/etc/datahub/plugins/modelsPath for entity registry pluginsGMS
ENTITY_REGISTRY_PLUGIN_LOAD_DELAY_SECONDS60Rate at which plugin runnable executesGMS
RETENTION_PLUGIN_PATH/etc/datahub/plugins/retentionPath for retention pluginsGMS
AUTH_PLUGIN_PATH/etc/datahub/plugins/authPath for auth pluginsGMS

Metrics Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_METRICS_HOOK_LATENCY_PERCENTILES0.5,0.95,0.99,0.999Hook latency percentilesGMS, MAE Consumer
DATAHUB_METRICS_HOOK_LATENCY_SERVICE_LEVEL_OBJECTIVES300,1800,3000,10800,21600,43200Hook latency SLOs in secondsGMS, MAE Consumer
DATAHUB_METRICS_HOOK_LATENCY_MAX_EXPECTED_VALUE86000Maximum expected hook latency value in secondsGMS, MAE Consumer

Entity Service Configuration

Environment VariableDefaultDescriptionComponents
ENTITY_SERVICE_IMPLebeanEntity service implementationGMS, MCE Consumer
ENTITY_SERVICE_ENABLE_RETENTIONtrueEnable entity retentionGMS, MCE Consumer
ENTITY_SERVICE_APPLY_RETENTION_BOOTSTRAPfalseApply retention on bootstrapGMS, MCE Consumer

Graph Service Configuration

Environment VariableDefaultDescriptionComponents
GRAPH_SERVICE_IMPLelasticsearchGraph service implementationGMS, MAE Consumer
GRAPH_SERVICE_LIMIT_RESULTS_MAX10000Maximum allowed result count for queriesGMS
GRAPH_SERVICE_LIMIT_RESULTS_API_DEFAULT5000Default API result limitGMS
GRAPH_SERVICE_LIMIT_RESULTS_STRICTfalseThrow exception if strict is true, otherwise override with default and warnGMS

Search Service Configuration

Environment VariableDefaultDescriptionComponents
SEARCH_SERVICE_BATCH_SIZE100Search service batch sizeGMS
SEARCH_SERVICE_ENABLE_CACHEfalseEnable search service cacheGMS
SEARCH_SERVICE_ENABLE_CACHE_EVICTIONfalseEnable search service cache evictionGMS
SEARCH_SERVICE_CACHE_IMPLEMENTATIONcaffeineSearch service cache implementationGMS
SEARCH_SERVICE_HAZELCAST_SERVICE_NAMEhazelcast-serviceHazelcast service name for search cacheGMS
SEARCH_SERVICE_FILTER_CONTAINER_EXPANSION_ENABLEDtrueEnable container expansion in search filtersGMS
SEARCH_SERVICE_FILTER_CONTAINER_EXPANSION_PAGE_SIZE100Page size for container expansionGMS
SEARCH_SERVICE_FILTER_CONTAINER_EXPANSION_LIMIT100Limit for container expansionGMS
SEARCH_SERVICE_FILTER_DOMAIN_EXPANSION_ENABLEDtrueEnable domain expansion in search filtersGMS
SEARCH_SERVICE_FILTER_DOMAIN_EXPANSION_PAGE_SIZE100Page size for domain expansionGMS
SEARCH_SERVICE_FILTER_DOMAIN_EXPANSION_LIMIT100Limit for domain expansionGMS
SEARCH_SERVICE_LIMIT_RESULTS_MAX10000Maximum allowed result count for queriesGMS
SEARCH_SERVICE_LIMIT_RESULTS_API_DEFAULT5000Default API result limitGMS
SEARCH_SERVICE_LIMIT_RESULTS_STRICTfalseThrow exception if strict is true, otherwise override with default and warnGMS

Timeseries Aspect Service

Environment VariableDefaultDescriptionComponents
TIMESERIES_ASPECT_SERVICE_QUERY_CONCURRENCY10Parallel threads for timeseries queriesGMS
TIMESERIES_ASPECT_SERVICE_QUERY_QUEUE_SIZE500Queue size for timeseries queriesGMS
TIMESERIES_ASPECT_SERVICE_QUERY_THREAD_KEEP_ALIVE60Thread keep alive time for timeseries queriesGMS
TIMESERIES_ASPECT_SERVICE_LIMIT_RESULTS_MAX10000Maximum allowed result count for queriesGMS
TIMESERIES_ASPECT_SERVICE_LIMIT_RESULTS_API_DEFAULT5000Default API result limitGMS
TIMESERIES_ASPECT_SERVICE_LIMIT_RESULTS_STRICTfalseThrow exception if strict is true, otherwise override with default and warnGMS

System Metadata Service

Environment VariableDefaultDescriptionComponents
SYSTEM_METADATA_SERVICE_LIMIT_RESULTS_MAX10000Maximum allowed result count for queriesGMS
SYSTEM_METADATA_SERVICE_LIMIT_RESULTS_API_DEFAULT5000Default API result limitGMS
SYSTEM_METADATA_SERVICE_LIMIT_RESULTS_STRICTfalseThrow exception if strict is true, otherwise override with default and warnGMS

Platform Analytics

Environment VariableDefaultDescriptionComponents
DATAHUB_ANALYTICS_ENABLEDtrueEnable platform analyticsGMS, MAE Consumer, Frontend
DATAHUB_ANALYTICS_TRACING_ENABLEDtrueEnable backend usage tracingGMS
ANALYTICS_DATAHUB_USAGE_EVENT_TYPESCreateAccessTokenEvent,CreatePolicyEvent,UpdatePolicyEvent,CreateIngestionSourceEvent,UpdateIngestionSourceEvent,RevokeAccessTokenEvent,CreateUserEvent,UpdateUserEvent,DeletePolicyEventComma separated list of usage event types to listen toGMS
ANALYTICS_GENERIC_ASPECT_TYPES``Filter list for generic aspect eventsGMS
ANALYTICS_USER_FILTERS``Filter out specific users' events from being publishedGMS

Visual Configuration

Queries Tab

Environment VariableDefaultDescriptionComponents
REACT_APP_QUERIES_TAB_RESULT_SIZE5Queries tab result size (experimental)Frontend

Theme Configuration

Environment VariableDefaultDescriptionComponents
REACT_APP_CUSTOM_THEME_ID``Custom theme ID for rendering specific theme fileFrontend

Assets Configuration

Environment VariableDefaultDescriptionComponents
REACT_APP_LOGO_URL/assets/platforms/datahublogo.pngLogo URL for the applicationFrontend
REACT_APP_FAVICON_URL/assets/icons/favicon.icoFavicon URL for the applicationFrontend
REACT_APP_TITLE``Application titleFrontend

UI Configuration

Environment VariableDefaultDescriptionComponents
REACT_APP_HIDE_GLOSSARYfalseHide glossary in the UIFrontend
REACT_APP_SHOW_FULL_TITLE_IN_LINEAGEfalseShow full title in lineageFrontend
DOMAIN_DEFAULT_TAB``Default tab for domains (set to DOCUMENTATION_TAB to show documentation tab first)Frontend
APPLICATION_SHOW_SIDEBAR_SECTION_WHEN_EMPTYfalseShow sidebar section when empty (deprecated)Frontend
SEARCH_RESULT_NAME_HIGHLIGHT_ENABLEDtrueEnable visual highlighting on search result names/descriptionsFrontend

Storage Layer Configuration

EBean Configuration (MySQL/PostgreSQL)

Environment VariableDefaultDescriptionComponents
EBEAN_DATASOURCE_USERNAMEdatahubDatabase usernameGMS, MCE Consumer, System Update
EBEAN_DATASOURCE_PASSWORDdatahubDatabase passwordGMS, MCE Consumer, System Update
EBEAN_DATASOURCE_URLjdbc:mysql://localhost:3306/datahubJDBC URLGMS, MCE Consumer, System Update
EBEAN_DATASOURCE_DRIVERcom.mysql.jdbc.DriverJDBC DriverGMS, MCE Consumer, System Update
EBEAN_MIN_CONNECTIONS2Minimum database connectionsGMS, MCE Consumer, System Update
EBEAN_MAX_CONNECTIONS50Maximum database connectionsGMS, MCE Consumer, System Update
EBEAN_MAX_INACTIVE_TIME_IN_SECS120Maximum inactive time in secondsGMS, MCE Consumer, System Update
EBEAN_MAX_AGE_MINUTES120Maximum age in minutesGMS, MCE Consumer, System Update
EBEAN_LEAK_TIME_MINUTES15Leak time in minutesGMS, MCE Consumer, System Update
EBEAN_WAIT_TIMEOUT_MILLIS1000Wait timeout in millisecondsGMS, MCE Consumer, System Update
EBEAN_AUTOCREATEfalseAuto-create DDLGMS, MCE Consumer, System Update
EBEAN_POSTGRES_USE_AWS_IAM_AUTHfalseUse AWS IAM authentication for PostgreSQLGMS, MCE Consumer, System Update
EBEAN_BATCH_GET_METHODINBatch get method (IN or UNION)GMS, MCE Consumer, System Update

Cassandra Configuration

Environment VariableDefaultDescriptionComponents
CASSANDRA_DATASOURCE_USERNAMEcassandraCassandra usernameGMS, MCE Consumer, System Update
CASSANDRA_DATASOURCE_PASSWORDcassandraCassandra passwordGMS, MCE Consumer, System Update
CASSANDRA_HOSTScassandraCassandra hostsGMS, MCE Consumer, System Update
CASSANDRA_PORT9042Cassandra portGMS, MCE Consumer, System Update
CASSANDRA_DATACENTERdatacenter1Cassandra datacenterGMS, MCE Consumer, System Update
CASSANDRA_KEYSPACEdatahubCassandra keyspaceGMS, MCE Consumer, System Update
CASSANDRA_USE_SSLfalseUse SSL for CassandraGMS, MCE Consumer, System Update

Elasticsearch Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_HOSTlocalhostElasticsearch hostGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PORT9200Elasticsearch portGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_THREAD_COUNT2Elasticsearch thread countGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_CONNECTION_REQUEST_TIMEOUT5000Connection request timeoutGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_USERNAMEnullElasticsearch usernameGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PASSWORDnullElasticsearch passwordGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PATH_PREFIXnullElasticsearch path prefixGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_USE_SSLfalseUse SSL for ElasticsearchGMS, MAE Consumer, MCE Consumer, System Update
OPENSEARCH_USE_AWS_IAM_AUTHfalseUse AWS IAM authentication for OpenSearchGMS, MAE Consumer, MCE Consumer, System Update
AWS_REGIONnullAWS regionGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_IMPLEMENTATIONelasticsearchImplementation (elasticsearch or opensearch)GMS, MAE Consumer, MCE Consumer, System Update
ELASTIC_ID_HASH_ALGOMD5ID hash algorithmGMS, MAE Consumer, MCE Consumer, System Update

SSL Context Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_SSL_PROTOCOLnullSSL protocolGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_SECURE_RANDOM_IMPLnullSSL secure random implementationGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_TRUSTSTORE_FILEnullSSL truststore fileGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_TRUSTSTORE_TYPEnullSSL truststore typeGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_TRUSTSTORE_PASSWORDnullSSL truststore passwordGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEYSTORE_FILEnullSSL keystore fileGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEYSTORE_TYPEnullSSL keystore typeGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEYSTORE_PASSWORDnullSSL keystore passwordGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEY_PASSWORDnullSSL key passwordGMS, MAE Consumer, MCE Consumer, System Update

Bulk Operations Configuration

Environment VariableDefaultDescriptionComponents
ES_BULK_DELETE_BATCH_SIZE5000Bulk delete batch sizeGMS, MAE Consumer
ES_BULK_DELETE_SLICESautoBulk delete slicesGMS, MAE Consumer
ES_BULK_DELETE_POLL_INTERVAL30Bulk delete poll intervalGMS, MAE Consumer
ES_BULK_DELETE_POLL_UNITSECONDSBulk delete poll unitGMS, MAE Consumer
ES_BULK_DELETE_TIMEOUT30Bulk delete timeoutGMS, MAE Consumer
ES_BULK_DELETE_TIMEOUT_UNITMINUTESBulk delete timeout unitGMS, MAE Consumer
ES_BULK_DELETE_NUM_RETRIES3Bulk delete number of retriesGMS, MAE Consumer
ES_BULK_ASYNCtrueEnable async bulk operationsGMS, MAE Consumer
ES_BULK_REQUESTS_LIMIT1000Bulk requests limitGMS, MAE Consumer
ES_BULK_FLUSH_PERIOD1Bulk flush periodGMS, MAE Consumer
ES_BULK_NUM_RETRIES3Bulk number of retriesGMS, MAE Consumer
ES_BULK_RETRY_INTERVAL1Bulk retry intervalGMS, MAE Consumer
ES_BULK_REFRESH_POLICYNONEBulk refresh policyGMS, MAE Consumer
ES_BULK_ENABLE_BATCH_DELETEfalseEnable batch deleteGMS, MAE Consumer

Index Configuration

Environment VariableDefaultDescriptionComponents
INDEX_PREFIX``Index prefixGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_INDEX_DOC_IDS_SCHEMA_FIELD_HASH_ID_ENABLEDfalseEnable hash ID for schema field doc IDsGMS, MAE Consumer, MCE Consumer, System Update

Build Indices Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_BUILD_INDICES_ALLOW_DOC_COUNT_MISMATCHfalseAllow document count mismatch when clone indices is enabledSystem Update
ELASTICSEARCH_BUILD_INDICES_CLONE_INDICEStrueClone indicesSystem Update
ELASTICSEARCH_BUILD_INDICES_RETENTION_UNITDAYSRetention unit for indicesSystem Update
ELASTICSEARCH_BUILD_INDICES_RETENTION_VALUE60Retention value for indicesSystem Update
ELASTICSEARCH_BUILD_INDICES_REINDEX_OPTIMIZATION_ENABLEDtrueEnable reindex optimizationSystem Update
ELASTICSEARCH_NUM_SHARDS_PER_INDEX1Number of shards per indexSystem Update
ELASTICSEARCH_NUM_REPLICAS_PER_INDEX1Number of replicas per indexSystem Update
ELASTICSEARCH_INDEX_BUILDER_NUM_RETRIES3Index builder number of retriesSystem Update
ELASTICSEARCH_INDEX_BUILDER_REFRESH_INTERVAL_SECONDS3Index builder refresh intervalSystem Update
SEARCH_DOCUMENT_MAX_ARRAY_LENGTH1000Maximum array length in search documentsSystem Update
SEARCH_DOCUMENT_MAX_OBJECT_KEYS1000Maximum object keys in search documentsSystem Update
SEARCH_DOCUMENT_MAX_VALUE_LENGTH4096Maximum value length in search documentsSystem Update
ELASTICSEARCH_MAIN_TOKENIZERnullMain tokenizerSystem Update
ELASTICSEARCH_INDEX_BUILDER_MAPPINGS_REINDEXfalseEnable mappings reindexSystem Update
ELASTICSEARCH_INDEX_BUILDER_SETTINGS_REINDEXfalseEnable settings reindexSystem Update
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS0Maximum reindex hours (0 = no timeout)System Update
ELASTICSEARCH_INDEX_BUILDER_SETTINGS_OVERRIDESnullIndex builder settings overridesSystem Update
ELASTICSEARCH_MIN_SEARCH_FILTER_LENGTH3Minimum search filter lengthSystem Update
ELASTICSEARCH_INDEX_BUILDER_ENTITY_SETTINGS_OVERRIDESnullEntity settings overridesSystem Update

Search Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_QUERY_MAX_TERM_BUCKET_SIZE60Maximum term bucket sizeGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_EXCLUSIVEfalseOnly return exact matches when using quotesGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_WITH_PREFIXtrueInclude prefix match in exact match resultsGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_FACTOR16.0Multiply by this number on true exact matchGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_PREFIX_FACTOR1.1Multiply by this number when prefix matchGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_CASE_FACTOR0.0Stacked boost multiplier when case mismatchGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_ENABLE_STRUCTUREDtrueEnable exact match on structured searchGMS
ELASTICSEARCH_QUERY_TWO_GRAM_FACTOR1.2Boost multiplier when match on 2-gram tokensGMS
ELASTICSEARCH_QUERY_THREE_GRAM_FACTOR1.5Boost multiplier when match on 3-gram tokensGMS
ELASTICSEARCH_QUERY_FOUR_GRAM_FACTOR1.8Boost multiplier when match on 4-gram tokensGMS
ELASTICSEARCH_QUERY_PARTIAL_URN_FACTOR0.5Multiplier on Urn token matchGMS
ELASTICSEARCH_QUERY_PARTIAL_FACTOR0.4Multiplier on possible non-Urn token matchGMS
ELASTICSEARCH_QUERY_CUSTOM_CONFIG_ENABLEDtrueEnable search query and ranking customizationGMS
ELASTICSEARCH_QUERY_CUSTOM_CONFIG_FILEsearch_config.yamlLocation of search customization configurationGMS
ELASTICSEARCH_QUERY_SEARCH_FIELD_CONFIG_DEFAULTlegacyDefault field configuration for searchGMS
ELASTICSEARCH_QUERY_AUTOCOMPLETE_FIELD_CONFIG_DEFAULTlegacyDefault field configuration for autocompleteGMS

Graph Search Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_SEARCH_GRAPH_TIMEOUT_SECONDS50Graph DAO timeout secondsGMS
ELASTICSEARCH_SEARCH_GRAPH_BATCH_SIZE1000Graph DAO batch sizeGMS
ELASTICSEARCH_SEARCH_GRAPH_MULTI_PATH_SEARCHfalseAllow path retraversal for all pathsGMS
ELASTICSEARCH_SEARCH_GRAPH_BOOST_VIA_NODEStrueBoost graph edges with via nodesGMS
ELASTICSEARCH_SEARCH_GRAPH_STATUS_ENABLEDfalseEnable soft delete tracking of URNs on edgesGMS
ELASTICSEARCH_SEARCH_GRAPH_LINEAGE_MAX_HOPS20Maximum hops to traverse lineage graphGMS
ELASTICSEARCH_SEARCH_GRAPH_IMPACT_MAX_HOPS1000Maximum hops to traverse for impact analysisGMS
ELASTICSEARCH_SEARCH_GRAPH_IMPACT_MAX_THREADS32Maximum parallel lineage graph queriesGMS
ELASTICSEARCH_SEARCH_GRAPH_QUERY_OPTIMIZATIONtrueReduce query nesting if possibleGMS

Neo4j Configuration

Environment VariableDefaultDescriptionComponents
NEO4J_USERNAMEneo4jNeo4j usernameGMS, MAE Consumer, System Update
NEO4J_PASSWORDdatahubNeo4j passwordGMS, MAE Consumer, System Update
NEO4J_URIbolt://localhostNeo4j URIGMS, MAE Consumer, System Update
NEO4J_DATABASEgraph.dbNeo4j databaseGMS, MAE Consumer, System Update
NEO4J_MAX_CONNECTION_POOL_SIZE100Maximum connection pool sizeGMS, MAE Consumer, System Update
NEO4J_MAX_CONNECTION_ACQUISITION_TIMEOUT_IN_SECONDS60Maximum connection acquisition timeoutGMS, MAE Consumer, System Update
NEO4j_MAX_CONNECTION_LIFETIME_IN_SECONDS3600Maximum connection lifetimeGMS, MAE Consumer, System Update
NEO4J_MAX_TRANSACTION_RETRY_TIME_IN_SECONDS30Maximum transaction retry timeGMS, MAE Consumer, System Update
NEO4J_CONNECTION_LIVENESS_CHECK_TIMEOUT_IN_SECONDS-1Connection liveness check timeoutGMS, MAE Consumer, System Update

Kafka Configuration

Reference Links:

Topic Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_USAGE_EVENT_NAMEDataHubUsageEvent_v1DataHub usage event topic nameGMS, MAE Consumer, MCE Consumer, Actions, Frontend

Bootstrap Servers

Environment VariableDefaultDescriptionComponents
KAFKA_BOOTSTRAP_SERVERhttp://localhost:9092Kafka bootstrap serversGMS, MAE Consumer, MCE Consumer, PE Consumer, Actions, Frontend

Producer Configuration

Environment VariableDefaultDescriptionComponents
KAFKA_PRODUCER_RETRY_COUNT3Producer retry countGMS, MCE Consumer, System Update
KAFKA_PRODUCER_DELIVERY_TIMEOUT30000Producer delivery timeoutGMS, MCE Consumer, System Update
KAFKA_PRODUCER_REQUEST_TIMEOUT3000Producer request timeoutGMS, MCE Consumer, System Update
KAFKA_PRODUCER_BACKOFF_TIMEOUT500Producer backoff timeoutGMS, MCE Consumer, System Update
KAFKA_PRODUCER_COMPRESSION_TYPEsnappyProducer compression algorithmGMS, MCE Consumer, System Update
KAFKA_PRODUCER_MAX_REQUEST_SIZE5242880Maximum bytes sent by producerGMS, MCE Consumer, System Update

Consumer Configuration

Environment VariableDefaultDescriptionComponents
KAFKA_LISTENER_CONCURRENCY1Number of Kafka consumer threadsGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MAX_PARTITION_FETCH_BYTES5242880Maximum data per partitionGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_STOP_ON_DESERIALIZATION_ERRORtrueStop on deserialization errorGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_HEALTH_CHECK_ENABLEDtrueEnable health check for consumersGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MCP_AUTO_OFFSET_RESETearliestMCP consumer auto offset resetGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MCL_AUTO_OFFSET_RESETearliestMCL consumer auto offset resetGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MCL_FINE_GRAINED_LOGGING_ENABLEDfalseEnable fine-grained logging for MCLGMS, MAE Consumer
KAFKA_CONSUMER_MCL_ASPECTS_TO_DROP``Aspects to drop for MCLGMS, MAE Consumer
KAFKA_CONSUMER_PE_AUTO_OFFSET_RESETlatestPE consumer auto offset resetGMS, PE Consumer
KAFKA_CONSUMER_PERCENTILES0.5,0.95,0.99,0.999Consumer percentilesGMS, MAE Consumer, MCE Consumer, PE Consumer, PE Consumer
KAFKA_CONSUMER_SERVICE_LEVEL_OBJECTIVES300,1800,3000,10800,21600,43200Consumer SLOs in secondsGMS, MAE Consumer, MCE Consumer, PE Consumer, PE Consumer
KAFKA_CONSUMER_MAX_EXPECTED_VALUE86000Maximum expected consumer value in secondsGMS, MAE Consumer, MCE Consumer, PE Consumer, PE Consumer

Consumer Pool Configuration

Environment VariableDefaultDescriptionComponents
KAFKA_CONSUMER_POOL_INITIAL_SIZE1Consumer pool initial sizeGMS
KAFKA_CONSUMER_POOL_MAX_SIZE5Consumer pool maximum sizeGMS

Schema Registry Configuration

Environment VariableDefaultDescriptionComponents
SCHEMA_REGISTRY_TYPEKAFKASchema registry type (INTERNAL, KAFKA, or AWS_GLUE)GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_SCHEMAREGISTRY_URLhttp://localhost:8081Schema registry URLGMS, MAE Consumer, MCE Consumer, PE Consumer
SCHEMA_REGISTRY_URLhttp://localhost:8081Schema registry URL (Actions)Actions
AWS_GLUE_SCHEMA_REGISTRY_REGIONus-east-1AWS Glue schema registry regionGMS, MAE Consumer, MCE Consumer, PE Consumer
AWS_GLUE_SCHEMA_REGISTRY_NAMEnullAWS Glue schema registry nameGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_PROPERTIES_SECURITY_PROTOCOLPLAINTEXTKafka security protocolGMS, MAE Consumer, MCE Consumer, PE Consumer, Actions

Spring Configuration

Kafka Security

Environment VariableDefaultDescriptionComponents
spring.kafka.security.protocolPLAINTEXTKafka security protocolGMS, MAE Consumer, MCE Consumer, PE Consumer

Management & Monitoring

JMX Configuration

Environment VariableDefaultDescriptionComponents
spring.jmx.enabledtrueEnable JMXGMS, MAE Consumer, MCE Consumer, PE Consumer

Endpoints Configuration

Environment VariableDefaultDescriptionComponents
management.endpoints.web.exposure.includeprometheus,info,healthcheck,metricsExposed web endpointsGMS
management.endpoints.jmx.enabledtrueEnable JMX endpointsGMS

Metrics Configuration

Environment VariableDefaultDescriptionComponents
management.metrics.cache.enabledfalseEnable cache metricsGMS, MAE Consumer, MCE Consumer, PE Consumer
management.metrics.export.jmx.enabledtrueEnable JMX metrics exportGMS, MAE Consumer, MCE Consumer, PE Consumer
management.metrics.export.prometheus.enabledtrueEnable Prometheus metrics exportGMS, MAE Consumer, MCE Consumer, PE Consumer

Server Configuration

Environment VariableDefaultDescriptionComponents
server.server-headerfalseServer headerGMS

Feature Flags

Reference Links:

Environment VariableDefaultDescriptionComponents
SHOW_SIMPLIFIED_HOMEPAGE_BY_DEFAULTfalseShow simplified homepage with just datasets, charts and dashboardsGMS
LINEAGE_SEARCH_CACHE_ENABLEDtrueEnable in-memory cache for searchAcrossLineage queryGMS
GRAPH_SERVICE_DIFF_MODE_ENABLEDtrueEnable diff mode for graph writesGMS
POINT_IN_TIME_CREATION_ENABLEDfalseEnable creation of point in time snapshots for scroll APIGMS
ALWAYS_EMIT_CHANGE_LOGfalseAlways emit MCL even when no changes detectedGMS
SEARCH_SERVICE_DIFF_MODE_ENABLEDtrueEnable diff mode for search document writesGMS
READ_ONLY_MODE_ENABLEDfalseEnable read only mode for instanceGMS
SHOW_ACCESS_MANAGEMENTfalseShow AccessManagement tab in UIGMS
SHOW_SEARCH_FILTERS_V2trueShow search filters V2 experienceGMS
SHOW_BROWSE_V2trueShow browse v2 sidebar experienceGMS
PLATFORM_BROWSE_V2trueEnable platform browse experienceGMS
LINEAGE_GRAPH_V2trueEnable new lineage visualizationGMS
PRE_PROCESS_HOOKS_UI_ENABLEDtrueCircumvent Kafka for UI changesGMS
PRE_PROCESS_HOOKS_UI_ENABLEDfalseReprocess UI sourced events asynchronouslyGMS
SHOW_ACRYL_INFOfalseShow CTAs around moving to DataHub CloudGMS
ER_MODEL_RELATIONSHIP_FEATURE_ENABLEDfalseEnable Join Tables FeatureGMS
NESTED_DOMAINS_ENABLEDtrueEnable nested Domains featureGMS
SCHEMA_FIELD_ENTITY_FETCH_ENABLEDtrueEnable fetching schema field entitiesGMS
BUSINESS_ATTRIBUTE_ENTITY_ENABLEDfalseEnable business attribute entityGMS
DATA_CONTRACTS_ENABLEDtrueEnable Data Contracts featureGMS
ALTERNATE_MCP_VALIDATIONfalseEnable alternate MCP validation flowGMS
THEME_V2_ENABLEDtrueAllow theme v2 to be turned onGMS
THEME_V2_DEFAULTtrueSet default theme for usersGMS
THEME_V2_TOGGLEABLEtrueAllow theme v2 to be toggled (Acryl only)GMS
SCHEMA_FIELD_CLL_ENABLEDfalseEnable schema field-level lineage linksGMS
SCHEMA_FIELD_LINEAGE_IGNORE_STATUStrueIgnore schema field status in lineageGMS
SHOW_SEPARATE_SIBLINGSfalseSeparate siblings with no combined viewGMS
EDITABLE_DATASET_NAME_ENABLEDfalseEnable editing dataset name in UIGMS
SHOW_MANAGE_STRUCTURED_PROPERTIEStrueShow manage structured properties buttonGMS
HIDE_DBT_SOURCE_IN_LINEAGEfalseHide dbt sources in lineageGMS
SHOW_NAV_BAR_REDESIGNtrueShow newly designed nav barGMS
SHOW_AUTO_COMPLETE_RESULTStrueShow auto complete results in search barGMS
ENTITY_VERSIONING_ENABLEDfalseEnable entity versioning APIsGMS
SHOW_HAS_SIBLINGS_FILTERfalseShow "has siblings" filter in searchGMS
SHOW_SEARCH_BAR_AUTOCOMPLETE_REDESIGNfalseShow redesigned search bar autocompleteGMS
SHOW_MANAGE_TAGStrueAllow users to manage tags in UIGMS
SHOW_INTRODUCE_PAGEtrueShow introduce page in V2 UIGMS
SHOW_INGESTION_PAGE_REDESIGNfalseShow re-designed Ingestion pageGMS
SHOW_LINEAGE_EXPAND_MOREtrueShow expand more button in lineage graphGMS
SHOW_HOME_PAGE_REDESIGNfalseShow re-designed home pageGMS
LINEAGE_GRAPH_V3falseEnable redesign of lineage v2 graphGMS
SHOW_PRODUCT_UPDATEStrueShow in-product update popoverGMS
LOGICAL_MODELS_ENABLEDfalseEnable logical models featureGMS
SHOW_HOMEPAGE_USER_ROLEfalseDisplay homepage user role underneath nameGMS
VIEWS_ENABLEDtrueEnable views featureGMS

System Updates

Reference Links:

Bootstrap Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_POLICIES_FILEclasspath:boot/policies.jsonBootstrap policies fileGMS
BOOTSTRAP_SERVLETS_WAITTIMEOUT60Total waiting time for servlets to initializeGMS

System Update Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_INITIAL_BACK_OFF_MILLIS5000Initial back off for system updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_MAX_BACK_OFFS50Maximum back offs for system updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_BACK_OFF_FACTOR2Multiplicative factor for back offSystem Update
BOOTSTRAP_SYSTEM_UPDATE_WAIT_FOR_SYSTEM_UPDATEtrueWait for system update to completeSystem Update
SYSTEM_UPDATE_BOOTSTRAP_MCP_CONFIGbootstrap_mcps.yamlBootstrap MCP configurationSystem Update

Data Job Node CLL Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_ENABLEDfalseEnable data job node CLLSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_BATCH_SIZE1000Data job node CLL batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_DELAY_MS30000Data job node CLL delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_LIMIT0Data job node CLL limitSystem Update

Domain Description Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_ENABLEDtrueEnable domain description updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_BATCH_SIZE1000Domain description batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_DELAY_MS30000Domain description delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_CLL_LIMIT0Domain description CLL limitSystem Update

Dashboard Info Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_ENABLEDtrueEnable dashboard info updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_BATCH_SIZE1000Dashboard info batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_DELAY_MS30000Dashboard info delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_CLL_LIMIT0Dashboard info CLL limitSystem Update

Browse Paths V2 Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_BROWSE_PATHS_V2_ENABLEDtrueEnable browse paths V2 updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_BROWSE_PATHS_V2_BATCH_SIZE5000Browse paths V2 batch sizeSystem Update
REPROCESS_DEFAULT_BROWSE_PATHS_V2falseReprocess default browse paths V2System Update

Ingestion Indices Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_ENABLEDtrueEnable ingestion indices updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_BATCH_SIZE5000Ingestion indices batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_DELAY_MS1000Ingestion indices delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_CLL_LIMIT0Ingestion indices CLL limitSystem Update

Policy Fields Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_ENABLEDtrueEnable policy fields updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_BATCH_SIZE5000Policy fields batch sizeSystem Update
REPROCESS_DEFAULT_POLICY_FIELDSfalseReprocess default policy fieldsSystem Update

Ownership Types Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_OWNERSHIP_TYPES_ENABLEDtrueEnable ownership types updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_OWNERSHIP_TYPES_BATCH_SIZE1000Ownership types batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_OWNERSHIP_TYPES_REPROCESSfalseReprocess ownership typesSystem Update

Schema Fields Configuration

Environment VariableDefaultDescriptionComponents
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_ENABLEDfalseEnable schema fields from schema metadataSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_BATCH_SIZE500Schema fields from schema metadata batch sizeSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_DELAY_MS1000Schema fields from schema metadata delaySystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_LIMIT0Schema fields from schema metadata limitSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_ENABLEDfalseEnable schema fields doc IDsSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_BATCH_SIZE500Schema fields doc IDs batch sizeSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_DELAY_MS5000Schema fields doc IDs delaySystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_LIMIT0Schema fields doc IDs limitSystem Update

Process Instance Configuration

Environment VariableDefaultDescriptionComponents
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_ENABLEDtrueEnable process instance has run eventsSystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_BATCH_SIZE100Process instance has run events batch sizeSystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_DELAY_MS1000Process instance has run events delaySystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_TOTAL_DAYS90Process instance has run events total daysSystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_WINDOW_DAYS1Process instance has run events window daysSystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_REPROCESSfalseReprocess process instance has run eventsSystem Update

Edge Status Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_ENABLEDfalseEnable edge status updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_BATCH_SIZE1000Edge status batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_DELAY_MS5000Edge status delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_LIMIT0Edge status limitSystem Update

Property Definitions Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_ENABLEDtrueEnable property definitions updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_BATCH_SIZE500Property definitions batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_DELAY_MS1000Property definitions delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_CLL_LIMIT0Property definitions CLL limitSystem Update

Remove Query Edges Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_REMOVE_QUERY_EDGES_ENABLEDtrueEnable remove query edgesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_REMOVE_QUERY_EDGES_RETRIES20Remove query edges retriesSystem Update

Additional Environment Variables

The following environment variables are used in the codebase but may not be explicitly defined in the application.yaml file:

Ingestion and Processing

Environment VariableDefaultDescriptionComponents
ASYNC_INGEST_DEFAULTfalseAsynchronously process ingestProposals by writing to KafkaGMS
STRICT_URN_VALIDATION_ENABLEDfalseEnable stricter URN validation logicGMS
DATAHUB_DATASET_URN_TO_LOWERnullConvert dataset URN names to lowercaseGMS
BUSINESS_ATTRIBUTE_ENTITY_ENABLEDfalseEnable business attribute entity featureGMS

REST and Servlet Configuration

Environment VariableDefaultDescriptionComponents
RESTLI_SERVLET_THREADSnullNumber of threads for REST servletGMS, MCE Consumer
RESTLI_TIMEOUT_SECONDS60REST timeout in secondsGMS, MCE Consumer

System and Version Information

Environment VariableDefaultDescriptionComponents
DATAHUB_GMS_PROTOCOLhttpGMS protocol (http/https)GMS

Upgrade and Migration

Environment VariableDefaultDescriptionComponents
SKIP_REINDEX_EDGE_STATUSfalseSkip reindexing edge statusSystem Update
SKIP_REINDEX_DATA_JOB_INPUT_OUTPUTfalseSkip reindexing data job input/outputSystem Update
SKIP_GENERATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATAfalseSkip generating schema fields from schema metadataSystem Update
SKIP_MIGRATE_SCHEMA_FIELDS_DOC_IDfalseSkip migrating schema fields doc IDsSystem Update
BACKFILL_BROWSE_PATHS_V2falseEnable backfilling browse paths V2System Update
READER_POOL_SIZEnullReader pool size for restore operationsSystem Update
WRITER_POOL_SIZEnullWriter pool size for restore operationsSystem Update

OpenTelemetry Configuration

Environment VariableDefaultDescriptionComponents
OTEL_METRICS_EXPORTERnoneOpenTelemetry metrics exporterGMS, MAE Consumer, MCE Consumer, PE Consumer
OTEL_TRACES_EXPORTERnoneOpenTelemetry traces exporterGMS, MAE Consumer, MCE Consumer, PE Consumer
OTEL_LOGS_EXPORTERnoneOpenTelemetry logs exporterGMS, MAE Consumer, MCE Consumer, PE Consumer
OTEL_PROPAGATORSnullOpenTelemetry propagatorsGMS, MAE Consumer, MCE Consumer, PE Consumer

Secret Service Configuration

Environment VariableDefaultDescriptionComponents
SECRET_SERVICE_ENCRYPTION_KEYENCRYPTION_KEYSecret service encryption keyGMS
SECRET_SERVICE_V1_ALGORITHM_ENABLEDtrueEnable v1 algorithm for secret serviceGMS

Health Check Configuration

Environment VariableDefaultDescriptionComponents
HEALTH_CHECK_CACHE_DURATION_SECONDS5Health check cache durationGMS

Metadata Tests Configuration

Environment VariableDefaultDescriptionComponents
METADATA_TESTS_ENABLEDfalseEnable metadata testsGMS

Hooks Configuration

Environment VariableDefaultDescriptionComponents
ENABLE_SIBLING_HOOKtrueEnable automatic sibling associationsGMS, MAE Consumer
SIBLINGS_HOOK_CONSUMER_GROUP_SUFFIX``Siblings hook consumer group suffixGMS, MAE Consumer
ENABLE_UPDATE_INDICES_HOOKtrueEnable update indices hookGMS, MAE Consumer
UPDATE_INDICES_CONSUMER_GROUP_SUFFIX``Update indices consumer group suffixGMS, MAE Consumer
ENABLE_INGESTION_SCHEDULER_HOOKtrueEnable ingestion schedulingGMS, MAE Consumer
INGESTION_SCHEDULER_HOOK_CONSUMER_GROUP_SUFFIX``Ingestion scheduler hook consumer group suffixGMS, MAE Consumer
ENABLE_INCIDENTS_HOOKtrueEnable incidents hookGMS, MAE Consumer
MAX_INCIDENT_HISTORY100Maximum incident historyGMS, MAE Consumer
INCIDENTS_HOOK_CONSUMER_GROUP_SUFFIX``Incidents hook consumer group suffixGMS, MAE Consumer
ENABLE_STRUCTURED_PROPERTIES_HOOKtrueEnable structured properties mappingsGMS, MAE Consumer
ENABLE_STRUCTURED_PROPERTIES_WRITEtrueEnable writing structured property valuesGMS, MAE Consumer
ENABLE_STRUCTURED_PROPERTIES_SYSTEM_UPDATEfalseEnable structured property mappings in system updateGMS, MAE Consumer
ENABLE_ENTITY_CHANGE_EVENTS_HOOKtrueEnable entity change events hookGMS, MAE Consumer
ECE_CONSUMER_GROUP_SUFFIX``Entity change events consumer group suffixGMS, MAE Consumer
ECE_ENTITY_EXCLUSIONSschemaFieldEntities to exclude from ECE hookGMS, MAE Consumer
FORMS_HOOK_ENABLEDtrueEnable forms hookGMS, MAE Consumer
FORMS_HOOK_CONSUMER_GROUP_SUFFIX``Forms hook consumer group suffixGMS, MAE Consumer

Search and API Configuration

Environment VariableDefaultDescriptionComponents
SEARCH_BAR_API_VARIANTAUTOCOMPLETE_FOR_MULTIPLESearch bar API variantFrontend
FIRST_IN_PERSONAL_SIDEBARYOUR_ASSETSFirst item in personal sidebarFrontend

Client Configuration

Environment VariableDefaultDescriptionComponents
ENTITY_CLIENT_RETRY_INTERVAL2Entity client retry intervalGMS
ENTITY_CLIENT_NUM_RETRIES3Entity client number of retriesGMS
ENTITY_CLIENT_JAVA_GET_BATCH_SIZE375Entity client Java get batch sizeGMS
ENTITY_CLIENT_JAVA_INGEST_BATCH_SIZE375Entity client Java ingest batch sizeGMS
ENTITY_CLIENT_RESTLI_GET_BATCH_SIZE100Entity client RESTli get batch sizeGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_GET_BATCH_CONCURRENCY2Entity client RESTli get batch concurrencyGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_GET_BATCH_QUEUE_SIZE500Entity client RESTli get batch queue sizeGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_GET_BATCH_THREAD_KEEP_ALIVE60Entity client RESTli get batch thread keep aliveGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_SIZE50Entity client RESTli ingest batch sizeGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_CONCURRENCY2Entity client RESTli ingest batch concurrencyGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_QUEUE_SIZE500Entity client RESTli ingest batch queue sizeGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_THREAD_KEEP_ALIVE60Entity client RESTli ingest batch thread keep aliveGMS, MAE Consumer, PE Consumer
USAGE_CLIENT_RETRY_INTERVAL2Usage client retry intervalGMS, MAE Consumer, PE Consumer
USAGE_CLIENT_NUM_RETRIES0Usage client number of retriesGMS, MAE Consumer, PE Consumer
USAGE_CLIENT_TIMEOUT_MS3000Usage client timeout in millisecondsGMS, MAE Consumer, PE Consumer

Cache Configuration

Environment VariableDefaultDescriptionComponents
CACHE_TTL_SECONDS600Default cache time to liveGMS
CACHE_MAX_SIZE10000Maximum number of items to cacheGMS
CACHE_ENTITY_COUNTS_TTL_SECONDS600Homepage entity count time to liveGMS
CACHE_SEARCH_LINEAGE_TTL_SECONDS86400Search lineage cache time to liveGMS
CACHE_SEARCH_LINEAGE_LIGHTNING_THRESHOLD300Lineage graphs exceeding this limit will use local cacheGMS
CACHE_CLIENT_USAGE_CLIENT_ENABLEDtrueEnable usage client cacheGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_STATS_ENABLEDtrueEnable usage client cache statsGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_STATS_INTERVAL_SECONDS120Usage client cache stats intervalGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_TTL_SECONDS86400Usage client cache TTLGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_MAX_BYTES52428800Usage client cache max bytes (50MB)GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_ENABLEDtrueEnable entity client cacheGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_STATS_ENABLEDtrueEnable entity client cache statsGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_STATS_INTERVAL_SECONDS120Entity client cache stats intervalGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_TTL_SECONDS0Entity client cache TTL (0 = no cache)GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_MAX_BYTES104857600Entity client cache max bytes (100MB)GMS, MAE Consumer, PE Consumer

GraphQL Configuration

Environment VariableDefaultDescriptionComponents
GRAPHQL_CONCURRENCY_SEPARATE_THREAD_POOLfalseEnable separate thread pool for GraphQLGMS
GRAPHQL_CONCURRENCY_STACK_SIZE256000GraphQL thread pool stack sizeGMS
GRAPHQL_CONCURRENCY_CORE_POOL_SIZE-1GraphQL core pool size (default 5 * cores)GMS
GRAPHQL_CONCURRENCY_MAX_POOL_SIZE-1GraphQL max pool size (default 100 * cores)GMS
GRAPHQL_CONCURRENCY_KEEP_ALIVE60GraphQL thread keep alive timeGMS
GRAPHQL_QUERY_COMPLEXITY_LIMIT2000GraphQL query complexity limitGMS
GRAPHQL_QUERY_DEPTH_LIMIT50GraphQL query depth limitGMS
GRAPHQL_QUERY_INTROSPECTION_ENABLEDtrueEnable GraphQL introspectionGMS
GRAPHQL_METRICS_ENABLEDtrueEnable GraphQL metrics collectionGMS
GRAPHQL_PERCENTILES0.5,0.75,0.95,0.98,0.99,0.999GraphQL percentilesGMS
GRAPHQL_METRICS_FIELD_LEVEL_ENABLEDfalseEnable field-level GraphQL metricsGMS
GRAPHQL_METRICS_FIELD_LEVEL_OPERATIONSgetSearchResultsForMultiple,searchAcrossLineageStructureGraphQL field-level operationsGMS
GRAPHQL_METRICS_FIELD_LEVEL_PATH_ENABLEDfalseInclude field path in GraphQL metricsGMS
GRAPHQL_METRICS_FIELD_LEVEL_PATHS``GraphQL field-level pathsGMS
GRAPHQL_METRICS_TRIVIAL_DATA_FETCHERS_ENABLEDfalseInclude trivial data fetchers in GraphQL metricsGMS

Chrome Extension Configuration

Environment VariableDefaultDescriptionComponents
CHROME_EXTENSION_ENABLEDtrueEnable Chrome extensionFrontend
CHROME_EXTENSION_LINEAGE_ENABLEDtrueEnable Chrome extension lineageFrontend

Business Attribute Configuration

Environment VariableDefaultDescriptionComponents
BUSINESS_ATTRIBUTE_RELATED_ENTITIES_COUNT20000Business attribute related entities countGMS
BUSINESS_ATTRIBUTE_RELATED_ENTITIES_BATCH_SIZE1000Business attribute related entities batch sizeGMS
BUSINESS_ATTRIBUTE_PROPAGATION_CONCURRENCY_THREAD_COUNT-1Business attribute propagation thread count (default 2 * cores)GMS
BUSINESS_ATTRIBUTE_PROPAGATION_CONCURRENCY_KEEP_ALIVE60Business attribute propagation keep alive timeGMS

Metadata Change Proposal Configuration

Environment VariableDefaultDescriptionComponents
MCP_CONSUMER_BATCH_ENABLEDfalseEnable MCP consumer batch processingGMS, MCE Consumer
MCP_CONSUMER_BATCH_SIZE15744000MCP consumer batch sizeGMS, MCE Consumer
MCP_VALIDATION_IGNORE_UNKNOWNtrueIgnore unknown fields in MCP validationGMS, MCE Consumer
MCP_VALIDATION_PRIVILEGE_CONSTRAINTStrueEnable privilege constraints in MCP validationGMS, MCE Consumer
MCP_VALIDATION_EXTENSIONS_ENABLEDfalseEnable extensions in MCP validationGMS, MCE Consumer
MCP_SIDE_EFFECTS_SCHEMA_FIELD_ENABLEDfalseEnable schema field side effectsGMS, MCE Consumer
MCP_SIDE_EFFECTS_DATA_PRODUCT_UNSET_ENABLEDtrueEnable data product unset side effectsGMS, MCE Consumer
MCP_THROTTLE_UPDATE_INTERVAL_MS60000MCP throttle update intervalGMS, MCE Consumer
MCP_MCE_CONSUMER_THROTTLE_ENABLEDfalseEnable MCE consumer throttlingGMS, MCE Consumer
MCP_API_REQUESTS_THROTTLE_ENABLEDfalseEnable API requests throttlingGMS, MCE Consumer
MCP_VERSIONED_THROTTLE_ENABLEDfalseEnable versioned MCL topic throttlingGMS, MCE Consumer
MCP_VERSIONED_THRESHOLD4000Versioned throttle thresholdGMS, MCE Consumer
MCP_VERSIONED_MAX_ATTEMPTS1000Versioned max attemptsGMS, MCE Consumer
MCP_VERSIONED_INITIAL_INTERVAL_MS100Versioned initial intervalGMS, MCE Consumer
MCP_VERSIONED_MULTIPLIER10Versioned multiplierGMS, MCE Consumer
MCP_VERSIONED_MAX_INTERVAL_MS30000Versioned max intervalGMS, MCE Consumer
MCP_TIMESERIES_THROTTLE_ENABLEDfalseEnable timeseries MCL topic throttlingGMS, MCE Consumer
MCP_TIMESERIES_THRESHOLD4000Timeseries throttle thresholdGMS, MCE Consumer
MCP_TIMESERIES_MAX_ATTEMPTS1000Timeseries max attemptsGMS, MCE Consumer
MCP_TIMESERIES_INITIAL_INTERVAL_MS100Timeseries initial intervalGMS, MCE Consumer
MCP_TIMESERIES_MULTIPLIER10Timeseries multiplierGMS, MCE Consumer
MCP_TIMESERIES_MAX_INTERVAL_MS30000Timeseries max intervalGMS, MCE Consumer

Events API Configuration

Environment VariableDefaultDescriptionComponents
EVENTS_API_ENABLEDtrueEnable events APIGMS

Iceberg Catalog Configuration

Environment VariableDefaultDescriptionComponents
ENABLE_PUBLIC_READfalseEnable public read for Iceberg catalogGMS
PUBLICLY_READABLE_TAGPUBLICLY_READABLEPublicly readable tag for Iceberg catalogGMS

Component Configuration

VariableDefaultDescriptionComponents
MCP_CONSUMER_ENABLEDtrueWhen running in standalone mode, disabled on GMS and enable on separate MCE Consumer.GMS, MCE Consumer
MCL_CONSUMER_ENABLEDtrueWhen running in standalone mode, disabled on GMS and enable on separate MAE Consumer.GMS, MAE Consumer
PE_CONSUMER_ENABLEDtrueWhen running in standalone mode, disabled on GMS and enable on separate MAE Consumer.GMS, PE Consumer

DataHub Frontend

Play Framework Configuration

Secret Key Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_SECRETnullSecret key used to secure cryptographic functionsFrontend

HTTP Parser Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_PLAY_MEM_BUFFER_SIZE10MBMaximum memory buffer size for HTTP parserFrontend

Server Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_AKKA_MAX_HEADER_COUNT64Maximum number of headers allowedFrontend
DATAHUB_AKKA_MAX_HEADER_VALUE_LENGTH32kMaximum header value lengthFrontend

Session Configuration

Environment VariableDefaultDescriptionComponents
AUTH_COOKIE_SAME_SITELAXSameSite attribute for authentication cookiesFrontend
AUTH_COOKIE_SECUREfalseWhether authentication cookies should be secureFrontend

Authentication Configuration

OIDC Configuration

Reference Links:

Required OIDC Configuration

Environment VariableDefaultDescriptionComponents
AUTH_OIDC_ENABLEDfalseEnable OIDC authenticationFrontend
AUTH_OIDC_CLIENT_IDnullUnique client ID issued by the identity providerFrontend
AUTH_OIDC_CLIENT_SECRETnullUnique client secret issued by the identity providerFrontend
AUTH_OIDC_DISCOVERY_URInullThe IdP OIDC discovery URLFrontend
AUTH_OIDC_BASE_URLnullThe base URL associated with your DataHub deploymentFrontend

Optional OIDC Configuration

Environment VariableDefaultDescriptionComponents
AUTH_OIDC_USER_NAME_CLAIMpreferred_usernameThe attribute/claim used to derive the DataHub usernameFrontend
AUTH_OIDC_USER_NAME_CLAIM_REGEX(.*)The regex used to parse the DataHub username from the user name claimFrontend
AUTH_OIDC_SCOPEoidc email profileString representing the requested scope from the IdPFrontend
AUTH_OIDC_CLIENT_AUTHENTICATION_METHODclient_secret_basicAuthentication method to pass credentials to token endpointFrontend
AUTH_OIDC_JIT_PROVISIONING_ENABLEDtrueWhether DataHub users should be provisioned on login if they don't existFrontend
AUTH_OIDC_PRE_PROVISIONING_REQUIREDfalseWhether the user should already exist in DataHub on loginFrontend
AUTH_OIDC_EXTRACT_GROUPS_ENABLEDtrueWhether groups should be extracted from a claim in the OIDC profileFrontend
AUTH_OIDC_GROUPS_CLAIMgroupsThe OIDC claim to extract groups information fromFrontend
AUTH_OIDC_RESPONSE_TYPEnullOIDC response typeFrontend
AUTH_OIDC_RESPONSE_MODEnullOIDC response modeFrontend
AUTH_OIDC_USE_NONCEnullWhether to use nonce in OIDC flowFrontend
AUTH_OIDC_CUSTOM_PARAM_RESOURCEnullCustom resource parameter for OIDCFrontend
AUTH_OIDC_READ_TIMEOUTnullOIDC read timeoutFrontend
AUTH_OIDC_CONNECT_TIMEOUTnullOIDC connect timeoutFrontend
AUTH_OIDC_EXTRACT_JWT_ACCESS_TOKEN_CLAIMSfalseWhether to extract claims from JWT access tokenFrontend
AUTH_OIDC_PREFERRED_JWS_ALGORITHMnullWhich JWS algorithm to useFrontend
AUTH_OIDC_ACR_VALUESnullOIDC ACR valuesFrontend
AUTH_OIDC_GRANT_TYPEnullOIDC grant typeFrontend

Authentication Methods Configuration

Environment VariableDefaultDescriptionComponents
AUTH_JAAS_ENABLEDtrueEnable JAAS authenticationFrontend
AUTH_NATIVE_ENABLEDtrueEnable native authenticationFrontend
GUEST_AUTHENTICATION_ENABLEDfalseEnable guest authenticationFrontend
GUEST_AUTHENTICATION_USERguestThe name of the guest user IDFrontend
GUEST_AUTHENTICATION_PATHnullThe path to bypass login page and get logged in as guestFrontend
ENFORCE_VALID_EMAILtrueEnforce the usage of a valid email for user sign upFrontend

Authentication Logging

Environment VariableDefaultDescriptionComponents
AUTH_VERBOSE_LOGGINGfalseEnable verbose authentication loggingFrontend

Session Configuration

Environment VariableDefaultDescriptionComponents
AUTH_SESSION_TTL_HOURS24Login session expiration time in hoursFrontend
MAX_SESSION_TOKEN_AGE24hMaximum age of session tokenFrontend

Metadata Service Configuration

Connection Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_GMS_HOSTlocalhostMetadata service hostFrontend
DATAHUB_GMS_PORT8080Metadata service portFrontend
DATAHUB_GMS_USE_SSLfalseWhether to use SSL for metadata service connectionFrontend

Authentication Configuration

Environment VariableDefaultDescriptionComponents
METADATA_SERVICE_AUTH_ENABLEDfalseEnable metadata service authenticationFrontend
DATAHUB_SYSTEM_CLIENT_SECRETJohnSnowKnowsNothingSystem client secret for metadata serviceFrontend

Entity Client Configuration

Environment VariableDefaultDescriptionComponents
ENTITY_CLIENT_RETRY_INTERVAL2Entity client retry intervalFrontend
ENTITY_CLIENT_NUM_RETRIES3Entity client number of retriesFrontend
ENTITY_CLIENT_RESTLI_GET_BATCH_SIZE50Entity client RESTli get batch sizeFrontend
ENTITY_CLIENT_RESTLI_GET_BATCH_CONCURRENCY2Entity client RESTli get batch concurrencyFrontend

Notes

  • Environment variables follow the pattern of converting YAML property paths to uppercase with underscores
  • Default values are shown in the table above
  • For Kafka configuration, refer to the official Spring Kafka documentation for additional properties
  • Feature flags control experimental or optional functionality
  • System update configurations control various background maintenance tasks
  • Cache configurations help optimize performance for different use cases
  • GraphQL configurations control query complexity and performance monitoring
  • OpenTelemetry variables control observability and tracing behavior
  • Play Framework properties are converted to environment variables by:
    • Converting dots (.) to underscores (_)
    • Converting to uppercase