Purpose
The purpose of this post is to explain how to monitor Database Performance with Prometheus and Grafana.
Connection Pooling & Thread Pools
The docs of doobie, the Functional Programming JDBC Access Library used nearly ubiquitously at Banno, offers a helpful explanation of
Connection Pooling and Thread Pools: https://tpolecat.github.io/doobie/docs/14-Managing-Connections.html#about-transactors.

Collecting Metrics via Prometheus
Code
def createDeviceRegistrationRepositoryConfig[F[_]: ContextShift](
...,
cr: CollectorRegistry,
...
hikariConfig <- Stream.eval(
F.delay {
val c = new HikariConfig()
c.setDriverClassName(connection.dbDriver)
c.setJdbcUrl(connection.dbServer)
c.setUsername(creds.username)
c.setPassword(creds.password)
c.setMetricsTrackerFactory(new PrometheusMetricsTrackerFactory(cr)) // Writes metrics to the CollectorRegistry. Consequently, Prometheus can then
// scrape (collect) them for display in Grafana, alerting, etc.
c.setMinimumIdle(connection.minIdle) // Sets minimum idle connections
c.setMaximumPoolSize(connection.maximumPoolSize) // Sets max pool size
c
}
)
Collecting Metrics for > 1 Database in a Service
If you’re connecting to multiple different DBs from a single service, you’ll want to share a single PrometheusMetricsTrackerFactory instance between the two connection pools. Otherwise, you’ll likely run into exceptions in initializing the underlying metrics, and end up only recording metrics for one of the pools.
Full source - https://github.com/Banno/device-service/blob/1.180.0/src/main/scala/com/banno/device/Config.scala#L87-L137
Minimum Idle and Max Pool Size
https://github.com/brettwooldridge/HikariCP, the Java Connection Pooler that powers doobie’s Connection Pooler, notes:
Watch a video from the Oracle Real-world Performance group, and learn about why connection pools do not need to be sized as large as they often are. In fact, oversized connection pools have a clear and demonstrable negative impact on performance
Please watch that video for motivation and experimentation that shows the benefits of using a small pool size.
Hikari Prometheus Metrics
HikariCPCollector.java documents the following Gauges:
hikaricp_active_connections- Active connections, i.e. current number of active (in-use) connections in the poolhikaricp_idle_connections- Idle connectionshikaricp_pending_threads- Pending threads, i.e. awaiting connections from the pool.hikaricp_connections- The number of current connectionshikaricp_max_connections- Max connectionshikaricp_min_connections- Min connections
Note that hikaricp_max_connections and hikaricp_min_connections are static values, i.e. they’re set when
constructing the doobie Connection Manager (see above code).
There are also a few Summary metrics:
hikaricp_connection_acquired_nanos- Connection acquired time (ns), i.e. the amount of time threads calling.getConnection()spend waiting for a connection from the pool.hikaricp_connection_usage_millis- Connection usage (ms), i.e. the amount of time connections are used before being returned to the pool. Also known as the “out of pool” or “in-use” time.hikaricp_connection_creation_millis- Connection creation (ms), i.e. the amount of time required to create a new connection to the database.
Visualizing Metrics in Grafana
Monitoring Database Performance is a template Grafana Dashboard.
Note that the device-service application is selected since it has 10 instances and connects to a database.
It includes nine different graphs.
HikariCP Max Pool Size (Set at App Startup)max by (pool)(hikaricp_max_connections{app=~"$service"})- shows the max pool size, which is static, i.e. set when the app starts up.
HikariCP Min Pool Size (Set at App Startup)max by (pool)(hikaricp_min_connections{app=~"$service"})- shows the min pool size, which is static, i.e. set when the app starts up.
HikariCP Threads Awaiting Connectionsum by (instance, pool)(hikaricp_pending_threads{app=~"$service"})- A spike in this value could indicate increased database contention, i.e. multiple threads are trying to get a connection to the DB.
HikariCP Active Database Connectionssum by (instance, pool)(hikaricp_active_connections{app=~"$service"})
HikariCP Idle Connectionssum by (instance, pool)(hikaricp_idle_connections{app=~"$service"})
HikariCP Total Connectionssum by (instance, pool)(hikaricp_connections{app=~"$service"})
Max p99 Connection acquired time (ns)max(hikaricp_connection_acquired_nanos{app=~"$service", quantile="0.99"})
Max p99 Connection usage (ms)max(hikaricp_connection_usage_millis{app=~"$service", quantile="0.99"})
Max p99 Connection creation (ms)max(hikaricp_connection_creation_millis{app=~"$service", quantile="0.99"})
Note that sum is used as it will take the sum of the current value of each gauge’s value.
banno_all
Multiple applications, such as MDS, consumer-enrollment, banno-users, etc, connect to the banno_all database. Folks from Team Infrastructure and
Data Services have improved overall performance based on experience and experimentation.
For questions involving DB performance of banno_all, please consider reaching out in the #temp-mds-pgbouncer-commits Slack channel.
Conclusions
doobie and HikariCP make it simple to collect Prometheus Metrics. Strongly consider the advice of Oracle (per the HikariCP link)
and set a small pool size. Lastly, closely monitor hikaricp_pending_threads since, in the past, that number has spiked for a prolonged amount of time
during Espresso incidents.