Spring JPA & Hibernate: Importing data before your beans are initialized

Published by Bilal Kaun on

The problem

At times, it may make sense to load system settings for an application from the database at ‘start time’. Sometimes this means attempting to query for such data early in a bean’s lifecycle, such as at construction time or slightly afterwards (e.g. ‘afterPropertiesSet()’ or init-method/@PostConstruct). This may work if the application is backed by a dedicated database that is ever present for the application lifecycle, or at least is started before the application (container dependencies, etc).

When it comes to integration testing, however, trying to ensure the data exist in a table prior to the bean instantiation can be tricky, especially if the test spins up an in-memory db and populates the schema objects from entities.

Launch Sequence

If you’re trying to use Spring Boot JPA and Hibernate to create/update/validate the schema, especially during testing with H2 and the likes, and as the problem definition states above, need to inject your data prior to your bean being ready, there couple of routes you can take. First, we need to see how the relevant bits of launch sequence looks like.

The components

There are several components that, in concert, launch your application. Keeping in line with Spring’s idioms, naturally they’re in the form of Beans.

  • Datasource – the subsystem that provisions connections to, and indeed, creates instance of in-memory db
  • EntityManagerFactory – the jakarta defined subsystem that eventually kicks off Hibernate Schema Creation
  • Database Initializers – the spring circuitry responsible for populating the database (dml and ddl)
  • Business-Beans‘ – My own jargon to indicate any non-lazy-init singleton bean in your project that is dependent on database having data at creation

Initialization order

  1. Datasource bean (e.g EmbeddedDatabaseFactoryBean) instantiated
  2. Database Initializers
    • Auto-Configurer instantiates and schedules this bean after data.sql’s presence is detected (see DataSourceInitializationConfiguration, DataSourceScriptDatabaseInitializer and ResourceDatabasePopulator)
  3. EntityManagerFactory bean init
    • dependent on Datasource, thus ordered to be init after it
    • by default, dependent on Database Initializers (see: DatabaseInitializationDependencyConfigurer, JpaDependsOnDatabaseInitializationDetector)
  4. Remaining non-lazy Bean Instantiation

Consider that the default order is problematic since we’re relying on Hibernate to populate the schema objects from annotations, but that can only happen on step #3. Meanwhile our sql data script is scheduled to run on step #2. Resulting in a data.sql attempting to insert records into tables that do not yet exist, experiencing a ScriptStatementFailedException error, which ultimately prevents the framework from launching by throwing an IllegalStateException error.

Solutions

Naturally, the obvious solution is to move #2, the db init stage, to after Hibernate has complete schema creation.

Defer Initialization Method

Spring has a provision for situations just like this, an aptly named property that prevents the auto-configurer from binding db-init as a dependency of EntityManagerFactory bean: spring.jpa.defer-datasource-initialization

Setting this property to true will sever the virtual dependence and allow the bean-factory to initialize the database initializers at a later time. However, this introduces a new problem. The order of bean instantiation between the “business-beans” and db-init is undefined and more often than not the db-initializers are beanified after business-beans. Now when your on-construction logic queries for data, it won’t find it. Once again leading to a situation where your error-handling routine is taking over during bean construction, potentially crashing the app at launch – again.

One way to rectify this situation is to simulate what the auto-configurer was doing with EntityManagerFactory and force some of your beans that require data-on-construction to be created after the initializer has run. This can be done by adding the following to your components @DependsOn("dataSourceScriptDatabaseInitializer")
In fact you can even annotate the db Service component to depend on the initializer to ensure more contained code change. You must take care, however, to ensure your components are not class annotated (@Service/@Component annotation) but rather via the @Bean method so you can exclude the @DependsOn annotation outside of testing environment.

Spring offers another, more purpose-built annotation to do the same thing: @DependsOnDatabaseInitialization. This annotation can be used exactly the same way as the DependsOn above yet provides the benefit of not knowing internal implementation details (takes no parameter unlike the @DependsOn requiring to know the bean-name).

One thing to be mindful of is that Spring JPA database initializers only work with in-memory database unless the setting spring.sql.init.mode=always is set.

Hibernate / Spring Boot Hybrid Method

Another way you can choose to go is to forego the database-initializers from Spring JPA and rely on Hibernate for data population, in addition to the schema population.

Hibernate’s Schema Creation Utility (see SchemaCreatorImpl) runs to create the schema and load any user-defined data, but only if the property hibernate.hbm2ddl.auto is set as either: create, create-only, create-drop. The creator searches for import.sql at the base of the classpath (e.g src/resources), if found, runs the script. This can be overridden by defining the property in your environment (properties, etc): spring.jpa.properties.hibernate.hbm2ddl.import_files = /baseline_dml.sql
There two other properties that can define any import scripts that should execute around the same time:

  • javax.persistence.sql-load-script-source
  • jakarta.persistence.sql-load-script-source

Note that these properties are considered the ‘same’ by the hibernate importer (it checks the first one and then the second one, taking the first non-null value).

Note, Spring devs do not recommend mixing initialization technologies, but this is a reasonable methodology to pursue. By renaming your data.sql as import.sql, you prevent Spring DB-initializers from running and ensure Hibernate populates your tables immediately after populating your schema objects.

One thing to be mindful of is that Spring’s db-initialization implementation is more flexible; allowing developers to target different profiles and database vendors, whereas Hibernate’s import.sql assumes a fixed approach (same file for any profile and any vendor). However this is not an issue if all your testing occurs inside an in-memory db.

Categories: JavaSpring