Getting Started with Spring Boot Batch Processing
Spring Batch is a lightweight, open-source framework created to develop scalable batch processing applications. Batch processing is mostly used by applications that process a large quantity of data at a given time. For example, payroll systems use batch processing to send out payments to employees at a given time of the month.
<!--more-->
Spring Batch does not include an in-built scheduling framework. It can be used with Quartz
or Control-M
scheduling frameworks to process data at a scheduled time.
In this tutorial, we will be developing a Spring Boot application that reads data from a CSV file and stores it in an SQL database (H2 database).
Table of contents
- Prerequisites
- Application setup
- Data layer
- Repository layer
- Processor
- Configuration layer
- Controller layer
- Application Configuration
- Testing
- Conclusion
Prerequisites
- Java Development kit (JDK) installed on your computer.
- Some knowledge in Spring Boot.
Application setup
- On your browser, navigate to spring intializr.
- Set the project name to
springbatch
. - Add
lombok
,spring web
,h2 database
,spring data jpa
, andspring batch
as the project dependencies. - Click on generate to download the generated project zip file.
- Decompress the downloaded file and open it on your preferred IDE.
Data layer
- Create a new package named
domain
in the root project package. - In the
domain
package created above, create a file namedCustomer
and add the code below.
@Entity(name = "person")
@Getter // Lombok annotation to generate Getters for the fields
@Setter // Lombok annotation to generate Setters for the fields
@AllArgsConstructor // Lombok annotation to generate a constructor will all of the fields in the class
@NoArgsConstructor // Lombok annotation to generate an empty constructor for the class
@EntityListeners(AuditingEntityListener.class)
public class Customer {
@Id // Sets the id field as the primary key in the database table
@Column(name = "id") // sets the column name for the id property
@GeneratedValue(strategy = GenerationType.AUTO) // States that the id field should be autogenerated
private Long id;
@Column(name = "last_name")
private String lastName;
@Column(name = "first_name")
private String firstName;
// A method that returns firstName and Lastname when an object of the class is logged
@Override
public String toString() {
return "firstName: " + firstName + ", lastName: " + lastName;
}
}
The class above has an id
field for the primary key in the database, lastName
and firstName
fields that we will be getting from the data.csv file.
Repository layer
- Create a new package named
repositories
in the root project package. - In the
repositories
package created above, create an interface namedCustomerRepository
and add the code below.
// The interface extends JpaRepository that has the CRUD operation methods
public interface CustomerRepository extends JpaRepository<Customer, Long> {
}
Processor
- Create a new package named
processor
in the root project package. - In the
processor
package, create a new Java file namedCustomerProcessor
then add the code below.
public class CustomerProcessor implements ItemProcessor<Customer, Customer> {
// Creates a logger
private static final Logger logger = LoggerFactory.getLogger(CustomerProcessor.class);
// This method transforms data form one form to another.
@Override
public Customer process(final Customer customer) throws Exception {
final String firstName = customer.getFirstName().toUpperCase();
final String lastName = customer.getLastName().toUpperCase();
// Creates a new instance of Person
final Customer transformedCustomer = new Customer(1L, firstName, lastName);
// logs the person entity to the application logs
logger.info("Converting (" + customer + ") into (" + transformedCustomer + ")");
return transformedCustomer;
}
}
The class above transforms data from one form to another. The ItemProcessor<I, O>
takes in the input data (I
), transforms it, then returns the result as the output data (O
).
In our case, we have declared the Customer
entity as both the input and output, meaning our data form is maintained.
Configuration layer
- Create a new package named
config
in the root project package. This package will contain all of our configurations. - In the
config
package, create a new Java file namedBatchConfiguration
and add the code below.
@Configuration // Informs Spring that this class contains configurations
@EnableBatchProcessing // Enables batch processing for the application
public class BatchConfiguration {
@Autowired
public JobBuilderFactory jobBuilderFactory;
@Autowired
public StepBuilderFactory stepBuilderFactory;
@Autowired
@Lazy
public CustomerRepository customerRepository;
// Reads the sample-data.csv file and creates instances of the Person entity for each person from the .csv file.
@Bean
public FlatFileItemReader<Customer> reader() {
return new FlatFileItemReaderBuilder<Customer>()
.name("customerReader")
.resource(new ClassPathResource("data.csv"))
.delimited()
.names(new String[]{"firstName", "lastName"})
.fieldSetMapper(new BeanWrapperFieldSetMapper<>() {{
setTargetType(Customer.class);
}})
.build();
}
// Creates the Writer, configuring the repository and the method that will be used to save the data into the database
@Bean
public RepositoryItemWriter<Customer> writer() {
RepositoryItemWriter<Customer> iwriter = new RepositoryItemWriter<>();
iwriter.setRepository(customerRepository);
iwriter.setMethodName("save");
return iwriter;
}
// Creates an instance of PersonProcessor that converts one data form to another. In our case the data form is maintained.
@Bean
public CustomerProcessor processor() {
return new CustomerProcessor();
}
// Batch jobs are built from steps. A step contains the reader, processor and the writer.
@Bean
public Step step1(ItemReader<Customer> itemReader, ItemWriter<Customer> itemWriter)
throws Exception {
return this.stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(5)
.reader(itemReader)
.processor(processor())
.writer(itemWriter)
.build();
}
// Executes the job, saving the data from .csv file into the database.
@Bean
public Job customerUpdateJob(JobCompletionNotificationListener listener, Step step1)
throws Exception {
return this.jobBuilderFactory.get("customerUpdateJob").incrementer(new RunIdIncrementer())
.listener(listener).start(step1).build();
}
}
- In the
config
package, create another Java class namedJobCompletionNotificationListener
and add the code below.
@Component
public class JobCompletionListener extends JobExecutionListenerSupport {
// Creates an instance of the logger
private static final Logger log = LoggerFactory.getLogger(JobCompletionListener.class);
private final CustomerRepository customerRepository;
@Autowired
public JobCompletionListener(CustomerRepository customerRepository) {
this.customerRepository = customerRepository;
}
// The callback method from the Spring Batch JobExecutionListenerSupport class that is executed when the batch process is completed
@Override
public void afterJob(JobExecution jobExecution) {
// When the batch process is completed the the users in the database are retrieved and logged on the application logs
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
log.info("!!! JOB COMPLETED! verify the results");
customerRepository.findAll()
.forEach(person -> log.info("Found (" + person + ">) in the database.") );
}
}
}
Controller layer
- Create a new package named
controllers
in the root project package. - In the
controllers
package created above, create a Java class namedBatchController
and add the code snippet below.
@RestController
@RequestMapping(path = "/batch")// Root path
public class BatchController {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job job;
// The function below accepts a GET request to invoke the Batch Process and returns a String as response with the message "Batch Process started!!".
@GetMapping(path = "/start") // Start batch process path
public ResponseEntity<String> startBatch() {
JobParameters Parameters = new JobParametersBuilder()
.addLong("startAt", System.currentTimeMillis()).toJobParameters();
try {
jobLauncher.run(job, Parameters);
} catch (JobExecutionAlreadyRunningException | JobRestartException
| JobInstanceAlreadyCompleteException | JobParametersInvalidException e) {
e.printStackTrace();
}
return new ResponseEntity<>("Batch Process started!!", HttpStatus.OK);
}
}
Application configuration
In the resource directory, add the code below in the application.properties
file.
# Sets the server port from where we can access our application
server.port=8080
# Disables our batch process from automatically running on application startup
spring.batch.job.enabled=false
Testing
Open Postman and send a GET
request to http://localhost:8080/batch/start to start the batch process.
After sending the GET
request, we can see that the batch process running from the application logs.
Conclusion
Now that you have learned how to execute batch processes, configure the application we have developed to use Spring Boot Scheduler to schedule jobs that run at a given time automatically rather than sending an HTTP call to start a job.
You can download the complete source code here.
Happy coding!
Peer Review Contributions by: Odhiambo Paul