This little blogpost is to describe my current design for Batch processing in my YPPO
9 project. The image comes straight from
Spring Batch.
As no doubt you are aware, Batch Processing in Java has been standardized in JEE 7
2. It is based on already existing Batch implementations, for example, Spring Batch
7.
The reason I am interested in it, is that I wanted to initially solve my problem using a JMS queue, but after getting the warning that I'm creating too many messages in one transaction, I decided to give batch processing a try instead. Besides, it felt a little (
oh, all right, a lot!) like I was abusing the JMS for Batch processing in the first place.
Definitions:
- Job
- encapsulates the entire batch process. So, in our case, we have two jobs to define for YPPO. Job "Import new Photographs" and job "Verify existing Photographs".
- Step
- a domain object that encapsulates an independent, sequential phase of the job. There are two kinds, Chunk-style and Batchlet-style. We can suffice here with the standard, Chunk-style. We only need one step for each job. We are keeping it simple as that is all that is needed.
- chunk-style step "Import Photograph" for job "Import new Photographs"
- chunk style step "Verify Photograph" for job "Verify existing Photographs"
- JobOperator
- operates on jobs, starts them, stops them, retrieves required steps etc.
- JobRepository
- contains history, jobs running, job that have run, etc.
I shall be focussing on the "Verify existing Photographs" job for the remainder of the blogpost.
Batch Artifacts
The Batch Artifacts are injected as Beans into the job system using CDI. Thusly the artifacts used are Beans identified with @Named annotations.
In the package gallery.jobs.verify above, the classes Listener, Reader, Processor, Writer are Beans annotated with @Named with names verifyPhotographListener, verifyPhotographReader, verifyPhotographProcessor and verifyPhotographWriter respectively. These names will be used later in defining the job.
In my example, I also wished that if an actual error occurred, the item to be processed would be skipped and a log error written to the database, so that it can be checked and corrected afterwards by the user. So, errors on individual items are no excuse for stopping the job.
In the new JEE 7, there's a setting in the empty beans.xml file called "
bean-discovery-mode", which is required. If set to "all", it checks all classes for CDI. If set to "annotated", it checks only those classes that have annotations. I've set it to "annotated". However, I've tried setting the @Named annotation on the classes I required, but my classes weren't injected in the Batch jobs. An additional annotation for the scope (@ApplicationScope) is required, before the classes are picked up.
Below is a sequence diagram of a job.
Job Specification Language
JSL or Job Specification Language is the xml format used to define jobs. It is grammatically defined in [5].
In the picture on the left, you can see where in my project structure in NetBeans the Job definition files are located. In this location, they are automatically picked up by the Application Server.
For YPPO you can find the files
AddPhotographs.xml and
VerifyPhotographs.xml as visible below on my git repository
YourPersonalPhotographOrganiser.
A job can currently not be restarted. My jobs are not configured for restarting (
restartable="false" as you can see). I'll devote a future blog to it once I figure out how to do that.
In my example, I have a Batch job chunk-size of 4. (
item-count="4")
Listeners should be defined in the step, prior to defining other things, like for example a chunk.
Starting a Batch Job
public void verifyPhotographs(Location location)
{
JobOperator operator = BatchRuntime.getJobOperator();
Properties jobParameters = new Properties();
jobParameters.setProperty("location", location.getId() + "");
operator.start("VerifyPhotographs", jobParameters);
}
Retrieving Details About Batch Jobs
Notes
It seems NetBeans has no Wizards nor GUIs and Widgets to automatically generate Batch Jobs scaffolding, but it is on the roadmap.
There are lots of tutorials on Batch Jobs in JEE 7, however, a lot of them were written in the first half of 2013 and they have in the mean time changed the definition of Batch Artifacts from Annotations to a simple set of Interfaces your Batch Artifacts need to implement.
1 3
[4] gives a good general view of Batch Applications in Java 7. The JEE 7 Tutorial
8 is also very good.
References
- [1] Batch Applications in Java EE
- https://blogs.oracle.com/arungupta/entry/batch_applications_in_java_ee
- [2] JSR 352 - Java Batch
- https://java.net/projects/jbatch
- [3] Batch Applications for Java Processing - SlideShare
- http://www.slideshare.net/arungupta1/jbatch-21153200
- [4] Java EE 7 Introduction to Batch JSR 352
- http://jaxenter.com/java-ee-7-introduction-to-batch-jsr-352-47400.html
- [5] Batch Applications for the Java Platform - Version 1.0 Final Release
- JSR-352-1.0-Final-Release.pdf
- [6] JavaDoc - JEE 7
- http://docs.oracle.com/javaee/7/api/
- [7] Spring Batch in Action
- Arnaud Cogluègnes, Thierry Templier, Gary Gregory, Olivier Bazoud
- [8] JEE 7 Tutorial - Batch Processing
- http://docs.oracle.com/javaee/7/tutorial/doc/batch-processing.htm
- [9] GitHub - YourPersonalPhotographOrganiser (YPPO)
- https://github.com/maartenl/YourPersonalPhotographOrganiser