Tuesday, 15 April 2014

"if it hurts, do it more often."

I recently came across the phrase "if it hurts, do it more often."1 on the website of Martin Fowler.

Now, this might not be appropriate to, for example, slamming your hand into a car door, but it does have its uses, for example, in sports.

Personally, I think it is one of the defining characteristics of humans versus animals, that humans can suffer through a bad cause (pain/hardship/uncomfortableness) if they know the effect later is appropriately good. In other words, humans have the ability to reason about causality3, usually with a perception of time.

Unfortunately, sometimes the animal instincts prove stronger.

Continuous Integration

The soundbite in the title comes straight out of the realm of Continuous Integration2.

I came across the sentence, as I was wrestling with a problem at work. We have several branches, which are specific to certain of our customers. An often heard complaint nowadays is that these are so called 'Long-lived branches'. Now, the more I consider it, the more I think long-lived branches are a pretty bad idea.

The complaint I hear most often is that some of our software developers are spending more time merging changes to the different branches and keeping them in sync and retesting, than actually developing new software.

I just thought I'd put down some references (see [4] and [5]) on how to get rid of long-lived branches, and keep everything in the main branch. I don't feel sanguine about convincing management, though.

References

[1] Martin Fowler - Frequency Reduces Difficulty
http://martinfowler.com/bliki/FrequencyReducesDifficulty.html
[2] Wikipedia - Continuous Integration
http://en.wikipedia.org/wiki/Continuous_integration
[3] Causality
http://en.wikipedia.org/wiki/Causality
Lean into the pain
http://www.aaronsw.com/weblog/dalio
[4] Martin Fowler - Feature Branch
http://martinfowler.com/bliki/FeatureBranch.html
[5] Martin Fowler - Branch by Abstraction
http://martinfowler.com/bliki/BranchByAbstraction.html

Tuesday, 8 April 2014

Batch Jobs in JEE 7

This little blogpost is to describe my current design for Batch processing in my YPPO9 project. The image comes straight from Spring Batch.

As no doubt you are aware, Batch Processing in Java has been standardized in JEE 72. It is based on already existing Batch implementations, for example, Spring Batch7.

The reason I am interested in it, is that I wanted to initially solve my problem using a JMS queue, but after getting the warning that I'm creating too many messages in one transaction, I decided to give batch processing a try instead. Besides, it felt a little (oh, all right, a lot!) like I was abusing the JMS for Batch processing in the first place.

Definitions:
Job
encapsulates the entire batch process. So, in our case, we have two jobs to define for YPPO. Job "Import new Photographs" and job "Verify existing Photographs".
Step
a domain object that encapsulates an independent, sequential phase of the job. There are two kinds, Chunk-style and Batchlet-style. We can suffice here with the standard, Chunk-style. We only need one step for each job. We are keeping it simple as that is all that is needed.
  • chunk-style step "Import Photograph" for job "Import new Photographs"
  • chunk style step "Verify Photograph" for job "Verify existing Photographs"
JobOperator
operates on jobs, starts them, stops them, retrieves required steps etc.
JobRepository
contains history, jobs running, job that have run, etc.
I shall be focussing on the "Verify existing Photographs" job for the remainder of the blogpost.

Batch Artifacts

The Batch Artifacts are injected as Beans into the job system using CDI. Thusly the artifacts used are Beans identified with @Named annotations.

In the package gallery.jobs.verify above, the classes Listener, Reader, Processor, Writer are Beans annotated with @Named with names verifyPhotographListener, verifyPhotographReader, verifyPhotographProcessor and verifyPhotographWriter respectively. These names will be used later in defining the job.

In my example, I also wished that if an actual error occurred, the item to be processed would be skipped and a log error written to the database, so that it can be checked and corrected afterwards by the user. So, errors on individual items are no excuse for stopping the job.

In the new JEE 7, there's a setting in the empty beans.xml file called "bean-discovery-mode", which is required. If set to "all", it checks all classes for CDI. If set to "annotated", it checks only those classes that have annotations. I've set it to "annotated". However, I've tried setting the @Named annotation on the classes I required, but my classes weren't injected in the Batch jobs. An additional annotation for the scope (@ApplicationScope) is required, before the classes are picked up.

Below is a sequence diagram of a job.

Job Specification Language

JSL or Job Specification Language is the xml format used to define jobs. It is grammatically defined in [5].

In the picture on the left, you can see where in my project structure in NetBeans the Job definition files are located. In this location, they are automatically picked up by the Application Server.

For YPPO you can find the files AddPhotographs.xml and VerifyPhotographs.xml as visible below on my git repository YourPersonalPhotographOrganiser.

A job can currently not be restarted. My jobs are not configured for restarting (restartable="false" as you can see). I'll devote a future blog to it once I figure out how to do that.

In my example, I have a Batch job chunk-size of 4. (item-count="4")

Listeners should be defined in the step, prior to defining other things, like for example a chunk.

Starting a Batch Job

public void verifyPhotographs(Location location)
{
    JobOperator operator = BatchRuntime.getJobOperator();
    Properties jobParameters = new Properties();
    jobParameters.setProperty("location", location.getId() + "");
    operator.start("VerifyPhotographs", jobParameters); // maps to VerifyPhotographs.xml
}

Retrieving Details About Batch Jobs

Notes

It seems NetBeans has no Wizards nor GUIs and Widgets to automatically generate Batch Jobs scaffolding, but it is on the roadmap.

There are lots of tutorials on Batch Jobs in JEE 7, however, a lot of them were written in the first half of 2013 and they have in the mean time changed the definition of Batch Artifacts from Annotations to a simple set of Interfaces your Batch Artifacts need to implement.1 3

[4] gives a good general view of Batch Applications in Java 7. The JEE 7 Tutorial8 is also very good.

References

[1] Batch Applications in Java EE
https://blogs.oracle.com/arungupta/entry/batch_applications_in_java_ee
[2] JSR 352 - Java Batch
https://java.net/projects/jbatch
[3] Batch Applications for Java Processing - SlideShare
http://www.slideshare.net/arungupta1/jbatch-21153200
[4] Java EE 7 Introduction to Batch JSR 352
http://jaxenter.com/java-ee-7-introduction-to-batch-jsr-352-47400.html
[5] Batch Applications for the Java Platform - Version 1.0 Final Release
JSR-352-1.0-Final-Release.pdf
[6] JavaDoc - JEE 7
http://docs.oracle.com/javaee/7/api/
[7] Spring Batch in Action
Arnaud Cogluègnes, Thierry Templier, Gary Gregory, Olivier Bazoud
[8] JEE 7 Tutorial - Batch Processing
http://docs.oracle.com/javaee/7/tutorial/doc/batch-processing.htm
[9] GitHub - YourPersonalPhotographOrganiser (YPPO)
https://github.com/maartenl/YourPersonalPhotographOrganiser