The Blog
Execution order of servlet filters
published January 13, 2010
In JEE Servlet spec 2.5 schema, servlet filter execution order is defined like this:
Declaration of the filter mappings in this web application is done by using filter-mappingType. The container uses the filter-mapping declarations to decide which filters to apply to a request, and in what order. The container matches the request URI to a Servlet in the normal way. To determine which filters to apply it matches filter-mapping declarations either on servlet-name, or on url-pattern for each filter-mapping element, depending on which style is used. The order in which filters are invoked is the order in which filter-mapping declarations that match a request URI for a servlet appear in the list of filter-mapping elements.The filter-name value must be the value of the filter-name sub-elements of one of the filter declarations in the deployment descriptor.
I never remember whether it is the order of “filter” or “filter-mapping” elements. Now you and I will find the answer here.
The pain with Liferay themes
published December 23, 2009
During this year, I have seen development of 5 Liferay themes and I do not like what I see. This stuff takes way more time than it should. Usually in our projects, a HTML layout is created by a UI designer working “outside of a project” and that layout is afterwards converted into a Liferay theme by a developer or UI specialist working “inside our project”. Immediate problems can arise if the HTML layout is designed wrong way.
The most basic error is to completely ignore how a portal framework creates portlets and the HTML it produces. You should know how HTML divs and other Liferay elements are created. Ignoring this will create a nasty mismatch when the layout is converted into a Liferay theme.
If you manage to get a good HTML layout, theme conversion is much easier but still takes a lot of time. It is common to see 90% of theme ready quite quickly and use another 90% of time to fix remaining 10 percents. Common problems include:
- Cross-browser functionality, to get all browsers to render perfectly.
- Fixing broken Liferay controls, i.e. portlet controls, drag and drop, etc.
- Fixing broken themes during a Liferay upgrade. This is a major one.
You will need a specialist who can handle these things and who knows what she is doing. Without her, it will cost you, a lot. I would like to see solutions on how to easily create a theme without a hassle.
Character encoding, character set, and more
published December 21, 2009
These things come up in every project I work in and I have to admit that I have never fully understood every detail about character sets and encodings. So now, I finally had to find out. In this article I will answer at least following questions:
- What is character encoding?
- What is a character set?
- What is the difference between character set and character encoding?
- What is a Unicode and how does that relate to UTF-8, UTF-16, and UTF-32?
- What is the difference between HTTP Content-Type header and HTML meta tag content-type, and why do we need both?
- What about MySQL connection string parameters, database character set and collation?
Let’s get started!
The basics
We all use characters to form words. In the magic world of computers we are going to need a system that converts characters into bits. Sure, it would be nice to have only one set of rules that could handle each and every case, but as we know, it is not that simple. While there are very old encodings, we will start at ASCII standard which was introduced back in the 1960s. It defines 128 code-character pairs, which together form a coded character set that uses 7 bits to represent each character on a media. Each of these code representations are called code points. So, ASCII standard defines a character repertoire of 128 characters in which the first 32 code points are reserved for control characters.
What are character set and character encoding?
Character set defines characters available in a set and their code points, whereas character encoding defines how code points are represented on a media. In practice they are used interchangeably. Historically they were synonyms because a same standard used to define the characters available as well as the actual encoding rules. Now things have changed and character set can be encoded with different encoding system.
All ASCII, ISO-8859-1, and Unicode character sets use value 65 for a character “A”. However, ASCII uses 7 bits for character encoding whereas ISO-8859-1 uses 8 bits.
Unicode, UTF-8, UTF-16 and UTF-32
Unicode is a character set. It defines 17 planes, in which each can containing up to 65 536 characters. This enables encoding of 1 114 112 characters. Similar characters are collected on same planes and most of the characters in a languages spoken today are collected in the first one, called Basic Multilingual Plane. Each of the characters in the Unicode character set can be represented using UTF-8/16/32 character encoding.
UTF-8 encodes characters using 1-4 bytes. The Length of the stream depends on the character encoded. It is backwards compatible with ASCII often making it the choice today.
UTF-16 is also a variable-length encoding. It uses 2-4 bytes to represent a character.
UTF-32 is the most straight forward encoding scheme, which always uses 4 bytes to represent a character.
What is the difference between HTTP Content-Type header and HTML meta tag content-type, and why do we need both?
When HTML content is transferred over network, both a server and a client will need to know how to encode and decode a stream. In an initial HTTP request, the client sends the request with an “Accept-Charset” header, which tells the server how it will want a results encoded. In the response, the server should include “Content-Type” HTTP header as well as a content-type meta tag inside HTML and they both should declare same encoding.
There is a nice article you can read. Basically both have precedence rules which will tell the client which one to use. In the case of a XHTML response, precedence is
1) HTTP header
2) Meta tag
In the case where the HTTP header is present, browser will use it. But afterwards when content is saved on a disk either by a user or a proxy, the Content-Type HTTP response header is lost. In this case, used character encoding can be read from the meta tag from the saved file. That is why you need both.
In a Java world, the HTTP response content type can be set using ServletResponse.setCharacterEncoding() if default ISO-8859-1 is not suitable. If you have Apache, you can use AddCharset directive.
How the MySQL database character set and collation work in this picture?
All text in a database is encoded in some format. The MySQL will encode characters into database using character set defined for it. Collation is a set of rules how characters are compared inside character set. Collation tells sorting engine whether a character “A” come before or after a character “B”.
MySQL connection string encodings
You can define many properties for MySQL jdbc connection. For example, you can use property “characterEncoding” to tell JDBC driver to encode queries using given character encoding. You can alter a result set encoding by using “characterSetResults” property.
Gimme some pictures!
In a picture below you can see character encoding working:

Strangling legacy code
published November 30, 2009
How do you refactor a legacy module which is used by many clients? Well, you strangle it. In a picture below we have two clients using some legacy code that we want to get rid of.

You start by creating a new interface and switching part of the usage to that interface.

Gradually, you refactor parts of your legacy code and move more clients using the new implementation.

In some point, you have managed to move all usage from the legacy interfaces to the new implementation, but you still might be using the legacy implementation under the hood.

The final step is to completely remove legacy code once you got all your necessary features refactored.

And actually delete all your legacy code, I do not want to see huge code blocks commented out “just in case we need this in the future”. You don’t. And if you do, you still find it in your SVN, right?
Keeping your development environment in sync with production environment
published November 29, 2009
In one of my current projects, we are developing a semantic portal that is providing welfare related information for public. Portal content is updated daily by content providers while we are developing more features using three week sprints. We have a very nice way of keeping development environment in sync with production environment.

1) In the beginning of every sprint, production content is copied into the maven repository using custom made scripts.
2) Test environment data is brought up to date with the production environment.
3) Development environment content data dependency is updated to latest version in our pom.xml files.
4) Developers update their environments from the SCM and the Maven repository. Development data is now in sync with production data.
5) Developers start to build new features planned for the current sprint.
6) Once a feature starts to be ready, new installation is made into the test environment and a developer will make necessary configurations for it.
7) Tester works with developer to finish the feature. In our team, tester is a role and can virtually be anyone.
8) Test environment content is copied into the Maven repository using custom made scripts.
9) Development environment content data dependency is updated to latest version in our pom.xml files.
10) Developers update their environments from the SCM and the Maven repository. Development data now contains just finished feature.
11) Steps 6…10 reoccur n times.
12) Once all features are ready (or we run out of development time), testing begins. During this time, only bug fix commits are allowed as we are getting ready for the production release update.
13) Production deployment is made for the new release.
This cycle works well and this way we are developing against latest production data. Key here is to make export and import scripts very easy to use. This style will help you tremendously if you are planning to release after every sprint.
Subscribe to RSS feed
The Tag Cloud
Agile Business Coaching Coding horror Conference Customer Design of Experiments Future Group dynamics ITIL It should not be that hard Java EE Kanban Leadership Lean Liferay Methodologies Natural UI Performance tuning Process Productivity Quality Retrospective RIA Scrum Six Sigma Social psychology Software Software architecture Testing This is great TOGAF
WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.
Samuli's Links
The Blog Archive
February 2012 (1)
January 2012 (1)
November 2011 (1)
June 2011 (2)
May 2011 (1)
April 2011 (2)
March 2011 (2)
February 2011 (1)
January 2011 (1)
December 2010 (1)
November 2010 (1)
October 2010 (3)
September 2010 (3)
August 2010 (5)
July 2010 (2)
June 2010 (3)
May 2010 (4)
April 2010 (2)
March 2010 (6)
February 2010 (7)
January 2010 (3)
December 2009 (7)
November 2009 (6)
