Entaxy Docs

Tokenize

Since Camel 2.0

The tokenizer language is a built-in language in camel-core, which is most often used only with the Splitter EIP to split a message using a token-based strategy.
The tokenizer language is intended to tokenize text documents using a specified delimiter pattern. It can also be used to tokenize XML documents with some limited capability. For a truly XML-aware tokenization, the use of the XMLTokenizer language is recommended as it offers a faster, more efficient tokenization specifically for XML documents. For more details see Splitter.

Tokenize Options

The Tokenize language supports 11 options, which are listed below.

Name Default Java Type Description

token

String

The (start) token to use as tokenizer, for example you can use the new line token. You can use simple language as the token to support dynamic tokens.

endToken

String

The end token to use as tokenizer if using start/end token pairs. You can use simple language as the token to support dynamic tokens.

inheritNamespaceTagName

String

To inherit namespaces from a root/parent tag name when using XML You can use simple language as the tag name to support dynamic names.

headerName

String

Name of header to tokenize instead of using the message body.

regex

false

Boolean

If the token is a regular expression pattern. The default value is false

xml

false

Boolean

Whether the input is XML messages. This option must be set to true if working with XML payloads.

includeTokens

false

Boolean

Whether to include the tokens in the parts when using pairs The default value is false

group

String

To group N parts together, for example to split big files into chunks of 1000 lines. You can use simple language as the group to support dynamic group sizes.

groupDelimiter

String

Sets the delimiter to use when grouping. If this has not been set then token will be used as the delimiter.

skipFirst

false

Boolean

To skip the very first element

trim

true

Boolean

Whether to trim the value to remove leading and trailing whitespaces and line breaks