CSV

The settings for CSV-files are:

  • Character encoding: specifies the character encoding used in the file. (default=’UTF-8’)
  • Column separator character: specifies the separator in the file. Must be a single character. Use ‘\t’ for tab. (default=,)
  • Contains header row: specifies whether the first row (after the skipped rows) is the list of column-names. (default=true)
  • Batch size (internal): number of rows to emit in a single XML fragment. Setting this number higher will usually increase performance, but will also impact memory usage. Usually a batch size of 256 is fine. Make sure that your indexer template is able to cope with multiple rows in a single XML fragment. (default=1)
  • Rows to skip (internal): specifies how many initial lines can be skipped. (default=0)
  • Trim whitespace from values: specifies whether the whitespace around the strings in the CSV file needs to be stripped. This feature is useful when parsing older databases which pad values with spaces up to a specific width. (default=false)
  • Method to escape strings: specifies how a literal quoting-character is escaped. There are two modes supported: DOUBLING (example: to express a literal double-quote, write two double-quotes), or BACKSLASH to use a preceding '' to denote a literal escaping character (" to write a "). (default=BACKSLASH)
  • Allow rows to be spread out over multiple rows: specifies whether quoted text-fields may have line-breaks in them. (default=false)
  • Character to use for quoting a string: specifies how text is identified. Usually a single or a double quote. (default=")