Ran into an interesting bug a few days ago that I’ve been meaning to document for anyone who has written their own truncating function in Rails. If you have, I’d guess it probably looked something like this:
def truncate(str, length) return '' if str.blank? truncated = str.size > length (str[0..(truncated ? length - 3 : length)] + (truncated ? "..." : '')) end
Fine enough. Unless you happen to be truncating user-given strings.
If you are letting your users enter in the information that gets truncated, chances are that some of them are entering unicode characters for accents, quotation marks, etc. Because unicode characters are 2-4 bytes long, the above truncate will split characters in half and cause general headaches if it truncates text that has unicode characters in it.
Split-in-half characters are bad news. They will cause errors like “illegal JSON output,” which is how I originally spotted this as a problem with our truncate method.
The solution is to take a page from Rails’ own truncate, and use #mb_chars. So a hand-written truncate that works for unicode would be:
def truncate(str, length) return '' if str.blank? truncated = str.size > length (str.mb_chars[0..(truncated ? length - 3 : length)] + (truncated ? "..." : '')).to_s end