I'm looking for a part-time remote job.

Hire me


I'm the author of:

Mastering Redmine is a comprehensive guide with tips, tricks and best practices, and an easy-to-learn structure.

Check the book's project or

Buy the book

Social pages of the book:

By buying this book you also donate to Redmine (see this page).


Follow me:

4-byte UTF-8 problem in Redmine and how to fix it

I’ll start with explaining the cause of the problem. If you don’t need this, you can skip the padded paragraphs.

You have probably already heard about issues with weird symbols in iPhones, that led to freeze or reboot? Something like these can also be experienced on Redmine instances, that were installed some time ago. iPhones crash because they fail to render the weird symbols and such Redmine instances give errors because they fail to save corresponding symbols to the database. Yes, the problem is actually in the underlying MySQL database, not in Redmine. In iPhones this happens with some unusual 4-byte unicode symbols and in Redmine/MySQL this can happen with any 4-byte symbol. Thus, you can experience this issue in Redmine with emoji, some Chinese, Japanese and Korean characters.

Some time ago 4-byte unicode symbols were not used so widely as nowadays. Probably for this reason at those days MySQL developers implemented a limited support of UTF-8 in their database. This limited implementation, which is called internally as just utf8, supported only 3-byte UTF-8 characters. That was done on purpose as less space is needed to store an array of 3-byte UTF-8 characters than 4-bytes ones (767 bytes can store 255 3-byte characters and 191 4-bytes ones). As a result, if an application attempts to save a 4-byte UTF-8 character into a table with utf8 character set, the SQL query gives the error (an example):

ActiveRecord::StatementInvalid (Mysql2::Error: Incorrect string value: '\xF0\x9F\x98\x81' for column 'notes' ...

You can check, if your Redmine is affected by this issue, by adding the following text in, e.g., an issue note (actually, in any textual content, which is saved to the database):

My Redmine database is already in utf8mb4. 😉

The full support of UTF-8 (which includes support for 4-byte characters) was added to MySQL later, in version 5.5.3. As the name utf8 was already in use, the new MySQL character set got the name utf8mb4 (utf8 is also known as utf8mb3). So, if you experience the aforementioned UTF-8 problem, you need to migrate your Redmine database to this new character set. (If you installed Redmine recently, it’s possible, that you already use utf8mb4, as it’s now recommended in official Redmine installation instructions. You know how to verify this. However, you can still experience the column size issue – see below.)

Generally, UTF-8 is a character encoding with variable character length, what means, that, technically, it’s possible to store a 255-character UTF-8 string with some 4-byte characters into 767 bytes (if other characters are two-bytes, for example). However, when indexing such strings, MySQL uses arrays of characters, each item of which is of the maximum size, i.e., 4 bytes. And, 767 bytes is the size of index key, which is used by MySQL for COMPACT and REDUNDANT row formats. Therefore, MySQL can’t store 255 UTF-8 characters in such indexes. In other words, if some tables contain indexed columns of the type VARCHAR(255) (VARCHAR(192) and bigger, actually), you won’t be able to convert them to utf8mb4. If you’ll try to do this, you will get the following error:

Index column size too large. The maximum column size is 767 bytes ...

As it has been mentioned, this happens due to the limit of 767 bytes of the MySQL’s COMPACT row format (which is default before MySQL 5.7). Luckily, MySQL supports other row formats, such as DYNAMIC and COMPRESSED, which have much larger limit – 3072 bytes. But, these row formats are incompatible with the older MySQL table file format, which is called Antelope and which you are probably using (if you experience the UTF-8 problem and/or the column size issue). So, the solution is to also migrate your database tables to the new Barracuda file format and to (for example) the DYNAMIC row format (these file and row formats are default in MySQL since version 5.7).

There is also another option: to change sizes of the corresponding failing columns to VARCHAR(191). In pure Redmine such columns are revision and scmid in the changesets table and version in the schema_migrations table. For these columns 191 bytes are more than enough, so it’s safe to. But, if you choose this solution, you will also have to do this for all similar columns (if any) in tables, that have been added by plugins. Moreover, to be able to install new Redmine plugins, that come with similar columns, you will first need to modify their tables creation code (so that they used 191 bytes limit for indexed string columns)! That’s why I believe, migrating to Barracuda/DYNAMIC is a better option.

From this moment we’ll modify the database (and the related configuration parameters), therefore you should stop Redmine before proceeding. How to do this depends on how did you install it and what operating system are you using, so unfortunately I can’t say how exactly (you have to look for this in corresponding tutorials). What I can say is that Redmine will need to remain stopped for up to a hour, or even more (depends on the size of the database).

We’ll start with adding support for the new Barracuda format in MySQL server. So, locate the my.cfg file and add the following options there (into the [mysqld] section):

innodb_file_format    = Barracuda
innodb_large_prefix   = 1
innodb_file_per_table = 1

When ready, restart MySQL (again, how to do this depends on how MySQL server was installed, so I can’t be more exact).

The above options just enable the Barracuda file format and large indexed columns. To convert existing tables to the new format and character set you need to run the following SQL query for each of them:

ALTER TABLE <table> ROW_FORMAT=DYNAMIC, CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

You can also write a script, that will extract the list of all tables (e.g., using SHOW TABLES) and then will run this SQL for them. I chose not to do so to see, if any errors happen (that’s how I know, that migration of changesets and schema_migrations fails without first switching to the DYNAMIC row format). So, if you write such script, make sure, that it handles and reports errors.

In MySQL character sets are also specified for the whole database. In this way, the default character set, which will be used for all new tables in this database, is configured. So, this default character set should be changed as well (otherwise new tables, created on Redmine upgrade or by newly installed Redmine plugins will use the limited 3-byte UTF-8 implementation). To do this execute the following SQL query:

ALTER DATABASE <database> CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

Finally, you need to tell Redmine’s MySQL client (ok, the Rails’ one, actually) to use utf8mb4, when connecting to the database. This should to be done in Redmine’s config/database.yml file – just replace utf8 with utf8mb4 there. As a result you should get something like:

production:
  adapter: mysql2
  database: <database>
  host: localhost
  ...
  encoding: utf8mb4

Now, you can start Redmine (you should already know, how to do this).

If you are using a MySQL server version prior 5.7, you should also patch Redmine to make it use the DYNAMIC row format for all newly created tables. Without this Redmine will continue to use the COMPACT format (remember, the default row format was changed only in 5.7). And, this means, that, when you’ll attempt to install a plugin with indexed string columns, you may still get the aforementioned column size error. So, to fix this create the file create_table_patch.rb in Redmine’s config/initializers with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
if ActiveRecord::Base.connection.select_value('SELECT VERSION()').to_f < 5.7
  module ActiveRecord
    module ConnectionAdapters
      class MysqlAdapter < AbstractMysqlAdapter
        def create_table_with_dynamic_row_format(table_name, options = {})
          create_table_without_dynamic_row_format(table_name, options.merge(:options => 'ROW_FORMAT=DYNAMIC'), &block)
        end
 
        alias_method_chain :create_table, :dynamic_row_format
      end
    end
  end
end

This is it!

 

Comments

Also available in: Atom

Add a comment