Today I decided to take the initiative to add UTF-8 support to our Item Import and Update tools. Easier said than done. I have had my share of Unicode issues with PHP, as I am sure everyone has. This is the first one that I have not been able to conquer.
Our tools use an uploaded CSV file to both import new and update existing items in the WorkXpress application. The file can be in either a comma or tab delimited list formats. The Unicode problem arises when the uploaded file contains any UTF-8 characters. We use fopen to open the files and fgetcsv to parse the file. However, fgetcsv does not support UTF-8 characters. After an hour of play, I could not get any function to read the UTF-8 characters properly, not even file_get_contents.
For my test, I used a three line file I called utf_import.csv. The file looked similar to the following:
However, I received the following results:
The Internet returned little help. I found several post suggesting to use
setlocale(LC_ALL, 'en_US.UTF-8'); which would make sense based on this note in
the fgetcsv documentation:
Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.
Unfortunately, even this does not work. Some time later, I came across PHP Bug #38471: fgetcsv(): locale dependency of delimiter / enclosure arg. The response to this bug:
We’re working Unicode support in PHP6. but it won’t appear in previous versions.