Did you know that you can navigate the posts by swiping left and right?

Why I Can't Wait For PHP 6: Part 1

10 Jun 2008 . category: blog . Comments
#charset #PHP #UTF-8

Today I decided to take the initiative to add UTF-8 support to our Item Import and Update tools. Easier said than done. I have had my share of Unicode issues with PHP, as I am sure everyone has. This is the first one that I have not been able to conquer.

Our tools use an uploaded CSV file to both import new and update existing items in the WorkXpress application. The file can be in either a comma or tab delimited list formats. The Unicode problem arises when the uploaded file contains any UTF-8 characters. We use fopen to open the files and fgetcsv to parse the file. However, fgetcsv does not support UTF-8 characters. After an hour of play, I could not get any function to read the UTF-8 characters properly, not even file_get_contents.

For my test, I used a three line file I called utf_import.csv. The file looked similar to the following:

"cafe Good","bold1"
"café Not So Good","bold2"
"cafae Okay I Guess","bold3"

However, I received the following results:

array
  0 => string 'cafe Good'
  1 => string 'bold1'

array
  0 => string 'caf� Not So Good'
  1 => string 'bold2'

array
  0 => string 'cafae Okay I Guess'
  1 => string 'bold3'

The Internet returned little help. I found several post suggesting to use setlocale(LC_ALL, 'en_US.UTF-8'); which would make sense based on this note in the fgetcsv documentation:

Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.

Unfortunately, even this does not work. Some time later, I came across PHP Bug #38471: fgetcsv(): locale dependency of delimiter / enclosure arg. The response to this bug:

We’re working Unicode support in PHP6. but it won’t appear in previous versions.


Me

James Armes is a software engineer and open source enthusiast from central Pennsylvania.