Commits
Ryan Lovelett committed 7dbb4127f55
[gyb] Force Unicode strings in Python 2
All strings are sequences of Unicode characters in Python 3. This is
entirely different than that of Python 2. Python 2's strings were of
bytes. However, Python 2 does have the concept of Unicode strings. This
patch changes the behavior of the file reader to use the same the codecs
module on Python 2 to properly read a string into a unicode string. From
there the strings are meant to be equivalent on 2 and 3. The rest of the
patch just updates the code to natively work with unicode strings.
To test the class `GraphemeClusterBreakPropertyTable`:
$ python2 utils/gyb --test \
-DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \
-DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \
-DCMAKE_SIZEOF_VOID_P=8 \
-o /tmp/UnicodeExtendedGraphemeClusters.cpp.2.7.tmp \
./stdlib/public/stubs/UnicodeExtendedGraphemeClusters.cpp.gyb
$ python3 utils/gyb --test \
-DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \
-DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \
-DCMAKE_SIZEOF_VOID_P=8 \
-o /tmp/UnicodeExtendedGraphemeClusters.cpp.3.5.tmp \
./stdlib/public/stubs/UnicodeExtendedGraphemeClusters.cpp.gyb
$ diff -u /tmp/UnicodeExtendedGraphemeClusters.cpp.2.7.tmp \
/tmp/UnicodeExtendedGraphemeClusters.cpp.3.5.tmp
To test the method `get_grapheme_cluster_break_tests_as_UTF8`:
$ python2 utils/gyb --test \
-DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \
-DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \
-DCMAKE_SIZEOF_VOID_P=8 \
-o /tmp/UnicodeGraphemeBreakTest.cpp.2.7.tmp \
./unittests/Basic/UnicodeGraphemeBreakTest.cpp.gyb
$ python3 utils/gyb --test \
-DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \
-DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \
-DCMAKE_SIZEOF_VOID_P=8 \
-o /tmp/UnicodeGraphemeBreakTest.cpp.3.5.tmp \
./unittests/Basic/UnicodeGraphemeBreakTest.cpp.gyb
$ diff -u /tmp/UnicodeGraphemeBreakTest.cpp.2.7.tmp \
/tmp/UnicodeGraphemeBreakTest.cpp.3.5.tmp