aboutsummaryrefslogtreecommitdiff
path: root/find-modules
diff options
context:
space:
mode:
authorStefan Brüns <stefan.bruens@rwth-aachen.de>2018-08-19 02:32:14 +0200
committerStefan Brüns <stefan.bruens@rwth-aachen.de>2018-09-14 14:45:41 +0200
commit28da9e614f663210f4e5e0790b56055ce517bf38 (patch)
tree2190d55ac8c3d11a82a605a62e651f5c496e0635 /find-modules
parent5ecea2ff63c36afc2dad77f33899bdf9eb260e51 (diff)
downloadextra-cmake-modules-28da9e614f663210f4e5e0790b56055ce517bf38.tar.gz
extra-cmake-modules-28da9e614f663210f4e5e0790b56055ce517bf38.tar.bz2
Bindings: Correct handling of sources containing utf-8
Summary: Depending on the locale, python3 may try to decode the source as ASCII when the file is opened in text mode. This will fail as soon as the code contains utf-8, e.g. (c) symbols. While it is possible to specify the encoding when reading the file, this is bad for several reasons: - only a very small part of the source is processed via _read_source, no need to decode the complete source and store it as string objects - the clang Cursor.extent.{start,end}.column refers to bytes, not multibyte characters. While python2 processes utf-8 containing sources without error messages, wrong extent borders are also an issue. The practical impact is low, as the issue only manifests if there is a multibyte character in front of *and* on the same line as the read token. Test Plan: Python3: Build any bindings which contains sources with non-ASCII codepoints, e.g. kcoreaddons. Unpatched version fails when using e.g. LANG=C. Python2: Both versions generate sources successfully. Bytes vs characters test: ``` #define Q_SLOTS class foo { /* a */ public Q_SLOTS: /* ä */ public Q_SLOTS: }; ``` `sip_generator.py --flags "" /usr/lib64/libclang.so Qt5Ruleset.py test.h out.sip` Obviously, both lines should result in the same code, the unfixed version generates `public Q_SLOTS:` vs `public:`. Reviewers: #frameworks, lbeltrame Reviewed By: lbeltrame Subscribers: lbeltrame, bcooksley, jtamate, kde-frameworks-devel, kde-buildsystem Tags: #frameworks, #build_system Differential Revision: https://phabricator.kde.org/D15068
Diffstat (limited to 'find-modules')
-rw-r--r--find-modules/sip_generator.py6
1 files changed, 4 insertions, 2 deletions
diff --git a/find-modules/sip_generator.py b/find-modules/sip_generator.py
index 1ebaba61..8dcab566 100644
--- a/find-modules/sip_generator.py
+++ b/find-modules/sip_generator.py
@@ -129,7 +129,7 @@ class SipGenerator(object):
#
source = h_file
self.unpreprocessed_source = []
- with open(source, "rU") as f:
+ with open(source, "rb") as f:
for line in f:
self.unpreprocessed_source.append(line)
@@ -739,6 +739,7 @@ class SipGenerator(object):
:param extent: The range of text required.
"""
+ # Extent columns are specified in bytes
extract = self.unpreprocessed_source[extent.start.line - 1:extent.end.line]
if extent.start.line == extent.end.line:
extract[0] = extract[0][extent.start.column - 1:extent.end.column - 1]
@@ -747,8 +748,9 @@ class SipGenerator(object):
extract[-1] = extract[-1][:extent.end.column - 1]
#
# Return a single line of text.
+ # Replace all kinds of newline variants (DOS, UNIX, MAC style) by single spaces
#
- return "".join(extract).replace("\n", " ")
+ return b''.join(extract).decode('utf-8').replace("\r\n", " ").replace("\n", " ").replace("\r", " ")
@staticmethod
def _report_ignoring(parent, child, text=None):