diff options
author | Stefan Brüns <stefan.bruens@rwth-aachen.de> | 2018-08-19 02:32:14 +0200 |
---|---|---|
committer | Stefan Brüns <stefan.bruens@rwth-aachen.de> | 2018-09-14 14:45:41 +0200 |
commit | 28da9e614f663210f4e5e0790b56055ce517bf38 (patch) | |
tree | 2190d55ac8c3d11a82a605a62e651f5c496e0635 | |
parent | 5ecea2ff63c36afc2dad77f33899bdf9eb260e51 (diff) | |
download | extra-cmake-modules-28da9e614f663210f4e5e0790b56055ce517bf38.tar.gz extra-cmake-modules-28da9e614f663210f4e5e0790b56055ce517bf38.tar.bz2 |
Bindings: Correct handling of sources containing utf-8
Summary:
Depending on the locale, python3 may try to decode the source as ASCII
when the file is opened in text mode. This will fail as soon as the
code contains utf-8, e.g. (c) symbols.
While it is possible to specify the encoding when reading the file,
this is bad for several reasons:
- only a very small part of the source is processed via _read_source,
no need to decode the complete source and store it as string objects
- the clang Cursor.extent.{start,end}.column refers to bytes, not
multibyte characters.
While python2 processes utf-8 containing sources without error messages,
wrong extent borders are also an issue.
The practical impact is low, as the issue only manifests if there is a
multibyte character in front of *and* on the same line as the read token.
Test Plan:
Python3: Build any bindings which contains sources with non-ASCII codepoints,
e.g. kcoreaddons. Unpatched version fails when using e.g. LANG=C.
Python2: Both versions generate sources successfully.
Bytes vs characters test:
```
#define Q_SLOTS
class foo {
/* a */ public Q_SLOTS:
/* ä */ public Q_SLOTS:
};
```
`sip_generator.py --flags "" /usr/lib64/libclang.so Qt5Ruleset.py test.h out.sip`
Obviously, both lines should result in the same code, the unfixed version generates `public Q_SLOTS:` vs `public:`.
Reviewers: #frameworks, lbeltrame
Reviewed By: lbeltrame
Subscribers: lbeltrame, bcooksley, jtamate, kde-frameworks-devel, kde-buildsystem
Tags: #frameworks, #build_system
Differential Revision: https://phabricator.kde.org/D15068
-rw-r--r-- | find-modules/sip_generator.py | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/find-modules/sip_generator.py b/find-modules/sip_generator.py index 1ebaba61..8dcab566 100644 --- a/find-modules/sip_generator.py +++ b/find-modules/sip_generator.py @@ -129,7 +129,7 @@ class SipGenerator(object): # source = h_file self.unpreprocessed_source = [] - with open(source, "rU") as f: + with open(source, "rb") as f: for line in f: self.unpreprocessed_source.append(line) @@ -739,6 +739,7 @@ class SipGenerator(object): :param extent: The range of text required. """ + # Extent columns are specified in bytes extract = self.unpreprocessed_source[extent.start.line - 1:extent.end.line] if extent.start.line == extent.end.line: extract[0] = extract[0][extent.start.column - 1:extent.end.column - 1] @@ -747,8 +748,9 @@ class SipGenerator(object): extract[-1] = extract[-1][:extent.end.column - 1] # # Return a single line of text. + # Replace all kinds of newline variants (DOS, UNIX, MAC style) by single spaces # - return "".join(extract).replace("\n", " ") + return b''.join(extract).decode('utf-8').replace("\r\n", " ").replace("\n", " ").replace("\r", " ") @staticmethod def _report_ignoring(parent, child, text=None): |