Fix regex for <ref> tag when it's not self-closing

In some articles <ref> tag appears like this:

    <ref name="Ahmed Rashid/The Telegraph">{{cite

Previous regex breaks when it sees the forward slash ("Rashid/The"). New regex stops at the earliest occurrence of the closing bracket, no need to pre-filter characters.
This commit is contained in:
Roman Prokofyev 2015-06-10 15:40:27 +02:00
parent c15d93c40a
commit b99aaf19aa

View File

@ -215,7 +215,7 @@ comment = re.compile(r'<!--.*?-->', re.DOTALL)
# Match ignored tags
ignored_tag_patterns = []
def ignoreTag(tag):
left = re.compile(r'<%s\b[^>/]*>' % tag, re.IGNORECASE) # both <ref> and <reference>
left = re.compile(r'<%s\b.*?>' % tag, re.IGNORECASE) # both <ref> and <reference>
right = re.compile(r'</\s*%s>' % tag, re.IGNORECASE)
ignored_tag_patterns.append((left, right))