添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

如何解析一个字符串并返回一个嵌套数组?

12 人关注

我想要一个 Python 函数,它接收一个字符串,并返回一个数组,其中数组中的每一项要么是一个字符,要么是另一个类似的数组。嵌套数组在输入字符串中以'('开始,以')结束。

因此,该函数将像这样行事。

1) foo("abc") == ["a", "b", "c"]
2) foo("a(b)c") == ["a", ["b"], "c"]
3) foo("a(b(c))") == ["a", ["b", ["c"]]]
4) foo("a(b(c)") == error: closing bracket is missing
5) foo("a(b))c") == error: opening bracket is missing
6) foo("a)b(c") == error: opening bracket is missing

注:我更希望有一个纯粹的功能性的解决方案。

3 个评论
在这里使用递归,它是一个完美的选择。在令牌流中找到一个'('意味着递归到。 在顶层调用中找到一个')'意味着有一个平衡不匹配。
确实是递归。只要字符串不是太长,就是这样。请记住,这可能需要一个封装函数,因为我们需要检查'('的数量是否与')'的数量相符。如果是,就调用递归函数。如果不是,则给出适当的错误信息。
是的,我知道我将在这里使用递归。我知道如何写一个函数(使用堆栈)来返回一个字符串中的括号是否平衡和有序,但下一步尝试返回嵌套数组才是我的难题。这不是家庭作业。这是一个更大的解析程序的一部分。
python
arrays
string
parsing
nested
Tespa42
Tespa42
发布于 2013-06-17
7 个回答
Jared
Jared
发布于 2013-06-17
已采纳
0 人赞同
def foo(s):
    def foo_helper(level=0):
            token = next(tokens)
        except StopIteration:
            if level != 0:
                raise Exception('missing closing paren')
            else:
                return []
        if token == ')':
            if level == 0:
                raise Exception('missing opening paren')
            else:
                return []
        elif token == '(':
            return [foo_helper(level+1)] + foo_helper(level)
        else:
            return [token] + foo_helper(level)
    tokens = iter(s)
    return foo_helper()
>>> foo('a((b(c))d)(e)')
['a', [['b', ['c']], 'd'], ['e']]
    
你想尝试用功能化的方式重写吗?
@FinalZero 你是什么意思?这已经是一个递归的解决方案了。
我的意思是在不使用 iter next 的情况下。
这个答案不是纯函数式的,但我还是接受了,因为它帮助我构思和编写了纯函数式的解决方案,它要求 foo 返回一个元组,而不仅仅是一个数组。
falsetru
falsetru
发布于 2013-06-17
0 人赞同
def foo(xs):
    stack = [[]]
    for x in xs:
        if x == '(':
            stack[-1].append([])
            stack.append(stack[-1][-1])
        elif x == ')':
            stack.pop()
            if not stack:
                return 'error: opening bracket is missing'
                #raise ValueError('error: opening bracket is missing')
        else:
            stack[-1].append(x)
    if len(stack) > 1:
        return 'error: closing bracket is missing'
        #raise ValueError('error: closing bracket is missing')
    return stack.pop()
assert foo("abc") == ["a", "b", "c"]
assert foo("a(b)c") == ["a", ["b"], "c"]
assert foo("a(b(c))") == ["a", ["b", ["c"]]]
assert foo("a((b(c))d)(e)") == ['a', [['b', ['c']], 'd'], ['e']]
assert foo("a(b(c)") == "error: closing bracket is missing"
assert foo("a(b))c") == "error: opening bracket is missing"
assert foo("a)b(c") == 'error: opening bracket is missing'
    
FMc
+1 而且你不需要 result :只要设置 stack = [[]] ,然后返回 stack.pop() 。这样看起来更纯粹 -- 只有堆栈。
fricke
fricke
发布于 2013-06-17
0 人赞同

我建议有两种方法。

要么就自己编写递归解析器,如 here , or use 语法分析 , something like

import pyparsing as pp
expr = pp.Forward()
expr << pp.Word(pp.alphas) + pp.Optional('(' + expr + ')') + pp.Optional(pp.Word(pp.alphas))

here you describe a recursive expression as a sequence of alphas, which can be interleaved by balanced parentheses. When you check this example for the outputs you will see how to get the desired output structure (though it will require some tweaking on your side and require some learning about 语法分析).

Ashwini Chaudhary
Ashwini Chaudhary
发布于 2013-06-17
0 人赞同

Using regex and ast.literal_eval

>>> import re
>>> from ast import literal_eval
>>> def listit(t):
...         return list(map(listit, t)) if isinstance(t, (list, tuple)) else t
def solve(strs):
    s = re.sub(r'[A-Za-z]', "'\g<0>',", strs)
    s = re.sub(r"\)", "\g<0>,", s)
    try: return listit( literal_eval('[' + s  + ']') )
    except : return "Invalid string! "
>>> solve("abc")
['a', 'b', 'c']
>>> solve("a(b)c")
['a', ['b'], 'c']
>>> solve("a(b(c))")
['a', ['b', ['c']]]
>>> solve("a(b(c)")
'Invalid string! '
>>> solve("a)b(c")
'Invalid string! '
>>> solve("a(b))c")
'Invalid string! '
>>> solve('a((b(c))d)(e)')
['a', [['b', ['c']], 'd'], ['e']]
    
Alex L
Alex L
发布于 2013-06-17
0 人赞同

一个相当快速和讨厌的方法(只是为了不同的东西)。

import json, re
def foo(x):
    # Split continuous strings
    # Match consecutive characters
    matches = re.findall('[a-z]{2,}', x)
    for m in matches:
        # Join with ","
        x = x.replace(m, '","'.join(y for y in list(m))) 
    # Turn curvy brackets into square brackets
    x = x.replace(')', '"],"')
    x = x.replace('(', '",["')
    # Wrap whole string with square brackets
    x = '["'+x+'"]'
    # Remove empty entries
    x = x.replace('"",', '')
    x = x.replace(',""', '')
        # Load with JSON
        return json.loads(x)
    except:
        # TODO determine error type
        return "error"
def main():
    print foo("abc")     # ['a', 'b', 'c']
    print foo("a(b)c")   # ['a', ['b'], 'c']
    print foo("a(b(c))") # ['a', ['b', ['c']]]
    print foo("a(b))c")  # error
    print foo('a((b(c))d)(e)') # ['a', [['b', ['c']], 'd'], ['e']]
    
Janne Karila
Janne Karila
发布于 2013-06-17
0 人赞同
def parse_nested(iterator, level=0):
    result = []
    for c in iterator:
        if c == '(':
            result.append(parse_nested(iterator, level+1))
        elif c == ')':
            if level:
                return result
            else:
                raise ValueError("Opening parenthesis missing")
        else:
            result.append(c)
    if level:
        raise ValueError("Closing parenthesis missing")
    else:
        return result
print parse_nested(iter('a((b(c))d)(e)'))       
    
Harry Wang
Harry Wang
发布于 2013-06-17
0 人赞同

递归是非常强大的东西,你应该尝试使用。

Here is my code:

# encoding: utf-8 # Python33 def check(s): cs = [c for c in s if c == '(' or c ==')'] flag = 0 for c in cs: if flag < 0: return 'opening bracket is missing' if c == '(': flag += 1 else: flag -= 1 if flag < 0: return 'opening bracket is missing' elif flag > 0: return 'closing bracket is missing' else: return '' def _foo(cs): result = [] while len(cs): c = cs.pop(0) if c == '(': result.append(_foo(cs)) elif c == ')': return result else: result.append(c) return result def foo(s): valiad = check(s) if valiad: return valiad cs = list(s) return _foo(cs) if __name__ == '__main__':