Custom parser for file similar to json python












3














I'm attempting to create a parser to translate a "custom" file into JSON so I can more easily manipulate its contents (For argument's sake, call the "custom" formate a .qwerty).



I've already created a Lexer which breaks down the file into individual lexemes (tokens) which structure is [token_type, token_value]. Now I am struggling to parse the lexemes into their correct dictionaries, as it is difficult to insert data into a sub-sub-dictionary since Keys aren't constant. As well as insert data into arrays stored in dictionaries.



It should be noted I am attempting to sequentially parse tokens into an actual python json object then dump the json object.



An example of the file can be seen below, along with what the end result is meant to resemble.



FILE: ABC.querty



Dict_abc_1{

Dict_abc_2{
HeaderGUID="";
Version_TPI="999";
EncryptionType="0";
}

Dict_abc_3{
FamilyName="John Doe";
}

Dict_abc_4{
Array_abc{
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
}

Dict_abc_5{
LastContact="2018-11-08 01:00:00";
BatteryStatus=99;
BUStatus=PowerOn;
LastCallTime="2018-11-08 01:12:46";
LastSuccessPoll="2018-11-08 01:12:46";
CallResult=Successful;
}
}
}
Code=999999;


FILE: ABC.json



{  
"Dict_abc_1":{
"Dict_abc_2":{
"HeaderGUID":"",
"Version_TPI":"999",
"EncryptionType":"0"
},

"Dict_abc_3":{
"FamilyName":"John Doe"
},

"Dict_abc_4":{
"Array_abc":[
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""}
],

"Dict_abc_5":{
"LastContact":"2018-11-08 01:00:00",
"BatteryStatus":99,
"BUStatus":"PowerOn",
"LastCallTime":"2018-11-08 01:12:46",
"LastSuccessPoll":"2018-11-08 01:12:46",
"CallResult":"Successful"
}
}
},
"Code":999999
}


Additional token information,
Token types can either be (with possible values)





  • IDENTIFIER contain the name of the variable identifier


  • VARIABLE containing actual data belonging to the parent IDENTIFIER


  • OPERATOR equal "="


  • OPEN_BRACKET equal "{"


  • CLOSE_BRACKET equal "}"


An example of ABC.querty's lexemes can be seen HERE



fundamental logical extract of main.py



def main():
content = open_file(file_name) ## read file

lexer = Lexer(content) ## create lexer class
tokens = lexer.tokenize() ## create lexems as seen in pastebin

parser = Parser(tokens).parse() ## create parser class given tokens

print(json.dumps(parser, sort_keys=True,indent=4, separators=(',', ': ')))


parser.py



import re

class Parser(object):
def __init__(self, tokens):
self.tokens = tokens
self.token_index = 0
self.json_object = {}
self.current_object = {}
self.path = [self.json_object]

def parse(self):
while self.token_index < len(self.tokens):

token = self.getToken()
token_type = token[0]
token_value = token[1]

print("%s t %s" % (token_type, token_value))

if token_type in "IDENTIFIER":
self.increment()
identifier_type = self.getToken()
if identifier_type[0] in "OPEN_BRACKET":
identifier_two_type = self.getToken(1)
if identifier_two_type[0] in ["OPERATOR","IDENTIFIER"]:
## make dict in current dict
pass

elif identifier_two_type[0] in "OPEN_BRACKET":
## make array in current dict
pass

elif identifier_type[0] in "OPERATOR":
## insert data into current dict
pass


if token_type in "CLOSE_BRACKET":
identifier_type = self.getToken()
if "OPEN_BRACKET" in identifier_type[0]:
#still in array of current dict
pass
elif "IDENTIFIER" in identifier_type[0]:
self.changeDirectory()

else:
#end script
pass

self.increment()
print(self.path)
return self.json_object


def changeDirectory(self):
if len(self.path) > 0:
self.path = self.path.pop()
self.current_object = -1

def increment(self):
if self.token_index < len(self.tokens):
self.token_index+=1

def getToken(self, x=0):
return self.tokens[self.token_index+x]


Additional parse information,
Currently, I was trying to store the current dictionary in a path array to allow me to insert into dictionaries and arrays within dictionaries.



Any suggestions or solutions are very much appreciated,



Thanks.










share|improve this question






















  • In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
    – Matt Timmermans
    Nov 13 '18 at 4:56










  • Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
    – AlexMika
    Nov 13 '18 at 23:34
















3














I'm attempting to create a parser to translate a "custom" file into JSON so I can more easily manipulate its contents (For argument's sake, call the "custom" formate a .qwerty).



I've already created a Lexer which breaks down the file into individual lexemes (tokens) which structure is [token_type, token_value]. Now I am struggling to parse the lexemes into their correct dictionaries, as it is difficult to insert data into a sub-sub-dictionary since Keys aren't constant. As well as insert data into arrays stored in dictionaries.



It should be noted I am attempting to sequentially parse tokens into an actual python json object then dump the json object.



An example of the file can be seen below, along with what the end result is meant to resemble.



FILE: ABC.querty



Dict_abc_1{

Dict_abc_2{
HeaderGUID="";
Version_TPI="999";
EncryptionType="0";
}

Dict_abc_3{
FamilyName="John Doe";
}

Dict_abc_4{
Array_abc{
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
}

Dict_abc_5{
LastContact="2018-11-08 01:00:00";
BatteryStatus=99;
BUStatus=PowerOn;
LastCallTime="2018-11-08 01:12:46";
LastSuccessPoll="2018-11-08 01:12:46";
CallResult=Successful;
}
}
}
Code=999999;


FILE: ABC.json



{  
"Dict_abc_1":{
"Dict_abc_2":{
"HeaderGUID":"",
"Version_TPI":"999",
"EncryptionType":"0"
},

"Dict_abc_3":{
"FamilyName":"John Doe"
},

"Dict_abc_4":{
"Array_abc":[
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""}
],

"Dict_abc_5":{
"LastContact":"2018-11-08 01:00:00",
"BatteryStatus":99,
"BUStatus":"PowerOn",
"LastCallTime":"2018-11-08 01:12:46",
"LastSuccessPoll":"2018-11-08 01:12:46",
"CallResult":"Successful"
}
}
},
"Code":999999
}


Additional token information,
Token types can either be (with possible values)





  • IDENTIFIER contain the name of the variable identifier


  • VARIABLE containing actual data belonging to the parent IDENTIFIER


  • OPERATOR equal "="


  • OPEN_BRACKET equal "{"


  • CLOSE_BRACKET equal "}"


An example of ABC.querty's lexemes can be seen HERE



fundamental logical extract of main.py



def main():
content = open_file(file_name) ## read file

lexer = Lexer(content) ## create lexer class
tokens = lexer.tokenize() ## create lexems as seen in pastebin

parser = Parser(tokens).parse() ## create parser class given tokens

print(json.dumps(parser, sort_keys=True,indent=4, separators=(',', ': ')))


parser.py



import re

class Parser(object):
def __init__(self, tokens):
self.tokens = tokens
self.token_index = 0
self.json_object = {}
self.current_object = {}
self.path = [self.json_object]

def parse(self):
while self.token_index < len(self.tokens):

token = self.getToken()
token_type = token[0]
token_value = token[1]

print("%s t %s" % (token_type, token_value))

if token_type in "IDENTIFIER":
self.increment()
identifier_type = self.getToken()
if identifier_type[0] in "OPEN_BRACKET":
identifier_two_type = self.getToken(1)
if identifier_two_type[0] in ["OPERATOR","IDENTIFIER"]:
## make dict in current dict
pass

elif identifier_two_type[0] in "OPEN_BRACKET":
## make array in current dict
pass

elif identifier_type[0] in "OPERATOR":
## insert data into current dict
pass


if token_type in "CLOSE_BRACKET":
identifier_type = self.getToken()
if "OPEN_BRACKET" in identifier_type[0]:
#still in array of current dict
pass
elif "IDENTIFIER" in identifier_type[0]:
self.changeDirectory()

else:
#end script
pass

self.increment()
print(self.path)
return self.json_object


def changeDirectory(self):
if len(self.path) > 0:
self.path = self.path.pop()
self.current_object = -1

def increment(self):
if self.token_index < len(self.tokens):
self.token_index+=1

def getToken(self, x=0):
return self.tokens[self.token_index+x]


Additional parse information,
Currently, I was trying to store the current dictionary in a path array to allow me to insert into dictionaries and arrays within dictionaries.



Any suggestions or solutions are very much appreciated,



Thanks.










share|improve this question






















  • In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
    – Matt Timmermans
    Nov 13 '18 at 4:56










  • Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
    – AlexMika
    Nov 13 '18 at 23:34














3












3








3







I'm attempting to create a parser to translate a "custom" file into JSON so I can more easily manipulate its contents (For argument's sake, call the "custom" formate a .qwerty).



I've already created a Lexer which breaks down the file into individual lexemes (tokens) which structure is [token_type, token_value]. Now I am struggling to parse the lexemes into their correct dictionaries, as it is difficult to insert data into a sub-sub-dictionary since Keys aren't constant. As well as insert data into arrays stored in dictionaries.



It should be noted I am attempting to sequentially parse tokens into an actual python json object then dump the json object.



An example of the file can be seen below, along with what the end result is meant to resemble.



FILE: ABC.querty



Dict_abc_1{

Dict_abc_2{
HeaderGUID="";
Version_TPI="999";
EncryptionType="0";
}

Dict_abc_3{
FamilyName="John Doe";
}

Dict_abc_4{
Array_abc{
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
}

Dict_abc_5{
LastContact="2018-11-08 01:00:00";
BatteryStatus=99;
BUStatus=PowerOn;
LastCallTime="2018-11-08 01:12:46";
LastSuccessPoll="2018-11-08 01:12:46";
CallResult=Successful;
}
}
}
Code=999999;


FILE: ABC.json



{  
"Dict_abc_1":{
"Dict_abc_2":{
"HeaderGUID":"",
"Version_TPI":"999",
"EncryptionType":"0"
},

"Dict_abc_3":{
"FamilyName":"John Doe"
},

"Dict_abc_4":{
"Array_abc":[
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""}
],

"Dict_abc_5":{
"LastContact":"2018-11-08 01:00:00",
"BatteryStatus":99,
"BUStatus":"PowerOn",
"LastCallTime":"2018-11-08 01:12:46",
"LastSuccessPoll":"2018-11-08 01:12:46",
"CallResult":"Successful"
}
}
},
"Code":999999
}


Additional token information,
Token types can either be (with possible values)





  • IDENTIFIER contain the name of the variable identifier


  • VARIABLE containing actual data belonging to the parent IDENTIFIER


  • OPERATOR equal "="


  • OPEN_BRACKET equal "{"


  • CLOSE_BRACKET equal "}"


An example of ABC.querty's lexemes can be seen HERE



fundamental logical extract of main.py



def main():
content = open_file(file_name) ## read file

lexer = Lexer(content) ## create lexer class
tokens = lexer.tokenize() ## create lexems as seen in pastebin

parser = Parser(tokens).parse() ## create parser class given tokens

print(json.dumps(parser, sort_keys=True,indent=4, separators=(',', ': ')))


parser.py



import re

class Parser(object):
def __init__(self, tokens):
self.tokens = tokens
self.token_index = 0
self.json_object = {}
self.current_object = {}
self.path = [self.json_object]

def parse(self):
while self.token_index < len(self.tokens):

token = self.getToken()
token_type = token[0]
token_value = token[1]

print("%s t %s" % (token_type, token_value))

if token_type in "IDENTIFIER":
self.increment()
identifier_type = self.getToken()
if identifier_type[0] in "OPEN_BRACKET":
identifier_two_type = self.getToken(1)
if identifier_two_type[0] in ["OPERATOR","IDENTIFIER"]:
## make dict in current dict
pass

elif identifier_two_type[0] in "OPEN_BRACKET":
## make array in current dict
pass

elif identifier_type[0] in "OPERATOR":
## insert data into current dict
pass


if token_type in "CLOSE_BRACKET":
identifier_type = self.getToken()
if "OPEN_BRACKET" in identifier_type[0]:
#still in array of current dict
pass
elif "IDENTIFIER" in identifier_type[0]:
self.changeDirectory()

else:
#end script
pass

self.increment()
print(self.path)
return self.json_object


def changeDirectory(self):
if len(self.path) > 0:
self.path = self.path.pop()
self.current_object = -1

def increment(self):
if self.token_index < len(self.tokens):
self.token_index+=1

def getToken(self, x=0):
return self.tokens[self.token_index+x]


Additional parse information,
Currently, I was trying to store the current dictionary in a path array to allow me to insert into dictionaries and arrays within dictionaries.



Any suggestions or solutions are very much appreciated,



Thanks.










share|improve this question













I'm attempting to create a parser to translate a "custom" file into JSON so I can more easily manipulate its contents (For argument's sake, call the "custom" formate a .qwerty).



I've already created a Lexer which breaks down the file into individual lexemes (tokens) which structure is [token_type, token_value]. Now I am struggling to parse the lexemes into their correct dictionaries, as it is difficult to insert data into a sub-sub-dictionary since Keys aren't constant. As well as insert data into arrays stored in dictionaries.



It should be noted I am attempting to sequentially parse tokens into an actual python json object then dump the json object.



An example of the file can be seen below, along with what the end result is meant to resemble.



FILE: ABC.querty



Dict_abc_1{

Dict_abc_2{
HeaderGUID="";
Version_TPI="999";
EncryptionType="0";
}

Dict_abc_3{
FamilyName="John Doe";
}

Dict_abc_4{
Array_abc{
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 01:00:00"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
{TimeStamp="2018-11-07 02:53:57"; otherinfo="";}
}

Dict_abc_5{
LastContact="2018-11-08 01:00:00";
BatteryStatus=99;
BUStatus=PowerOn;
LastCallTime="2018-11-08 01:12:46";
LastSuccessPoll="2018-11-08 01:12:46";
CallResult=Successful;
}
}
}
Code=999999;


FILE: ABC.json



{  
"Dict_abc_1":{
"Dict_abc_2":{
"HeaderGUID":"",
"Version_TPI":"999",
"EncryptionType":"0"
},

"Dict_abc_3":{
"FamilyName":"John Doe"
},

"Dict_abc_4":{
"Array_abc":[
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 01:00:00", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""},
{"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""}
],

"Dict_abc_5":{
"LastContact":"2018-11-08 01:00:00",
"BatteryStatus":99,
"BUStatus":"PowerOn",
"LastCallTime":"2018-11-08 01:12:46",
"LastSuccessPoll":"2018-11-08 01:12:46",
"CallResult":"Successful"
}
}
},
"Code":999999
}


Additional token information,
Token types can either be (with possible values)





  • IDENTIFIER contain the name of the variable identifier


  • VARIABLE containing actual data belonging to the parent IDENTIFIER


  • OPERATOR equal "="


  • OPEN_BRACKET equal "{"


  • CLOSE_BRACKET equal "}"


An example of ABC.querty's lexemes can be seen HERE



fundamental logical extract of main.py



def main():
content = open_file(file_name) ## read file

lexer = Lexer(content) ## create lexer class
tokens = lexer.tokenize() ## create lexems as seen in pastebin

parser = Parser(tokens).parse() ## create parser class given tokens

print(json.dumps(parser, sort_keys=True,indent=4, separators=(',', ': ')))


parser.py



import re

class Parser(object):
def __init__(self, tokens):
self.tokens = tokens
self.token_index = 0
self.json_object = {}
self.current_object = {}
self.path = [self.json_object]

def parse(self):
while self.token_index < len(self.tokens):

token = self.getToken()
token_type = token[0]
token_value = token[1]

print("%s t %s" % (token_type, token_value))

if token_type in "IDENTIFIER":
self.increment()
identifier_type = self.getToken()
if identifier_type[0] in "OPEN_BRACKET":
identifier_two_type = self.getToken(1)
if identifier_two_type[0] in ["OPERATOR","IDENTIFIER"]:
## make dict in current dict
pass

elif identifier_two_type[0] in "OPEN_BRACKET":
## make array in current dict
pass

elif identifier_type[0] in "OPERATOR":
## insert data into current dict
pass


if token_type in "CLOSE_BRACKET":
identifier_type = self.getToken()
if "OPEN_BRACKET" in identifier_type[0]:
#still in array of current dict
pass
elif "IDENTIFIER" in identifier_type[0]:
self.changeDirectory()

else:
#end script
pass

self.increment()
print(self.path)
return self.json_object


def changeDirectory(self):
if len(self.path) > 0:
self.path = self.path.pop()
self.current_object = -1

def increment(self):
if self.token_index < len(self.tokens):
self.token_index+=1

def getToken(self, x=0):
return self.tokens[self.token_index+x]


Additional parse information,
Currently, I was trying to store the current dictionary in a path array to allow me to insert into dictionaries and arrays within dictionaries.



Any suggestions or solutions are very much appreciated,



Thanks.







python json algorithm parsing lexical-analysis






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 0:38









AlexMikaAlexMika

857




857












  • In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
    – Matt Timmermans
    Nov 13 '18 at 4:56










  • Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
    – AlexMika
    Nov 13 '18 at 23:34


















  • In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
    – Matt Timmermans
    Nov 13 '18 at 4:56










  • Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
    – AlexMika
    Nov 13 '18 at 23:34
















In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
– Matt Timmermans
Nov 13 '18 at 4:56




In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
– Matt Timmermans
Nov 13 '18 at 4:56












Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
– AlexMika
Nov 13 '18 at 23:34




Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
– AlexMika
Nov 13 '18 at 23:34












1 Answer
1






active

oldest

votes


















1














Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.



For example "FamilyName":"John Doe". Tokens are "FamilyName", : and "John Doe".



You add first token on stack.
stack = ["FamilyName"].
Rule 1: str_obj -> E. So you create Expression(type='str', value="FamilyName") and stack is now stack = [Expression].



Then you add next token.
stack = [Expression, ':']. No rules for ':'. Go next.



stack = [Expression, ':', "FamilyName"]. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]. Then we see another rule. Rule 2: E:E -> E. Use it like Expression(type='kv_pair, value=(Expression, Expression)). And stack becomes stack=[Expression].



And if you describes all the rules it will work like that. Hope it helps.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272128%2fcustom-parser-for-file-similar-to-json-python%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.



    For example "FamilyName":"John Doe". Tokens are "FamilyName", : and "John Doe".



    You add first token on stack.
    stack = ["FamilyName"].
    Rule 1: str_obj -> E. So you create Expression(type='str', value="FamilyName") and stack is now stack = [Expression].



    Then you add next token.
    stack = [Expression, ':']. No rules for ':'. Go next.



    stack = [Expression, ':', "FamilyName"]. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]. Then we see another rule. Rule 2: E:E -> E. Use it like Expression(type='kv_pair, value=(Expression, Expression)). And stack becomes stack=[Expression].



    And if you describes all the rules it will work like that. Hope it helps.






    share|improve this answer


























      1














      Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.



      For example "FamilyName":"John Doe". Tokens are "FamilyName", : and "John Doe".



      You add first token on stack.
      stack = ["FamilyName"].
      Rule 1: str_obj -> E. So you create Expression(type='str', value="FamilyName") and stack is now stack = [Expression].



      Then you add next token.
      stack = [Expression, ':']. No rules for ':'. Go next.



      stack = [Expression, ':', "FamilyName"]. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]. Then we see another rule. Rule 2: E:E -> E. Use it like Expression(type='kv_pair, value=(Expression, Expression)). And stack becomes stack=[Expression].



      And if you describes all the rules it will work like that. Hope it helps.






      share|improve this answer
























        1












        1








        1






        Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.



        For example "FamilyName":"John Doe". Tokens are "FamilyName", : and "John Doe".



        You add first token on stack.
        stack = ["FamilyName"].
        Rule 1: str_obj -> E. So you create Expression(type='str', value="FamilyName") and stack is now stack = [Expression].



        Then you add next token.
        stack = [Expression, ':']. No rules for ':'. Go next.



        stack = [Expression, ':', "FamilyName"]. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]. Then we see another rule. Rule 2: E:E -> E. Use it like Expression(type='kv_pair, value=(Expression, Expression)). And stack becomes stack=[Expression].



        And if you describes all the rules it will work like that. Hope it helps.






        share|improve this answer












        Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.



        For example "FamilyName":"John Doe". Tokens are "FamilyName", : and "John Doe".



        You add first token on stack.
        stack = ["FamilyName"].
        Rule 1: str_obj -> E. So you create Expression(type='str', value="FamilyName") and stack is now stack = [Expression].



        Then you add next token.
        stack = [Expression, ':']. No rules for ':'. Go next.



        stack = [Expression, ':', "FamilyName"]. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]. Then we see another rule. Rule 2: E:E -> E. Use it like Expression(type='kv_pair, value=(Expression, Expression)). And stack becomes stack=[Expression].



        And if you describes all the rules it will work like that. Hope it helps.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 13 '18 at 1:09









        sashaaerosashaaero

        8431720




        8431720






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272128%2fcustom-parser-for-file-similar-to-json-python%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini