読者です 読者をやめる 読者になる 読者になる

言語処理100本ノック

35. 名詞の連接

名詞の連接(連続して出現する名詞)を最長一致で抽出せよ

import re

sentences = []
with open("D:\\nlp100\\neko.txt.mecab",encoding="UTF-8") as fr:
    line = fr.readline()
    keitaiso = []
    while line:
        if "EOS" in line:
            if len(keitaiso)>0:
                sentences.append(keitaiso)
                keitaiso = []
        else:
            line = re.split(r'[\t,]',line)            
            keitaiso.append({"surface":line[0],"base":line[7],"pos":line[1],"pos1":line[2]})
        line =fr.readline()

rensetu = []
for sentence in sentences:
    for index in range(0,len(sentence)):
        if sentence[index]["pos"] == "名詞":
            rensetu.append(sentence[index]["surface"])
        else:
            if len(rensetu) > 1:
                print(rensetu)
            rensetu = []

 <結果 一部>
['人間', '中']
['一番', '獰悪']
['時', '妙']
['一', '毛']
['その後', '猫']
['一', '度']
['ぷうぷうと', '煙']
['邸', '内']
['三', '毛']
['書生', '以外']
['四', '五', '遍']
['この間', 'おさん']