人工智能应用研究组研究生严景松同学的论文中稿2023 IEEE/ACM Transactions on Audio, Speech, and Language Processing期刊,该期刊是NLP领域的顶级期刊之一。
论文题目:《Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text Classification》
论文摘要:Hierarchical Text Classification (HTC) is an essential and challenging task due to the difficulty of modeling label hierarchy. Recent generative methods have achieved state-of-the-art performance by flattening the local label hierarchy into a label sequence with a specific order. However, the order between labels does not naturally exist and the generation of the current label should incorporate the information in all other target labels. Moreover, the generative methods usually suffer from the error accumulation problem. To this end, we propose a new framework named sequence-to-label (Seq2Label) with a random generative way to learn label hierarchy for hierarchical text classification. Instead of using only one specific order, we shuffle the label sequence by a Label Sequence Random Shuffling (LSRS) mechanism so that a text will be mapped to several different order label sequences during the training phase. To allev- iate the error accumulation problem, we further propose a Hierarchy-aware Negative Sampling (HNS) strategy with a negative label-aware loss to better distinguish target labels and negative labels. In this way, our model can capture the hierarchical and co-occurrence information of the target labels of each text. The experimental results on three benchmark datasets show that Seq2Label achieves state-of-the-art results.
论文模型图:
论文链接:TASLP-Seq2Label.pdf