IMPARA: Impact-Based Metric for GEC Using Parallel Data
Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022) · October 2022
Proposal of a new impact-based metric for grammatical error correction using parallel datasets.
BibTeX
@inproceedings{maeda2022impara,
  title = {IMPARA: Impact-Based Metric for GEC Using Parallel Data},
  author = {Koki Maeda and Masahiro Kaneko and Naoaki Okazaki},
  booktitle = {Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022)},
  pages = {3578--3588},
  year = {2022},
  address = {Gyeongju, Republic of Korea},
  publisher = {International Committee on Computational Linguistics}
}
    Abstract
The paper introduces IMPARA (Impact-based Metric for GEC using PARAllel data), a novel reference-less metric for automatic evaluation of grammatical error correction (GEC) systems. Unlike existing methods relying on manual assessments or multiple reference sentences, IMPARA uses parallel data (pairs of grammatical and ungrammatical sentences) to compute the impact of individual corrections. This approach significantly reduces data creation costs and adapts well across different domains and correction styles.
Methodology
- Architecture:
    
- Combines a Quality Estimator (QE) and Similarity Estimator (SE), both based on BERT.
 - QE assesses correction quality, and SE measures semantic similarity between original and corrected sentences.
 - Score calculation is conditional on semantic similarity exceeding a threshold to ensure meaning preservation.
 
 - Quality Estimator (QE):
    
- Trained by generating pairs of partially corrected sentences, using their preference order based on impact scores computed through edit operations identified by ERRANT.
 - QE output is calculated using a fine-tuned BERT model with a linear layer.
 
 - Similarity Estimator (SE):
    
- Computes cosine similarity between sentence embeddings generated by a pre-trained BERT model.
 
 - Impact Calculation:
    
- Edits are automatically extracted using ERRANT.
 - Impact of each edit is defined by semantic changes measured using BERT embeddings.
 - Overall sentence impact is the sum of individual edit impacts.
 
 
Experimental Results
- Evaluation:
    
- Demonstrates IMPARA performs comparably or better than other reference-less metrics.
 - High correlation with human assessments and significantly better adaptability to various domains and correction styles.
 
 - Datasets:
    
- Evaluated across multiple benchmarks including CoNLL-2014, JFLEG, CWEB, and FCE.
 
 
Key Contributions
- Provides a cost-effective and scalable metric for automatic evaluation of GEC.
 - Shows robustness and adaptability across diverse domains without the need for additional human annotations.
 
Conclusion and Future Work
The paper proposes IMPARA as a practical and effective evaluation method. Future directions include enhancing interpretability and integrating synthetic parallel data to further reduce data creation costs and improve evaluation quality.