CICS Doctoral Student Kalpesh Krishna Awarded Google PhD Fellowship
Content
Robert and Donna Manning College of Information and Computer Sciences (CICS) doctoral student Kalpesh Krishna has been announced as a 2021 Google PhD Fellowship recipient in recognition of his research work in the field of natural language processing (NLP).
Krishna’s proposal, “Towards Real-World Deployment of Text Generation,” presents his research towards making text generation AI technologies more accurate, secure, and robust in real-world deployments.
Text generation technologies, found in a number of online applications, provide the capacity of generating fluent text with minimal grammatical errors. As an example, users of Google Docs are familiar with the autocomplete function that can complete a word or even a simple sentence after typing just 2-3 letters.
However, as Krishna explains, there are still several fundamental limitations of current text generators when placed into real-world deployment. First, it is difficult to change the style of text (such as tone, level of formality, or type of emotion) without also changing the essential meanings of the text. Second, text generators are vulnerable to hallucination (making up false facts) and factual inaccuracies. Third, these generators can have significant security vulnerabilities.
Style transfer of text is a fundamental task in NLP, providing the ability to automatically generate text which adapts to different situations and purposes. For instance, when a user chats with friends, they may want an autocomplete tool that can generate texts with a bit of humor and abbreviation, but if they are speaking to a supervisor or professor, they may want suggestions that are more formal and polite. While systems exist to perform this task, many of them tend to accidentally alter the semantic properties (the content and meaning).
With John Wieting of Google Research and Assistant Professor Mohit Iyyer of CICS, Krishna improved existing style transfer systems by using a simple two-level paraphrasing algorithm: first normalizing text by removing its stylistic properties, and then rewriting the plain text with the desired style. With this algorithm, the models are able to learn and use the distinct features of the style, while keeping the semantics same. This paper was recently published at EMNLP 2020.
According to Krishna, hallucination and factual inaccuracies—fundamental shortcomings in AI systems that deploy to the real world—can be addressed by teaching text generators to use information retrieval. “It's the same as how you or I would look up facts in external sources like books and search engines,” he explains. “Similarly, the models may need to get input from sources of relevant information to expand their real-world knowledge.” Krishna recently explored this topic in a research paper on long-form question answering with Iyyer and Aurko Roy from Google Research, which was published at NAACL 2021.
Finally, Krishna plans to remedy the lack of current research on NLP privacy vulnerabilities, specifically for systems that have publicly accessible machine learning inference APIs. According to Krishna, these models are vulnerable to model extraction attacks (ICLR 2020 paper), which copy a model through using its API to steal its intellectual properties, which would allow for private information contained in the training data of the original model to be extracted from the reconstructed model. He proposes to design better defenses against these attacks by applying diverse paraphrasing to the outputs of the API, which could make it more difficult to extract specific answers.
Krishna’s research interest lies in the intersection of NLP, text generation and machine learning security and privacy. He is currently working on problems in text generation under the supervision of Assistant Professor Mohit Iyyer in the UMass Natural Language Processing lab.
The Google PhD Fellowship Program is designed to recognize outstanding graduate students doing exceptional work in computer science, related disciplines, or promising research areas.