Sunday, July 3, 2022
Techiexpert.com
No Result
View All Result
  • Login
  • Register
Exclusive Videos
  • Tech news
  • Startup news
  • Artificial Intelligence
  • IOT
  • Big Data
  • Cloud
  • Data Analytics
  • Machine Learning
  • Blockchain
  • Social Media
  • Tech news
  • Startup news
  • Artificial Intelligence
  • IOT
  • Big Data
  • Cloud
  • Data Analytics
  • Machine Learning
  • Blockchain
  • Social Media
No Result
View All Result
TechiExpert
No Result
View All Result

AI Models to Process Text in 11 Indian Regional Languages

Srikanth by Srikanth
November 15, 2020
in Tech news
Reading Time: 4 mins read
ModelOp to Help Enterprises put AI and Machine Learning Models in Business
11
SHARES
142
VIEWS
Share on FacebookShare on Twitter

This is a unique attempt in Academia to develop and publicly release such large scale multilingual AI models containing millions of parameters trained on billions of tokens from Indian languages

Indian Institute of Technology Madras Faculty have developed Artificial Intelligence Models and datasets to process texts in 11 Indian Regional Languages. This was taken up jointly with ‘AI4Bharat,’ a platform for building AI solutions for problems of relevance to India.

The researchers from IIT Madras and AI4Bharat released AI models and datasets for the following languages: Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati, and Marathi. The multilingual AI models and datasets developed through this initiative will provide the essential building blocks to students, faculty, start-ups and industry to work on Indian language tools and push the frontiers of technology.

The Faculty have made these cutting-edge resources open-source and completely free of cost, which can be accessed by anyone. These models are freely available and can be downloaded from a Github repository (https://indicnlp.ai4bharat.org/). An accompanying research paper describing the research methodologies and evaluation has been accepted at EMNLP-Findings (a companion publication at one of the top Natural Language Processing conferences).

Elaborating on this initiative, Dr. Mitesh M. Khapra, Assistant Professor, Department of Computer Science and Engineering, IIT Madras, said, “We have a very rich diversity of languages in our country. As we move towards a digital economy, it is important that our languages find a space online. This requires a lot of innovation in creating input tools, datasets, and AI models for Indian languages.”

For example, imagine a learner who posts a question on an e-learning platform in Tamil or Hindi or any other numerous Indian regional languages. There is a need for tools that can automatically process such questions written in Indian languages and classify them into specific topics.

“While such tools are available for English and other foreign languages, there are hardly any tools for Indian languages and this is the critical gap that we are trying to address through this initiative. These models are available free of cost as we want the entire country to benefit from them,” added Dr. Mitesh Khapra.

AI4Bharat is an initiative co-founded by Dr. Mitesh M Khapra and Dr. Pratyush Kumar from IIT Madras and works to solve India specific problems in a community-driven, open-sourced manner. Both Dr. Mitesh Khapra and Dr. Pratyush Kumar are also associated with the Robert Bosch Centre for Data Science and Artificial Intelligence.

Speaking about the technology behind this initiative, Dr. Anoop Kunchukuttan, a volunteer at AI4Bharat and the lead researcher on this project, said, “We have an urgent responsibility to take the rapid advances of AI and make them accessible to the common man. One way of achieving this is to improve interactions between humans and machines. That is where the field of Natural Language Processing (NLP) comes in.  NLP is a branch of AI that deals with the interaction between computers and humans using natural language.”

Adding on, Dr. Pratyush Kumar, Assistant Professor, Department of Computer Science and Engineering, IIT Madras, said, “This initiative is one of the few attempts in Academia to develop and publicly release such large scale multilingual AI models containing millions of parameters trained on billions of tokens from 11 Indian languages, completely free and open-source.”

For the past one year, a team of researchers comprising students, faculty and volunteers from IIT Madras and AI4Bharat worked on collecting data and training powerful models for processing text written in Indian languages. The models take advantage of the similarities between Indian languages to make efficient use of data. With these models, the researchers have been able to push the state-of-the-art for Indian language processing on several tasks such as document classification, sentiment analysis, semantic matching, paraphrase detection and so on.

Highlighting the work done on Natural Language Processing, Dr. Pratyush Kumar said, “Modern NLP systems are driven by Deep Learning. A fundamental piece of these systems are language models, which capture meanings of words and sentences and their relations and require a large amount of data to train. The unavailability of such data has prevented the development of such models for Indian languages. As a result, Indian NLP has not been able to progress at the rate at which it should.”

Dr. Anoop Kunchukuttan added, “We really hope that start-ups and social initiatives working on Indian language technologies will be able to take our pre-trained models and adapt them to specific use cases by collecting smaller amounts of in-domain data.”

Mr. Divyanshu Kakwani, a Master’s student at IIT Madras and the Lead Student Researcher on this project, said, “I am happy to make a contribution to this project which has the potential to create impact. I hope our efforts inspire other students to work on Indian language technologies. All our models are publicly available and I am curious to see how others take this forward.”

The Research Team hopes that this initiative will serve as a ‘call to action’ for academia, government and industry to come together and develop bigger and more diverse datasets for Indian languages. Data drives AI technology and it is time to make a serious investment in building datasets for Indian languages.

Natural language processing is a subfield of linguistics, computer science, and Artificial Intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.

Tags: AI NewsIIT Madras
Share4Tweet3Share1Pin2

Related Posts

Telemedicine Business
Tech news

Telemedicine or e-medicine: What is it?

Ways Data Analysis has changed customer reward programs
Tech news

Ways Data Analysis has changed customer reward programs

Digital Learning face recognition
Tech news

Here’s why Deep Learning might not be enough for celebrity face recognition

VPS Helps Forex Trading
Tech news

Staying Safe When Trading on a Trading App

Mozilla launches Thinderbird for android
Tech news

Mozilla launches Thinderbird for android

Most Read

  • How to Track Someone’s iPhone by Phone Number?

    How to Track Someone’s iPhone by Phone Number?

    471 shares
    Share 188 Tweet 118
  • Top 5 car automation trends to know

    251 shares
    Share 100 Tweet 63
  • Is Parody Coin investment a Good Investment?

    94 shares
    Share 38 Tweet 24
  • What is windows modules installer ? How to Enable/Disable

    1243 shares
    Share 497 Tweet 311
  • Tips to Reduce Your Website Hosting Costs

    880 shares
    Share 352 Tweet 220
  • How to Track Activities an Instagram account?

    87 shares
    Share 35 Tweet 22

Recent Stories

Doing Cleanup: 5 Types of Links You Should Disavow

Backlinks
Share4Tweet3Share1Pin1

Hyperlocal marketplace Urvann raises Rs. 3 Cr in Seed Round led by IPV

Hyperlocal marketplace Urvann raises Rs. 3 Cr in Seed Round led by IPV
Share5Tweet3Share1Pin2

Does domain extensions impact SEO standards

Does domain extension impact SEO standards
Share5Tweet3Share1Pin2

How Enterprise Blockchain can enable Privacy Preservation

How Enterprise Blockchain can enable Privacy Preservation
Share5Tweet3Share1Pin1
  • Terms of use
  • Privacy Policy
  • About Us
  • Contact us
  • Write For Us
  • Cookie Policy

© 2022 All Rights Reserved

No Result
View All Result
  • Tech news
  • Startup news
  • Artificial Intelligence
  • IOT
  • Big Data
  • Cloud
  • Data Analytics
  • Machine Learning
  • Blockchain
  • Social Media

© 2022 All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Cookie Law Notice
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT