.

Editorial
IRCC Notes


In Brief
Consultancy Projects
Sponsored Projects
MOUs
Awards

Technologies & Products
Artificial Hand
Automatic Address
    Segmentation

3-in-1 Heat Pump
Keyboard-Text Input
    Indian Languages

Seminars at IRCC

Articles By
S. Kotha

Narayan Rangaraj

Sandip Roy
Kushal Deb
Karuna Jain
Prema Prakash

Board of Governors
Archives

Weblinks
Team / Contact


Large organizations like banks, government departments, universities, and corporations need to handle massive databases of postal addresses. These databases are often poorly structured and frequently accumulate several duplicate entries for the same person. Hence, such organizations periodically engage in a data cleaning or warehousing activity where addresses are stored in a standard format, with duplicates removed. A key step in this process is address segmentation that involves extracting from address strings, individual structured fields like 'Landmarks', 'House number', and 'State'. In the less structured Indian addressing system, existing commercial approaches require extensive manual effort due to various reasons like: non-uniform building numbering schemes, reliance on ad hoc descriptive landmarks, changing city names, non-standard abbreviations of state names and style of writing addresses, spelling mistakes, and optional zip codes.

Prof. Sunita Sarawagi and her team at the Kanwal Rekhi School of Information Technology (KReSIT) have developed a software tool that will 'learn' a model for segmenting unseen addresses when ‘trained’ with some examples of segmented addresses. The underlying model is a powerful statistical machine-learning technique that can handle new data robustly, is computationally efficient, and is easy for humans to interpret and tweak in order to rectify the address segmentation problem. Experiments using nationwide, heterogeneous collections of actual addresses showed encouraging results, with high levels of accuracy. The software is now licensed to a data cleaning company in India, and is being deployed commercially.

Contact: Prof Sunita Sarawagi, sunita@iitb.ac.in

 

Home | Top