User Tools

Site Tools


projects:year7:7a.029.tut

7a.029.TUT - Very Fast Nearest Neighbor Retrieval in High-Dimensional Domains

Project - Summary

Nearest neighbor search is a core part of several machine learning algorithms. It finds applications in areas such as information retrieval, clustering and visualization. In many of the problem domains, the number of samples and the dimensionality of data render the exact neighbor search infeasible. This project investigates variants of approximate nearest neighbor (ANN) algorithms based on random projection trees and their possible applications to text, audio, image and other signal data.

Project - Team

Team Member Role Email Phone Number Academic Site/IAB
Teemu Roos PI teemu.roos@cs.helsinki.fi Not available University of Helsinki
Petri Myllymäki Co-PI petri.myllymaki@helsinki.fi +358 40 5531162 University of Helsinki
Jukka Kohonen Researcher jukka.kohonen@cs.helsinki.fi Not available University of Helsinki
Ville Hyvonen Student ville.hyvonen@cs.helsinki.fi Not available University of Helsinki
Janne Leppa-Aho Student Not Available Not available University of Helsinki
Gur Ersalan Student Not Available Not available University of Helsinki
Kimmo Valtonen Project Mentor kimmo.valtonen@m-brain.com +358 45 1218075 M-Brain

Project - Novelty of Approach

We use multiple random projection trees, which allows for a trade-off between the computational cost and the probability of retrieving the closest neighbor or neighbors

Project - Deliverables

Deliverables
1 An open-source package for parallelized ANN.
2 Concrete applications of ANN in large-scale document classification, clustering, etc.
3 Theoretical framework for the study of parallel ANN and other machine learning procedures.
4 Optional deliverable: Incremental algorithms for dynamic (streaming) data.

Project - Benefits to IAB

The research conducted under this project is featured in published papers [2] and [3]. The research regarding the automatic tuning is incorporated to an open-source package for ANN search available on GitHub: https://github.com/vioshyvo/mrpt

[2] U. Sheth, S. Dutta, M. Chaudhari, H. Jeong, Y. Yang, J. Kohonen, T. Roos, and P. Grover (2018). An application of storage-optimal MatDot codes for coded matrix multiplication: Fast k-nearest neighbors estimation, IEEE Big Data Conference, Seattle, Dec. 10–13, 2018.

[3] E. Jääsaari, V. Hyvönen, T. Roos (2019). Efficient autotuning of hyperparameters in approximate nearest neighbor search, in Proc. 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-2019), Macau, China.

Project - Presentation Video

Project - Documents

FilenameFilesizeLast modified
7a.029.tut_ip_info_sheet.docx18.7 KiB2019/08/19 12:47
7a.029.tut_final_report.docx576.9 KiB2019/08/19 12:43
7a.029.tut_quad_chart_2018_spring_meeting.pdf4.5 MiB2019/08/13 15:08
7a.029.tut_very_fast_nearest_neighbor_retrieval_quad_2017_fall_meeting.pdf4.4 MiB2019/08/13 15:08
7a.029.tut_confluence_project_page.pdf148.0 KiB2019/08/13 15:08
7a.029.tut_2018_fall_meeting_poster.pptx263.7 KiB2019/08/13 15:08
7a.029.tut_mid-year_report.pdf313.8 KiB2019/08/13 15:08
7a.029.tut_executive_summary.pdf106.8 KiB2019/08/13 15:08
projects/year7/7a.029.tut.txt · Last modified: 2021/06/02 15:21 by sally.johnson