Naoaki Okazaki

Naoaki Okazaki

Professor, PhD

Department of Computer Science, School of Computing, Tokyo Institute of Technology

  •  

Biography

  • Professor, Okazaki Lab, School of Computing, Tokyo Institute of Technology (2017)
  • Associate Professor, Inui-Okazaki Lab, Graduate School of Information Sciences, Tohoku University (2011)
  • Researcher, Tsujii Lab, Graduate School of Interdisciplinary Information Studies, University of Tokyo (2009)
  • Researcher, Graduate School of Information Science and Technology, University of Tokyo (2007)
  • Research fellow, National Centre for Text Mining, University of Manchester (2005)
  • PhD, Graduate School of Information Science and Technology, University of Tokyo (2007)
  • MSc, Graduate School of Information Science and Technology, University of Tokyo (2003)
  • BSc, School of Engineering, University of Tokyo (2001)
  • Tochigi Prefectural Utsunomiya High School (1997)

Courses

  • Foundations of Computer Science I, II (V5.6) (2017-)
  • Information Communication Theory (2011-2016) (past at Tohoku University)
  • Programming Practice A (2011-2017) (past at Tohoku University)
  • 電気・通信・電子・情報工学実験C (2013-2016) (past at Tohoku University)
  • Basic Computer Science (2013-2016) (past at Tohoku University)

Software

CRFsuite

CRFSuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data. The first priority of this software is to train and use CRF models as fast as possible even at the expense of its memory space and code generality. CRFsuite runs 5.4 - 61.8 times faster than C++ implementations for training.

libLBFGS

libLBFGS is a C port of the implementation of Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method written by Jorge Nocedal in FORTRAN. Unlike C codes generated automatically by f2c (Fortran 77 into C converter), this port includes changes based on my interpretations, improvements, optimizations, and clean-ups so that the ported code would be well-suited for a C code. It facilitates some optimizations and enhancements such as callback interface, thread safety, and SSE/SSE2 optimization.

SimString

SimString is an implementation of a simple and efficient algorithm for approximate string matching. Approximate string matching is the operation to retrieve strings in a string collection (database) whose similarity with a query string is no smaller than a threshold. Finding not only identical but also similar strings, SimString facilitates various applications including spelling correction, fuzzy search, approximate dictionary matching, duplicate record detection, database merging.

Classias

Classias is a collection of machine-learning algorithms for classification. Currently, this software supports the following formalizations: L1/L2-regularized logistic regression (aka. Maximum Entropy); L1/L2-regularized L1-loss linear-kernel Support Vector Machine (SVM); and Averaged perceptron. It implements several algorithms for training classifiers: Averaged perceptron, L-BFGS, OWL-QN, Pegasos, Truncated Gradient.

C++ implementation of Constant Database (CDB++)

C++ implementation of Constant Database (CDB++) is a light-weight library for static hash database. By including a single header file (cdbpp.h), one can build compact hash database and search entries very quickly. However, CDB++ does not support dynamic update and deletion of elements from an existing database. CDB++ is suitable for implementing a database in which fast look-ups of keys and their values are essential while a database update rarely occurs.

Static Double Array Trie (DASTrie)

Static Double Array Trie (DASTrie) is a C++ template library of static double-array trie. For the simplicity and efficiency, DASTrie focuses on building a static double array from a list of records sorted by dictionary order of keys. One can implement associative arrays (e.g., std::map) with arbitrary value types and/or sets (e.g., std::set), only by including a header file. DASTrie implements double arrays whose each element is 4 or 5 bytes long, whereas most implementations consume 8 bytes for an double-array element.

Activities (International)

Executives

  • Members-at-Large (MAL), Asian Federation of Natural Language Processing (AFNLP), 2017-2018.

Editor for international journals

  • Editorial board, Computational Intelligence, January 2015 to December 2017.
  • Standing reviewer team, Transactions of the Association for Computational Linguistics, November 2014 to June 2018.

Reviewer for international journals

  • AI Communications (2014)
  • American Society for Information Science and Technology (2009)
  • Applied Clinical Informatics (2014)
  • Bioinformatics (2016, 2017)
  • BMC Bioinformatics (2010)
  • Cheminformatics (2014)
  • Computational Intelligence (2011, 2012, 2013)
  • Computers in Industry (2015)
  • Data and Knowledge Engineering (2016)
  • IEICE Transaction on Information and Systems (2010, 2012, 2016)
  • IEEE Transaction on Neural Networks and Learning Systems (2016)
  • Information Processing (2015)
  • Information Processing and Management (2011)
  • Information Sciences (2011)
  • Journal of Cheminformatics (2014)
  • Language Resources and Evaluation (2012)
  • Machine Learning Research (2009, 2012, 2015, 2016)
  • Transactions of the Association for Computational Linguistics (2014, 2015, 2016, 2017)
  • Transactions on Knowledge and Data Engineering (2012)
  • Transactions on Management Information Systems (2013)
  • Journal of Medical Internet Research (2017)

International conferences

  • General co-chairs, Young Researchers Symposium on Natural Language Processing 2016 (YRSNLP 2016)
  • Area co-chairs, ACL 2012 (for Lexical Semantics)
  • Area co-chairs, ACL 2016 (for Machine Learning)
  • Workshop co-chairs, IJCNLP 2013
  • Publication chair, EMNLP-CoNLL 2012
  • Program committee, AAAI 2011, 2014, 2015, 2017
  • Program committee, ACL 2009, 2010, 2013, 2015, 2016, 2017
  • Program committee, BigComp 2015, 2016
  • Program committee, BioNLP 2011, 2013, 2015, 2016, 2017
  • Program committee, BioTxtM 2012, 2014, 2016
  • Program committee, Coling 2008, 2010, 2012, 2014, 2016
  • Program committee, CoNLL 2014, 2015
  • Program committee, DTMBIO 2012
  • Program committee, EACL 2012, 2014, 2017
  • Program committee, EDB 2016
  • Program committee, EMNLP 2010, 2012, 2013, 2014, 2015, 2016, 2017
  • Program committee, IJCAI 2011, 2016
  • Program committee, IJCNLP 2011, 2017
  • Program committee, KIKE 2016
  • Program committee, NAACL 2016
  • Program committee, SMBM 2010, 2012
  • Program committee, W-NUT 2016, 2017

Activities (Domestic)

Executives

  • General co-chairs, Young Researcher Association for NLP Studies (YANS), 2015 - 2017
  • Publicity co-chairs, Tohoku Branch, Information Processing Society of Japan (IPSJ), FY 2012 - 2013

Editor for journals

  • Editor, IPSJ Transactions on Databases (TOD), FY 2015 - 2016
  • Editor, Journal of the Japanese Society for Artificial Intelligence, June 2013 - May 2017
  • Editor, Journal of Natural Language Processing, Oct 2012 - Sep 2014
  • Student editor, Journal of the Japanese Society for Artificial Intelligence, FY 2005 - 2007

Reviewer for journals

  • Journal of Natural Language Processing (2011,2012,2013,2014,2016,2017)
  • IPSJ Journal (2008,2011,2012,2013)
  • IPSJ Transactions on Databases (TOD) (2008,2010,2011,2012)
  • Journal of Digital Practices (2012)
  • Transactions of the Japanese Society for Artificial Intelligence (2008,2009,2010,2012,2013,2014,2016)
  • IEICE Transactions on Information and Systems (2010,2015)

Domestic conferences

  • Board member, IPSJ SIG of Natural Language Processing, FY 2014 - 2017
  • Program committee, the 23rd Annual Meeting of The Association for Natural Language Processing, 2017
  • Local organizing committee, the 22nd Annual Meeting of The Association for Natural Language Processing, 2016
  • IPSJ SIG committee, the 13rd Forum on Information Technology (FIT), 2014
  • Local organizing committee, the 75th National Convention of IPSJ, 2013
  • Local organizing committee, the 16th Annual Meeting of The Association for Natural Language Processing, 2010
  • Program committee, the 4th Symposium of Young Researcher Association for NLP Studies, 2009
  • Program committee, the 3rd Symposium of Young Researcher Association for NLP Studies, 2008