I'm an Assistant Professor in the Computer Science Department of the City University of Hong Kong. My research focuses on designing new distributed and parallel algorithms, the distributed processing of big data, achieving fault-tolerance in communication networks against adversarial attacks, and developing robust protocols that work in highly dynamic environments such as peer-to-peer Blockchain networks and mobile ad-hoc networks.
My research has been supported by the General Research Fund (Hong Kong), the Natural Sciences and Engineering Research Council (Canada), IBM Research, and the London Mathematical Society.
Tags (Show all)Asynchrony Big Data Byzantine Failures Churn Communication Complexity Distributed Agreement Distributed Storage Dynamic Network Fault-Tolerance Gossip Communication Graph Algorithm Haskell Information Complexity Leader Election ≪ Machine Learning Mobile Ad-Hoc Network Natural Language Processing P2P Secure Computation in Networks Self-Healing Symmetry Breaking Wireless Networks
- Log File Processing by Machine Learning and Information Extraction
Peter Robinson. Master Thesis. TU Vienna, Institute of Computer Languages, 2006. Nominated for Distinguished Young Alumnus Award.
AbstractIn today's computer network systems lots of events are constantly written to log files. Unfortunately there is no common standard defining the structure of these event messages which are partly in human readable natural language form. This lack of structure makes automatic processing a lot more difficult. This master thesis describes the architecture and implementation of the LoP-System, a system that attempts to create machine readable event structures from ordinary log file events by natural language processing. The thesis explains implementational details as well as the theoretical concepts used. The core of the system consists of a series of cascaded but independent components, partly enhanced with machine learning techniques. The raw input is first processed by a simple recursive descent parser which recognizes syntactical features (e.g. IP addresses) and is then passed on to a part-of-speech tagger based on a hidden Markov model. Applying regular expression patterns to the tagged words is used to combine them to basic word groups (e.g. noun groups), which are subsequently semantically analyzed. The final step is the construction of the output events by a rule based event constructor. All components are implemented in Haskell, a purely functional programming language. Some of the components developed during this thesis, especially the part-of-speech tagger, are general natural language processing tools and can be applied to other domains.
- concurrent hash table: a thread-safe hash table that scales to multicores.
- data dispersal: an implementation of an (m,n)-threshold information dispersal scheme that is space-optimal.
- secret sharing: an implementation of a secret sharing scheme that provides information-theoretic security.
- tskiplist: a data structure with range-query support for software transactional memory.
- stm-io-hooks: An extension of Haskell's Software Transactional Memory (STM) monad with commit and retry IO hooks.
- Mathgenealogy: Visualize your (academic) genealogy! A program for extracting data from the Mathematics Genealogy project.
- I extended Haskell's Cabal, for using a "world" file to keep track of installed packages. (Now part of the main distribution.)
- Computer Networks, Fall 2020, 2019.
- Database Systems, Spring 2020.
- Distributed Computing, Spring 2019.
- Randomized Algorithms, Fall 2018: Intro slides. Part 1 on Concentration Bounds.
- Advanced Distributed Systems, Fall 2016, 2017.
- Computation with Data, Fall 2016.
- Internet and Web Technologies, Spring 2016.