pyenv, virtualenv and using them with Jupyter
It is common that the different projects you are working on depend on different versions of Python. That is why pyenv becomes very handy for Python developers, as it lets you switch between different Python versions easily. With pyenv-virtualenv it can also be used together with virtualenv to create isolated development environments for differen...
BERT - Tokenization and Encoding
To use a pre-trained BERT model, we need to convert the input data into an appropriate format so that each sentence can be sent to the pre-trained model to obtain the corresponding embedding. This article introduces how this can be done using modules and functions available in Hugging Face’s transformers package (https://huggingface.co/transform...
Implementing Trie in Python
What is a Trie?
Trie is a very useful data structure. It is commonly used to represent a dictionary for looking up words in a vocabulary.
For example, consider the task of implementing a search bar with auto-completion or query suggestion. When the user enters a query, the search bar will automatically suggests common queries starting with the...
Creating a Website with Vue.js backed by Google Sheets
It has been quite a long time since I have developed a Website. The last one I made was probably the Website that I designed and developed for my brother (http://www.simusic.hk/) many years ago. At that time, the Website was developed using PHP, JQuery and Bootstrap. Since then, I have been mainly working in the areas of machine learning, data m...
Displaying CJK Characters in Matplotlib Plots
Matplotlib by default does not support displaying Unicode characters such as Chinese, Japanese and Korean characters. This post introduces two different methods to allow these characters to be shown in the graphs.
The issue here is that we need to configure Matplotlib to use fonts that support the characters that we want to display. To configur...
Talk on Deploying ML Models in Python @ PyCon HK 2018
PyCon HK 2018 was held on 23-24th November 2018 at Cyberport. I gave a talk on how to deploy machine learning models in Python. The slides of the talk can be found at the link: http://talks.albertauyeung.com/pycon2018-deploy-ml-models/. The video of the talk can be found on Youtube at https://www.youtube.com/watch?v=U2YSFWDjfM.
Generating N-grams from Sentences in Python
N-grams are contiguous sequences of n-items in a sentence. N can be 1, 2 or any other positive integers, although usually we do not consider very large N because those n-grams rarely appears in many different places.
When performing machine learning tasks related to natural language processing, we usually need to generate n-grams from input sen...
Talk on Using Gradient Boosting Machines in Python @ PyCon HK 2017
PyCon HK 2017 was held on 3rd-4th November 2017 at the City University of Hong Kong. I gave a talk on using gradient boosting machines in Python to perform machine learning. The slides of the talk can be found at the link: http://talks.albertauyeung.com/pycon2017-gradient-boosting/. The video of the talk can be found on Youtube at https://www.yo...
Deep Learning and Its Applications - Research Seminar at HSMC
I gave a talk on deep learning and its applications in a research seminar at the Deep Learning Research & Application Centre (DLC), Hang Seng Management College on 20th July, 2017. The slides of the talk can be found here: http://talks.albertauyeung.com/deep-learning
Making pandas Operations Faster
pandas is one of the most commonly used Python library in data analysis and machine learning. It is versatile and can be used to handle many different types of data. Before feeding a model with training data, one would most probably pre-process the data and perform feature extraction on data stored as pandas DataFrame. I have been using pandas e...
Performing Sequence Labelling using CRF in Python
Sequence Labelling in NLP
In natural language processing, it is a common task to extract words or phrases of particular types from a given sentence or paragraph. For example, when performing analysis of a corpus of news articles, we may want to know which countries are mentioned in the articles, and how many articles are related to each of thes...
Matrix Factorization: A Simple Tutorial and Implementation in Python
There is probably no need to say that there is too much information on the Web nowadays. Search engines help us a little bit. What is better is to have something interesting recommended to us automatically without asking. Indeed, from as simple as a list of the most popular questions and answers on Quora to some more personalized recommendations...
愛因斯坦與伯恩
一、
公元一九零五年,是物理學上充滿突破的一年。在這短短的一年內,愛因斯坦 (Albert Einstein) 發表了五篇有關光電物理,分子運動,以及相對論的論文。人們把這一年稱為物理學或愛因斯坦的「奇蹟年」 (Annus Mirabilis)。在工作之餘進行物理學研究的這段時間,愛因斯坦居住在瑞士的伯恩 (Bern) 。伯恩成為愛因斯坦成名的地方,這城市的名字,也就跟這位廿十世紀最偉大科學家的名字連在一起,變得不可分割了。
二、
瑞士是世人心目中的旅遊勝地,歐洲人享受滑雪運動的好地方。其中日內瓦 (Geneva) 是不少國際機構或組織的總部所在,蘇黎世 (Zürich) 則是歐洲重要的金融中心。可是,作為首都的伯恩,卻是出奇地低調,如果沒有到過瑞士,或是對瑞士不太熟悉的人,大概...
Château de Chillon 石庸古堡遊記
從洛桑 (Lausanne) 坐火車到蒙特勒 (Montreux),沿著日內瓦湖 (Lac Léman) 邊望去,可以看到一座碩大的城堡倚立在岸邊。這座古堡,名氣不小。古堡古老莊嚴的建築及四周綺麗的風光自然吸引不少遊人,它還因為英國詩人拜倫 (Lord Byron) 的一首詩而名聲大噪。這座古堡,便是 Château de chillon.
古堡的中文譯名似乎沒有被統一,售票職員給我的小冊子上寫著「石庸古堡」,可是古堡外的小商店內售賣的書籍卻叫它作「西庸古堡」,我在網路上也看過不少其他的譯名,如「詩庸」、「希隆」等等。若把這些名字跟它的法語名字作比較,也實在沒有太大的分別。無論如何,小冊子的譯名該可算是比較「官方」的吧,暫且用「石庸古堡」這個名字吧。
到石庸古堡,坐火車到 Veyt...