Even if you take several courses on deep learning, you won’t get a job based on a few certificates, you have to demonstrate your skills somehow. An attractive Github profile will impress AI recruiters and show your skills. Hence, we aim for the most stars and forks on GitHub (repository) to demonstrate our engineering skills. Your project gets attention if it offers something valuable to the community, but how do you create a valuable machine learning project? To create quality and useful projects, we need new ideas and solutions, but how do we find them? In the following, we will see how I and some of the people I follow attempt to create something new and useful.
This article shows us how to make a decent Machine Learning project that gives us a job and credit in the ML community. So I show how I created my Github repositories. I hope that it will be helpful.
Finding out what already exists, or what has already been done, is always the first step. Afterward, you can add your own value (project) to this huge pool. Here’s my first story: Since I started ML with Computer Vision, I became interested in Face Recognition. I searched for all libraries related to the subject and found a few. The problem was none of them had a user-friendly interface and state-of-the-art models at the same time. They were either a research implementation (SOTA) or an inaccurate package for deployment. Apart from that each package only provided one element of face analysis. For example, there was a package that only predicts age and gender in the face or only predicts the location of the face in the image. This was an opportunity to fill the gap. I thought if I could create a minimal package that provides state-of-the-art models for each task on the face It would catch the community’s attention. That was the story behind FaceLib a package that can do all face analysis tasks with only two lines of code and had the best performance at the time. That was my first project and I made lots of mistakes, so it could be very better, but I’ve learned many things from that project, and got a job at a Face recognition start-up 😃
- How can I find exciting projects in an area? I would suggest search by PaperWithCode based on the task that you are interested in.
- How to learn the big picture of a new field? I’d definitely go with reading survey papers. they are the distillation of an area
To show you know something you better teach it. It’s always a good idea. Here’s another story from my friend: After releasing OpenAI’s CLIP It attracted a lot of attention in the field, but there weren’t any good code base sources to learn it or implement it. A friend of mine (Moine Shariatinia) has implemented a simple and annotated version of CLIP that explains the idea in detail and makes it easy to understand. This project even was cited in a paper about CLIP. Another option is to make a contribution to existing libraries that teach new ideas with implementation, for example, you can write a new example for keras.io/examples or Huggingface community examples. Additionally, you can get inspired by other people’s implementations. You have seen a StyleGAN that creates new faces that do not exist. Wouldn’t it be nice if we could create new mobile phone images? You got the idea right?
- This idea became pretty popular and has created websites like labml.ai that provide annotated paper implementations
Let’s start with an example, learning Git is hard and time-consuming. so let’s write an NLP system that converts natural language to git commands. I’m a mentor in an ML internship and I’m supposed to review the code which interns have written. to be honest it’s a rather boring task. I thought we can create a model that takes the code and gives a score to it, in terms of readability, documentation, clean code, etc. You could create a system that gives scores to football matches based on the quality of the game. AI is a great tool and many problems have not been solved, try to find one and solve it. Create a system that predicts the success rate in an interview based on a resume. These projects don’t have to be the best, just create something funny and useful for example, you can create a language model that talks exactly like you based on your social media messages. There are times when you don’t have to invent anything new, just adapt the existing solutions to meet your needs. In the next section, you’ll find out what I mean.
The first thing I became interested in when learning about NLP was Question Answering systems. I thought it would be great if we had such a system in Persian NLP, but we did not have any related Persian datasets at the time. It was a good opportunity to collect some data and learn about the data collection process as well. A 10K entry dataset was collected by me and my teammates, and we trained the first Persian QA model that was cited in another related paper later. This project has given me some friends in the Persian NLP community. Open source datasets are sometimes valuable enough for the community to share. It is important to remember that you do not necessarily have to collect data, sometimes automatic labeling or ideas like that can be used instead. As an example, I created 400K pairs of Persian captions along with images using translation and filtering to create the Persian version of CLIP. Many other practitioners are using the dataset for other multimodal tasks as well. The next section illustrates the general framework for solution adaptation 👇
- Experience: The task of collecting data should be taken seriously and an expert should be consulted.
I’d rather discuss solution adaptation whit an example. Imagine this, there is a great new research paper on NLP, you can apply it to your own language. Any idea can be implemented in your native language if English isn’t your mother tongue, that’s called specification (reduction). In order to serve the Farsi community, I have developed a Farsi version of CLIP called CLIPfa. It has been used in a few companies and shows that it can be useful. Generalization means If new research has come out in one area of NLP, try generalizing it to other areas. As an example, Transformers (Attention is all you need) was first introduced for machine translation. Today, it is used everywhere from GPT-3 to state-of-the-art vision models. In the early days of deep learning, the U-net architecture was first proposed for segmenting medical images. They are now included in diffusion models, which are considered to be the most promising generative models at the time. Never underestimate the power of Adaptation.
The more you learn, the better your ideas will be. How come? As you gain more knowledge about the field, you can see the big picture better and connect ideas and solutions from different areas. It is for this reason that I believe an NLP practitioner should be up to date on the latest research in computer vision and vice versa. I’m not an expert in the field yet, just learning. There may be many of you who disagree with my statements in the article. I would appreciate it if you shared them with me so that we can learn from each other✌️
Final Words: In a project, you should learn or earn (credit). If it does neither, move on to the next one. how to advertise a project is another matter to discuss.