JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning.
In a traditional IDE, when we execute a program, we execute the whole file or the whole code. In Data Science projects, however, we don’t follow this type of coding paradigm. Sometimes we need to execute certain parts of code separately based on the analysis or requirements.
Imagine if one specific part of your code takes 30 mins to execute. If you make changes to the other parts of your code & then run the file in a normal environment, all of your code will execute at once & that unchanged part will also execute again, taking 30 mins of extra time.
Using notebooks, we run cells(blocks of code) individually, which reduces the redundant execution time & we can check our outputs at each step.
(Note — Nowadays, we can also use this cell format or notebooks in many IDEs as well)
Google Colab, or ‘Colaboratory’, allows you to write and execute Python in your browser, with — Zero configuration required, Access to GPUs free of charge, Easy sharing.
It is similar to the notebooks we learnt about in the previous point. The difference here is that these notebooks will run on Google’s end, meaning you don’t need to worry about setting up a local machine with capable hardware.
You even get free access to GPUs for processing larger ML models, which is awesome!
Regex101 is a website that allows you to compose & test regular expressions.
In many ML tasks, we employ a cleaning phase that is responsible for making sure that our data is clean & ready for further analysis. In NLP, many times cleaning the data means removing certain text, symbols, numbers or any other non-important pieces from within the data.
Sometimes these can be removed just by logic, but more complicated cleaning procedures will require some kind of expression to be composed for it.
Regex101 will help you compose expressions & test them on given examples. It gives you a list of all possible tokens that you can use & what they mean. Once your regex matches any part of the given text, it will be highlighted & a lot of information will be presented to you.
Once you are happy with the results, you can copy this regex & employ it in your code.
This website allows you to visualize the different POS/dependencies within the given context.
In NLP, we make use of dependency/POS analysis to figure out the POS (parts of speech) tags for phrases & the relationship words have with each other within the given context.
Depending on the context you provide, this website will present the POS tags for different phrases & the dependency between them.
You can also choose to not use the “merge Phrases” option & it will then output the POS tags for each individual word.
(Note — You can make this using the displacy feature from the spacy library in your code as well. Checkout the small code sample below.)
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")