The primary role of Node.js is the development of web servers and other network programs. That’s why it has a lot of common with PHP, which is often used on servers. The significant difference between two is in their core. PHP is a blocking language, while Node.js is a non-blocking language. It means that commands in PHP are executed only after the previous commands has been completed, while commands in Node.JS are executed in parallel.
Non-blocking nature of Node.js
Node.js provides server developers with event-driven programming. As a result, developers are able to create servers with a high scalability. They use a simplified model of event-driven programming, which relies on callbacks used to indicate the completion of a task.
Technical features of Node.js are threading, V8, package management, unified API, and event loop. Below you will find the detailed explanation of each one.
First of all we should mention two thing about threading of Node.js: it relies on non-blocking I/O calls; it operates on a single thread. As a result, Node.js support no end of concurrent connections, eliminating the thread context-switching cost. By sharing a single thread between all possible requests, Node.js can be used to develop highly concurrent apps. In Node.js applications, any function that performs I/O should use a callback.
But there is a downside in such approach. Scaling with the CPU cores number in Node.js is impossible until you use an additional module like pm2.
The absence of threads doesn’t cancel the advantage of multiple cores in your environment. You can always spawn child processes with easy communication by using child_process.fork() API. The cluster module, which is built upon that same interface, also provides you with the ability to enable load balancing over cores. In this situation sockets are shared between processes.
Node.js server platform includes a pre-installed package manager NPM. Node.js package manager is used to install Node.js programs from its registry. NPM is used to improve the performance of development by organizing third-party Node.js programs installation and management. Keep in mind, that npm and CommonJS require() statement are not the same thing. The package manager is not used to load code. It is designed for code installation and dependencies management from the command line. The npm registry can include all possible kinds of packages from simple helper libraries to task runners like.
express – Express.js is a Sinatra-inspired framework.
connect – an extensible HTTP server framework, which provides a collection of middleware plugins and serves as a base foundation for Express.
forever – utility to check the continuous work of a node script.
In order to offer a unified JS development stack, Node.js is combined with browser, JSON and document DB. It provides the increased attention to client-side frameworks and such server-side development patterns as MVC, MVP, MVVM. As a result, Node.js make it possible to use the same service interface for both client-side and server-side.
Being influenced by Eventmachine and Twisted, Node has a lot of similarities in design with them, but it takes the event model slightly further. Node.js is notified when a connection is made. At the same time, the operating system issues a callback. Then, Node.js starts an event loop. The system uses it to scale to all simultaneous connections. And the event loop doesn’t need to be called explicitly. The next algorithm is used instead: while callbacks are defined, the server enters the event loop by the end of the definition process. The event loop is not used if there is no work to be done.
Chat is a real-time application used by several people. Being lightweight, high traffic, and data-intensive, chat app is a perfect example of for Node.js use. It’s also good to start your learning from such kind of apps, because they are relatively simple, but cover most of the typical for Node.js apps paradigms.
Imagine the situation with a single chatroom on a website. For instance, there are three users chatting.
There is the Express.js app on the server-side. It is necessary for the implementation of two things: 1) a GET ‘/’ request handler; 2) a websockets server. The first one serves the webpage, which contains both a ‘Send’ button and a message board, to start the input of a new message. The second one listens for emitted by websocket clients messages.
On the client-side, there is an HTML page which includes a couple of handlers set up. One of them is used for the event associated with a ‘Send’ button click, which works with the input message: picks it up and sends down the websocket. Another handlers set up listens for incoming messages on the websockets client.
Here’s what happens, when someone posts a message:
Now, it’s time of server-side component of the websocket connection to act in the algorithm. First of all, it receives the message. It’s second action consists in forwarding it to other connected clients. The broadcast method is used here.
As a result, all clients get the message in a form of push message via a websockets client-side component, which runs within the web page. Then, they pick up the content of the message and append the new message to the board, updating the web page.
Keep in mind, that it was the basic example. There are more robust solutions, based on simple cache and Redis. Some of them provide a better delivery mechanism which protects users from connection losses and stores messages of registered visitors. Regardless of the implemented solution, Node.js always operates under the same principles: it reacts to events, handles all concurrent connections, maintains fluid user experience.
Object DB (for instance, MongoDB) is another area of use for Node.js. With the help of JSON stored data, Node.js is able to function without the data conversion and impedance mismatch.
In a case of Rails, you have to perform much more actions. In its turn Node.js simplifies the process. JSON objects are exposed with a REST API for the client to consume. Additionally, there is no need to worry about JSON when reading or writing from the database. With Node.js, you will be able to avoid the necessity of multiple conversions. In order to do this, you will must synchronize the data flow between your client, server, and database with the help of a uniform data serialization format.
Node.js also provides you with the ability to push the database writes off and use them later. In a case of a high amount of concurrent data, problems can occur with database. Node.js can handle the concurrent connections with ease, but because of blocking nature of database access operation, you can run into trouble. Luckily, there is a reliable solution: you should acknowledge the behavior of a client before the data is written to the DB.
Such approach provides the ability to maintain the responsiveness of the system even under a heavy load. This is especially useful in the situation with the successful data write when the client doesn’t require a firm confirmation. On practice this includes such processes as user-tracking data writing (which is processed in batches, and used sometime later); or operations that don’t require instant reflection where eventual consistency is acceptable.
In order to queue the data, the message queuing infrastructure (such as RabbitMQ or ZeroMQ) is used. Then the data is digested by a separate database batch-write process. Another possible solution is computation intensive processing backend services, which are written in a platform, that provides better for such tasks performance. Keep in mind, that similar behavior is also possible with other frameworks or languages. But the difference is in a hardware. To achieve the same throughput, you will need to change the hardware.
In traditional platforms, HTTP responses and requests are treated like streams. In a case of Node.js, this principle can be used to get new cool features. You can easily process files during their upload (real-time audio or video encoding, or proxying between different sources of data).
Node.js supports server-side proxy capabilities. It is often used for collecting data from multiple source points, and proxying services with different time of response. In addition, Node is helpful in a case of non-existent proxying infrastructure. It is also used as a tool for local development.
Few reasons to use Node.js
Wide variety of tools, provided by npm – the Node.js package manager.
Real-time and multi-user support.
The possibility to write web apps that run on a single codebase on both server and client with automatic data synchronization between them.
Huge community with volunteer maintainers and reliable investors.
Node.js hosting providesrapid adoption.
Node.js web scraping
With the increase of data on the web, scraping – the process of programmatic information retrieving – become widespread and simplified. There are a lot of appropriate tools, but you can always use Node.js to make your own powerful solution for web scraping. Below you will find information about Request and Cheerio – Node.js modules for web scraping; and two apps – one is able to fetch and display data, another can find Google search related keywords.
To bring in the aforementioned modules you will have to use NPM – the Node Package Manager.
It is possible to download Internet data from both HTTP and HTTPS interfaces with the help of Node.js, but they should be separated. By using Request module, you can easily merge these methods and abstract the difficulties away. The module provides a unified interface for making requests. Keep in mind, that you can use Request to download web pages into the memory. The installation includes the following steps: go to the directory with main Node.js file; open terminal; start “npm install request”.
With Cheerio module, you will be able to use the syntax of jQuery while working with downloaded web data. Cheerio provides developers with the ability to provide their attention on the downloaded data, rather than on parsing it. The installation includes the following steps: go to the directory with main Node.js file; open terminal; start ”npm install cheerio“.
The below code will help you grab the temperature information from an appropriate website. You can use the code of your area at the end of the URL. Also check, if aforementioned modules have been installed successfully.
console.log("We’ve encountered an error: "+error);
And this is how the app works. The modules are required for the later access. Then, the URL is defined. It shows the place, where a variable should be downloaded.
Next, Request module is used to download the page at the appropriate URL. The “request“ function is used. You pass in the aforementioned URL, and a callback handles the results of the request.
Now is the time for data to be returned. And we have to deal with the callback again. First of all, it is invoked, and then passed 3 variables:” error”, ”response“, ”body”.
In the situation when Request module is unable to extract the data, it passes a valid error object to the function. Thus, the body variable is null. Keep in mind, that before working with data, you have to check errors and log them to see what was wrong.
If everything works well, the data is passed to Cheerio, where you can handle it with the help of standard jQuery syntax.
You can also create a selector responsible for grabbing the chosen elements from the page. Just use your browser and developer tools to explore the page with the required data.
In the browser, you have to open the page you’ve decided to scrape and create a jQuery selector for the elements you are going to get data from.
In the code, you have to perform 3 steps. First of all, you should use request for downloading the page. Then, it is time for Cheerio: pass the returned data into it to get jQuery-like interface. And finally, it’s time to use the selector (don’t forget to write it in advance).
Data mining is a more advanced use of web scraping. The process also relies on downloading web pages, extracting data from them, and generating reports. Of course, you can always use Node.js in this process.
Below you will find a simple data-mining Node.js application. It looks for the top terms associated with appropriate Google search results. The application examines the Google search, downloads all the necessary pages, parses out the text from each page, analyzes it, and presents the most popular words. Hit this link for the full code.
How to download the Google search
First of all you should decide what page to analyse. Find the URL for the search you want. Then you should download it and parse the results. As a result, you will be able to find the required URLs.
Request module is used for downloading the page, while Cheerio is necessary to parse it. And this is the code:
console.log(“Couldn’tget page because of error:“+error);
// load the body of the page into Cheerio so we can traverse the DOM
// get the href attribute of each link
// strip out unnecessary junk
// this link counts as a result, so increment results
Google search for “data mining” is the URL variable you are passing in.
First of all, you should get the contents of the page. To do this, you’ll have to make a request.
The second step requires loading the page contents into Cheerio. As a result, you will be able to query the DOM for the elements with the links to the appropriate results.
The third stage requires looping through the links and stripping out extra URL parameters.
And don’t forget to check URL. It shouldn’t start with a “/”.
How to pull words from every page
This step is almost similar to the simple example, but now the URL variable refers to the URL of the page from the above loop.
// load the page into Cheerio
You should use Request and Cheerio once again. This modules will do 2 actions: 1)download the page; 2)get access to DOM. In the example, the access is used to get the text from the page.
Another step requires working with the text. You should clean it up from the page by compress the white space into single spaces; getting rid of characters that are not spaces or letters; converting everything to lowercase.
Now, you can split the text on the spaces and get the array with all the rendered words of the page. Next stage is about looping through them and adding to the corpus.
Use this code to perform the above actions:
// Throw away extra white space and non-alphanumeric characters.
// Split on spaces for a list of all the words on that page and
// loop through that list.
// We don't want to include very short or long words because they're
// probably bad data.
// If this word is already in our corpus, our collection
// of terms, increase the count for appearances of that
// word by one.
// Otherwise, say that we've found one of that word so far.
How to analyze the words
With all words in the corpus, you should sort them by popularity. Gather them in an array:
// stick all words in an array
// sort array based on how often they occur
This is the example of the finished work:
This is how the web scraping (and data mining) is done.
KeystoneJS provides the easiest way for building database-driven sites, apps and APIs in Node.js. It provides automatic configuration for Express.js, and give an access to MongoDB. The Admin UI is intuitive and provides all necessary features, designed to save your time. The other useful features include readable asynchronous code, secure forms processing, effective session management for data encryption. There are also email management features on KeystoneJS CMS. This CMS is based on Bootstrap and jQuery, but its styling options are endless. KeystoneJS on GitHub
Calipso is a very fast, extremely flexible, and at the same time simple Node.js CMS, built along themes similar to Drupal and WordPress. Because of minimalistic approach to the design, this CMS suits best to sites that don’t rely on tons of media content. Check the example for the better understanding of the problem. Another important aspect of Calipso is a modular approach to delivering the functions. All core features of this CMS are provided by modules. Bootstrapping, forms, and theming are the exclusion.
This open source nodejs CMS provides maximum flexibility alongside with a minimal learning curve. As a result, you can focus on building things, rather than on learning. Apostrophe provides wide variety of features. Thus, this CMS appeals to both business owners and community websites. In addition, Apostrophe is good for learning more about Node.js on practice.
Hatch is built on Node.js and Redis, to provide fast and highly consistent performance. Thanks to Twitter Bootstrap framework, it enables you to apply a lot of existing themes. There are also powerful editing tools designed for rapid site building. You can build pages with the help of WYSIWYG-style editor.The open source platform provides both free and paid addons. Keep in mind, that Hatch works as a NPM, so you can build your own features with ease. We should also mention, that it is responsive, supports raw CSS editing, and provides a lot of social features.
This content management system is also open source and built upon Node.js with MongoDB. Buckets provides you with the ability to store the content in a structured and at the same time flexible way. It is fast and responsive. Buckets looks perfect on both sleek layouts of big screens and fast responsive layouts of smartphones. It provides the ability to add, modify, remove, or update content at the maximum speed.
PencilBlue is fully responsive and easy to use. It was designed for content driven websites for businesses and publications. The platform provides SEO and relational data management tools and the freedom to host the Node.js CMS on all possible server architectures. Being 100% extendable, it relies on a plugin system. Thus, developers are able to modify even the core functionality of the platform. Furthermore, PencilBlue is totally scalable. Based on Node.js, it abstracts its services and data providers. Keep in mind, that MongoDB and Redis are included out of the box.
StoreHippo is the platform for e-commerce and m-commerce. The platform is based on the MEAN stack, which consists of MongoDB, Express, AngularJS, and Node.js. All this technologies are used while creating dynamic web sites. The use of MEAN stack makes the user experience seamless. StoreHippo-based e-commerce stores are fully responsive, so there is no need to deploy two different solutions for mobile and web. This Node.js e-commerse platform is a perfect solution for digital retail.
Nodeshop is a Node.js powered solution designed to create e-commerce web stores. Its front-end framework is based on Bootstrap, that’s why the user interface looks good. The project is still in progress, so all the suggestions are appreciated. NodeShop on GitHub.
Forward is a developer centric platform for e-commerce. It makes custom coding easier by providing expressive syntax and powerful templates. Forward also aims to create more innovative approach to e-commerce design. Another positive feature of this platform is the attempt to reduce complexity and cost of e-commerce store development. Forward NodeJS client library on GitHub.
E-commerce platform Ottemo provides a quick-and-easy downloadable solution with 100 features for both small and medium enterprises. The platform is mobile friendly, so you can get access to all tools and features anywhere. A single code base is used for mobile and web interfaces. Ottemo is cost efficient and makes scaling a breeze. Thus, you can easily jump from a small business into a medium or even large enterprise. Ottemo Node.js experiment on GitHub.