In this tutorial, **we’ll talk about three key components of a Machine Learning (ML) model: Features, Parameters, and Classes.**

Over the past years, **the field of ML has revolutionized many aspects of our life** from engineering and finance to medicine and biology. Its applications range from self-driving cars to predicting deadly diseases such as cancer. Generally, the goal of ML is to understand the structure of data and fit that data into models that can be understood and utilized by people. These models are mathematical representations of real-world processes and are divided into:

**supervised**where we use labeled datasets to train algorithms into classifying data or predicting outcomes accurately.**unsupervised**where we analyze and cluster unlabeled datasets without the need for human intervention.

First, let’s talk about features that act as input to the model. **Features are individual and independent variables that measure a property or characteristic of the task.** Choosing informative, discriminative, and independent features is the first important decision when implementing any model. In classical ML, it is our responsibility to hand-craft and chooses some useful features of data while in modern deep learning, the features are learned automatically based on the underlying algorithm.

To better explain the concept of features let’s imagine that we want to implement **a model that predicts if a student will be accepted for graduate studies in a university.** To choose our features, we should think of variables that are correlated with the outcome meaning that they influence the outcome of a grad application. For example, we can have the following features:

- The GPA of their undergraduate studies
- The recommendation letters of their previous professors or employers
- Their scores in standardized tests like GRE, GMAT and more
- If they have previous publications
- The projects they worked on during undergrad

Since the result (accepted or not) depends more or less on all of these variables, they can be used as input features to the ML model. In case we have too many features, we can use feature selection methods.

The next step is to choose the ML model that we will train for our task. **Model parameters are defined as the internal variables of this model. They are learned or estimated purely from the data during training as every ML algorithm has mechanisms to optimize these parameters.**

Training typically starts with parameters being initialized to some values. As training progresses, the initial values are updated using an optimization algorithm (e.g. gradient descent). The learning algorithm is continuously updating the parameter values as learning progress. At the end of the learning process, model parameters are what constitute the model itself.

To better understand the concept of parameters let’s see what are the parameters in some common ML models:

- In a simple Linear Regression model, where the variables and are the parameters of the model.
- In a Neural Network model, the weights and biases are the parameters of the model.
- In a Clustering model, the centroids of the clusters are the parameters of the model.

We often confuse parameters with hyperparameters. However, **hyperparameters are not learned during training by the model but are set beforehand.** For example, in clustering, the number of clusters is a hyperparameter while the centroids of the clusters are parameters.

Our last term **applies only to classification tasks** where we want to learn a mapping function from our input features to some discrete output variables. These output variables are referred to as classes (or labels):

In our previous task of grad application, we have only two classes that are “Accepted” and not “Not Accepted”.

In this article, we talked about three key components of ML models: Features, Parameters, and Classes. First, we defined them and then we presented some practical examples.

The post Features, Parameters and Classes in Machine Learning first appeared on Baeldung on Computer Science.]]>In this tutorial, we’ll discuss learning rate and batch size, two neural network hyperparameters that we need to set up before model training. We’ll introduce them both and, after that, analyze how to tune them accordingly.

Also, we’ll see how one influences another and what work has been done on this topic.

Learning rate is a term that we use in machine learning and statistics. Briefly, it refers to the rate at which an algorithm converges to a solution. Learning rate is one of the most important hyperparameters for training neural networks. Thus, it’s very important to set up its value as close to the optimal as possible.

Usually, when we need to train a neural network model, we need to use some optimization techniques based on a gradient descent algorithm. After calculating the gradient of the loss function with respect to weights, that gradient has a direction to the local optima. We use a learning rate hyperparameter to tune weights towards that direction and optimize the model.

**The learning rate indicates the step size that gradient descent takes towards local optima:**

Consequently, if the learning rate is too low, gradient descent will take more time to reach the optima. Conversely, if the learning rate is too big, the gradient descent might start to diverge, and it’ll never reach the optimal solution.

Also, the learning rate doesn’t have to have a fixed value. For example, we might define a rule that the learning rate will decrease as epochs for training increase. Besides that, some adaptive learning rate optimization methods modify the learning rate during the training. We can find more details about choosing the learning rate and gradient descent method in this article.

**Batch size defines the number of samples we use in one epoch to train a neural network.** There are three types of gradient descent in respect to the batch size:

- Batch gradient descent – uses all samples from the training set in one epoch.
- Stochastic gradient descent – uses only one random sample from the training set in one epoch.
- Mini-batch gradient descent – uses a predefined number of samples from the training set in one epoch.

The mini-batch gradient descent is the most common, empirically showing the best results. For instance, let’s consider the training size of 1000 samples and the batch size of 100. A neural network will take the first 100 samples in the first epoch and do forward and backward propagation. After that, it’ll take the subsequent 100 samples in the second epoch and repeat the process.

Overall, the network will be trained for the predefined number of epochs or until the desired condition is not met.

The reason why mini-batch gradient descent is better than one single batch descent we explained in this article.

**The batch size affects some indicators such as overall training time, training time per epoch, quality of the model, and similar. Usually, we chose the batch size as a power of two, in the range between 16 and 512. But generally, the size of 32 is a rule of thumb and a good initial choice.**

The question arises is there any relationship between learning rate and batch size. Do we need to change the learning rate if we increase or decrease batch size? First of all, if we use any adaptive gradient descent optimizer, such as Adam, Adagrad, or any other, there’s no need to change the learning rate after changing batch size.

Because of that, we’ll consider that we’re talking about the classic mini-batch gradient descent method.

There are some works done on this problem. Some authors suggest that when multiplying batch size by , we should also multiply the learning rate with to keep the variance in the gradient expectation constant. **Also, more commonly, a simple linear scaling rule is used.** It means that when the batch size is multiplied by , the learning rate should also be multiplied by , while other hyperparameters stay unchanged.

Namely, the authors experimented with different batch sizes and learning rates. Using the linear scaling rule, they achieved the same accuracy and matched their learning curves. They achieved this concurrence with batch sizes up to in the ImageNet experiment.

Also, they accomplished it using a gradual warmup that increments a learning rate by a constant in the first five epochs of training. This strategy prevents early overfitting by a significant learning rate.

In our example, we tried to apply the linear scaling rule. It was the experiment with MNIST data set and simple CNN with one convolutional, dropout, and fully connected layer. We compared the batch size of and the learning rate of with their multiplied values, where the multiplication is done using integers from up to . The results confirm that the learning curves are well matched.

The theoretical approach works well in theoretical experiments when most of the variables and data set are nearly perfect. Usually, the situation is slightly different when it comes to real-world tasks. **First of all, our goal is not to match the same accuracy and learning curve using two sets of batch size and learning rate but to achieve as good as possible results.**

For instance, if we increase the batch size and our accuracy increases, there’s no sense to modify the learning rate to achieve the prior results. Also, since there are many more hyperparameters to tune, we don’t know if the initial values of batch size and learning rate are optimal.

Usually, in practice, we tune these two hyperparameters with others together. Besides that, it’s common to set them up independently. For example, if the neural network trains too slow, we might increase the batch size and monitor the results’ changes. Also, we might improve the learning rate if the network converges too slow.

In this article, we’ve briefly described the terms batch size and learning rate. We’ve presented some theoretical background of both terms. The rule of thumb is to increase both hyperparameters from the above linearly. But also, more importantly, it’s to keep the focus on the results of the neural network and not on the ratio between batch size and learning rate.

After all, our goal is to get the best possible results.

The post Relation Between Learning Rate and Batch Size first appeared on Baeldung on Computer Science.]]>

This is not the typical code-focused style of an article I usually publish here on Baeldung.

Jumping right into it – the site is growing, more and more developers are applying to become authors, and the current editorial team (9 editors) is starting to need help again.

I’m looking for a new **part-time technical editor** to join the editorial team.

And what better way to find a solid content editor for the site than reaching out to readers and the community.

First – you need to be active in the Computer Science area yourself, teaching or working in R&D departments. All of these articles are code-centric, so being in the trenches and able to code is instrumental.

Any experience with Scala, Kotlin, or Linux is a plus as well.

Second – and it almost goes without saying – you should have an excellent command of the English language.

You’re going to work with authors, **review their new article drafts,** and provide helpful feedback.

The goal is to make sure that the article hits a high level of quality before it gets published. More specifically – articles should match the Baeldung formatting, code, and style guidelines.

**Beyond formatting and style**, articles should be code-focused, clean, and easy to understand. Sometimes an article is almost there, but not quite – and the author needs to be guided towards a better solution or a better way of explaining some technical concepts.

You’re going to be **working with 6-7 authors** to help them with their articles (you can choose to work with fewer or more authors depending on your availability).

An article usually takes about two rounds of review until it’s ready to go. All of this usually takes about 30 to 45 minutes of work for a small to a medium article and can take 60 minutes for larger writeups.

The payment is based on the number of words edited and your level as an editor. A typical budget is 5$/100 words edited as an L10 editor (the max level is 20). Your level is determined by internal metrics such as response time and feedback from the senior editors. **The average earning is around 500$/month.**

If you think you’re well suited for this work, I’d love to work with you to help grow Baeldung.

Send me a quick email at *hiring@baeldung.com* with your details and a link to your LinkedIn profile.

Cheers,

*Eugen. *

**In this tutorial, we’re going to go over the N-tier architecture in software development**. To start, we’ll give a definition, we’ll review its components, and outline its characteristics. Then, we’ll also describe the benefits and drawbacks, and give examples of situations where they are appropriate for use.

Before we look at N-tier architectures, software architecture is the basic structure of any software system and incorporates any aspect that makes a system function and behaves as it should. There are different software architectures in use to date such as client-servers, microservices, microkernels, and layered architectures.

Now, **an** **N-tier architecture is a multilayered client-server architecture in which the presentation, processing, and data functions are divided into logically and physically separate tiers**. Being physically separated also means that each of these functions executes on different physical machines. Sometimes in different geographical locations.

Let’s take a look at some key characteristics.

**N-tier architectures are** **usually connected in a linear narrative**, that is to say, we must pass through one tier to get to the next.

Furthermore, despite the physical and logical separation of components, **an N-tier application appears and functions as a single unit** **to the user**. This is an example of distributed networking. The different components usually communicate through communication links such as high-speed buses.

Additionally,** N-tier architectures also** **adopt a client-server model**. In the client-server model, multiple clients request and receive service from a centralized server. In this case, the client and server are both computer programs running in different tiers.

Lastly, **N-tier also means that the architecture can have an arbitrary number of tiers**, however, 3-tier is the most common. FOr example, in 1-tier architecture, all components would be placed on a single-tier hence a single machine.

The N-tier architecture divides its main components into logically and physically separate components. **These are the presentation tier, processing or logic tier, and data tier:**

The presentation tier is responsible for presenting information in a format that can be easily understood and manipulated by the user. For example, a user interacting with a web application on a laptop is the presentation tier at work.

Additionally, the logic tier handles all processing functions which includes command executions, handing of errors, calculations, and any logical decisions. In the example of a web application, the logic tier would have the underlying program that renders the web pages written in a programming language such as HTML or PHP.

Lastly, the data tier handles the data store, usually a database, and any communication to and from the database. In the example of a web application, the data tier would be the database running SQL statements.

Let’s now take a look at a real use-case of N-tier architecture.

Let’s consider a web-based patient appointment application for a medical centre, which enables patients to schedule appointments with doctors and specialists:

The presentation layer consists of the user interacting with a web browser on a computing device such as a laptop or phone. The user logs into the system with their login details and selects the doctor with whom they would like to make an appointment.

It is important to note that one of the perks of using the n-tier design is being able to change one layer easily without touching the others. In this case, the graphical interface of the web browser can be adjusted to work on different computing devices such as laptops or phones, without affecting the other tiers.

The logic tier connects to the presentation tier and consists of user authorization processes. In addition, there are scheduling and validation forms on a web server hosted on a different machine.

Lastly, the logic tier connects to the data tier and consists of a MySQL database hosted on another machine. The database hosts the details of patients, doctors and specialists. It also contains information on available times for each of the doctors and specialists. Here, the appointment scheduled by the user is also saved on the database.

**Scalability is significantly improved** **when an N-tier architecture is used**. Due to the separation of individual components in the architecture, it is easy to upgrade one component without affecting the performance or operation of the other components. For example, if storage runs low on the database in the data tier, the storage size can be increased easily and this will not affect the presentation and logic tiers.

In addition to scalability, **having separate components makes it easier to maintain each of these components in the architecture without affecting the performance of all the others**.

**Another notable benefit of adopting this architecture is reusability**. This is due to the logical separation of components which allows the architecture to be reused for different applications or projects.

**Security is also heightened** **in an N-tier architecture because the different tiers can be secured with the appropriate security privileges as required**.

**There is always the possibility of an increase in latency due to the physical separation of components**. However, this is where a fast communication link would somehow solve the problem.

Additionally,** an increase in the number of tiers tends to increase the complexity of the architecture**. More tiers usually mean there are additional components to maintain and operate.

N-tier architectures are suitable for situations where the communication links are very fast in terms of transmitting and receiving information. This is vital because the communication links are what enable the communication between the different components.

Furthermore, this architecture is mostly suitable for applications where scalability, security, ease of maintenance, and reusability are a priority.

In this article, we’ve reviewed N-tier architectures. First, we defined software architectured and N-tier architecture. Secondly, we listed some characteristics and listed the basic components of the architecture. We also went over the benefits and drawbacks associated with this architecture. Lastly, we discussed situations where it is plausible to use the N-tier architecture.

The post N-Tier Architecture first appeared on Baeldung on Computer Science.]]>Flowcharts are an excellent resource for systematically illustrating processes and algorithms. **With flowcharts, we can define, for instance, conditional decisions and loopings in a visual and simple way.**

LaTeX, in summary, is a system that uses tags and marks to format a document. Several packages are available for LaTeX, including different drawing packages. These drawing packages enable the users to create several graphical resources from scratch.

Among these packages, there exists the one called TikZ. **TikZ is a vector drawing package for general use.** In particular, TikZ provides multiple tools for drawing flowcharts.

In this tutorial, we will explore how to draw flowcharts using LaTeX/TikZ. At first, we’ll study the organization of a LaTeX/TikZ picture environment. So, we’ll see a set of TikZ elements usable for drawing flowcharts. Finally, we’ll have some examples of complete TikZ flowcharts.

Drawing with LaTeX/TikZ requires importing specific packages and organizing the environment with particular tags. In this way, this section presents how to prepare a LaTeX document to draw flowcharts with TikZ.

In our examples, we will use the following code as the base document to draw flowcharts:

```
(1) \documentclass{standalone}
(2) \usepackage{tikz}
(3) \usetikzlibrary{shapes, arrows}
(4) \begin{document}
(5) \begin{tikzpicture}
(6) ...
(7) \end{tikzpicture}
(8) \end{document}
```

Our base document uses the *standalone* class (line 1). Thus, the LaTeX compiler will create a PDF particularly for the TikZ image defined in the document. However, we can employ TikZ with other document classes with no problems, such as *article* and *a4paper.*

In the following line (2), we import the TikZ package to the LaTeX document. This importing line enables us to use the TikZ resources. Line 3, in turn, specifies that we’ll use the shapes and arrows library from TikZ.

Note that it is common to use TikZ to draw charts and graphs besides flowcharts.

Next, we can open the environments of our example document. Line 4 defines the beginning of the LaTeX document itself. Line 5, in turn, indicates the begging of a TikZ picture environment.

After opening our environments, we can define the TikZ image. In our example, line 6 represents the image definition. But, it is relevant to notice that this definition can have multiple lines.

Finally, lines 7 and 8 show the end of the TikZ picture environment (7) and the LaTeX document (8).

We highlight that, by default, the LaTeX compiler generates PDF documents. However, in the case of standalone images, it is possible to convert them to other file formats, such as PNG and JPG.

**Creating flowcharts with TikZ consists of including graphical elements into an image.** Thus, we must consider an image origin point as a referential to properly allocate these elements.

For drawing our flowcharts, we’ll consider the 2D coordinate system (X, Y) of the TikZ package. In the following image, we can see a simple cartesian plane system of TikZ:

**The origin of the cartesian plane of TikZ is at the middle of the image (yellow point).** Taking the origin point as the referential, the coordinate system to locate graphical elements in the image behaves as follows:

**Positive X and Y**: the middle point of the graphical element will be located at the upper-right section of the cartesian plane (for example, red circle)**Negative X and Y**: the middle point of the graphical element will be located at the lower-left section of the cartesian plane (for example, green circle)**Negative X and positive Y**: the middle point of the graphical element will be located at the upper-left section of the cartesian plane (for example, blue circle)**Positive X and negative Y**: the middle point of the graphical element will be located at the lower-right section of the cartesian plane (for example, purple circle)

There are several elements for drawing flowcharts. These elements are, at most, simple geometrical figures. In this section, we’ll explore some of these flowchart elements in the context of LaTeX and TikZ.

It is relevant to highlight that, in the following subsections, we will show the definition command to create flowchart elements using TikZ. Therefore, we should include the definition commands before the begin document command in the LaTeX file (\begin{document}).

The *terminator* element indicates the beginning or end of a process. To define a terminator in TikZ, we can use the following definition command:

`\tikzstyle{terminator} = [rectangle, draw, text centered, rounded corners, minimum height=2em]`

Next, we can see the terminator element image:

Finally, we can include this element as a node in a TikZ picture environment as follows:

`\node at (X,Y) [terminator] (t-id) {Terminator};`

The *process* element represents a processing function in the flowchart. So, the TikZ definition line for the process element is presented next:

`\tikzstyle{process} = [rectangle, draw, text centered, minimum height=2em]`

The following image depicts the process element:

We can reproduce the image in a TikZ picture environment with the subsequent command:

`\node [process] at (0,0) (p-id) {Process};`

The *decision* element represents two or more branches on the flowchart. Thus, it defines different paths taken according to particular conditions. We can define the decision element with the line in next:

`\tikzstyle{decision} = [diamond, draw, text centered, minimum height=2em]`

The image shown next depicts the decision element:

With the previous definitions done, we can insert the decision element in the flowchart using the following command:

`\node [decision] at (X,Y) (d-id) {Decision};`

The data element indicates an operation of input or output in the flowchart. So, this element means that data is being introduced to or provided by the system. A viable manner to define this element is presented next:

`\tikzstyle{data}=[trapezium, draw, text centered, trapezium left angle=60, trapezium right angle=120, minimum height=2em]`

The image below demonstrates the data input/output element:

The following line inserts a data element into a flowchart:

`\node [data] at (X,Y) (dio-id) {Data\\In/Out};`

The connector element consists of the relationship between elements. Following a set of connectors of the flowchart, in turn, create paths. We can see the definition of the connector element next:

`\tikzstyle{connector} = [draw, -latex']`

The following image presents the connector element shape:

We can use the subsequent line to produce a connector between two elements (*e1-id* as the origin and *e2-id* as the destination):

`\path [connector] (e1-id) -- (e2-id);`

We highlight that the double hyphen between the elements indicates a straight line connecting them. To create curved lines in the flow chart, we can replace one hyphen with a pipe: |- or -|.

**In this section, we’ll see different ways to create flowcharts using the elements shown in the previous section.** At first, we’ll understand how to build a flowchart explicitly defining the coordinates of the elements. So, we’ll build the same flowchart through relative positions instead of explicit coordinates.

**Using the coordinates, we can place the elements in any location of the TikZ cartesian plane with no restrictions.** Thus, employing explicit coordinates makes the process of creating flowcharts very generic and customizable.

The following LaTeX/TikZ code shows the TikZ picture environment to create a simple flowchart using explicit coordinates:

```
\begin{tikzpicture}
\node [terminator, fill=blue!20] at (0,0) (start) {\textbf{Start}};
\node [data, fill=blue!20] at (0,-2) (data) {Provide data};
\node [decision, fill=blue!20] at (0,-5) (decision) {Valid data?};
\node [process, fill=red!20] at (3.5,-5) (error) {Error};
\node [process, fill=green!20] at (0,-8) (success) {Success};
\node [terminator, fill=blue!20] at (0,-10) (end) {\textbf{End}};
\node[draw=none] at (1.85, -4.75) (no) {No};
\node[draw=none] at (0.35, -6.75) (yes) {Yes};
\path [connector] (start) -- (data);
\path [connector] (data) -- (decision);
\path [connector] (decision) -- (error);
\path [connector] (decision) -- (success);
\path [connector] (error) |- (end);
\path [connector] (success) -- (end);
\end{tikzpicture}
```

The image below depicts the flowchart generated by the previously presented code:

The flowchart contains a total of 12 elements: two terminators (*start* and *end*), one data element (*data*), one decision element (*decision*), two processes (*error* and *success*), and six connectors.

The coordinates of the elements were explicitly defined (*at (X, Y)*). The exception, however, is the connectors. **Connectors don’t require coordinates definition, only the identifiers of the elements they connect.**

We also employed the node modifier *fill *to set colors for the elements. Furthermore, we used extra nodes for labeling specific connectors (*no* and *yes*).

Despite decreasing the customization possibilities compared to explicitly defining coordinates, the relative positioning of elements in a flowchart can bring several advantages.

Among these advantages, we can cite, for example, the intuitiveness of building flowcharts and the standardization of the distance between elements in the flowchart.

The following LaTeX/TikZ code shows the TikZ picture environment to create a simple flowchart using relative positioning:

```
\begin{tikzpicture}[node distance = 3cm]
\node [terminator, fill=blue!20] (start) {\textbf{Start}};
\node [data, fill=blue!20, below of=start] (data) {Provide data};
\node [decision, fill=blue!20, below of=data] (decision) {Valid data?};
\node [process, fill=red!20, right of=decision] (error) {Error};
\node [process, fill=green!20, below of=decision] (success) {Success};
\node [terminator, fill=blue!20, below of=success] (end) {\textbf{End}};
\node[draw=none] at (1.60, -5.75) (no) {No};
\node[draw=none] at (0.35, -7.80) (yes) {Yes};
\path [connector] (start) -- (data);
\path [connector] (data) -- (decision);
\path [connector] (decision) -- (error);
\path [connector] (decision) -- (success);
\path [connector] (error) |- (end);
\path [connector] (success) -- (end);
\end{tikzpicture}
```

The image generated using the previously presented code is presented next:

It is relevant to note that the definition of coordinates got replaced with node positioning modifiers (*below of*, *above of*, *right of*, *left of*). So, we need to provide a referential allocate an element.

The referential, in our case, is any other previously allocated element, and the global referential is the first allocated element (the terminator *start*).

Furthermore, we employ the TikZ picture environment modifier of node distance to determine the standard distance between neighbor node centers.

Finally, we defined the coordinates of the labeling nodes (*no* and *yes*) once they demand a specific positioning on the image.

**In this tutorial, we learned about drawing flowcharts with LaTeX/TikZ. **First, we took a look at how the LaTeX/TikZ environment works. Thus, we mainly studied some elements used to build flowcharts, observing how to create them with LaTeX/TikZ. Finally, we attempted on how to build complete flowcharts with these elements.

We can conclude that LaTeX/TikZ is a very versatile tool to create flowcharts. Specifically, the TikZ package library enables the user to create several forms natively, and its libraries provide several of these forms already defined.

The post How to Draw Flowcharts With LaTeX first appeared on Baeldung on Computer Science.]]>In this tutorial, we’ll present three ways to check the periodicity of a string. In other words, we check if it’s a repetition of one of its substrings.

**We have a string () and want to check if we can obtain it by concatenating one of its proper prefixes to itself multiple times**. The substring of we repeatedly concatenate to get has to be its prefix. Otherwise, it wouldn’t be possible to form the whole string.

For example, if , then we get by concatenating two times to itself:

which we write as . Since isn’t a prefix of , we can’t get from it.

So, formally, **the prefix we want to find is of the form where **.

Let’s see how we can determine it.

**We can iterate over the prefixes of and check if concatenating them gives us **. To test if concatenates to , we iterate over and check if for all . This way, we cyclicly iterate over :

If we find an for which the test fails, we can stop the iteration. Otherwise, we return since we’ve proven that for some integer .

Here’s the pseudocode:

**This algorithm’s worst-case time complexity is **. Here’s why. The worst-case input is a string of the form . If given such an input, the algorithm runs symbol equality tests for each of the outer loop. So, the total number of checks is .

The complexity would stay quadratic even if we started checks from in the inner loop, skipping the unnecessary checks of , , , :

**The problem with this approach is that it tries to get even from the prefixes for which we can conclude in advance that they can’t form **. The key is to observe that ** has to divide without remainder for to be a repetition of **. For instance, and . Even without trying to get the latter from the former, we can say that it’s an impossible task because .

**So, we start from in the outer loop as before but don’t test whether unless is a divisor of **.

**The iteration can stop at ** because no number greater than divides except , and has to be strictly lower than :

Here’s the pseudocode:

Another way would be to identify the divisors of in advance to skip unnecessary checks whether .

Let’s derive the worst-case complexity of this approach. The worst-case input is of the same form as in the previous algorithm (). The iterations of the inner loop still perform checks but since we enter the inner loops only if , **the number of inner loops isn’t but is equal to , where is the number of different divisors of **.

**Since , the total complexity is **, so the algorithm is definitely sub-quadratic. We can get a tighter bound if we use the result:

From there, we conclude that , **so the algorithms’ complexity is**:

That means that the algorithm’s worst-case performance is better than but worse than . Is there a way to solve the problem in the time?

We can design a linear-time algorithm by using the theory of strings.

We’ll use the theorem which states that **a string is a repetition of one of its substrings if and only if is a non-trivial rotation of itself**. Let’s prove it.

If for some of length , then by removing the first symbols and appending them to the last occurrence of in , we get the same string.

To prove the theorem in the other direction, we assume that is a non-trivial rotation of itself. That means that there’s an such that we can remove the first symbols, append them to the rest of in the same order, and obtain the starting string, :

Since the LHS and RHS are equal, it must hold that , , , . So, we’ve proven that:

However, since the rotation of is equal to , we can rotate it again by symbols to get the same string and conclude that , , , , so . Following the approach, we get that .

**So, the problem is now to check whether the input string is a non-trivial rotation of itself. If that’s the case, will be a proper substring of (not starting at the st and the th positions)**. Therefore, **we reduce the original problem to a string search.** We try to find the index of in different from 1 and . If successful, we’ll show that is periodic, and the building block will be .

We can use an efficient string matching algorithm that uses the substring index of (constructible in time) and then look for a non-trivial occurrence of in time. Therefore, **we can solve the original problem in the linear worst-case time**.

**In this article, we presented three algorithms to check string periodicity.** The first and the second ones are easier to understand and implement than the third one. However, the rotation algorithm is more efficient. Its worst-case complexity is , whereas the other two have super-linear time complexities.

In this tutorial, we’ll present Binary Insertion Sort. It’s a variant of Insertion Sort that uses Binary Search to improve performance.

**Insertion Sort sorts the input array by iteratively growing a sorted sub-array at the start of .**

So, Insertion Sort first checks if and swaps them if that’s the case. Then, it finds where it should insert so that it holds that ( is the -th element of , whereas is the initial value of ). It continues like this, growing the sorted sub-array one element at a time. Before it inserts , the sorted sub-array consists of the elements initially at positions through but now in the sought order:

**Insertion Sort inserts right at the place that makes also sorted.** When it inserts at the appropriate position, the whole array becomes non-descending.

This is the pseudo-code of Insertion Sort:

The worst-case input is an array sorted in the opposite way (). In that case, Insertion Sort has to do comparisons and swaps for each . In total, it does swaps and performs the same number of comparisons. Therefore, **the algorithm has the quadratic worst-case time complexity.**

The average-case complexity of Insertion Sort is also .

**The idea behind Binary Insertion Sort is to use Binary Search to find where to place each . The goal is to reduce the number of comparisons.**

This is the pseudo-code of BIS:

**The number of swaps is the same as in the standard version of Insertion Sort.**

In the worst case, we perform approximately comparisons for each , and do exactly one comparison for :

Using Stirling’s approximation, we get:

So, we conclude that **the number of comparisons Binary Insertion Sort performs is log-linear in **. **However, since the number of swaps is , both algorithms are asymptotically the same in the worst case.** That’s also true of their average-case complexities.

Why should we bother implementing binary search and using it within Insertion Sort if the resulting complexity doesn’t change? It seems Binary Insertion Sort isn’t worth the extra work. The answer is that, **although asymptotically equivalent to the standard version of Insertion Sort, Binary Insertion Sort usually works faster in practice. It compares fewer elements because of binary search.**

**If the elements are complex, and comparing two objects takes a lot of time, the time spent comparing the items will dominate that spent exchanging them.** In such cases, the improvement brought by binary search pays off significantly. If we deal with simple types such as numbers of characters, we probably won’t notice any difference. However, in most real-world applications, our data will be more intricate.

In this article, we talked about Binary Insertion Sort. It’s a variant of Insertion Sort that uses Binary Search to find where to place in the input’s sub-array while iterating over .

Although Binary Search reduces the number of comparisons to in the worst case, **Binary Insertion Sort has a quadratic time complexity just as Insertion Sort. Still, it is usually faster than Insertion Sort in practice, which is apparent when comparison takes significantly more time than swapping two elements.**

In this tutorial, we’ll analyze two common processing techniques used by the operating system (OS). We usually need these two concepts when we have multiple processes waiting to be executed.

Before getting into too much detail about concurrency and parallelism, let’s have a look at the key definitions used in the descriptions of these two processing methods:

**Multiprocessing:**The employment of two or more central processing units (CPUs) within a single computer system is known as multiprocessing.**Multithreading:**This technique allows a single process to have multiple code segments, like threads. These segments run concurrently within the context of that process.**Distributed Computing:**A distributed computing system consists of multiple computer systems that’s run as a single system. The computers in a system can be physically close to each other and connected by a local network, or they can be distant and connected by a wide area network.**Multicore processor:**It’s a single integrated processor that includes multiple core processing units. It’s also known as chip multiprocessor (CMP).

**Pipelining:**It’s a technique where multiple instructions are overlapped during execution.

Concurrency actually means that multiple tasks can be executed in an overlapping time period. One of the tasks can begin before the preceding one is completed; however, they won’t be running at the same time. The CPU will adjust time slices per task and appropriately switch contexts. That’s why this concept is quite complicated to implement and especially debug.

The main aim of concurrency is to maximize the CPU by minimizing its idle time. **While the current thread or process is waiting for input-output operations, database transactions, or launching an external program, another process or thread receives the CPU allocation.** On the kernel side, the OS sends an interrupt to the active task to stop it:

If two or more jobs are running on the same core of a single-core or multi-core CPU, they can access the same resources at the same time. Even though data read operations are performed in parallel and are safe, during write accesses, programmers must maintain data integrity.

Efficient process scheduling has a crucial role in a concurrent system. First-in, first-out (FIFO), shortest-job-first (SJF), and round-robin (RR) are popular task scheduling algorithms.

As we mentioned, it can be complicated to implement and debug concurrency, especially at the kernel level, so there can be starvation between processes when one of the tasks gets the CPU for too long. In order to prevent this situation, interrupts are designed, and they help the CPU allocate other processes. This is also called preemptive scheduling. The OS, like any other application, requires CPU time to adjust concurrent tasks.

Parallelism is the ability to execute independent tasks of a program in the same instant of time. **Contrary to concurrent tasks, these tasks can run simultaneously on another processor core, another processor, or an entirely different computer that can be a distributed system.** As the demand for computing speed from real-world applications increases, parallelism becomes more common and affordable.

**The figure below represents an example of distributed systems.** As we previously mentioned, a distributed computing system consists of multiple computer systems, but it’s run as a single system. The computers that are in a system can be physically close to each other and connected by a local network, or they can be distant and connected by a wide area network:

Parallelism is a must for performance gain. There’s more than one benefit of parallelism, and we can implement it on different levels of abstractions:

- As we can see in the figure above, distributed systems are one of the most important examples of parallel systems. They’re basically independent computers with their own memory and IO.
- For example, we can have multiple functional units, like several adders and multipliers, managed by one instruction set.
- Process pipelining is another example of parallelism.
- Even at chip-level, parallelism can increase concurrency in operations.
- We can also take advantage of parallelism by using multiple cores on the same computer. This makes various edge devices, like mobile phones, possible.

Let’s take a look at how concurrency and parallelism work with the below example. As we can see, there are two cores and two tasks. In a concurrent approach, each core is executing both tasks by switching among them over time. In contrast, the parallel approach doesn’t switch among tasks, but instead executes them in parallel over time:

This simple example for concurrent processing can be any user-interactive program, like a text editor. In such a program, there can be some IO operations that waste CPU cycles. When we save a file or print it, the user can concurrently type. The main thread launches many threads for typing, saving, and similar activities concurrently. They may run in the same time period; however, they aren’t actually running in parallel.

In contrast, we can give an example of Hadoop-based distributed data processing for a parallel system. It entails large-scale data processing on many clusters and it uses parallel processors. Programmers see the entire system as a single database.

As we noted earlier in this tutorial, concurrency and parallelism are complex ideas and require advanced development skills. Otherwise, there could be some potential risks that jeopardize the system’s reliability.

For example, if we don’t carefully design the concurrent environment, there can be deadlocks, race conditions, or starvation.

Similarly, we should also be careful when we’re doing parallel programming. We need to know where to stop and what to share. Otherwise, we could face memory corruption, leaks, or errors.

Simultaneously executing processes and threads is the main idea that concurrent programming languages use. On the other hand, languages that support parallelism make programming constructs able to be executed on more than one machine. Instruction and data stream are key terms for the parallelism taxonomy.

These languages include some important concepts. Instead of learning the language itself, it would be better to understand the fundamentals of these subjects:

**Systems programming:**It’s basic OS and hardware management that can include system call implementation and writing a new scheduler for an OS.**Distributed computing:**As we mentioned earlier, it’s a must for parallel CPUs to be utilized.**Performance computing:**This concept is necessary for CPU resource optimization.

Now let’s categorize the different languages, frameworks, and APIs:

**Shared memory languages:**Orca, Java, C (with some additional libraries)**Object-oriented parallelism:**Java, C++, Nexus**Distributed memory:**MPI, Concurrent C, Ada**Message passing:**Go, Rust**Parallel functional languages:**LISP**Frameworks and APIs:**Spark, Hadoop

These are just some of the different languages which we can use for concurrency and parallelism. Instead of the whole language itself, there are library extensions, such as POSIX thread library for C programming language. With this library, we can implement almost all of the concurrent programming concepts, such as semaphores, multi-threads, and condition variables.

In this article, we discussed how concurrency and parallelism work, and the differences between them. We shared some examples related to these two concepts, and explained why we need them in the first place. Lastly, we gave a brief summary of the potential pitfalls in concurrency and parallelism, and listed the programming languages that support these two important concepts.

The post Concurrency vs Parallelism first appeared on Baeldung on Computer Science.]]>In this tutorial, we’ll talk about tabulation and memoization as two techniques of dynamic programming.

Dynamic Programming (DP) is an optimization paradigm that finds the optimal solution to the initial problem by solving its sub-problems and combining their solutions, usually in polynomial time. In doing so, DP makes use of Bellman’s Principle of Optimality, which we state as follows:

A sub-solution of the entire problem’s optimal solution is the optimal solution to the corresponding sub-problem.

So, DP first divides the problem so that the optimal solution of the whole problem is a combination of the optimal solutions to its sub-problems. But, the same applies to the sub-problems: their optimal solutions are also combinations of their sub-problems’ optimal solutions. This division continues until we reach base cases.

Therefore, **each problem we solve using DP has a recursive structure that respects Bellman’s Principle.** We can solve the problem by traversing the problem’s structure top-down or bottom-up.

**Tabulation is a bottom-up method for solving DP problems.** It starts from the base cases and finds the optimal solutions to the problems whose immediate sub-problems are the base cases. Then, it goes one level up and combines the solutions it previously obtained to construct the optimal solutions to more complex problems.

Eventually, tabulation combines the solutions of the original problem’s subproblems and finds its optimal solution.

Let’s say that we have a grid. The cell contains coins (). Our task is to find the maximal amount of coins we can collect by traversing a path from the cell to the cell . In moving across the board, we can move from a cell to its right or bottom neighbor. We consider collected once we reach the cell .

Here’s an example of a grid, with the optimal path highlighted:

Let’s suppose that is the optimal path from to and that it goes through cell . Then, the part of from to , represents the optimal path between the two cells. If it wasn’t so, then there would be a better path from to . Consequently, wouldn’t be the optimal path to because the concatenation of and would be better, and that would be a contradiction.

Now, let’s say that we’ve determined the optimal path to and its total yield . Based on the previous analysis and the problem definition, we conclude that must pass through or . So, the recursive definition of is:

However, what holds for , also holds for any . When we account for the base cases, we get the recursive definition of :

(1)

To reconstruct the path, we can use an auxiliary matrix where if the optimal path reaches from the left (i.e., by moving one cell right from the predecessor cell), and if it does so by moving one cell down. However, to keep things simple(r), we’ll omit path tracing and focus only on computing .

Now, we reverse the direction of the recursion. We start from the base case . From there, we find the yields of the paths to the cells in the st column and the first row. Then, we compute the values for the second column and row, and so on:

Processing the -shaped stripes one by one, we get our tabulation algorithm:

This algorithm’s time and space complexities are .

**Tabulation algorithms are usually very efficient. Most of the time, they’ll have polynomial complexity.** And, since they’re iterative, they don’t risk throwing the stack overflow error. However, there are certain drawbacks.

First, we invest a lot of effort into identifying the problem’s recursive structure to get the recurrence right. However, **the tabulation algorithms aren’t recursive**. What’s more, they solve the problems in the reverse direction: from the base cases to the original problem. That’s the first drawback of tabulation. **It may be hard to get the tabulation algorithm from a recursive formula.**

Also, recursive relations such as (1) are natural descriptions of the problems to solve. So, they’re easier to understand than an iterative algorithm in which the recursive structure of the problem isn’t as apparent.

Finally, systematic in constructing more complex problems from the simpler ones, tabulation algorithms may compute the solutions to the sub-problems that aren’t necessary to solve the original problem.

**Is there a way to have the efficiency of tabulation and keep the elegance and understandability of recursion?** Yes, there is, and it’s called memoization.

The idea behind it is as follows. **First, we write the recursive algorithm as usual. Then, we enrich it with a memory structure where we store the solutions to the sub-problems we solve.** If we reencounter the same sub-problem during the execution of the recursive algorithm, we don’t re-calculate its solution. Instead, we read it from memory.

**This way, we avoid repeated computation and reduce the time complexity to the number of different sub-problems.** We do so at the expense of the space complexity but don’t use more memory than the corresponding tabulation algorithm.

Here’s the memoization algorithm for the grid problem:

This algorithm’s time and space complexities are .

Let’s draw the first three levels of the memoization’s recursion tree:

The algorithm calculates the value while computing in the root’s left sub-tree. Later, when it calculates in the right sub-tree, it doesn’t have to compute from scratch. Instead, it reuses the readily available value in the memory.

So, we see that memoization effectively prunes the execution tree.

Even though it’s intuitive and efficient, memoization sometimes isn’t an option. **The problem is that it can cause the stack overflow error. That will be the case if the input size is too big.** Iterative, the memoization algorithms don’t suffer from the same issue.

A more subtle but still relevant issue is related to the memory we use to store the results. **For memoization to work, we need an -access memory structure.** Hash maps, especially with double hashing, offer the constant worst-case access time. However, they require a hash function for the (sub-)problems. In the coin example, it’s easy to devise a quality hash. However, the problems may be complex, so a hash could be very hard to design.

Finally, **some authors don’t consider memoization a DP tool**, whereas others do. That isn’t a drawback by itself but is worth keeping in mind.

**In this article, we talked about tabulation and memoization**. Those are the bottom-up and top-down approaches of dynamic programming (DP). Although there’s no consensus about the latter being a DP technique, **we can use both methods to obtain efficient algorithms**.

While the memoization algorithms are easier to understand and implement, they can cause the stack overflow (SO) error. The tabulation algorithms are iterative, so they don’t throw the SO error but are generally harder to design.

The post Tabulation vs. Memoization first appeared on Baeldung on Computer Science.]]>In this article, we’ll elaborate on two cryptographic algorithms, namely MD5 (message-digest algorithm) and SHA (Secure Hash Algorithm). We’ll discuss them in detail, and after that, we’ll compare them.

To begin with, let’s define a cryptographic hash function, a fundamental element of both mentioned algorithms. **A cryptographic hash function takes a variable-length input and produces fixed-size output called a hash**. In other words, it maps an arbitrarily large input into a fixed-size array of bits (hash).

**A cryptographic hash function should be a one-way operation**. Therefore, retrieving the data using its hash should be impossible. In general, one shouldn’t be able to guess or retrieve any useful information from the hash. Therefore, pseudorandomness of cryptographic hash functions is demanded. Moreover, **a cryptographic hash function needs to be collision-resistant**. There shouldn’t be two different messages that produce the same hash.

Cryptographic hash functions are often used to check data integrity and identify files. **It’s easier and faster to compare hashes than to compare the data itself.** Further, they are used for authentication purposes, storing confidential data (e.g., passwords) in databases, or for password verification. As we can see, cryptographic hash functions are strongly related to an application or data security. Therefore, they should be secure and reliable.

**MD5 is a cryptographic hash function that takes arbitrarily long data and produces a 128-bit hash**. Although it’s considered to be cryptographically broken, it’s still widely used for some purposes. One of the most common uses is validating the integrity of publicity shared files. The MD5 algorithm processes data in 512-bit chunks split into 16 words composed of 32 bits each. The result is a 128-bit hash**.**

Let’s see the MD5 hashing in practice. Consider the following example:

`MD5("The grass is always greener on the other side of the fence.") = d78298e359ac826549e3030104241a57`

Just a simple change in the input (replacing dot with exclamation mark) produces an entirely different hash:

`MD5("The grass is always greener on the other side of the fence!") = 2e51f2f8daec292839411955bd77183d`

Such a property is called an avalanche effect.

As we mentioned earlier, the MD5 is considered to be cryptographically broken. Let’s talk in detail about its security.

Let’s recall one of the most essential attributes of the cryptographic hash function: a cryptographic hash function needs to be collision-resistant**. **In simple words, **two inputs should never produce the same hash**.

In 2011, Internet Engineering Task Force (IETF) published RFC 6151, describing possible attacks on MD5. Some attacks could generate collisions in less than a minute on an average computer. The research stated that:

the aforementioned results have provided sufficient reason to eliminate MD5 usage in applications where collision resistance is required such as digital signatures.

Thus, **the MD5 is no longer recommended for solutions requiring a high level of security**. However, as we mentioned earlier, it’s widely used as a checksum for files. Let’s consider an example. An indie developer publishes a game free of charge. The game file has a specific hash value assigned. You’re downloading the game from a third-party site. If the hash of the downloaded file differs, it isn’t the original one. Thus, it can be a virus, or files may have been damaged while downloading (e.g., due to network issues).

To sum up, the MD5 algorithm has security vulnerabilities, and it’s considered cryptographically broken. Nowadays, there are more secure algorithms like SHA-2. Let’s introduce it.

SHA is a widely used family of hash algorithms. There are currently three main versions, namely SHA-1, SHA-2, SHA-3. In this article, we’ll focus on a popular SHA-2 algorithm. **SHA-2 consists of different variants which use the same algorithm but different constants.** Therefore, they produce an output of different lengths, e.g., 224, 256, or 512 bits. The variants are often referred to as SHA-224, SHA-256, SHA-512, etc. Although, they are all subversions of SHA-2. Let’s use examples from the MD5 section and see SHA-256 in practice:

`SHA256("The grass is always greener on the other side of the fence.") = d017bcafd6aa208df913d92796f670df44cb8d7f7b548d6f9eddcccf214ac08a`

`SHA256("The grass is always greener on the other side of the fence!") = a8c655db7f4d0a3a0b34209f3b89d4466332bbf2745e759e01567ac74b23a349`

SHA2- is known for its security. It is used for multiple purposes like cryptocurrencies, TLS, SSL, SSH, password hashing, digital signature verification. Moreover, SHA-2 is required to be used by law in some U.S. government applications, primarily to protect confidential data.

Let’s analyze the security of the SHA-256 algorithm. It’s one of the most secure and popular hashing algorithms. First of all, it’s a one-way operation. Therefore, it’s almost impossible to reconstruct the input from the hash. Theoretically, a brute force attack would need attempts to achieve this.

Secondly, **SHA-256 is collision-resistant.** This is because there are possible hash values. Therefore, there is almost no chance of collision in practice.

Finally, the SHA-256 follows the avalanche effect. A small change in the input produces a completely different hash.

To sum up, **SHA-256 meets all of the important requirements of the cryptographic hash function**. Thus, it’s very often used in applications requiring a high level of security.

Now we know the fundamentals of MD5 and SHA-2. Let’s compare them. First of all, MD5 produces 128-bit hashes. SHA-2 contains subversion that can produce hashes of different lengths. The most common is SHA-256 that produces 256-bit hashes.

Secondly, **the SHA-2 is more secure than MD5, especially in terms of collision resistance**. Therefore, the MD5 isn’t recommended to use for high-security purposes. On the other hand, the SHA-2 is used for high-security purposes, e.g., digital signature or SSL handshake. Moreover, there are fewer reported attacks on SHA-2 than on MD5. The MD5 is considered to be cryptographically broken and can be attacked by an average computer.

In terms of speed, the MD5 is slightly faster than SHA-2. Therefore, the MD5 is often used as a checksum for verifying files integrity.

To sum up, in most cases, SHA-2 will do better than MD5. It’s more secure, reliable, and less likely to be broken. It doesn’t really matter that SHA-2 is slightly slower than the MD5 until the speed is the main criteria. The SHA-2 has subversion that produces different length hashes. The longer hash means that the algorithm is slower. Thus, SHA-256 seems to be the best balance between security and speed.

In this article, we discussed the MD5 and SHA-2 algorithms in detail. Then, we compared both. **The conclusion is that SHA-2 does better than MD5 in most cases, especially regarding security**. On the other hand, MD5 can be used in solutions that don’t require a high level of security and when speed is the main criteria.