# 2.3: Arithmetic of inequality

Definition

Let (a, bin mathbb{Z}). Then

1. (a< b) provided (b=a + k), for some (k in mathbb{Z_+}).
2. (a> b) provided (a=b + h), for some (h in mathbb{Z_+}).

Theorem (PageIndex{1})

Let (a, bin mathbb{Z}).

1. If (a< b) then (a+c< b+c), ( forall c in mathbb{Z}).
2. If (a< b) then (ac< bc),( forall c in mathbb{Z_+}).
3. If (a< b) then (ac> bc),( forall c in mathbb{Z_-}).
4. If (a< b) and (c< d) then (a+c< b+d).
Proof

Let (a, b, c in mathbb{Z}) such that (a

1. Now consider, (b+c= (a+k)+c= (a+c)+k), for some (k in mathbb{Z_+}). Thus (a+c< b+c).

2.

Example (PageIndex{1})

Determine all integers (m) that satisfy (-12m geq 324).

Solution

Since (-12m geq 324), ( m leq -dfrac{324}{12}=-27).

Example (PageIndex{2})

Determine all integers (m) that satisfy (14m geq 635).

Solution

Since (14m geq 635), ( m geq dfrac{635}{14}=45.35). Thus the solutions are ( {min mathbb{Z}| mgeq 46}.)

Example (PageIndex{3})

Determine all integers (k) that satisfy ( -165+ 98k geq 0, -335+199k geq 0, -165+ 98k < 100 ) and ( -335+199k <100).

Solution

Since (-165 + 98k ≥ 0, k ≥ 1.68).

Since (-335 + 199k ≥ 0, k ≥ 1.68).

Since (-165 + 98k < 100, 98k < 265, ) and ( k < 2.70.)

Since (-335 + 199k < 100, 199k < 435,) and ( k < 2.18.)

Since ( 1.68 ≤ k < 2.18) and ( k ∈ ℤ, k = 2. )

## A Gentle Introduction to Jensen’s Inequality

It is common in statistics and machine learning to create a linear transform or mapping of a variable.

An example is a linear scaling of a feature variable. We have the natural intuition that the mean of the scaled values is the same as the scaled value of the mean raw variable values. This makes sense.

Unfortunately, we bring this intuition with us when using nonlinear transformations of variables where this relationship no longer holds. Fixing this intuition involves the discovery of Jensen’s Inequality, which provides a standard mathematical tool used in function analysis, probability, and statistics.

In this tutorial, you will discover Jensen’s Inequality.

After completing this tutorial, you will know:

• The intuition of linear mappings does not hold for nonlinear functions.
• The mean of a convex function of a variable is always greater than the function of the mean variable, called Jensen’s Inequality.
• A common application of the inequality is in the comparison of arithmetic and geometric means when averaging the financial returns for a time interval.

Kick-start your project with my new book Probability for Machine Learning, including step-by-step tutorials and the Python source code files for all examples. A Gentle Introduction to Jensen’s Inequality
Photo by gérard, some rights reserved.

## Social inequality

Social inequality refers to relational processes in society that have the effect of limiting or harming a group's social status, social class, and social circle.

Areas of social inequality include access to voting rights, freedom of speech and assembly, the extent of property rights and access to education, health care, quality housing, traveling, transportation, vacationing and other social goods and services.

Apart from that it can also be seen in the quality of family and neighbourhood life, occupation, job satisfaction, and access to credit.

If these economic divisions harden, they can lead to social inequality. The reasons for social inequality can vary, but are often broad and far reaching.

Social inequality can emerge through a society's understanding of appropriate gender roles, or through the prevalence of social stereotyping.

Social inequality can also be established through discriminatory legislation.

Social inequalities exist between ethnic or religious groups, classes and countries making the concept of social inequality a global phenomenon.

Social inequality is different from economic inequality, though the two are linked.

Social inequality refers to disparities in the distribution of economic assets and income as well as between the overall quality and luxury of each person's existence within a society, while economic inequality is caused by the unequal accumulation of wealth social inequality exists because the lack of wealth in certain areas prohibits these people from obtaining the same housing, health care, etc. as the wealthy, in societies where access to these social goods depends on wealth.

Social inequality is linked to racial inequality, gender inequality, and wealth inequality.

The way people behave socially, through racist or sexist practices and other forms of discrimination, tends to trickle down and affect the opportunities and wealth individuals can generate for themselves.

Shapiro presents a hypothetical example of this in his book, The Hidden Cost of Being African American, in which he tries to demonstrate the level of inequality on the "playing field for blacks and whites."

One example he presents reports how a black family was denied a bank loan to use for housing, while a white family was approved.

As being a homeowner is an important method in acquiring wealth, this situation created fewer opportunities for the black family to acquire wealth, producing social inequality.

## Vector Spaces, Hilbert Spaces, and the L2 Space

### Proof

By Hölder's inequality (Theorem 5.5.2),

and hence for all s ∈ L p , | T ( s ) | ‖ s ‖ q ≤ ‖ t ‖ p . Therefore ||T|| ≤ ||t||p, and T is bounded. We leave the proof of the equality as an exercise.

In fact, the converse of Lemma 5.5.2 holds for 1 ≤ p < ∞. This is the well-known Riesz Representation Theorem (the proof of which is rather long, so we omit it):

## Contents

### Example 1.

(Theory of operator inequalities.) T. Ando [a1] has proved that the following assertions are equivalent:

2) $A ^ geq ( A ^ B ^ A ^ ) ^ <1/2 >$ for all $p geq 0$

3) $A ^ <- p/2 >( A ^ B ^ A ^ ) ^ <1/2 >A ^ <- p/2 >$ is a decreasing function of $p geq 0$.

An extension of Ando's result is the Fujii–Furuta–Kamei theorem, which states that the following assertions are equivalent:

2) $A ^ geq ( A ^ B ^ A ^ ) ^$ for all $p geq 0$ and all $s geq 0$

3) $A ^ <- r/2 >( A ^ B ^ A ^ ) ^ <( t + r ) / ( p + r ) >A ^ <- r/2 >$ is a decreasing function of both $p geq t$ and $r geq 0$ for any fixed $t geq 0$.

### Example 2.

(Theory of operator inequalities.) This examples concerns the estimation of the value of the relative operator entropy, defined by $S ( A mid B ) = A ^ <1/2 >( < mathop< m log>> A ^ <- 1/2 >BA ^ <- 1/2 >) A ^ <1/2 >$ for positive invertible operators $A$ and $B$. One has $< mathop< m log>> C geq < mathop< m log>> A geq < mathop< m log>> B$ if and only if $S ( A ^ <- p >mid C ^ ) geq S ( A ^ <- p >mid A ^ ) geq S ( A ^ <- p >mid B ^ )$ for all $p geq 0$ and all $s geq 0$.

### Example 3.

(Theory of operator inequalities.) Ando and F. Hiai [a2] established the following useful and interesting inequality, equivalent to log majorization (the Ando–Hiai inequality): If $A geq B geq 0$ with $A > 0$, then

for all $p geq 1$ and $r geq 1$. The following inequality, proved in [a7], interpolates between the Furuta inequality and the inequality stated above. The grand Furuta inequality (1995): If $A geq B geq 0$ with $A > 0$, then for each $t in [ 0,1 ]$ and $p geq 1$,

is a decreasing function of both $r$ and $s$ for any $s geq 1$ and $r geq t$, and the following inequality holds:

$A ^ <1 - t >= F _ ( A,A,r,s ) geq F _ ( A,B,r,s )$

for any $s geq 1$, $p geq 1$, $r geq t$.

The mean-theoretic approach has also some advantages [a4]. See [a9] for the theory of operator means, in which the correspondence between non-negative operator monotone functions on $( 0, infty )$ and operator means is given.

Examples in the theory of norm inequalities are given by generalizations of the Heinz–Kato inequality and a generalization of the Kosaki trace inequality and related trace inequalities. Examples in the theory of operator equations are a generalization of the Pedersen–Takesaki theorem, which is closely related to a non-commutative Radon–Nikodým theorem, and related results.

## Use of the Inequality

If we know more about the distribution that we’re working with, then we can usually guarantee that more data is a certain number of standard deviations away from the mean. For example, if we know that we have a normal distribution, then 95% of the data is two standard deviations from the mean. Chebyshev’s inequality says that in this situation we know that at least 75% of the data is two standard deviations from the mean. As we can see in this case, it could be much more than this 75%.

The value of the inequality is that it gives us a “worse case” scenario in which the only things we know about our sample data (or probability distribution) is the mean and standard deviation. When we know nothing else about our data, Chebyshev’s inequality provides some additional insight into how spread out the data set is.

## MML, Hybrid Bayesian Network Graphical Models, Statistical Consistency, Invariance and Uniqueness

### 2.3 Entropy

Let us re-visit our result from equation (1) and the standard accompanying approximation that li = − log pi.

Let us begin with the 2-state case. Suppose we have probabilities p1 and p2 = 1−p1 which we wish to encode with code-words of length l1 = − log q1 and l2 = − log q2 = − log(1 − q1) respectively. As per the Huffman code construction (and Kraft's inequality), choosing such code lengths gives us a prefix code (when these code lengths are non-negative integers).

The negative of the expected code length would then be

and we wish to choose q1 and q2 = 1−q1 to make this code as short as possible on average — and so we differentiate the negative of the expected code length with respect to q1.

exactly as in the 2-state case above, where again q1 = p1.

This expected (or average) code length,

Note that if we sample randomly from the distribution p with code-words of length − log p, then the (expected) average long-term cost is the entropy.

Where the distribution is continuous rather than (as above) discrete, the sum is replaced by an integral and (letting x be a variable being integrated over) the entropy is then defined as

And, of course, entropy can be defined for hybrid structures of both discrete and continuous, such as Bayesian network graphical models (of sec. 7.6 ) — see sec. 3.6 , where it is pointed out that for the hybrid continuous and discrete Bayesian net graphical models in [ Comley and Dowe, 2003 2005 ] (emanating from the current author's ideas in [ Dowe and Wallace, 1998 ]), the log-loss scoring approximation to Kullback-Leibler distance has been used [ Comley and Dowe, 2003 , sec. 9].

The next section, sec. 2.4 , introduces Turing machines as an abstract model of computation and then discusses the formal relationship between MML and minimising the length of some (constrained) input to a Turing machine. The section can be skipped on first reading.

## Lesson 10 Summary

When we find the solutions to an inequality, we should think about its context carefully. A number may be a solution to an inequality outside of a context, but may not make sense when considered in context.

Suppose a basketball player scored more than 11 points in a game, and we represent the number of points she scored, s , with the inequality s >11 . By looking only at s >11 , we can say that numbers such as 12, 14frac12 , and 130.25 are all solutions to the inequality because they each make the inequality true.

In a basketball game, however, it is only possible to score a whole number of points, so fractional and decimal scores are not possible. It is also highly unlikely that one person would score more than 130 points in a single game.

In other words, the context of an inequality may limit its solutions.

The solutions to r<30 can include numbers such as 27frac34 , 18.5, 0, and -7. But if r represents the number of minutes of rain yesterday (and it did rain), then our solutions are limited to positive numbers. Zero or negative number of minutes would not make sense in this context.

To show the upper and lower boundaries, we can write two inequalities:

Inequalities can also represent comparison of two unknown numbers.

• Let’s say we knew that a puppy weighs more than a kitten, but we did not know the weight of either animal. We can represent the weight of the puppy, in pounds, with p and the weight of the kitten, in pounds, with k , and write this inequality: p >k

## Lesson 15 Summary

Here is an inequality: 3(10-2x) < 18 . The solution to this inequality is all the values you could use in place of x to make the inequality true.

In order to solve this, we can first solve the related equation 3(10-2x) = 18 to get the solution x = 2 . That means 2 is the boundary between values of x that make the inequality true and values that make the inequality false.

To solve the inequality, we can check numbers greater than 2 and less than 2 and see which ones make the inequality true.

Let’s check a number that is greater than 2: x= 5 . Replacing x with 5 in the inequality, we get 3(10-2 oldcdot 5) < 18 or just 0 < 18 . This is true, so x=5 is a solution. This means that all values greater than 2 make the inequality true. We can write the solutions as x > 2 and also represent the solutions on a number line: Notice that 2 itself is not a solution because it's the value of x that makes 3(10-2x) ​equal to 18, and so it does not make 3(10-2x) < 18 true.

For confirmation that we found the correct solution, we can also test a value that is less than 2. If we test x=0 , we get 3(10-2 oldcdot 0) < 18 or just 30 < 18 . This is false, so x = 0 and all values of x that are less than 2 are not solutions.

## World Inequality Database on Education

The World Inequality Database on Education (WIDE) highlights the powerful influence of circumstances, such as wealth, gender, ethnicity and location, over which people have little control but which play an important role in shaping their opportunities for education and life. It draws attention to unacceptable levels of education inequality across countries and between groups within countries, with the aim of helping to inform policy design and public debate.

### Compare overlapping disparities

Selecting an indicator compares disparities between countries for different groups, such as wealth, gender or location. Groups are visualized as coloured dots.

Clicking on a country shows the disparities for different groups, such as gender, wealth or location within the selected country.

Clicking on one of the groups shows overlapping disparities within countries. Combining multiple dimensions of inequality, it can compare, for example, education for rural poor women with urban rich men within a given country.

### Sustainable Development Goal 4 – Education

#### Target 4.1: Universal primary completion

In 35 out of 114 countries, fewer than 50% of the poorest children have completed primary school

#### Target 4.1: Universal secondary completion

More than 50% of young people in 65 out of 115 countries have not completed upper secondary school

#### Target 4.2: Early childhood care and education

Since 2010, in 23 out of 50 countries fewer than 25% of children in rural areas have the opportunity to attend a pre-primary programme

#### Target 4.5: Equity by gender

In 30 out of 116 countries, fewer than 90 females for every 100 males completed lower secondary school. In 17 countries, fewer than 90 males for every 100 females completed lower secondary school.

#### Target 4.5: Equity by language

Grade 4 students who did not speak the language of the test at home were at least 10 percentage points less likely than other students to reach the lowest level of proficiency in reading in 20 out of 10 countries that took part in the PIRLS assessment.

#### Target 4.6: Youth literacy

In 35 out of 75 countries, at least 25% of the poorest young women are not literate.

Major progress has been made since 2000 in enrolling children in primary school. However, progress has stalled in recent years, and children from marginalized groups continue to face significant barriers to accessing, attending and completing primary school. In Uganda, only 12% of the poorest 14- to 16 year olds had completed primary school in 2011.

According to the first target of the SDG agenda, all young people should complete upper secondary school by 2030. Around the world only 43% of young people did so in the period 2008-14. In Pakistan, only 20% of 20- to 22-year-olds had completed upper secondary school in 2012.

Early childhood is the critical period in which to lay the foundations for success in education and beyond. Yet children who would benefit most from early childhood care and education are least likely to receive it. For example, only 1% of 3- to 4-year old children in rural Iraq have the opportunity to attend pre-primary education programmes.

Despite improvement since 2000, significant gender disparities remain. In the case of lower secondary completion, while the most extreme injustices are still at the expense of females, the disparities can also move in the opposite direction. In Afghanistan, only 33 females complete lower secondary school for every 100 males. By contrast, in Honduras, only 68 males complete lower secondary school for every 100 females.

The SDG target on equity refers to all vulnerable groups, not just those characterized by gender, location, wealth and their interactions. Learning assessments record whether students speak the language of the test at home and it is possible to infer whether they are at a disadvantage. In Bulgaria, 96% of those who spoke the language of the test at home achieved the minimum level of proficiency in PIRLS in 2011 compared to just 68% among those who did not.

Youth literacy rates are higher than ever as a result of progress in primary education. However, progress is still nowhere near fast enough for the most disadvantaged populations. In Yemen, only 21% of the poorest young women could read a simple sentence in 2013