Expressions

Bag-Tuple

Coalesce

Supported Engines: Pig

Similar to the Coalesce function in SQL, uses the DataFu library to return the first non-null value from the arguments passed in. All arguments must be of the same type.

Example:

{
  "operation": "expression.Coalesce",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["A"]},
    {"operation": "expression.Field", "arguments": ["B"]},
    {"operation": "expression.Chararray", "arguments": ["not found"]}
  ]
}

Arguments are expressions. N-ary operation accepts unlimited arguments.

Sample Output:

datafu.pig.util.Coalesce(A, B, \'not found\')

Boolean

And

Supported Engines: Pig

Performs a logical AND on the result of nested expressions, which must return a boolean.

Example:

{
  "operation": "expression.And",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['A']},
    {"operation": "expression.Field", "arguments": ['B']}
  ]
}

Arguments are expressions. Binary operation requires 2 argumentss.

Sample Output:

(A) AND (B)

Equality

Supported Engines: Pig

Checks equality of two nested expressions. Automatically casts type if necessary/possible.

Example:

{
  "operation": "expression.Equality",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [1]},
    {"operation": "expression.Integer", "arguments": [2]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

1 == 2

GreaterEqualThan

Supported Engines: Pig

Checks if left expression is greater than or equal to the right one. Automatically casts type if necessary/possible.

Example:

{
  "operation": "expression.GreaterEqualThan",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [1]},
    {"operation": "expression.Integer", "arguments": [2]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

1 >= 2

GreaterThan

Supported Engines: Pig

Checks if left expression is greater than the right one. Automatically casts type if necessary/possible.

Example:

{
  "operation": "expression.GreaterThan",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [1]},
    {"operation": "expression.Integer", "arguments": [2]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

1 > 2

In

Supported Engines: Pig

Returns a boolean result depending on whether the value of the field is present in the specified multiple values.

Example:

{
  "operation": "expression.In",
    "arguments": [
      {"operation": "expression.Field", "arguments": ['A']},
      {"operation": "expression.Integer", "arguments": [1]},
      {"operation": "expression.Integer", "arguments": [2]},
      {"operation": "expression.Integer", "arguments": [3]}
    ]
}

Arguments are expressions. Operation requires minimum 2 arguments.

Sample Output:

A IN (1, 2, 3)

Inequality

Supported Engines: Pig

Checks inequality of two nested expressions. Automatically casts type if necessary/possible.

Example:

{
  "operation": "expression.Inequality",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [1]},
    {"operation": "expression.Integer", "arguments": [2]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

1 != 2

LessEqualThan

Supported Engines: Pig

Checks if left expression is less than or equal to the right one. Automatically casts type if necessary/possible.

Example:

{
  "operation": "expression.LessEqualThan",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [1]},
    {"operation": "expression.Integer", "arguments": [2]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

1 <= 2

LessThan

Supported Engines: Pig

Checks if left expression is less than the right one. Automatically casts type if necessary/possible.

Example:

{
  "operation": "expression.LessThan",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [1]},
    {"operation": "expression.Integer", "arguments": [2]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

1 < 2

Matches

Supported Engines: Pig

Checks if left expression matches the right regex. Right argument must be a string literal in [Java format](http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html).

Example:

{
  "operation": "expression.Matches",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test"]},
    {"operation": "expression.Integer", "arguments": ["(a|b|c)"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

test MATCHES '(a|b|c)'

Not

Supported Engines: Pig

Performs a logical NOT on the result of a nested expression, which must return a boolean.

Example:

{
  "operation": "expression.Not",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['A']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

NOT (A)

Or

Supported Engines: Pig

Performs a logical OR on the result of nested expressions, which must return a boolean.

Example:

{
  "operation": "expression.Or",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['A']},
    {"operation": "expression.Field", "arguments": ['B']}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

(A) OR (B)

Case

Bincond

Supported Engines: Pig

Depending on the boolean result of the first argument, chooses the second if true or third if false. First argument must return boolean, others may be any type.

Example:

{
  "operation": "expression.Bincond",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['A']},
    {"operation": "expression.Integer", "arguments": [1]},
    {"operation": "expression.Integer", "arguments": [2]}
  ]
}

Arguments are expressions. Ternary operation requires 3 arguments.

Sample Output:

(A ? 1 : 2)

Digitize

Supported Engines: Pig

Returns the indices of the bins to which each value in input relation field belongs.

Example:

{
  "operation": "expression.Digitize",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']},
    {"operation": "expression.Field", "arguments": ['bin_0']},
    {"operation": "expression.Field", "arguments": ['bin_1']},
    {"operation": "expression.Field", "arguments": ['bin_2']},
    {"operation": "expression.Field", "arguments": ['bin_3']}
  ]
}

Arguments are expressions. N-ary operation requires multiple arguments. The first argument is value to digitize.

Sample Output:

(CASE WHEN a < bin_0 THEN 0 WHEN a < bin_1 THEN 1 WHEN a < bin_2 THEN 2 WHEN a < bin_3 THEN 3 WHEN a >= bin_3 THEN 4 END)

DataType

Bool

Supported Engines: Pig

Creates an boolean literal.

Example:

{
  "operation": "expression.Bool",
  "arguments": [true]
}

Argument is a literal value. Unary operation requires 1 argument.

Sample Output:

true

CastToChararray

Supported Engines: Pig

Casts an expression to a chararray.

Example:

{
  "operation": "expression.CastToChararray",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [123]}
  ]
}

Argument is an expression. Unary operation requires 1 argument.

Sample Output:

(chararray)123

CastToDouble

Supported Engines: Pig

Casts an expression to an double.

Example:

{
  "operation": "expression.CastToDouble",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Argument is an expression. Unary operation requires 1 argument.

Sample Output:

(double)a

CastToFloat

Supported Engines: Pig

Casts an expression to an float.

Example:

{
  "operation": "expression.CastToFloat",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Argument is an expression. Unary operation requires 1 argument.

Sample Output:

(float)a

CastToInteger

Supported Engines: Pig

Casts an expression to an integer.

Example:

{
  "operation": "expression.CastToInteger",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Argument is an expression. Unary operation requires 1 argument.

Sample Output:

(int)a

CastToLong

Supported Engines: Pig

Casts an expression to an long.

Example:

{
  "operation": "expression.CastToLong",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Argument is an expression. Unary operation requires 1 argument.

Sample Output:

(long)a

Chararray

Supported Engines: Pig

Creates an string literal.

Example:

{
  "operation": "expression.Chararray",
  "arguments": ['abc']
}

Argument is a literal value. Unary operation requires 1 argument.

Sample Output:

'abc'

Datetime

Supported Engines: Pig

Creates a datetime object from a string in ISO8601 format.

Example:

{
  "operation": "expression.Datetime",
  "arguments": ["2009-05-19 14:39:22"]
}

Argument is a literal value. Unary operation requires 1 argument.

Sample Output:

ToDate('2009-05-19 14:39:22')

Double

Supported Engines: Pig

Creates an double literal.

Example:

{
  "operation": "expression.Double",
  "arguments": [2.123]
}

Argument is a literal value. Unary operation requires 1 argument.

Sample Output:

2.123

Float

Supported Engines: Pig

Creates an float literal.

Example:

{
  "operation": "expression.Float",
  "arguments": [2.123]
}

Argument is a literal value. Unary operation requires 1 argument.

Sample Output:

2.123F

Integer

Supported Engines: Pig

Creates an integer literal.

Example:

{
  "operation": "expression.Integer",
  "arguments": [2]
}

Argument is a literal value. Unary operation requires 1 argument.

Sample Output:

2

Long

Supported Engines: Pig

Creates an long literal.

Example:

{
  "operation": "expression.Long",
  "arguments": [2]
}

Argument is a literal value. Unary operation requires 1 argument.

Sample Output:

2L

NullBool

Supported Engines: Pig

Creates a null of type boolean.

Example:

{
  "operation": "expression.NullBool",
  "arguments": []
}

Nullary operation requires 0 arguments.

Sample Output:

null

NullChararray

Supported Engines: Pig

Creates a null of type chararray.

Example:

{
  "operation": "expression.NullChararray",
  "arguments": []
}

Nullary operation requires 0 arguments.

Sample Output:

null

NullDatetime

Supported Engines: Pig

Creates a null of type datetime.

Example:

{
  "operation": "expression.NullDatetime",
  "arguments": []
}

Nullary operation requires 0 arguments.

Sample Output:

null

NullFloat

Supported Engines: Pig

Creates a null of type float.

Example:

{
  "operation": "expression.NullFloat",
  "arguments": []
}

Nullary operation requires 0 arguments.

Sample Output:

null

NullInteger

Supported Engines: Pig

Creates a null of type integer.

Example:

{
  "operation": "expression.NullInteger",
  "arguments": []
}

Nullary operation requires 0 arguments.

Sample Output:

null

NullLong

Supported Engines: Pig

Creates a null of type long.

Example:

{
  "operation": "expression.NullLong",
  "arguments": []
}

Nullary operation requires 0 arguments.

Sample Output:

null

DateTime

AddDuration

Supported Engines: Pig

Adds a specified duration provided in ISO 8601 format to the given datetime.

Example:

{
  "operation": "expression.AddDuration",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']},
    {"operation": "expression.Chararray", "arguments": ['P7Y0M0W0DT13H0M0S']}
  ]
}

Binary operation requires 2 arguments.

Sample Output:

AddDuration(a, 'P7Y0M0W0DT13H0M0S')

CurrentTime

Supported Engines: Pig

Returns a datetime object of current timestamp with millisecond accuracy.

Example:

{
  "operation": "expression.CurrentTime",
  "arguments": []
}

Nullary operation requires 0 arguments.

Sample Output:

CurrentTime()

DateToISOString

Supported Engines: Pig

Converts a date to an ISO8601 string.

Example:

{
  "operation": "expression.DateToString",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

ToString(a)

DateToString

Supported Engines: Pig

Converts a date to a string using the specified format.

Example:

{
  "operation": "expression.DateToString",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test1"]},
    {"operation": "expression.Chararray", "arguments": ["MM/dd/yyyy"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

ToString(a, 'MM/dd/yyyy')

DaysBetween

Supported Engines: Pig

Returns number of days between two datetime objects.

Example:

{
  "operation": "expression.DaysBetween",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Field", "arguments": ["b"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

DaysBetween(a, b)

GetDay

Supported Engines: Pig

Returns day of the month.

Example:

{
  "operation": "expression.GetDay",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 arguments.

Sample Output:

GetDay(a)

GetHour

Supported Engines: Pig

Returns hour of a day.

Example:

{
  "operation": "expression.GetHour",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 arguments.

Sample Output:

GetHour(a)

GetMilliSecond

Supported Engines: Pig

Returns millisecond of a second.

Example:

{
  "operation": "expression.GetMilliSecond",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 arguments.

Sample Output:

GetMilliSecond(a)

GetMinute

Supported Engines: Pig

Returns minute of an hour.

Example:

{
  "operation": "expression.GetMinute",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 arguments.

Sample Output:

GetMinute(a)

GetMonth

Supported Engines: Pig

Returns month of a year.

Example:

{
  "operation": "expression.GetMonth",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 arguments.

Sample Output:

GetMonth(a)

GetSecond

Supported Engines: Pig

Returns second of a minute.

Example:

{
  "operation": "expression.GetSecond",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 arguments.

Sample Output:

GetSecond(a)

GetWeek

Supported Engines: Pig

Returns week of a week year.

Example:

{
  "operation": "expression.GetWeek",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 arguments.

Sample Output:

GetWeek(a)

GetWeekYear

Supported Engines: Pig

Returns the week year.

Example:

{
  "operation": "expression.GetWeekYear",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 arguments.

Sample Output:

GetWeekYear(a)

GetYear

Supported Engines: Pig

Returns year.

Example:

{
  "operation": "expression.GetYear",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 arguments.

Sample Output:

GetYear(a)

HoursBetween

Supported Engines: Pig

Returns number of hours between two datetime objects.

Example:

{
  "operation": "expression.HoursBetween",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Field", "arguments": ["b"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

HoursBetween(a, b)

MilliSecondsBetween

Supported Engines: Pig

Returns number of milliseconds between two datetime objects.

Example:

{
  "operation": "expression.MilliSecondsBetween",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Field", "arguments": ["b"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

MilliSecondsBetween(a, b)

MinutesBetween

Supported Engines: Pig

Returns number of minutes between two datetime objects.

Example:

{
  "operation": "expression.MinutesBetween",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Field", "arguments": ["b"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

MinutesBetween(a, b)

MonthsBetween

Supported Engines: Pig

Returns number of months between two datetime objects.

Example:

{
  "operation": "expression.MonthsBetween",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Field", "arguments": ["b"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

MonthsBetween(a, b)

SecondsBetween

Supported Engines: Pig

Returns number of seconds between two datetime objects.

Example:

{
  "operation": "expression.SecondsBetween",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Field", "arguments": ["b"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

SecondsBetween(a, b)

SubtractDuration

Supported Engines: Pig

Subtracts a specified duration provided in ISO 8601 format from the given datetime.

Example:

{
  "operation": "expression.SubtractDuration",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']},
    {"operation": "expression.Chararray", "arguments": ['P7Y0M0W0DT13H0M0S']}
  ]
}

Binary operation requires 2 arguments.

Sample Output:

SubtractDuration(a, 'P7Y0M0W0DT13H0M0S')

ToDate

Supported Engines: Pig

Converts a chararray field to datetime.

Example:

{
  "operation": "expression.ToDate",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

ToDate(a)

ToDateFormat

Supported Engines: Pig

Converts a chararray field to datetime.

Example:

{
  "operation": "expression.ToDateFormat",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']},
    {"operation": "expression.Chararray", "arguments": ['MM/dd/yyyy']}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

ToDate(a, 'MM/dd/yyyy')

ToDateFormatTimezone

Supported Engines: Pig

Converts a chararray field to datetime with appropriate timezone adjustment .

Example:

{
  "operation": "expression.ToDateFormatTimezone",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Chararray", "arguments": ["YYYY-MM-DD HH-mm-ssZ"]},
    {"operation": "expression.Chararray", "arguments": ["Asia/Calcutta"]}
  ]
}

Arguments are expressions. Ternary operation requires 3 arguments.

Sample Output:

ToDate(a, 'YYYY-MM-DD HH-mm-ssZ', 'Asia/Calcutta')

ToDateMillis

Supported Engines: Pig

Converts a long field containing time in milliseconds to datetime.

Example:

{
  "operation": "expression.ToDate",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

ToDate(a)

WeeksBetween

Supported Engines: Pig

Returns number of weeks between two datetime objects.

Example:

{
  "operation": "expression.WeeksBetween",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Field", "arguments": ["b"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

WeeksBetween(a, b)

YearsBetween

Supported Engines: Pig

Returns number of years between two datetime objects.

Example:

{
  "operation": "expression.YearsBetween",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Field", "arguments": ["b"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

YearsBetween(a, b)

Eval

Addition

Supported Engines: Pig

Adds the result of two expressions together. Automatically casts type if possible.

Example:

{
  "operation": "expression.Addition",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [2]},
    {"operation": "expression.Field", "arguments": ["test"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

2 + test

Division

Supported Engines: Pig

Divides the result of the left expression by the right. Automatically casts type if possible.

Example:

{
  "operation": "expression.Division",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [2]},
    {"operation": "expression.Field", "arguments": ["test"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

2 / test

IfCase

Supported Engines: Pig

Provides an implementation of if case structure with provision for multiple else-if's and an else at the end.

Example:

{
  "operation": "expression.IfCase",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['A']},
    {"operation": "expression.Integer", "arguments": [1]},
    {"operation": "expression.Field", "arguments": ['B']},
    {"operation": "expression.Integer", "arguments": [2]},
    {"operation": "expression.Integer", "arguments": [3]},
  ]
}

Arguments are expressions. Requires minimum 2 arguments which includes the initial if condition and its value. Multiple else-if conditions can also be provided with an optional else condition at the end.

Sample Output:

(CASE WHEN A THEN 1 WHEN B THEN 2 ELSE 3 END)

Modulo

Supported Engines: Pig

Finds the remainder of the left divided by the right. Automatically casts type if possible.

Example:

{
  "operation": "expression.Modulo",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [2]},
    {"operation": "expression.Field", "arguments": ["test"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

2 % test

Multiplication

Supported Engines: Pig

Multiplies the result of two expressions. Automatically casts type if possible.

Example:

{
  "operation": "expression.Multiplication",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [2]},
    {"operation": "expression.Field", "arguments": ["test"]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

2 * test

Subtraction

Supported Engines: Pig

Subtracts the right expression result from the left. Automatically casts type if possible.

Example:

{
  "operation": "expression.Subtraction",
  "arguments": [
    {"operation": "expression.Integer", "arguments": [2]},
    {"operation": "expression.Field", "arguments": ["test"]}
  ]
}

Arguments:

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

2 - test

SwitchCase

Supported Engines: Pig

Depending on the result of the switch-case expression, the result is compared with the case conditions and return the value associated with the matching case condition. If none of the case conditions match, it will return the default value, if provided.

Example:

{
  "operation": "expression.SwitchCase",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['A']},
    {"operation": "expression.Integer", "arguments": [1]},
    {"operation": "expression.Integer", "arguments": [100]},
    {"operation": "expression.Integer", "arguments": [2]},
    {"operation": "expression.Integer", "arguments": [200]},
    {"operation": "expression.Integer", "arguments": [0]},
  ]
}

Arguments are expressions. Requires minimum 3 arguments which includes the switch-case expression, case condition and its value. Multiple case conditions and their values can be provided with an optional default value at the end.

Sample Output:

(CASE A WHEN 1 THEN 100 WHEN 2 THEN 200 ELSE 0 END)

Field Expressions

DereferencedField

Supported Engines: Pig

Dereferences a field in an input schema using the [dereference operator](http://pig.apache.org/docs/r0.15.0/basic.html#deref). In the example below a & b must be tuples or bags. The expression reach into the schemas for the specified fields and return the type of the last field. It also checks for field existence and for supported types along the way. Set dereferencing is not supported at this time.

Example:

{
  "operation": "expression.Field",
  "arguments": ['a.b.c']
}

Argument is a field path. Unary operation requires 1 argument.

Sample Output:

a.b.c

Field

Supported Engines: Pig

References a field in an input schema.

Example:

{
  "operation": "expression.Field",
  "arguments": ['f1']
}

Argument is a field name. Unary operation requires 1 argument.

Sample Output:

f1

IsEmpty

Supported Engines: Pig

Checks if a bag or map contains any items.

Example:

{
  "operation": "expression.IsEmpty",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

IsEmpty(a)

IsNotNull

Supported Engines: Pig

Checks if the nested expression result is not null.

Example:

{
  "operation": "expression.IsNotNull",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test"]}
  ]
}

Argument is an expression. Unary operation requires 1 argument.

Sample Output:

test IS NOT NULL

IsNull

Supported Engines: Pig

Checks if the nested expression result is null.

Example:

{
  "operation": "expression.IsNull",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test"]}
  ]
}

Argument is an expression. Unary operation requires 1 argument.

Sample Output:

test IS NULL

Nullify

Supported Engines: Pig

Returns a null value if the condition in the second argument is true, otherwise the value of the specified field.

Example:

{
  "operation": "expression.Nullify",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["fieldA"]},
    {"operation": "expression.Equality", "arguments": [
      {"operation": "expression.Field", "arguments": ["fieldA"]},
      {"operation": "expression.Chararray", "arguments": ["None"]}
    ]}
  ]
}

Arguments are expressions. Binary operation requires 2 arguments.

Sample Output:

(fieldA == 'None' ? null : fieldA)

Math

Absolute

Supported Engines: Pig

Computes the absolute value of an expression.

Example:

{
  "operation": "expression.Absolute",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

ABS(a)

Avg

Supported Engines: Pig

Computes the average of a single column bag of numeric values.

Example:

{
  "operation": "expression.Avg",
  "arguments": [
    {"operation": "expression.DereferencedField", "arguments": ['a.b']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

AVG(a.b)

Ceil

Supported Engines: Pig

Computes the value of an expression rounded to nearest integer while never decreasing the value.

Example:

{
  "operation": "expression.Ceil",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

CEIL(a)

Count

Supported Engines: Pig

Counts the number of items in a bag. Ignores null values.

Example:

{
  "operation": "expression.Count",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

COUNT(a)

CountStar

Supported Engines: Pig

Counts the number of items in a bag. Includes null values.

Example:

{
  "operation": "expression.CountStar",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

COUNT_STAR(a)

Exponential

Supported Engines: Pig

Computes and returns the value of Euler's number raised to the value of the expression.

Example:

{
  "operation": "expression.Exponential",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

EXP(a)

Floor

Supported Engines: Pig

Computes the value of an expression rounded to nearest integer while never increasing the value.

Example:

{
  "operation": "expression.Floor",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

FLOOR(a)

Log

Supported Engines: Pig

Computes the natural logarithm value(base e) of an expression.

Example:

{
  "operation": "expression.Log",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

LOG(a)

Log10

Supported Engines: Pig

Computes the logarithm value with respect to base 10 of an expression.

Example:

{
  "operation": "expression.Log10",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

LOG10(a)

Max

Supported Engines: Pig

Computes the average of a single column bag of numeric values.

Example:

{
  "operation": "expression.Max",
  "arguments": [
    {"operation": "expression.DereferencedField", "arguments": ['a.b']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

MAX(a.b)

Min

Supported Engines: Pig

Computes the average of a single column bag of numeric values.

Example:

{
  "operation": "expression.Min",
  "arguments": [
    {"operation": "expression.DereferencedField", "arguments": ['a.b']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

MIN(a.b)

OddsRatio

Supported Engines: Pig

Computes the odds ratio given 4 values. The arguments order indicates the matrix cell into which they are placed.

x axis is "feature", y axis is "target"

  0   1
---------
| 0 | 1 |  0
---------
| 2 | 3 |  1
---------

Example:

{
  "operation": "expression.OddsRatio",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['feature_bin_0__target_bin_0']},
    {"operation": "expression.Field", "arguments": ['feature_bin_1__target_bin_0']},
    {"operation": "expression.Field", "arguments": ['feature_bin_0__target_bin_1']},
    {"operation": "expression.Field", "arguments": ['feature_bin_1__target_bin_1']}
  ]
}

Arguments are expressions. Quaternary operation requires 4 arguments.

Sample Output:

((double)feature_bin_0__target_bin_0 / (double)feature_bin_0__target_bin_1) / ((double)feature_bin_1__target_bin_0 / (double)feature_bin_1__target_bin_1)

Pow

Supported Engines: Pig

Computes and returns the value of the first expression raised to the value of the second.

Example:

{
  "operation": "expression.Pow",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
    {"operation": "expression.Integer", "arguments": [2]}
  ]
}

Arguments are expressions. Binary operation requires 2 argument.

Sample Output:

org.apache.pig.piggybank.evaluation.math.POW(a, 2)

RoundTo

Supported Engines: Pig

Computes the value of an expression rounded to a fixed number of digits specified. An optional integer rounding mode can also be specified.

Example:

{
  "operation": "expression.RoundTo",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']},
    {"operation": "expression.Integer", "arguments": [2]},
    {"operation": "expression.Integer", "arguments": [4]}
  ]
}

Arguments are expressions. Binary operation requires minimum 2 arguments.

Sample Output:

ROUND_TO(a,2,4)

RoundToInteger

Supported Engines: Pig

Computes the value of an expression rounded to nearest integer or long.

Example:

{
  "operation": "expression.RoundToInteger",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

ROUND(a)

Size

Supported Engines: Pig

Computes the number of elements based on type.

Example:

{
  "operation": "expression.Size",
  "arguments": [
    {"operation": "expression.Field", "arguments": ['a']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

SIZE(a)

StreamingMedian

Supported Engines: Pig

Computes the streaming median of a single column bag of numeric values.

Example:

{
  "operation": "expression.StreamingMedian",
  "arguments": [
    {"operation": "expression.DereferencedField", "arguments": ['a.b']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

datafu.pig.util.StreamingMedian(a.b).quantile_0_5

Sum

Supported Engines: Pig

Computes the sum of a single column bag of numeric values.

Example:

{
  "operation": "expression.Sum",
  "arguments": [
    {"operation": "expression.DereferencedField", "arguments": ['a.b']}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

SUM(a.b)

String

BagToString

Supported Engines: Pig

Creates a single string from the elements of a bag, similar to SQL's GROUP_CONCAT function.

Example:

{
  "operation": "expression.BagToString",
  "arguments": [
    {"operation": "expression.DereferencedField", "arguments": ["a.b"]},
    {"operation": "expression.Chararray", "arguments": ["_"]}
  ]
}

First expression should be a DereferencedField and second should be a delimiter string.

Sample Output:

BagToString(a.b,'_')

Concat

Supported Engines: Pig

Concatenates two or more values together depending on type. Types of all arguments must match.

Example:

{
  "operation": "expression.Concat",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test1"]}
    {"operation": "expression.Chararray", "arguments": ["_"]}
    {"operation": "expression.Field", "arguments": ["test2"]}
  ]
}

Arguments are expressions. N-ary operation requires 2 or more arguments.

Sample Output:

CONCAT(test1,'_',test2)

LPad

Supported Engines: Pig

Uses a custom Aunsight UDF to pad a string on the left side with the specified character.

Example:

{
  "operation": "expression.LPad",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Integer", "arguments": [6]},
    {"operation": "expression.Chararray", "arguments": ["0"]}
  ]
}

Arguments are expressions. Ternary operation requires 3 arguments.

  • Chararray expression
  • Number of characters to pad up to
  • Character to pad with

Sample Output:

com.aunalytics.pig.string.PadLeft(a, 6, '0')

RegexExtract

Supported Engines: Pig

Performs regular expression matching and extracts the matched group defined by an index parameter.

Example:

{
  "operation": "expression.RegexExtract",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Chararray", "arguments": ["test"]},
    {"operation": "expression.Integer", "arguments": [0]}
  ]
}

Ternary operation requires 3 arguments.

  • expression.Field
  • regex
  • index to return

Sample Output:

REGEX_EXTRACT(a, 'test', 0)

Replace

Supported Engines: Pig

Replaces any occurrences in the first argument of the second RegEx string with the third string.

Example:

{
  "operation": "expression.Replace",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Chararray", "arguments": ["b"]},
    {"operation": "expression.Chararray", "arguments": ["c"]}
  ]
}

Arguments are expressions. Ternary operation requires 3 arguments.

Sample Output:

REPLACE(a, 'b', 'c')

StrSplitIdx

Supported Engines: Pig

Splits a chararray field using the given regex and return the specified part.

Example:

{
  "operation": "expression.StrSplitIdx",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    "-",
    2
  ]
}

Ternary operation requires 3 arguments.

  • expression.Field
  • regex string to split on
  • index to return

Sample Output:

STRSPLIT(a, '-', 4).$2

SubString

Supported Engines: Pig

Returns a substring from a given string.

Example:

{
  "operation": "expression.SubString",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["a"]},
    {"operation": "expression.Integer", "arguments": [0]},
    {"operation": "expression.Integer", "arguments": [1]}
  ]
}

Ternary operation requires 3 arguments.

  • expression.Field
  • start index
  • stop index

Sample Output:

SUBSTRING(a, 0, 1)

ToLowerCase

Supported Engines: Pig

Coverts the chararray into all lowercase.

Example:

{
  "operation": "expression.ToLowerCase",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test1"]}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

LOWER(test1)

ToLowerCaseFirst

Supported Engines: Pig

converts only the first character of the chararray to lower case.

Example:

{
  "operation": "expression.ToLowerCaseFirst",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test1"]}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

LCFIRST(test1)

ToUpperCase

Supported Engines: Pig

Coverts the chararray into all uppercase.

Example:

{
  "operation": "expression.ToUpperCase",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test1"]}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

UPPER(test1)

ToUpperCaseFirst

Supported Engines: Pig

converts only the first character of the chararray to upper case.

Example:

{
  "operation": "expression.ToUpperCaseFirst",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test1"]}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

UCFIRST(test1)

Trim

Supported Engines: Pig

Removes leading & trailing white space from a chararray.

Example:

{
  "operation": "expression.Trim",
  "arguments": [
    {"operation": "expression.Field", "arguments": ["test1"]}
  ]
}

Arguments are expressions. Unary operation requires 1 argument.

Sample Output:

TRIM(test1)

Other

Hash

Supported Engines: Pig

Generates the Hash (Murmur3) of the given string in the expression.

ToProperCase

Converts the charrarray to proper case (e.g. from "proper case string" to "Proper Case String").

NullDouble

Creates a null of the double type.

RoundToCents

Rounds a float or double to two decimal places.

Sqrt

Returns the square root.

HaversinDistInMiles

Supported Engines: Pig

Adds a column with the Haversine Distance in miles between two lat/long pairs in the expression.

StandardDeviation

Supported Engines: Pig

Adds a column with the standard deviation of values in the expression.

Variance

Supported Engines: Pig

Adds a column with the variance of values in the expression.

DateTruncDay

Truncates the date to the day.

DateTruncWeek

Truncates the date to the week.

DateTruncMonth

Truncates the date to the month.

DateTruncQuarter

Truncates the date to the quarter.

DateTruncYear

Truncates the date to the year.