Expressions¶
Bag-Tuple¶
Coalesce¶
Supported Engines: Pig
Similar to the Coalesce function in SQL, uses the DataFu library to return the first non-null value from the arguments passed in. All arguments must be of the same type.
Example:
{
"operation": "expression.Coalesce",
"arguments": [
{"operation": "expression.Field", "arguments": ["A"]},
{"operation": "expression.Field", "arguments": ["B"]},
{"operation": "expression.Chararray", "arguments": ["not found"]}
]
}
Arguments are expressions. N-ary operation accepts unlimited arguments.
Sample Output:
datafu.pig.util.Coalesce(A, B, \'not found\')
Boolean¶
And¶
Supported Engines: Pig
Performs a logical AND on the result of nested expressions, which must return a boolean.
Example:
{
"operation": "expression.And",
"arguments": [
{"operation": "expression.Field", "arguments": ['A']},
{"operation": "expression.Field", "arguments": ['B']}
]
}
Arguments are expressions. Binary operation requires 2 argumentss.
Sample Output:
(A) AND (B)
Equality¶
Supported Engines: Pig
Checks equality of two nested expressions. Automatically casts type if necessary/possible.
Example:
{
"operation": "expression.Equality",
"arguments": [
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [2]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
1 == 2
GreaterEqualThan¶
Supported Engines: Pig
Checks if left expression is greater than or equal to the right one. Automatically casts type if necessary/possible.
Example:
{
"operation": "expression.GreaterEqualThan",
"arguments": [
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [2]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
1 >= 2
GreaterThan¶
Supported Engines: Pig
Checks if left expression is greater than the right one. Automatically casts type if necessary/possible.
Example:
{
"operation": "expression.GreaterThan",
"arguments": [
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [2]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
1 > 2
In¶
Supported Engines: Pig
Returns a boolean result depending on whether the value of the field is present in the specified multiple values.
Example:
{
"operation": "expression.In",
"arguments": [
{"operation": "expression.Field", "arguments": ['A']},
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Integer", "arguments": [3]}
]
}
Arguments are expressions. Operation requires minimum 2 arguments.
Sample Output:
A IN (1, 2, 3)
Inequality¶
Supported Engines: Pig
Checks inequality of two nested expressions. Automatically casts type if necessary/possible.
Example:
{
"operation": "expression.Inequality",
"arguments": [
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [2]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
1 != 2
LessEqualThan¶
Supported Engines: Pig
Checks if left expression is less than or equal to the right one. Automatically casts type if necessary/possible.
Example:
{
"operation": "expression.LessEqualThan",
"arguments": [
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [2]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
1 <= 2
LessThan¶
Supported Engines: Pig
Checks if left expression is less than the right one. Automatically casts type if necessary/possible.
Example:
{
"operation": "expression.LessThan",
"arguments": [
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [2]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
1 < 2
Matches¶
Supported Engines: Pig
Checks if left expression matches the right regex. Right argument must be a string literal in [Java format](http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html).
Example:
{
"operation": "expression.Matches",
"arguments": [
{"operation": "expression.Field", "arguments": ["test"]},
{"operation": "expression.Integer", "arguments": ["(a|b|c)"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
test MATCHES '(a|b|c)'
Not¶
Supported Engines: Pig
Performs a logical NOT on the result of a nested expression, which must return a boolean.
Example:
{
"operation": "expression.Not",
"arguments": [
{"operation": "expression.Field", "arguments": ['A']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
NOT (A)
Or¶
Supported Engines: Pig
Performs a logical OR on the result of nested expressions, which must return a boolean.
Example:
{
"operation": "expression.Or",
"arguments": [
{"operation": "expression.Field", "arguments": ['A']},
{"operation": "expression.Field", "arguments": ['B']}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
(A) OR (B)
Case¶
Bincond¶
Supported Engines: Pig
Depending on the boolean result of the first argument, chooses the second if true or third if false. First argument must return boolean, others may be any type.
Example:
{
"operation": "expression.Bincond",
"arguments": [
{"operation": "expression.Field", "arguments": ['A']},
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [2]}
]
}
Arguments are expressions. Ternary operation requires 3 arguments.
Sample Output:
(A ? 1 : 2)
Digitize¶
Supported Engines: Pig
Returns the indices of the bins to which each value in input relation field belongs.
Example:
{
"operation": "expression.Digitize",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']},
{"operation": "expression.Field", "arguments": ['bin_0']},
{"operation": "expression.Field", "arguments": ['bin_1']},
{"operation": "expression.Field", "arguments": ['bin_2']},
{"operation": "expression.Field", "arguments": ['bin_3']}
]
}
Arguments are expressions. N-ary operation requires multiple arguments. The first argument is value to digitize.
Sample Output:
(CASE WHEN a < bin_0 THEN 0 WHEN a < bin_1 THEN 1 WHEN a < bin_2 THEN 2 WHEN a < bin_3 THEN 3 WHEN a >= bin_3 THEN 4 END)
DataType¶
Bool¶
Supported Engines: Pig
Creates an boolean literal.
Example:
{
"operation": "expression.Bool",
"arguments": [true]
}
Argument is a literal value. Unary operation requires 1 argument.
Sample Output:
true
CastToBigDecimal¶
Supported Engines: Pig
Converts double values to bigdecimal type to avoid scientific notation.
Example:
{
"operation": "expression.CastToBigdecimal",
"arguments": [
{"operation": "expression.Field", "arguments": [123.0]}
]
}
Argument is an expression. Unary operation requires 1 argument.
Sample Output:
(bigdecimal)com.aunalytics.pig.string.PlainString((chararray)123.0)
CastToChararray¶
Supported Engines: Pig
Casts an expression to a chararray.
Example:
{
"operation": "expression.CastToChararray",
"arguments": [
{"operation": "expression.Integer", "arguments": [123]}
]
}
Argument is an expression. Unary operation requires 1 argument.
Sample Output:
(chararray)123
CastToDouble¶
Supported Engines: Pig
Casts an expression to an double.
Example:
{
"operation": "expression.CastToDouble",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Argument is an expression. Unary operation requires 1 argument.
Sample Output:
(double)a
CastToFloat¶
Supported Engines: Pig
Casts an expression to an float.
Example:
{
"operation": "expression.CastToFloat",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Argument is an expression. Unary operation requires 1 argument.
Sample Output:
(float)a
CastToInteger¶
Supported Engines: Pig
Casts an expression to an integer.
Example:
{
"operation": "expression.CastToInteger",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Argument is an expression. Unary operation requires 1 argument.
Sample Output:
(int)a
CastToLong¶
Supported Engines: Pig
Casts an expression to an long.
Example:
{
"operation": "expression.CastToLong",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Argument is an expression. Unary operation requires 1 argument.
Sample Output:
(long)a
Chararray¶
Supported Engines: Pig
Creates an string literal.
Example:
{
"operation": "expression.Chararray",
"arguments": ['abc']
}
Argument is a literal value. Unary operation requires 1 argument.
Sample Output:
'abc'
Datetime¶
Supported Engines: Pig
Creates a datetime object from a string in ISO8601 format.
Example:
{
"operation": "expression.Datetime",
"arguments": ["2009-05-19 14:39:22"]
}
Argument is a literal value. Unary operation requires 1 argument.
Sample Output:
ToDate('2009-05-19 14:39:22')
Double¶
Supported Engines: Pig
Creates an double literal.
Example:
{
"operation": "expression.Double",
"arguments": [2.123]
}
Argument is a literal value. Unary operation requires 1 argument.
Sample Output:
2.123
Float¶
Supported Engines: Pig
Creates an float literal.
Example:
{
"operation": "expression.Float",
"arguments": [2.123]
}
Argument is a literal value. Unary operation requires 1 argument.
Sample Output:
2.123F
Integer¶
Supported Engines: Pig
Creates an integer literal.
Example:
{
"operation": "expression.Integer",
"arguments": [2]
}
Argument is a literal value. Unary operation requires 1 argument.
Sample Output:
2
Long¶
Supported Engines: Pig
Creates an long literal.
Example:
{
"operation": "expression.Long",
"arguments": [2]
}
Argument is a literal value. Unary operation requires 1 argument.
Sample Output:
2L
NullBool¶
Supported Engines: Pig
Creates a null of type boolean.
Example:
{
"operation": "expression.NullBool",
"arguments": []
}
Nullary operation requires 0 arguments.
Sample Output:
null
NullChararray¶
Supported Engines: Pig
Creates a null of type chararray.
Example:
{
"operation": "expression.NullChararray",
"arguments": []
}
Nullary operation requires 0 arguments.
Sample Output:
null
NullDatetime¶
Supported Engines: Pig
Creates a null of type datetime.
Example:
{
"operation": "expression.NullDatetime",
"arguments": []
}
Nullary operation requires 0 arguments.
Sample Output:
null
NullFloat¶
Supported Engines: Pig
Creates a null of type float.
Example:
{
"operation": "expression.NullFloat",
"arguments": []
}
Nullary operation requires 0 arguments.
Sample Output:
null
NullInteger¶
Supported Engines: Pig
Creates a null of type integer.
Example:
{
"operation": "expression.NullInteger",
"arguments": []
}
Nullary operation requires 0 arguments.
Sample Output:
null
NullLong¶
Supported Engines: Pig
Creates a null of type long.
Example:
{
"operation": "expression.NullLong",
"arguments": []
}
Nullary operation requires 0 arguments.
Sample Output:
null
DateTime¶
AddDuration¶
Supported Engines: Pig
Adds a specified duration provided in ISO 8601 format to the given datetime.
Example:
{
"operation": "expression.AddDuration",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']},
{"operation": "expression.Chararray", "arguments": ['P7Y0M0W0DT13H0M0S']}
]
}
Binary operation requires 2 arguments.
Sample Output:
AddDuration(a, 'P7Y0M0W0DT13H0M0S')
CurrentTime¶
Supported Engines: Pig
Returns a datetime object of current timestamp with millisecond accuracy.
Example:
{
"operation": "expression.CurrentTime",
"arguments": []
}
Nullary operation requires 0 arguments.
Sample Output:
CurrentTime()
DateToISOString¶
Supported Engines: Pig
Converts a date to an ISO8601 string.
Example:
{
"operation": "expression.DateToString",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
ToString(a)
DateToString¶
Supported Engines: Pig
Converts a date to a string using the specified format.
Example:
{
"operation": "expression.DateToString",
"arguments": [
{"operation": "expression.Field", "arguments": ["test1"]},
{"operation": "expression.Chararray", "arguments": ["MM/dd/yyyy"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
ToString(a, 'MM/dd/yyyy')
DaysBetween¶
Supported Engines: Pig
Returns number of days between two datetime objects.
Example:
{
"operation": "expression.DaysBetween",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Field", "arguments": ["b"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
DaysBetween(a, b)
GetDay¶
Supported Engines: Pig
Returns day of the month.
Example:
{
"operation": "expression.GetDay",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 arguments.
Sample Output:
GetDay(a)
GetHour¶
Supported Engines: Pig
Returns hour of a day.
Example:
{
"operation": "expression.GetHour",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 arguments.
Sample Output:
GetHour(a)
GetMilliSecond¶
Supported Engines: Pig
Returns millisecond of a second.
Example:
{
"operation": "expression.GetMilliSecond",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 arguments.
Sample Output:
GetMilliSecond(a)
GetMinute¶
Supported Engines: Pig
Returns minute of an hour.
Example:
{
"operation": "expression.GetMinute",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 arguments.
Sample Output:
GetMinute(a)
GetMonth¶
Supported Engines: Pig
Returns month of a year.
Example:
{
"operation": "expression.GetMonth",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 arguments.
Sample Output:
GetMonth(a)
GetSecond¶
Supported Engines: Pig
Returns second of a minute.
Example:
{
"operation": "expression.GetSecond",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 arguments.
Sample Output:
GetSecond(a)
GetWeek¶
Supported Engines: Pig
Returns week of a week year.
Example:
{
"operation": "expression.GetWeek",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 arguments.
Sample Output:
GetWeek(a)
GetWeekYear¶
Supported Engines: Pig
Returns the week year.
Example:
{
"operation": "expression.GetWeekYear",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 arguments.
Sample Output:
GetWeekYear(a)
GetYear¶
Supported Engines: Pig
Returns year.
Example:
{
"operation": "expression.GetYear",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 arguments.
Sample Output:
GetYear(a)
HoursBetween¶
Supported Engines: Pig
Returns number of hours between two datetime objects.
Example:
{
"operation": "expression.HoursBetween",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Field", "arguments": ["b"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
HoursBetween(a, b)
MilliSecondsBetween¶
Supported Engines: Pig
Returns number of milliseconds between two datetime objects.
Example:
{
"operation": "expression.MilliSecondsBetween",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Field", "arguments": ["b"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
MilliSecondsBetween(a, b)
MinutesBetween¶
Supported Engines: Pig
Returns number of minutes between two datetime objects.
Example:
{
"operation": "expression.MinutesBetween",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Field", "arguments": ["b"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
MinutesBetween(a, b)
MonthsBetween¶
Supported Engines: Pig
Returns number of months between two datetime objects.
Example:
{
"operation": "expression.MonthsBetween",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Field", "arguments": ["b"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
MonthsBetween(a, b)
SecondsBetween¶
Supported Engines: Pig
Returns number of seconds between two datetime objects.
Example:
{
"operation": "expression.SecondsBetween",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Field", "arguments": ["b"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
SecondsBetween(a, b)
SubtractDuration¶
Supported Engines: Pig
Subtracts a specified duration provided in ISO 8601 format from the given datetime.
Example:
{
"operation": "expression.SubtractDuration",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']},
{"operation": "expression.Chararray", "arguments": ['P7Y0M0W0DT13H0M0S']}
]
}
Binary operation requires 2 arguments.
Sample Output:
SubtractDuration(a, 'P7Y0M0W0DT13H0M0S')
ToDate¶
Supported Engines: Pig
Converts a chararray field to datetime.
Example:
{
"operation": "expression.ToDate",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
ToDate(a)
ToDateFormat¶
Supported Engines: Pig
Converts a chararray field to datetime.
Example:
{
"operation": "expression.ToDateFormat",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']},
{"operation": "expression.Chararray", "arguments": ['MM/dd/yyyy']}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
ToDate(a, 'MM/dd/yyyy')
ToDateFormatTimezone¶
Supported Engines: Pig
Converts a chararray field to datetime with appropriate timezone adjustment .
Example:
{
"operation": "expression.ToDateFormatTimezone",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Chararray", "arguments": ["YYYY-MM-DD HH-mm-ssZ"]},
{"operation": "expression.Chararray", "arguments": ["Asia/Calcutta"]}
]
}
Arguments are expressions. Ternary operation requires 3 arguments.
Sample Output:
ToDate(a, 'YYYY-MM-DD HH-mm-ssZ', 'Asia/Calcutta')
ToDateFormatTimeZoneCorrected¶
Supported Engines: Pig
Converts datetime strings to objects with correctly adjusted DST.
Example:
{
"operation": "expression.ToDateFormatTimezoneCorrected",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Chararray", "arguments": [yyyy-MM-dd HH:mm:ss.SSSX]},
{"operation": "expression.Chararray", "arguments": [UTC]}
]
}
Sample Output:
(a is not null AND a != '' ? com.aunalytics.pig.string.ToDate(a, 'yyyy-MM-dd HH:mm:ss.SSSX', 'UTC', null) : null)
ToDateMillis¶
Supported Engines: Pig
Converts a long field containing time in milliseconds to datetime.
Example:
{
"operation": "expression.ToDate",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
ToDate(a)
WeeksBetween¶
Supported Engines: Pig
Returns number of weeks between two datetime objects.
Example:
{
"operation": "expression.WeeksBetween",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Field", "arguments": ["b"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
WeeksBetween(a, b)
YearsBetween¶
Supported Engines: Pig
Returns number of years between two datetime objects.
Example:
{
"operation": "expression.YearsBetween",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Field", "arguments": ["b"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
YearsBetween(a, b)
Eval¶
Addition¶
Supported Engines: Pig
Adds the result of two expressions together. Automatically casts type if possible.
Example:
{
"operation": "expression.Addition",
"arguments": [
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Field", "arguments": ["test"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
2 + test
Division¶
Supported Engines: Pig
Divides the result of the left expression by the right. Automatically casts type if possible.
Example:
{
"operation": "expression.Division",
"arguments": [
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Field", "arguments": ["test"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
2 / test
IfCase¶
Supported Engines: Pig
Provides an implementation of if case structure with provision for multiple else-if's and an else at the end.
Example:
{
"operation": "expression.IfCase",
"arguments": [
{"operation": "expression.Field", "arguments": ['A']},
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Field", "arguments": ['B']},
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Integer", "arguments": [3]},
]
}
Arguments are expressions. Requires minimum 2 arguments which includes the initial if condition and its value. Multiple else-if conditions can also be provided with an optional else condition at the end.
Sample Output:
(CASE WHEN A THEN 1 WHEN B THEN 2 ELSE 3 END)
IfCaseNullElse¶
Supported Engines: Pig
Assigns NULL Cases to ELSE when using IF/THEN/ELSE statements.
Example:
{
"operation": "expression.IfCaseNullElse",
"arguments": [
{"operation": "expression.Equality", "arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Integer", "arguments": [100]}]},
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [0]}
}
Sample Output:
((CASE WHEN a == 100 THEN 1 ELSE 0 END) IS NULL ? 0 : (CASE WHEN a == 100 THEN 1 ELSE 0 END))
Modulo¶
Supported Engines: Pig
Finds the remainder of the left divided by the right. Automatically casts type if possible.
Example:
{
"operation": "expression.Modulo",
"arguments": [
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Field", "arguments": ["test"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
2 % test
Multiplication¶
Supported Engines: Pig
Multiplies the result of two expressions. Automatically casts type if possible.
Example:
{
"operation": "expression.Multiplication",
"arguments": [
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Field", "arguments": ["test"]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
2 * test
Subtraction¶
Supported Engines: Pig
Subtracts the right expression result from the left. Automatically casts type if possible.
Example:
{
"operation": "expression.Subtraction",
"arguments": [
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Field", "arguments": ["test"]}
]
}
Arguments:
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
2 - test
SwitchCase¶
Supported Engines: Pig
Depending on the result of the switch-case expression, the result is compared with the case conditions and return the value associated with the matching case condition. If none of the case conditions match, it will return the default value, if provided.
Example:
{
"operation": "expression.SwitchCase",
"arguments": [
{"operation": "expression.Field", "arguments": ['A']},
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [100]},
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Integer", "arguments": [200]},
{"operation": "expression.Integer", "arguments": [0]},
]
}
Arguments are expressions. Requires minimum 3 arguments which includes the switch-case expression, case condition and its value. Multiple case conditions and their values can be provided with an optional default value at the end.
Sample Output:
(CASE A WHEN 1 THEN 100 WHEN 2 THEN 200 ELSE 0 END)
SwitchCaseNullElse¶
Supported Engines: Pig
Assigns NULL Cases to ELSE when using switch statements.
Example:
{
"operation": "expression.SwitchCaseNullElse",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']},
{"operation": "expression.Integer", "arguments": [100]},
{"operation": "expression.Integer", "arguments": [1]},
{"operation": "expression.Integer", "arguments": [101]},
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Integer", "arguments": [3]},
]
}
Sample Output:
((CASE a WHEN 100 THEN 1 WHEN 101 THEN 2 ELSE 3 END) IS NULL ? 3 : (CASE a WHEN 100 THEN 1 WHEN 101 THEN 2 ELSE 3 END))
Field Expressions¶
DereferencedField¶
Supported Engines: Pig
Dereferences a field in an input schema using the [dereference operator](http://pig.apache.org/docs/r0.15.0/basic.html#deref). In the example below a & b must be tuples or bags. The expression reach into the schemas for the specified fields and return the type of the last field. It also checks for field existence and for supported types along the way. Set dereferencing is not supported at this time.
Example:
{
"operation": "expression.Field",
"arguments": ['a.b.c']
}
Argument is a field path. Unary operation requires 1 argument.
Sample Output:
a.b.c
Field¶
Supported Engines: Pig
References a field in an input schema.
Example:
{
"operation": "expression.Field",
"arguments": ['f1']
}
Argument is a field name. Unary operation requires 1 argument.
Sample Output:
f1
IsEmpty¶
Supported Engines: Pig
Checks if a bag or map contains any items.
Example:
{
"operation": "expression.IsEmpty",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
IsEmpty(a)
IsNotNull¶
Supported Engines: Pig
Checks if the nested expression result is not null.
Example:
{
"operation": "expression.IsNotNull",
"arguments": [
{"operation": "expression.Field", "arguments": ["test"]}
]
}
Argument is an expression. Unary operation requires 1 argument.
Sample Output:
test IS NOT NULL
IsNull¶
Supported Engines: Pig
Checks if the nested expression result is null.
Example:
{
"operation": "expression.IsNull",
"arguments": [
{"operation": "expression.Field", "arguments": ["test"]}
]
}
Argument is an expression. Unary operation requires 1 argument.
Sample Output:
test IS NULL
Nullify¶
Supported Engines: Pig
Returns a null value if the condition in the second argument is true, otherwise the value of the specified field.
Example:
{
"operation": "expression.Nullify",
"arguments": [
{"operation": "expression.Field", "arguments": ["fieldA"]},
{"operation": "expression.Equality", "arguments": [
{"operation": "expression.Field", "arguments": ["fieldA"]},
{"operation": "expression.Chararray", "arguments": ["None"]}
]}
]
}
Arguments are expressions. Binary operation requires 2 arguments.
Sample Output:
(fieldA == 'None' ? null : fieldA)
Math¶
Absolute¶
Supported Engines: Pig
Computes the absolute value of an expression.
Example:
{
"operation": "expression.Absolute",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
ABS(a)
Avg¶
Supported Engines: Pig
Computes the average of a single column bag of numeric values.
Example:
{
"operation": "expression.Avg",
"arguments": [
{"operation": "expression.DereferencedField", "arguments": ['a.b']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
AVG(a.b)
Ceil¶
Supported Engines: Pig
Computes the value of an expression rounded to nearest integer while never decreasing the value.
Example:
{
"operation": "expression.Ceil",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
CEIL(a)
Count¶
Supported Engines: Pig
Counts the number of items in a bag. Ignores null values.
Example:
{
"operation": "expression.Count",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
COUNT(a)
CountStar¶
Supported Engines: Pig
Counts the number of items in a bag. Includes null values.
Example:
{
"operation": "expression.CountStar",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
COUNT_STAR(a)
Exponential¶
Supported Engines: Pig
Computes and returns the value of Euler's number raised to the value of the expression.
Example:
{
"operation": "expression.Exponential",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
EXP(a)
Floor¶
Supported Engines: Pig
Computes the value of an expression rounded to nearest integer while never increasing the value.
Example:
{
"operation": "expression.Floor",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
FLOOR(a)
Log¶
Supported Engines: Pig
Computes the natural logarithm value(base e) of an expression.
Example:
{
"operation": "expression.Log",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
LOG(a)
Log10¶
Supported Engines: Pig
Computes the logarithm value with respect to base 10 of an expression.
Example:
{
"operation": "expression.Log10",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
LOG10(a)
Max¶
Supported Engines: Pig
Computes the average of a single column bag of numeric values.
Example:
{
"operation": "expression.Max",
"arguments": [
{"operation": "expression.DereferencedField", "arguments": ['a.b']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
MAX(a.b)
Min¶
Supported Engines: Pig
Computes the average of a single column bag of numeric values.
Example:
{
"operation": "expression.Min",
"arguments": [
{"operation": "expression.DereferencedField", "arguments": ['a.b']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
MIN(a.b)
OddsRatio¶
Supported Engines: Pig
Computes the odds ratio given 4 values. The arguments order indicates the matrix cell into which they are placed.
x axis is "feature", y axis is "target"
0 1
---------
| 0 | 1 | 0
---------
| 2 | 3 | 1
---------
Example:
{
"operation": "expression.OddsRatio",
"arguments": [
{"operation": "expression.Field", "arguments": ['feature_bin_0__target_bin_0']},
{"operation": "expression.Field", "arguments": ['feature_bin_1__target_bin_0']},
{"operation": "expression.Field", "arguments": ['feature_bin_0__target_bin_1']},
{"operation": "expression.Field", "arguments": ['feature_bin_1__target_bin_1']}
]
}
Arguments are expressions. Quaternary operation requires 4 arguments.
Sample Output:
((double)feature_bin_0__target_bin_0 / (double)feature_bin_0__target_bin_1) / ((double)feature_bin_1__target_bin_0 / (double)feature_bin_1__target_bin_1)
Pow¶
Supported Engines: Pig
Computes and returns the value of the first expression raised to the value of the second.
Example:
{
"operation": "expression.Pow",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
{"operation": "expression.Integer", "arguments": [2]}
]
}
Arguments are expressions. Binary operation requires 2 argument.
Sample Output:
org.apache.pig.piggybank.evaluation.math.POW(a, 2)
RoundTo¶
Supported Engines: Pig
Computes the value of an expression rounded to a fixed number of digits specified. An optional integer rounding mode can also be specified.
Example:
{
"operation": "expression.RoundTo",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']},
{"operation": "expression.Integer", "arguments": [2]},
{"operation": "expression.Integer", "arguments": [4]}
]
}
Arguments are expressions. Binary operation requires minimum 2 arguments.
Sample Output:
ROUND_TO(a,2,4)
RoundToInteger¶
Supported Engines: Pig
Computes the value of an expression rounded to nearest integer or long.
Example:
{
"operation": "expression.RoundToInteger",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
ROUND(a)
Size¶
Supported Engines: Pig
Computes the number of elements based on type.
Example:
{
"operation": "expression.Size",
"arguments": [
{"operation": "expression.Field", "arguments": ['a']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
SIZE(a)
StreamingMedian¶
Supported Engines: Pig
Computes the streaming median of a single column bag of numeric values.
Example:
{
"operation": "expression.StreamingMedian",
"arguments": [
{"operation": "expression.DereferencedField", "arguments": ['a.b']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
datafu.pig.util.StreamingMedian(a.b).quantile_0_5
Sum¶
Supported Engines: Pig
Computes the sum of a single column bag of numeric values.
Example:
{
"operation": "expression.Sum",
"arguments": [
{"operation": "expression.DereferencedField", "arguments": ['a.b']}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
SUM(a.b)
String¶
BagToString¶
Supported Engines: Pig
Creates a single string from the elements of a bag, similar to SQL's GROUP_CONCAT function.
Example:
{
"operation": "expression.BagToString",
"arguments": [
{"operation": "expression.DereferencedField", "arguments": ["a.b"]},
{"operation": "expression.Chararray", "arguments": ["_"]}
]
}
First expression should be a DereferencedField and second should be a delimiter string.
Sample Output:
BagToString(a.b,'_')
Concat¶
Supported Engines: Pig
Concatenates two or more values together depending on type. Types of all arguments must match.
Example:
{
"operation": "expression.Concat",
"arguments": [
{"operation": "expression.Field", "arguments": ["test1"]}
{"operation": "expression.Chararray", "arguments": ["_"]}
{"operation": "expression.Field", "arguments": ["test2"]}
]
}
Arguments are expressions. N-ary operation requires 2 or more arguments.
Sample Output:
CONCAT(test1,'_',test2)
LPad¶
Supported Engines: Pig
Uses a custom Aunsight UDF to pad a string on the left side with the specified character.
Example:
{
"operation": "expression.LPad",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Integer", "arguments": [6]},
{"operation": "expression.Chararray", "arguments": ["0"]}
]
}
Arguments are expressions. Ternary operation requires 3 arguments.
- Chararray expression
- Number of characters to pad up to
- Character to pad with
Sample Output:
com.aunalytics.pig.string.PadLeft(a, 6, '0')
RegexExtract¶
Supported Engines: Pig
Performs regular expression matching and extracts the matched group defined by an index parameter.
Example:
{
"operation": "expression.RegexExtract",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Chararray", "arguments": ["test"]},
{"operation": "expression.Integer", "arguments": [0]}
]
}
Ternary operation requires 3 arguments.
- expression.Field
- regex
- index to return
Sample Output:
REGEX_EXTRACT(a, 'test', 0)
Replace¶
Supported Engines: Pig
Replaces any occurrences in the first argument of the second RegEx string with the third string.
Example:
{
"operation": "expression.Replace",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Chararray", "arguments": ["b"]},
{"operation": "expression.Chararray", "arguments": ["c"]}
]
}
Arguments are expressions. Ternary operation requires 3 arguments.
Sample Output:
REPLACE(a, 'b', 'c')
StrSplitIdx¶
Supported Engines: Pig
Splits a chararray field using the given regex and return the specified part.
Example:
{
"operation": "expression.StrSplitIdx",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
"-",
2
]
}
Ternary operation requires 3 arguments.
- expression.Field
- regex string to split on
- index to return
Sample Output:
STRSPLIT(a, '-', 4).$2
SubString¶
Supported Engines: Pig
Returns a substring from a given string.
Example:
{
"operation": "expression.SubString",
"arguments": [
{"operation": "expression.Field", "arguments": ["a"]},
{"operation": "expression.Integer", "arguments": [0]},
{"operation": "expression.Integer", "arguments": [1]}
]
}
Ternary operation requires 3 arguments.
- expression.Field
- start index
- stop index
Sample Output:
SUBSTRING(a, 0, 1)
ToLowerCase¶
Supported Engines: Pig
Coverts the chararray into all lowercase.
Example:
{
"operation": "expression.ToLowerCase",
"arguments": [
{"operation": "expression.Field", "arguments": ["test1"]}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
LOWER(test1)
ToLowerCaseFirst¶
Supported Engines: Pig
converts only the first character of the chararray to lower case.
Example:
{
"operation": "expression.ToLowerCaseFirst",
"arguments": [
{"operation": "expression.Field", "arguments": ["test1"]}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
LCFIRST(test1)
ToUpperCase¶
Supported Engines: Pig
Coverts the chararray into all uppercase.
Example:
{
"operation": "expression.ToUpperCase",
"arguments": [
{"operation": "expression.Field", "arguments": ["test1"]}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
UPPER(test1)
ToUpperCaseFirst¶
Supported Engines: Pig
converts only the first character of the chararray to upper case.
Example:
{
"operation": "expression.ToUpperCaseFirst",
"arguments": [
{"operation": "expression.Field", "arguments": ["test1"]}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
UCFIRST(test1)
Trim¶
Supported Engines: Pig
Removes leading & trailing white space from a chararray.
Example:
{
"operation": "expression.Trim",
"arguments": [
{"operation": "expression.Field", "arguments": ["test1"]}
]
}
Arguments are expressions. Unary operation requires 1 argument.
Sample Output:
TRIM(test1)
Other¶
Hash¶
Supported Engines: Pig
Generates the Hash (Murmur3) of the given string in the expression.
ToProperCase¶
Converts the charrarray to proper case (e.g. from "proper case string" to "Proper Case String").
NullDouble¶
Creates a null of the double type.
RoundToCents¶
Rounds a float or double to two decimal places.
Sqrt¶
Returns the square root.
HaversinDistInMiles¶
Supported Engines: Pig
Adds a column with the Haversine Distance in miles between two lat/long pairs in the expression.
StandardDeviation¶
Supported Engines: Pig
Adds a column with the standard deviation of values in the expression.
Variance¶
Supported Engines: Pig
Adds a column with the variance of values in the expression.
DateTruncDay¶
Truncates the date to the day.
DateTruncWeek¶
Truncates the date to the week.
DateTruncMonth¶
Truncates the date to the month.
DateTruncQuarter¶
Truncates the date to the quarter.
DateTruncYear¶
Truncates the date to the year.