Dataflow Operators

Script

Script

A Script operation is a container for many steps. Its toGraph() contains children, which is a list of steps. There are several extra methods for dealing with children.

Example:

{'operation': 'Script', 'meta': {}, 'steps': []}

Arguments:

  • None

Execution options:

  • execution_options (object):
    • properties:
      • default_parallel (integer):
        • minimum: 1
      • split_size (integer):
        • minimum: 1
      • mapper_memory (integer):
        • minimum: 1
      • reducer_memory (integer):
        • minimum: 1
      • disable_execution_statistics (boolean)
      • execution_statistics_output_location (string)

Steps:

field

field.Flatten

(no description found)

Arguments:

field.Range

(no description found)

Arguments:

field.Rename

Renames a field.

Arguments:

field.Star

(no description found)

Arguments:

  • None

field.Concat

(no description found)

Arguments:

join

join.AppendSortFill

Given two datasets, union them with null for any non-shared fields. Group by the shared key and order by a shared date field. For the target column on the right, for each group, forward fill any null values using the closest previous value. For any nulls without a previous value, backfill from the first non-null value. Remove any rows that are null for the target column on the left. The purpose is to join output_key from the left to output_key from the right.

  • Available in mapreduce engine only.

  • Webapp status: true

  • Disambiguation needed: true

Example:

{'operation':'join.AppendSortFill','arguments':{'alias':'thing','relations':[{'relation':'thing1','group_key':'a','date_field':'b','output_key':'c'},{'relation':'thing2','group_key':'d','date_field':'e','output_key':'f'}]}}

Arguments:

Sample Output:

thing = CROSS thing1 BY a, thing2 BY c;

join.Complement

Performs a cartesian join on two or more relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps. It should be noted that this is an expensive operation and should be used sparingly

  • Disambiguation needed: true

Arguments:

join.ComplementCompound

Performs a complement compound join on two. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps. It should be noted that this is an expensive operation and should be used sparingly

  • Disambiguation needed: true

Arguments:

join.Cross

(no description found)

  • Webapp status: true

  • Disambiguation needed: true

  • Supports execution statistics: true

Example:

{'operation': 'join.LeftOuter', 'arguments': {'alias': 'thing', 'relations': [{'relation': 'thing1'}, {'relation': 'thing2'}]}}

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 2
    • items (object):
      • properties:
      • required:
        • relation
      • additionalProperties: false

Sample Output:

thing = CROSS thing1 BY a, thing2 BY c;

join.Inner

(no description found)

  • Webapp status: true

  • Disambiguation needed: true

  • Supports execution statistics: true

  • Group: join.Joins

  • Group description: Performs an inner(by default)/left-outer/full-outer join on two relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps.

Example:

{'operation': 'join.Inner', 'arguments': {'alias': 'thing', 'relations': [{'relation': 'thing1', 'field': 'a'}, {'relation': 'thing2', 'field': 'c'}]}}

Arguments:

Sample Output:

thing = JOIN thing1 BY a, thing2 BY c;

join.InnerCompound

(no description found)

  • Disambiguation needed: true

  • Supports execution statistics: true

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 2
    • maxItems: 2
    • items (object):

join.LeftAnti

(no description found)

  • Available in spark engine only.

  • Webapp status: true

  • Disambiguation needed: true

  • Supports execution statistics: true

Example:

{'operation': 'join.LeftAnti', 'arguments': {'alias': 'thing', 'relations': [{'relation': 'thing1', 'field': 'a'}, {'relation': 'thing2', 'field': 'c'}]}}

Arguments:

Sample Output:

thing = thing1.join(thing2,(thing1.a == thing2.c),'leftanti');

join.LeftOuter

Performs a left-outer join on two relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps.

  • Webapp status: true

  • Disambiguation needed: true

  • Supports execution statistics: true

  • Group: join.Joins

Example:

{'operation': 'join.LeftOuter', 'arguments': {'alias': 'thing', 'relations': [{'relation': 'thing1', 'field': 'a'}, {'relation': 'thing2', 'field': 'c'}]}}

Arguments:

Sample Output:

thing = JOIN thing1 BY a LEFT OUTER, thing2 BY c;

join.LeftOuterCompound

Performs a left-outer join on two relations with multiple field handling. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps.

  • Disambiguation needed: true

  • Supports execution statistics: true

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 2
    • maxItems: 2
    • items (object):

join.LeftAntiCompound

Performs a left-outer join on two relations with multiple field handling. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps.

  • Available in spark engine only.

  • Webapp status: true

  • Disambiguation needed: true

  • Supports execution statistics: true

Example:

{'operation': 'join.LeftAnti', 'arguments': {alias: 'thing', 'relations': [{relation: 'thing1', 'fields': ['a','b']}, {relation: 'thing2', 'fields': ['c','d']}]}}

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 2
    • maxItems: 2
    • items (object):

Sample Output:

thing = thing1.join(thing2,(thing1.a == thing2.c) &(thing1.b == thing2.d),'leftanti');

join.LeftOuterMulti

Performs a left-outer join on 3 or more relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps. Uses the DataFu library to populate empty joins will null values instead of the default Flatten behavior that produces nothing for empty collections(bags).

  • Webapp status: true

  • Disambiguation needed: true

  • Group: join.MultiJoins

Example:

{'operation': 'join.FullOuterMulti', 'arguments': {'alias': 'thing', 'relations': [{'relation': 'thing1', 'field': 'id'}, {'relation': 'thing2', 'field': 'leftid'}]}}

Arguments:

Sample Output:

thing = COGROUP thing1 BY id, thing2 BY leftid; thing = FOREACH thing GENERATE FLATTEN(thing1), FLATTEN(datafu.pig.bags.EmptyBagToNullFields(thing2)), FLATTEN(datafu.pig.bags.EmptyBagToNullFields(thing3));

join.LeftOuterMultiCompound

Performs a left-outer join on 3 or more relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps. Uses the DataFu library to populate empty joins will null values instead of the default Flatten behavior that produces nothing for empty bags.

  • Disambiguation needed: true

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 3
    • maxItems: 128
    • items (object):

join.RightOuter

Performs a right-outer join on two relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps

  • Webapp status: true

  • Disambiguation needed: true

  • Supports execution statistics: true

Arguments:

join.RightOuterCompound

Performs a right-outer join on two relations with multiple field handling. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps.

  • Disambiguation needed: true

  • Supports execution statistics: true

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 2
    • maxItems: 2
    • items (object):

join.FullOuter

Performs a full-outer join on two relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps.

  • Webapp status: true

  • Disambiguation needed: true

  • Supports execution statistics: true

  • Group: join.Joins

Example:

{'operation': 'join.FullOuter', 'arguments': {'alias': 'thing', 'relations': [{'relation': 'thing1', 'field': 'a'}, {'relation': 'thing2', 'field': 'c'}]}}

Arguments:

Sample Output:

thing = JOIN thing1 BY a FULL OUTER, thing2 BY c;

join.FullOuterCompound

Performs a full-outer join on two relations with multiple field handling. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps.

  • Disambiguation needed: true

  • Supports execution statistics: true

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 2
    • maxItems: 2
    • items (object):

join.FullOuterMulti

Performs a full-outer join on 3 or more relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps. Uses the DataFu library to populate empty joins will null values instead of the default Flatten behavior that produces nothing for empty collections(bags).

  • Webapp status: true

  • Disambiguation needed: true

  • Group: join.MultiJoins

  • Group description: Performs a left-outer/full-outer(by default full-outer) join on 3 or more relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps. Uses the DataFu library to populate empty joins will null values instead of the default Flatten behavior that produces nothing for empty collections(bags).

Example:

{'operation': 'join.FullOuterMulti', 'arguments': {'alias': 'thing', 'relations': [{'relation': 'thing1', 'field': 'a'}, {'relation': 'thing2', 'field': 'b'}, {'relation': 'thing3', 'field': 'c'}]}}

Arguments:

Sample Output:

thing = COGROUP thing1 BY a, thing2 BY b, thing3 BY c; thing = FOREACH thing GENERATE group, FLATTEN(datafu.pig.bags.EmptyBagToNullFields(thing1)), FLATTEN(datafu.pig.bags.EmptyBagToNullFields(thing2)), FLATTEN(datafu.pig.bags.EmptyBagToNullFields(thing3));

join.FullOuterMultiCompound

Performs a full-outer join on 3 or more relations. If alias_label is present, performs "friendly disambiguation" which renames disambiguated fields to the input relation's alias_label value to make it easier to reference in subsequent steps. Uses the DataFu library to populate empty joins will null values instead of the default Flatten behavior that produces nothing for empty bags.

  • Disambiguation needed: true

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 3
    • maxItems: 128
    • items (object):

join.Unjoin

(no description found)

Arguments:

load

load.HCat

(no description found)

Arguments:

load.Load

Load data from a file using PigStorage

Arguments:

load.Dsv

(no description found)

Arguments:

load.Json

Load data from a file using JsonLoader

Arguments:

load.Atlas

Loads a file from an Atlas record using the record's location, format & schema information

  • Webapp status: true

  • Supports execution statistics: true

Example:

operation = {'operation': 'load.Atlas', 'arguments': {'alias': 'output', 'record': 'sampleRecord'}}

Arguments:

  • alias (aliasString)
  • record (string):
    • pattern: (\w|-)+

Sample Output:

output = LOAD 'file.ldjson' USING JsonLoader('a:int,b:float,c:chararray');

load.AtlasEventSeries

Load data from an Atlas record

Arguments:

  • alias (aliasString)
  • series (string):
    • pattern: (\w|-)+
  • members (string):
    • pattern: ^(\w|-)+(,\s*(\w|-)+)*$

reduce

reduce.CoGroup

Groups multiple relations, like a nested join. Grouped rows from each input relation are nested in a collection(bag) with the name of the relation.

  • Webapp status: true

Example:

{'operation': 'reduce.CoGroup', 'arguments': {'alias': 'thing', 'relations': [{'relation': 'thing1', 'field': 'a'}, {'relation': 'thing2', 'field': 'c'}]}}

Arguments:

Sample Output:

thing = COGROUP thing1 BY a, thing2 BY c;

reduce.CoGroupComplex

Groups multiple relations, like a nested join. Grouped rows from each input relation are nested in a collection(bag) with the name of the relation.

  • Webapp status: true

Example:

{'operation': 'reduce.CoGroupComplex', 'arguments': {'alias': 'thing', 'relations': [{'relation': 'thing1', 'fields': ['a', 'b']}, {'relation': 'thing2', 'fields': ['d', 'e']}]}}

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 1
    • items (object):

Sample Output:

thing = COGROUP thing1 BY (a, b), thing2 BY (d, e);

reduce.Count

(no description found)

Arguments:

reduce.OddsRatioCellsVsAll

(no description found)

Arguments:

reduce.StreamingQuantile

Approximates quantile boundaries on an unordered relation.

  • Available in mapreduce engine only.

  • Webapp status: true

Arguments:

reduce.FilterCount

Performs conditional counting based on the specified cases. All counts must be performed on the same collection(bag), but cases are specified via an output name and a conditional to apply. Expressions should be written relative to the collection(bag).

  • Webapp status: true

Example:

{'operation': 'reduce.FilterCount', 'arguments': {'relation': 'A', 'alias': 'B', 'bag': 'b', 'cases': [{'field': 'myfield_abc', 'expression': {'operation': 'expression.Equality', 'arguments': [{'operation': 'expression.Field', 'arguments': ['d']}, {'operation': 'expression.Chararray', 'arguments': ['abc']}]}}, {'field': 'myfield_def', 'expression': {'operation': 'expression.Equality', 'arguments': [{'operation': 'expression.Field', 'arguments': ['d']}, {'operation': 'expression.Chararray', 'arguments': ['def']}]}}]}}

Arguments:

Sample Output:

B = FOREACH A {myfield_abc = FILTER b BY d == 'abc'; myfield_def = FILTER b BY d == 'def'; GENERATE *, COUNT(myfield_abc) AS myfield_abc:long, COUNT(myfield_def) AS myfield_def:long;}

reduce.FilterSum

Performs conditional summations based on the specified cases. All summations must be performed on the same field in the same bag, but cases are specified via an output name and a conditional to apply. Expressions should be written relative to the collection(bag).

  • Webapp status: true

Example:

{'operation': 'reduce.FilterSum', 'arguments': {'relation': 'A', 'alias': 'B', 'bag': 'b', 'field': 'c', 'cases': [{'field': 'myfield_abc', 'expression': {'operation': 'expression.Equality', 'arguments': [{'operation': 'expression.Field', 'arguments': ['d']}, {'operation': 'expression.Chararray', 'arguments': ['abc']}]}}, {'field': 'myfield_def', 'expression': {'operation': 'expression.Equality', 'arguments': [{'operation': 'expression.Field', 'arguments': ['d']}, {'operation': 'expression.Chararray', 'arguments': ['def']}]}}]}}

Arguments:

Sample Output:

B = FOREACH A {myfield_abc = FILTER b BY d == 'abc'; myfield_def = FILTER b BY d == 'def'; GENERATE *, SUM(myfield_abc.c) AS myfield_abc:long, SUM(myfield_def.c) AS myfield_def:long;}

reduce.Group

Groups rows in a relation by a common key.

  • Webapp status: true

Example:

{'operation': 'reduce.Group', 'arguments': {'alias': 'outthing', 'relation': 'thing', 'field': 'a'}}

Arguments:

Sample Output:

outthing = GROUP thing BY a;

reduce.GroupAll

Groups all rows in a relation into one row. This is useful before some types of aggregation.

  • Webapp status: true

Example:

{'operation': 'reduce.GroupAll', 'arguments': {'alias': 'outthing', 'relation': 'thing'}}

Arguments:

Sample Output:

outthing = GROUP thing ALL;

reduce.GroupComplex

Groups rows in a relation by multiple keys.

  • Webapp status: true

Example:

{'operation': 'reduce.GroupComplex', 'arguments': {'alias': 'outthing', 'relation': 'thing', 'fields': ['b', 'a']}}

Arguments:

Sample Output:

outthing = GROUP thing BY (b,a);

store

store.Dump

Dumps a relation to the logger.

Arguments:

store.Dsv

(no description found)

Arguments:

store.Store

Stores a delimited file.

Arguments:

store.Avro

(no description found)

Arguments:

store.Json

Stores a line-delimited JSON file using JsonStorage

Arguments:

store.Atlas

Stores a file to an Atlas record using the record's location, format & schema information.

  • Webapp status: true

  • Supports execution statistics: true

Example:

operation = {'operation': 'store.Atlas', 'arguments': {'relation': 'thing', 'record': 'sampleRecord2'}}

Arguments:

Sample Output:

STORE thing INTOP 'file.ldjson' USING JsonStorage();

transform

transform.AddCompoundKey

Adds a column based on the supplied fields concatenated with delimiter.

  • Webapp status: true

Example:

{'operation': 'transform.AddCompoundKey', 'arguments': {'relation': 'A', 'alias': 'B', 'delimiter': 'C', 'name': 'field_name', {'operation': 'expression.Field', 'arguments': ['field1']}, {'operation': 'expression.Field', 'arguments': ['field2']}, {'operation': 'expression.Field', 'arguments': ['field3']},}}

Arguments:

Sample Output:

B = FOREACH A GENERATE CONCAT((chararray)field1,delimiter,(chararray)field2,delimiter,(chararray)field3) AS row_name:chararray, *;

transform.AddField

Adds a new field to a relation based on the supplied expression.

  • Webapp status: true

Example:

{'operation': 'transform.AddField', 'arguments': {'relation': 'A', 'alias': 'B', 'field': 'f2', 'expression': {'operation': 'expression.Addition', 'arguments': [{'operation': 'expression.Field', 'arguments': ['f1']}, {'operation': 'expression.Float', 'arguments': [1.23]}]}}}

Arguments:

Sample Output:

B = FOREACH A GENERATE *, f1 * 1.23F AS f2:float;

transform.AddFields

Adds multiple new fields to a relation based on the supplied expression.

  • Webapp status: true

Example:

{'operation': 'transform.AddFields', 'arguments': {'relation': 'A', 'alias': 'B', 'fields': [{'field': 'f2', 'expression': {'operation': 'expression.Addition', 'arguments': [{'operation': 'expression.Field', 'arguments': ['f1']}, {'operation': 'expression.Float', 'arguments': [1.23]}]}}, {'field': 'f3', 'expression': {'operation': 'expression.Addition', 'arguments': [{'operation': 'expression.Field', 'arguments': ['f1']}, {'operation': 'expression.Integer', 'arguments': [123]}]}}]}}

Arguments:

Sample Output:

B = FOREACH A GENERATE *, f1 * 1.23F AS f2:float, f1 * 123 AS f3:int;

transform.AddRowNumbers

Adds a column for unique row numbers to a relation.

  • Webapp status: true

Example:

{'operation': 'transform.AddRowNumbers', 'arguments': {'relation': 'A', 'alias': 'B', 'name': 'row_number'}}

Arguments:

Sample Output:

B = RANK A; B = FOREACH B GENERATE rank_A as row_number, $1 ..;

transform.Flatten

Flattens a collection(bag) or tuple into the higher tuple. For tuples, merges it with the parent fields. For collections(bags), produces a new row for each item and merges it with the parent fields. New fields are disambiguated with the original field's name.

  • Webapp status: true

Example:

{'operation': 'transform.Flatten', 'arguments': {'alias': 'thing', 'relation': 'thing1', 'field': 'b'}}

Arguments:

Sample Output:

thing1 = FOREACH thing generate a,FLATTEN(b),c;

transform.Conflicts

Finds any rows that have the same values for the specified fields, but different values for other fields.

Arguments:

transform.ContingencyTable

Given a relation with a binned feature and a binned target, counts occurrences of all feature & target combinations to create a contingency table. Binned fields must have values 0 <= value < #bins. Produces a single row with the cell of the table matrix. Cell addresses are encoded in field names as <feature>_<bin>__<target>_<bin> and positionally traverses from upper left to lower right.

Arguments:

transform.Convert

Maps a field via an expression.

  • Webapp status: true

Example:

{'operation': 'transform.Convert', 'arguments': {'relation': 'A', 'alias': 'B', 'field': 'a', 'expression': {'operation': 'expression.Multiplication', 'arguments': [{'operation': 'expression.Field', 'arguments': ['a']}, {'operation': 'expression.Float', 'arguments': [3.14]}]}}}

Arguments:

Sample Output:

B = FOREACH A GENERATE a * 3.14F AS a:float, b, c;

transform.ConvertFields

Maps multiple fields via expressions.

  • Webapp status: true

Example:

{'operation': 'transform.ConvertFields', 'arguments': {'relation': 'A', 'alias': 'B', 'fields': [{'field': 'a', 'expression': {'operation': 'expression.Multiplication', 'arguments': [{'operation': 'expression.Field', 'arguments': ['a']}, {'operation': 'expression.Float', 'arguments': [3.14]}]}}, {'field': 'c', 'expression': {'operation': 'expression.Multiplication', 'arguments': [{'operation': 'expression.Field', 'arguments': ['c']}, {'operation': 'expression.Float', 'arguments': [12.3]}]}}]}}

Arguments:

Sample Output:

B = FOREACH A GENERATE a * 3.14F AS a:float, b, c * 12.3F AS c:float;

transform.ConvertFieldsToDatetime

Converts a field from any data type into DateTime data type

  • Webapp status: true

Example:

{'operation': 'transform.ConvertFieldsToDatetime', 'arguments': {'relation': 'A', 'alias': 'B', 'fields': ['a', 'c']}}

Arguments:

Sample Output:

B = FOREACH A GENERATE ToDate(a) AS a:datetime, b, ToDate(c) AS a:datetime;

transform.ConvertFieldsToJethroTimestamp

Converts a field from DateTime data type to JethroTimestamp

  • Webapp status: true

Arguments:

transform.Rearrange

Creates a new relation using only the specified fields.

  • Webapp status: true

Example:

{'operation': 'transform.Rearrange', 'arguments': {'relation': 'A', 'alias': 'B', 'fields': ['a','b']}}

Arguments:

Sample Output:

B = FOREACH A GENERATE a,b;

transform.Cut

Creates a new relation using only the specified fields.

  • Webapp status: true

Example:

{'operation': 'transform.Cut', 'arguments': {'relation': 'A', 'alias': 'B', 'fields': ['a']}}

Arguments:

Sample Output:

B = FOREACH A GENERATE a,b;

transform.Cutout

Opposite to field.SelectFields, removes the specified fields and keeps the rest.

  • Webapp status: true

Example:

{'operation': 'transform.Cutout', 'arguments': {'relation': 'A', 'alias': 'B', 'fields': ['b']}}

Arguments:

Sample Output:

B = FOREACH A GENERATE a,c;

transform.CutRename

Creates a new relation using only the specified fields and renames the relations.

  • Webapp status: true

Example:

{'operation': 'transform.CutRename', 'arguments': {'relation': 'A', 'alias': 'B', 'fields': [{'old': 'a', 'new': 'd'}, {'old': 'b', 'new': 'e'}]}}

Arguments:

  • alias (aliasString)
  • relation (restrictedString)
  • fields (array):
    • title: Fields
    • minItems: 1
    • items (object):
      • properties:
        • old:
          • au-restriction: input-field
          • $ref: #/definitions/restrictedString
          • title: Old
        • new (restrictedString)

Sample Output:

B = FOREACH A GENERATE a AS d,b AS e;

transform.NestedCut

(no description found)

  • Webapp status: true

Arguments:

transform.NestedCutout

Performs RemoveField(s) operation on nested field.

  • Webapp status: true

Example:

{'operation': 'transform.NestedCutout', 'arguments': {'relation': 'A', 'alias': 'B', 'field': 'b', 'fields': ['d', 'f']}}

Arguments:

Sample Output:

B = FOREACH A GENERATE a,b.(c,e) AS b,g;

transform.Distinct

Produces a relation with distinct rows.

  • Webapp status: true

Example:

{'operation': 'transform.Distinct', 'arguments': {'relation': 'A', 'alias': 'B'}}

Arguments:

Sample Output:

B = DISTINCT A;

transform.DistinctCountBag

Adds a column with a count of the distinct values for a field within the collection(bag).

  • Webapp status: true

Example:

{'operation': 'transform.DistinctCountBag', 'arguments': {'relation': 'A', 'alias': 'B', 'bag': 'mybag', 'bag_field': 'itemtocount', 'field': 'myfieldname'}}

Arguments:

Sample Output:

B = FOREACH A {count_itemtocount = DISTINCT mybag.itemtocount; generate *, COUNT(count_itemtocount) as myfieldname;}

transform.EnumerateBag

Adds a column to each item in the collection(bag) with its index.

  • Webapp status: true

Arguments:

transform.MapRow

Maps a row to a new set of fields via expressions, discarding any not specified.

  • Webapp status: true

Example:

{'operation': 'transform.MapRow', 'arguments': {'relation': 'A', 'alias': 'B', 'fields': [{'field': 'a', 'expression': {'operation': 'expression.Multiplication', 'arguments': [{'operation': 'expression.Field', 'arguments': ['a']}, {'operation': 'expression.Float', 'arguments': [3.14]}]}}]}}

Arguments:

Sample Output:

B = FOREACH A GENERATE a * 3.14F AS a:float;

transform.Melt

Converts the specified fields into key-value pairs where the key is the field name, then flattens the resulting collection(bag). This produces a new row for each field kvp.

  • Webapp status: true

Example:

{'operation': 'transform.Melt', 'arguments': {'relation': 'A', 'alias': 'B', 'index': 'a', 'fields': ['a' ,'b', 'c']}}

Arguments:

Sample Output:

B = FOREACH A GENERATE a, {('b', b), ('c', c)} AS kvp; B = FOREACH B GENERATE a, FLATTEN(kvp); B = FOREACH B GENERATE a,b,c; B = RANK B; B = FOREACH A GENERATE a, {('b', b), ('c', c)} AS kvp; B = FOREACH B GENERATE a, FLATTEN(kvp);

transform.MeltInPlace

Converts the specified fields into key-value pairs where the key is the field name, then flattens the resulting bag. This produces a new row for each field kvp.

Arguments:

transform.GetFirstInBag

Sorts an inner collection(bag) and flattens the top item.

  • Webapp status: true

  • Group: collection.Get In Collection

  • Group description: Sorts an inner collection(bag) and flattens the top(by default) or last item.

Example:

{'operation': 'transform.GetFirstInBag', 'arguments': {'relation': 'A', 'alias': 'B', 'bag': 'mybag', 'sort': 'myfield', 'field': 'topitem'}}

Arguments:

Sample Output:

B = FOREACH A {topitem = ORDER mybag BY myfield ASC; topitem = LIMIT topitem 1; GENERATE *, FLATTEN(topitem);}

transform.GetLastInBag

Sorts an inner collection(bag) and flattens the last item.

  • Webapp status: true

  • Group: collection.Get In Collection

Example:

{'operation': 'transform.GetLastInBag', 'arguments': {'relation': 'A', 'alias': 'B', 'bag': 'mybag', 'sort': 'myfield', 'field': 'topitem'}}

Arguments:

Sample Output:

B = FOREACH A {topitem = ORDER mybag BY myfield DESC; topitem = LIMIT topitem 1; GENERATE *, FLATTEN(topitem);}

transform.Head

Produces a relation with the specified number of rows from the beginning of the input relation.

  • Webapp status: true

  • Group: row.RowSubset

  • Group description: Performs different operations to give a subset of rows in a relation. Head: Gives specified number of rows at the beginning. Tail:Gives specified number of rows at the end. Skip: Gives remaining rows minus the specified count from the top. Limit: Gives relation with the specified number of rows.

Example:

{'operation': 'transform.Head', 'arguments': {'relation': 'A', 'alias': 'B', 'count': 10}}

Arguments:

Sample Output:

B = RANK A; B = LIMIT B 10; B = ORDER B BY rank_A; B = FOREACH B GENERATE $1 ..;

transform.Limit

Produces a relation with the specified number of rows.

  • Webapp status: true

  • Group: row.RowSubset

Arguments:

Note

To see all rows returned, use "-1" as the value.

transform.NestedProjection

(no description found)

Arguments:

transform.Projection

(no description found)

Arguments:

transform.Pack

(no description found)

Arguments:

transform.RenameRelation

Renames a dataset(relation) as specified. This essentially makes a copy, which is useful for self-joining for instance.

  • Webapp status: true

Example:

{'operation': 'transform.RenameRelation', 'arguments': {'relation': 'A', 'alias': 'B'}}

Arguments:

Sample Output:

B = FOREACH A GENERATE *;

transform.Rename

Renames a set of fields as specified.

  • Webapp status: true

Example:

{'operation': 'transform.Rename', 'arguments': {'relation': 'A', 'alias': 'B', 'fields': [{'old': 'a', 'new': 'd'}, {'old': 'b', 'new': 'e'}]}}

Arguments:

  • alias (aliasString)
  • relation (restrictedString)
  • fields (array):
    • minItems: 1
    • title: Fields
    • items (object):
      • properties:
        • old:
          • au-restriction: input-field
          • $ref: #/definitions/restrictedString
          • title: Old
        • new (restrictedString)

Sample Output:

B = FOREACH A GENERATE a AS d, B AS e, c;

transform.PrependToFieldNames

Renames all fields by prepending with the specified prefix.

  • Webapp status: true

Arguments:

transform.AppendToFieldNames

Renames all fields by appending the specified postfix.

  • Webapp status: true

Arguments:

transform.RemoveDisambiguation

Renames any fields that are disambiguated with the specified prefix. Will throw if this results in any duplicate field names.

Arguments:

transform.RemoveDisambiguations

Renames any fields that are disambiguated with the specified prefix. Will throw if this results in any duplicate field names.

  • Webapp status: true

Example:

{'operation': 'transform.RemoveDisambiguations', 'arguments': {'relation': 'A', 'alias': 'B', 'prefixes': ['test', 'test2']}}

Arguments:

Sample Output:

B = FOREACH A GENERATE test::a AS a, b, test2::c AS c;

transform.RowSlice

Creates a relation from a slice of rows in the input relation.

  • Webapp status: true

Example:

{'operation': 'transform.RowSlice', 'arguments': {'relation': 'A', 'alias': 'B', 'start': 1, 'end': 2}}

Arguments:

Sample Output:

B = RANK A; B = FILTER B BY rank_A >= 1 AND rank_A < 2; B = FOREACH B GENERATE $1 ..;

transform.Skip

Creates a relation with remaining rows minus the specified count from the top.

  • Webapp status: true

  • Group: row.RowSubset

Example:

{'operation': 'transform.Skip', 'arguments': {'relation': 'A', 'alias': 'B', 'count': 10}}

Arguments:

Sample Output:

B = RANK A; B = FILTER B BY rank_A >= 10; B = FOREACH B GENERATE $1 ..;

transform.Sort

Sorts a relation.

  • Webapp status: true

Example:

{'operation': 'transform.Sort', 'arguments': {'relation': 'A', 'alias': 'B', 'keys': [{'name': 'a', 'order': 'ASC'}, {'name': 'b', 'order': 'DESC'}]}}

Arguments:

  • alias (aliasString)
  • relation (restrictedString)
  • keys (array):
    • minItems: 1
    • title: Keys
    • items (object):
      • properties:
        • name (string):
          • title: Field
        • order (string):
          • title: Order (ASC/DESC)
          • enum:
            • ASC
            • DESC

Sample Output:

B = ORDER A BY a ASC, b DESC;

transform.SortBags

Sorts nested rows in a relation.

  • Webapp status: true

Example:

{'operation': 'transform.Sort', 'arguments': {'relation': 'A', 'alias': 'B', 'bags': ['field': 'mybag', 'keys': [{'name': 'a', 'order': 'ASC'}, {'name': 'b', 'order': 'DESC'}]]}}

Arguments:

  • alias (aliasString)
  • relation (restrictedString)
  • bags (array):
    • minItems: 1
    • title: Collections
    • items (object):
      • properties:
        • field (restrictedString)
        • keys (array):
          • minItems: 1
          • title: Field
          • items (object):
            • properties:
              • name (string):
                • title: Collection Field
              • order (string):
                • title: Order (ASC/DESC)
                • enum:
                  • ASC
                  • DESC
            • required:
              • name
              • order
      • required:
        • field
        • keys

Sample Output:

B = FOREACH A {mybag = ORDER mybag BY a ASC, b DESC; GENERATE id, mybag, other;}

transform.Tail

Similar to head, but slices rows at the end of a relation.

  • Webapp status: true

  • Group: row.RowSubset

Example:

{'operation': 'transform.Tail', 'arguments': {'relation': 'A', 'alias': 'B', 'count': 10}}

Arguments:

Sample Output:

B = RANK A; B = ORDER B BY rank_A DESC; B = LIMIT B 10; B = ORDER B BY rank_A ASC; B = FOREACH B GENERATE $1 ..;

transform.Sample

Produces a relation with the specified percentage of rows taken from throughout the input relation.

  • Webapp status: true

Example:

{'operation': 'transform.Sample', 'arguments': {'relation': 'A', 'alias': 'B', 'percentage': 10}}

Arguments:

Sample Output:

B = SAMPLE A 10 * 0.01;

transform.Select

Filters rows based on an expression.

  • Webapp status: true

Example:

{'operation': 'transform.Select', 'arguments': {'relation': 'A', 'alias': 'B', 'expression': {'operation': 'expression.And', 'arguments': [{'operation': 'expression.Equality', 'arguments': [{'operation': 'expression.Field', 'arguments': ['sex']}, {'operation': 'expression.Chararray', 'arguments': ['F']}]}, {'operation': 'expression.GreaterThan', 'arguments': [{'operation': 'expression.Field', 'arguments': ['age']}, {'operation': 'expression.Integer', 'arguments': [25]}]}]}}}

Arguments:

Sample Output:

B = FILTER A BY (sex == 'F') AND (age > 25);

transform.Union

Appends multiple relations aligning fields by name in the order they are observed. Fields in each relation must have the same type and complex types must match exactly.

  • Webapp status: true

  • Supports execution statistics: true

Example:

{'operation': 'transform.Union', 'arguments': {'alias': 'thing3', 'relations': ['thing1', 'thing2']}}

Arguments:

  • alias (aliasString)
  • relations (array):
    • minItems: 2
    • items (object):
      • properties:
      • required:
        • relation
      • additionalProperties: false

Sample Output:

thing3 = UNION ONSCHEMA thing1, thing2;

Definitions

meta

  • meta (object):
    • properties:
      • id (integer)
      • alias_label (aliasString)
      • label (string)
      • description (string)
      • collect_execution_statistics (boolean)
      • strict_not_disambiguate (boolean)
      • partial_not_disambiguate (boolean)
      • disabled (boolean)
    • required:
      • id
    • additionalProperties: true

restrictedString

  • restrictedString (string):
    • pattern: ^[A-Za-z0-9_]+((::|.)?[A-Za-z0-9_]+)*$

prependString

  • prependString (string):
    • pattern: ^[A-Za-z0-9]+((::)?[A-Za-z0-9_]*)*$

appendString

  • appendString (string):
    • pattern: ^(::)?[A-Za-z0-9_]+$

aliasString

  • aliasString (string):
    • pattern: ^[A-Za-z0-9_]+$

fileString

  • fileString (string):
    • pattern: ^([\w-]+)?(/[\w-]+)*(.[a-zA-Z]+?)$

restrictedDelimiter

  • restrictedDelimiter (string):
    • enum:
      • ,
      • \t
      • :
      • |
      • " "
      • ~
      • ;
      • _

stepsField

schema.fieldRequired

  • schema.fieldRequired (object):

schema.simpleTypeBase

  • schema.simpleTypeBase (object):
    • properties:
      • type (string):
        • enum:
          • int
          • long
          • float
          • double
          • chararray
          • bytearray
          • boolean
          • datetime
          • biginteger
          • bigdecimal
          • ""
    • required:
      • type

schema.simpleTypeField

schema.tupleTypeBase

  • schema.tupleTypeBase (object):
    • properties:
      • type (string):
        • pattern: ^tuple$
      • fields (array):
    • required:
      • type
      • fields

schema.tupleTypeField

schema.bagTypeBase

  • schema.bagTypeBase (object):
    • properties:
    • required:
      • type
      • tuple
      • fields

schema.bagTypeField

schema.mapTypeBase

  • schema.mapTypeBase (object):

schema.mapTypeField

schema.anyType

expressions