Skip to main content

Make arrays into fully-fledged Collections: T[] is shorthand for Array

18 replies [Last post]
lucretius2
Offline
Joined: 2004-12-19

This idea is an RFE and apparently along similar lines to something Sun have considered:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4654312

THE PROBLEM
To summarize the reasons for this idea:
* Java currently has 2 systems of parameterized types, generics and native arrays. T[] is (almost) equivalent to a fixed-size array class which we might call Array.
* Of course, Java also has 2 systems of collections, Collection and native arrays T[].
* When you write a method which takes a collection as a parameter you have to decide whether to make it an arrays (for speed) or a Collection (for flexibility). You shouldn't have to make this choice.
* Similarly when you declare a field (or other variable) which is a collection, you have to decide whether to make it an array or a Collection. It is difficult to revoke this decision later. Whereas if you (say), declare a field as a List, it's very easy to switch it from being an ArrayList to a LinkedList etc.

A SOLUTION?
The obvious solution is to add a new class java.lang.Array, which is a native array (not a wrapper for a native array). Then the existing syntax for creating/accessing arrays using T[] is just syntax sugar for using new, get() and set() on an Array object. It also removes the somewhat mysterious 'special' status that arrays have in the language, including oddities like their length field.

Array's API would look like this:

public final class java.util.Array extends AbstractList implements Cloneable, Serializable, RandomAccess {
public final int length; // same as size()
public Array(int size); // same as new T[size]
public Array(Collection ); // might be useful; simpler than Collection.toArray(T[])
public Object clone();
public boolean contains(Object o); // from Collection interface
public T get(int index); // from List interface; same as this


public Iterator iterator(); // from Collection interface
public int size(); // from Collection interface; same as this.length
// etc. etc. (various other methods from
the Collection/List interfaces)
}

Array would implement the other useful methods
of List such as indexOf(Object), iterator() etc. Since arrays are fixed-size, some optional operations would throw UnsupportedOperationException: add, addAll,
clear, remove, removeAll, retainAll.

Unfortunately there is an incompatibility in the way generics and native arrays handle subtypes:

Cat[] cats = whatever;
Animal[] animals = cats; // legal

Array cats = whatever;
Array animals = cats; // illegal

I don't know enough about the issues to have an answer to this, but maybe improvements to generics' handling of arrays will provide a solution? For instance new T[] isn't currently allowed.

There's another more minor problem: native arrays already implement the equals() method, but it just does an '==' check. So Array.equals() would break the contract of Collection.equals().

A DIFFERENT SOLUTION?

This may be a bit radical, but there is some demand for true rectangular arrays in Java instead of arrays-of-arrays. This would apparently speed up array-intensive code which is currently hampered by the possibility of aliasing (i.e. currently in a 2D array n[][], the subarrays n[0] and n[1] can refer to the same array). If rectangular arrays were introduced as a completely new type of array with their own syntax and bytecodes, side by side with the existing ones, then they could obey the generics subtyping rules and the Collection.equals() contract. Existing arrays could be gradually deprecated.

New-style arrays could have distinctive syntax like this (I don't know how to make 1D arrays distinctive though):
Cat[,] cats = new Cat[10,5]; // new-style array syntax equivalent to new Array>
x = cats[0,0]; // equivalent to cats.get(0).get(0);
List animals = new LinkedList();
animals.addAll(cats); // this works because cats is a Collection
Animal[,] animals = cats; // error: illegal because Array> is incompatible with Array>

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
jwenting
Offline
Joined: 2003-12-02

> > You'd have to build very convoluted code to
> achieve
> > either, and for 1.7 we'd get people demanding a
> > normal for loop be put back in.
>
> I do not suggest removing them. I suggest not using
> them in your code if you want it to be robust.
>
> > [code]
> > int i = 0;
> > for (SomeObject o: c) {
> > if (i % 3 == 0)
> > {
> > o.setValue(calculate(i));
> > }
> > i += 2;
> > }
> > [/code]
>
> Looks too Fortranish to me. Obscures the underlying
> algorithm. Think collections, not loops. If you write
> a loop, you prescribe too many details. If you don't,
> the process can be easily parallelized.
>
It's what you'd get by mandating foreach loops...
Collections without some way to loop over them are useless.
And looping without a means to control the loop while looping is useless as well.

vpatryshev
Offline
Joined: 2004-06-30

> Collections without some way to loop over them are
> useless.
> And looping without a means to control the loop while
> looping is useless as well.

Say we compare matrices/arrays and operators/vectors. Do you think one needs to control the loop while working with operators and vectors? A loop is always a detail of implementation; it lacks polymorphism.

So, in case when we need to project an n-tuple to an m-tuple, omitting certain elements, we could involve a projector operator; it does not work on tuple components, but on whole tuples. And, generally speaking, it does not necessarily mean a gross loss of performance.

vpatryshev
Offline
Joined: 2004-06-30

> > Well, plain arrays are probably the thing from the
> > past. From Fortran or something.

> bullocks.
> Arrays have their use still and always will have.
> For one they're more efficient than using
> collections.

Why do you think they are more efficient? asList() is a view, is not it? There's virtually no performance loss.

On the other hand, if you want "performance", in its '80s sense, you may start using C.

> > Another feature worth eliminating is the old 'for
> > (int i = 0...' loop. If you do not have arrays,
> you
> > do not need to extract the size, and in most, if
> not
> > all, cases, it is enough to iterate over an
> iterable,
> > using 'foreach'. Performance? No impact on
> > performance.
>
> Wrong again.
> the for/each loop can not be used to skip elements
> for example, nor do you have access to the loop
> counter.

Skipping can be done in two ways: a) [code]continue[/code], b) by using filtering.

> You'd have to build very convoluted code to achieve
> either, and for 1.7 we'd get people demanding a
> normal for loop be put back in.

I do not suggest removing them. I suggest not using them in your code if you want it to be robust.

> [code]
> int i = 0;
> for (SomeObject o: c) {
> if (i % 3 == 0)
> {
> o.setValue(calculate(i));
> }
> i += 2;
> }
> [/code]

Looks too Fortranish to me. Obscures the underlying algorithm. Think collections, not loops. If you write a loop, you prescribe too many details. If you don't, the process can be easily parallelized.

> OK, it's contrived but that's for brevity and to get
> 2 objections into 1 example...

:) This is an example of programming, not of algorithm implementation. The same way you can argue for using labels and goto, showing a piece of badly structured code.

But probably this is an issue of taste preferences.

markf
Offline
Joined: 2005-01-20

A foreach loop doesn't allow you to replace elements of the collection. That makes them completely useless in many situations. A classical for-loop permits this easily. Do you really think that loops that require replacing elements are so unusual that classical for-loops should be deprecated?

vpatryshev
Offline
Joined: 2004-06-30

Right. I agree with you here. Say, to implement Eratosphenes sieve. Have to think about it.

tackline
Offline
Joined: 2003-06-19

Array use cases:

o Compatibility with legacy code and ...
o Statically typed substitute for List, particularly as parameters or return types. Only significant prior to Tiger, so obsolete.
o Implementing ArrayList. Who cares if that was native?
o Increasing irrelevant costs of extra layer of indirection. Most of which can be removed.
o Better syntax than List. Assuming no operator overloading, not even like String.
o "List" of primitives. With improved JVMs, notably using escape analysis, I believe an implementation of List could store primitives in a primitive array without much overhead.

List use cases:

o Sequence of things not covered above.

lucretius2
Offline
Joined: 2004-12-19

> If you want an array, use an array.
> If you want a List, use a List.
>
> There's 2 different things for different use cases,

Hold on a sec - what makes you sure these are 2 different things?
I can only see one difference between a List and an array - arrays are not resizable. That just means they don't implement some of the optional methods in the List interface. (And since these methods are optional, an array *is* a List.)

I think you, like many others, are being misled by the different syntaxes of arrays and Lists. Logically they are the same. a[n] is the same as a.get(n) and so on.

> why dumb things down for those who can't see the
> difference?

Why make things look different when they are in fact the same? and generics makes it even more apparent that arrays and Lists should be the same: a T[] is a List.

Some advantages of unifying arrays and Lists:
* they would be interchangable (just as LinkedList and ArrayList currently are), instead of just interconvertible

* refactoring: it's currently difficult to change an array in your code to a List, if (say) you subsequently decide you need to resize it

* simplification of the language; currently Java has 2 incompatible systems of generics (yes, arrays are a system of generics) and 2 incompatible systems of collections

* the possibility of using array-like syntax for Lists, as well as List-like syntax for arrays. a[n] is much nicer than a.get(n). This could be done without going for full operator overloading.

Message was edited by: lucretius2

markf
Offline
Joined: 2005-01-20

We can already do this -- Arrays.asList() wraps an array and presents it as a List. No conversion is performed. It is an ideal adapter from array-based methods to list-based methods.

This is taken verbatim from "Arrays.java":
[code]
/**
* @serial include
*/
private static class ArrayList extends AbstractList
implements RandomAccess, java.io.Serializable
{
private static final long serialVersionUID = -2764017481108945198L;
private Object[] a;

ArrayList(E[] array) {
if (array==null)
throw new NullPointerException();
a = array;
}

public int size() {
return a.length;
}

public Object[] toArray() {
return (Object[])a.clone();
}

public E get(int index) {
return (E)a[index];
}

public E set(int index, E element) {
Object oldValue = a[index];
a[index] = element;
return (E)oldValue;
}

public int indexOf(Object o) {
if (o==null) {
for (int i=0; i if (a[i]==null)
return i;
} else {
for (int i=0; i if (o.equals(a[i]))
return i;
}
return -1;
}

public boolean contains(Object o) {
return indexOf(o) != -1;
}
}
[/code]

Is this not exactly what you are asking for?

vpatryshev
Offline
Joined: 2004-06-30

Well, plain arrays are probably the thing from the past. From Fortran or something. I would suggest never to use them in real life, treat them as a legacy feature, and refactor them to lists whenever found. Unfortunately, a lot of Java classes use arrays with no specific reason - for instance, where an [code]Iterable[/code] would be enough, a [code]String[][/code] is created (I'm talking about String.split()).

Another feature worth eliminating is the old 'for (int i = 0...' loop. If you do not have arrays, you do not need to extract the size, and in most, if not all, cases, it is enough to iterate over an iterable, using 'foreach'. Performance? No impact on performance.

jwenting
Offline
Joined: 2003-12-02

> Well, plain arrays are probably the thing from the
> past. From Fortran or something. I would suggest
> never to use them in real life, treat them as a
> legacy feature, and refactor them to lists whenever
> found. Unfortunately, a lot of Java classes use

bullocks.
Arrays have their use still and always will have.
For one they're more efficient than using collections.

>
> Another feature worth eliminating is the old 'for
> (int i = 0...' loop. If you do not have arrays, you
> do not need to extract the size, and in most, if not
> all, cases, it is enough to iterate over an iterable,
> using 'foreach'. Performance? No impact on
> performance.

Wrong again.
the for/each loop can not be used to skip elements for example, nor do you have access to the loop counter.
Both are frequently required.
You'd have to build very convoluted code to achieve either, and for 1.7 we'd get people demanding a normal for loop be put back in.
[code]
int i = 0;
for (SomeObject o: c) {
if (i % 3 == 0)
{
o.setValue(calculate(i);
}
i += 2;
}
[/code]
OK, it's contrived but that's for brevity and to get 2 objections into 1 example...

markf
Offline
Joined: 2005-01-20

Agreed. People shouldn't be afraid of low-level data manipulation. It has a very clear place in any language that expects to achieve good performance, and is necessary to be able to express algorithms in a clean way.

It's like Linus Torvalds always says: "mechanism, not policy". Java should provide good mechanisms, not enforce policies. The enhanced for loop is a godsend, but compelling its use in all cases would be disastrous.

jwenting
Offline
Joined: 2003-12-02

If you want an array, use an array.
If you want a List, use a List.

There's 2 different things for different use cases, why dumb things down for those who can't see the difference?

markf
Offline
Joined: 2005-01-20

> If you want an array, use an array.
> If you want a List, use a List.
>
> There's 2 different things for different use cases,
> why dumb things down for those who can't see the
> difference?

I tend to agree. I think that arrays should stay a simple, low-level kind of thing. There's nothing wrong with primitive, non-object-oriented variables.

Besides, we already have [i]Arrays.asList(T[])[/i], which provides exactly the functionality you describe. It's no different than keeping the methods for operating on primitive numeric types in the [i]Math[/i] class.

As much as I love the Python language, I think their insistence on unifying primitives and objects is counter-productive. I don't want Sun to go down that route -- keep primitives primitive. If anything, Sun needs to beef up the Java primitives by adding true multidimensional arrays, [url="http://citeseer.ist.psu.edu/vanreeuwijk02adding.html"]tuples [/url], and maybe a complex type and a rational type.

pauldv
Offline
Joined: 2005-02-03

> [b]A SOLUTION?
> [/b]The obvious solution is to add a new class
> java.lang.Array, which is a native array (not a
> wrapper for a native array). Then the existing syntax
> for creating/accessing arrays using T[] is just
> syntax sugar for using new, get() and set() on an
> Array object. It also removes the somewhat
> mysterious 'special' status that arrays have in the
> language, including oddities like their length
> field.

What about having arrays implement the Iterable interface. In many cases the only thing needed is to iterate over them in a for loop. Arrays can do this, now we need a way to pass them to an agnostic method

lucretius2
Offline
Joined: 2004-12-19

To spell out some other benefits of this idea: the java.util.Arrays class, which contains static methods for sorting/searching/filling arrays would mostly not be needed. Similar methods are provided in the Collections class - except that arrays currently aren't Collections. This violates the principle of 'once and only once'.

You'll find duplication like this all over any Java program which has to handle both arrays and Collections.

BTW I'm only proposing that arrays of [i]references [/i]be made into Collections. Arrays of primitive types can't be, unless you autobox the primitives.

dgreenbean
Offline
Joined: 2003-12-08

+0.5 :-)

It would be really nice to have the Collections methods available to arrays, but I don't think trying to create a class that is synonymous with the array syntax is the right way to do it.

An array, for reasons pointed out in the original posting, is not a Collection. Arrays are already a special kind of object, but how special do they need to be? What would be the detriment to the following code:
[code]
String[][] myStrings = new String[][] {{"h","e"},{"l","l"},{"o"}};
for (String[] array : myStrings)
for (String str : array)
System.out.print(str);
[/code]
Intuitively, this is just an iteration over the array, which is all it is supposed to be. Granted it would be a better design to use the current Collections for this sort of thing. Even so, everyone needs a foreach loop for arrays once in a while, even when using the arrays correctly. Keeping track of i/j/k dummy variables is annoying and messy. This just gets rid of that.

In summary, I propose to have the current array objects implement the Collection interface without adding any additional library classes for the purpose. As far as static utility methods, that would be great, but I think they need to be considered in more detail separate from this concern.

lucretius2
Offline
Joined: 2004-12-19

> +0.5 :-)
>
> It would be really nice to have the Collections
> methods available to arrays, but I don't think trying
> to create a class that is synonymous with the array
> syntax is the right way to do it.
>
> An array, for reasons pointed out in the original
> posting, is not a Collection. Arrays are already a
> special kind of object, but how special do they need
> to be? What would be the detriment to the following
> code:
> [code]
> String[][] myStrings = new String[][]
> {{"h","e"},{"l","l"},{"o"}};
> for (String[] array : myStrings)
> for (String str : array)
> System.out.print(str);
> [/code]
> Intuitively, this is just an iteration over the
> array, which is all it is supposed to be. Granted it
> would be a better design to use the current
> Collections for this sort of thing. Even so,
> everyone needs a foreach loop for arrays once in a
> while, even when using the arrays correctly. Keeping
> track of i/j/k dummy variables is annoying and messy.
> This just gets rid of that.

> In summary, I propose to have the current array
> objects implement the Collection interface without
> adding any additional library classes for the
> purpose.

It doesn't make that much difference whether native array's implementation of Collection is documented (as Array ) or is merely implicit. Documenting it as Array would IMHO be a lot clearer, and I don't see what we would lose by it.

The main advantages of the proposal are fixing problems like these:

public void myMethod(Foo[] f) {
}

public void otherMethod() {
List bar;
...
myMethod(bar); // can't do it! have to waste time/memory converting to an array
}

private Foo[] f; // on what basis can you decide upfront to make this field an array not a List? better if we could declare this as List and decide later whether to assign an array to it

tackline
Offline
Joined: 2003-06-19

If I get this right, the suggestion is arrays (of non-primitives) become synonymous with a new class Array. Array implements List.

An example given is a List treated as an array.

The example does not follow from the suggestion. In the new world array is a List, but a List is not an array. So you can treat an array as a list, but not vice versa.

The obvious correction to that is to make an array reference (mostly) synonymous with a List. new[] creates somethings concrete. The array notation follows array casting rules, and the List notation List rules. Similarly for overloading. Byte code is appropriately jiggled around. Kind of nightmare implications, I feel.

Why can't we all live happily in 5.0+-world using List<>s in place of arrays? If we do have an array hanging around we want to treat as a List/Collection, then Arrays.asList is our friend (the single, small allocation overhead is absolutely tiny, and you can always cache it if you are mad).

Incidentally, does anyone know why Arrays.ArrayList.a isn't final (as of 1.5.0-b64)?